Dispatches from Spark Summit: What You Need to Know
As we've been reporting in conjunction with Spark Summit this week in San Francisco, the Big Data and Hadoop communities are becoming increasingly interested in Apache Spark, an open source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. At Spark Summit. IBM, Microsoft and other organizations upped their antes on Spark, delivering new initiatives and tools.
There are also some numbers coming out that illustrate Spark's momentum. Here are the most important Spark-related soundbytes from this week.
According to solid data from Tech Overflow’s recent developer survey, Spark tied with Scala as the top-paying job in technology, and Spark developers in the U.S. are earning an average of $125,000 per year, according to the survey.
Meanwhile, there are over 1,000 contributors to Spark, Databricks leaders said at this week’s Spark Summit. Databricks was out with major news, too. The company has announced the General Availability of Databricks Community Edition (DCE), a free version of the just-in-time data platform built on top of Apache Spark.
"This year we've seen explosive growth for the Apache Spark project and all signs indicate the pace will only accelerate as the community expands even more," said Matei Zaharia, cofounder and chief technology officer at Databricks. "Databricks Community Edition has created an ideal environment for learning Apache Spark. Developers of all backgrounds can now use Databricks Community Edition to learn Spark and mitigate the acute Spark skills gap."
IBM had previously announced a major commitment to Apache Spark, billing it as "potentially the most important new open source project in a decade that is being defined by data." Now, the company has launched a promising cloud-based development environment for Spark. Data Science Experience is IBM's cloud-based development environment for Apache Spark that could help data scientists work very efficiently with developers to build smarter apps. According to Tech Republic:
"The Data Science Experience is available through IBM's Cloud Bluemix platform, and it provides curated data sets, open source toolsets, and a collaborative space. In theory, it will allow data scientists to provide developers with better insights and data-driven models to be used in application development."
Microsoft also had major Spark news. Building on its previous investments, Microsoft announced a surprisingly large commitment to Spark, similar to the far reaching commitment that IBM had previously announced. Microsoft said it will be integrating its HDInsight, Cortana Intelligence Suite, Power BI and Microsoft R Server with Spark.
Additionally, MapR Technologies, known for its focus on Hadoop, announced a new enterprise-grade Apache Spark distribution. This new distribution includes the complete Spark stack engineered to support advanced analytic applications, along with patented innovations in the MapR Platform, plus several open source projects that complement Spark.
It was also clear at the summit that the Spark skills gap is actually very real. Companies need workers experienced with Spark, and toward that end, Databricks has been promoting free Spark training.