Microsoft Lays Out Details on its Plans for Apache Spark
This week in San Francisco, thousands of people converged at Spark Summit, to share how they leverage Apache Spark to get the most out of big data. Building on its previous investments, Microsoft announced a surprisingly large commitment to Spark, similar to the far reaching commitment that IBM had previously announced. Microsoft said it will be integrating its HDInsight, Cortana Intelligence Suite, Power BI and Microsoft R Server with Spark.
Now, further details are emerging about Microsoft and its plans for Spark.
In a post, Microsoft detailed the following plans:
Spark for Azure HDInsight General Availability, previously announced as public preview, Spark for Azure HDInsight generally available today, and introducing a fully managed Spark service from Hortonworks that has been hardened for the enterprise and made simpler for you to use. You can also rely on the industry’s highest availability service level agreement for Spark at 99.9%. You can get value out of Spark immediately with out-of-the-box integration with Jupyter, the most popular open source notebook for data scientists.
R Server for HDInsight in the cloud powered by Spark, previously announced as public preview, R Server for HDInsight will be generally available in the summer making the Spark integration available both on-premises and in the cloud. This makes it easy to move code and projects to the cloud with a few clicks and within a few minutes without buying hardware or hiring specialized operations teams typically associated with big data infrastructure.
R Server for Hadoop on-premises now powered by Spark, as the leading solution in the world to run R at scale, R Server for Hadoop will support both Microsoft R and native Spark execution frameworks available in June. Combining R Server with Spark gives users the ability to run R functions over thousands of Spark nodes letting you train your models on data 1000x larger and 100x faster than was possible with open source R and nearly 2x faster than Spark’s own MLLib.
Free R Client for Data Scientists, today we are announcing Microsoft R Client, a new freely available tool for data scientists to build high performance analytics using R. R Client not only allows you to use any of the open source R functions to analyze the data present on your local workstation, it also enables you to analyze remote big data and scale out the analytics by pushing the computation to a production instance of Microsoft R Server such as SQL Server R Services, R Server for Hadoop and HD Insight with Spark. You can download Microsoft R Client today at http://aka.ms/rclient.
Power BI support for Spark Streaming, previously announced with Power BI General Availability, Spark support in Power BI is now expanded with new support for Spark Streaming scenarios. This allows you to publish real-time events from Spark Streaming directly into one of the fastest growing visualization tools in the market today.
This week, IBM also launched Data Science Experience, which is a cloud-based development environment for Apache Spark that could help data scientists work very efficiently with developers to build smarter apps. According to Tech Republic:
"The Data Science Experience is available through IBM's Cloud Bluemix platform, and it provides curated data sets, open source toolsets, and a collaborative space. In theory, it will allow data scientists to provide developers with better insights and data-driven models to be used in application development."
Microsof, of course, has been increasing its focus on open source. CEO Satya Nadella (shown above), has commented on how he "loves Linux" and he reportedly claims that more than 20 percent of Microsoft's Azure cloud is already Linux-based. It looks like Spark wil be the next open source frontier for Microsoft.