IBM's Massive Spark Initiatives Include an Offering for Data Scientists

by Ostatic Staff - Jun. 30, 2016

People all over the Big Data and Hadoop communities are becoming increasingly interested in Apache Spark, an open source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. In fact, Spark has drawn billions of dollars of commitments from organizations including IBM. As I covered here, IBM  has called Apache Spark "potentially the most important new open source project in a decade that is being defined by data."

IBM is weaving Spark into its own product offerings, but also note that the company has a new offering for data scientists that leverages Spark and cloud services. Big Blue announced the first cloud-based development environment for near real-time, high performance analytics, "giving data scientists the ability to access and ingest data and deliver insight-driven models to developers." Available on the IBM Cloud Bluemix platform, the Data Science Experience provides 250 curated data sets, open source tools and a collaborative workspace to help data scientists uncover and share meaningful insights with developers, making it easier to rapidly develop applications that are infused with intelligence.

Building on its $300 million investment in developing Apache Spark as a type of "analytics operating system," IBM created the Data Science Experience to extend Spark's data crunching power to more than two million members of the R community through new contributions to SparkR, SparkSQL and Apache SparkML

"The Data Science Experience's open and collaborative environment allows data scientists to accelerate and simplify data ingestion, curation and analysis by bringing together the content, data, models, and open source resources from IBM and others including H2O, RStudio, Jupyter Notebooks on Apache Spark in a single security-rich managed environment," claims IBM.

 "With Apache Spark, we see an opportunity to significantly transform the role of the data scientist by providing access to curated data sets, open source tools and a collaborative platform to accelerate innovation," said Bob Picciano, Senior Vice President, IBM Analytics.  "IBM's Digital Science Experience is the killer enterprise app for Apache Spark, and gives data scientists new opportunities to deliver insight-driven models to developers, and opens the door for unprecedented innovation from the open source community."

With H2O.ai as a partner, IBM can also leverage open source machine learning and artificial intelligence tools with its offering. We interviewed H2O.ai officials here and here. Oleg Rogynskyy, VP of Marketing & Growth at H2O.ai, said that its tools help drive better predictions.

IBM is tying its Spark initiatives to the rise of the Internet of Things, too. As data and analytics are embedded into all kinds of objects and apps as part of the Internet of Things (IoT) push, IBM claims that "Spark brings essential advances to large-scale data processing." The company says it dramatically improves the performance of data dependent apps. And, it purportedly radically simplifies the process of developing intelligent apps, which are fueled by data.