IBM's Spark-Driven Data Science Experience Cozies Up to GitHub
A few months ago, we caught up with Kavitha Mariappan, who is Vice President of Marketing at Databricks, for a guest post on open source tools and the rapidly evolving field of data science. She noted that Apache Spark supports complete data science pipelines with libraries that run on the Spark engine.
Also, a few months ago, IBM announced the first Spark-driven, cloud-based development environment for near real-time, high performance analytics, "giving data scientists the ability to access and ingest data and deliver insight-driven models to developers." Available on the IBM Cloud Bluemix platform, the Data Science Experience provides 250 curated data sets, open source tools and a collaborative workspace to help data scientists uncover and share meaningful insights with developers, making it easier to rapidly develop applications that are infused with intelligence. Now, IBM has announced the integration of GitHub with the IBM Data Science Experience to enhance the collaboration between data scientists.
According to Big Blue:
"Last year we announced a strategic partnership with GitHub. GitHub boasts roughly 12 million users, including some 60,000 organizations and many data scientists. We are excited to announce the integration of GitHub with the IBM Data Science Experience to enhance the collaboration between data scientists. Now you can combine code, data and visualizations in real-time with your colleagues."
"Millions of developers use GitHub to build personal projects, support their businesses, and work together on open source technologies," said GitHub's Todd Berman, VP of Engineering. "GitHub is a powerful addition to IBM's Bluemix that builds on our strategic partnership to dramatically advance the development of next generation cloud applications for enterprise customers."
Building on its $300 million investment in developing Apache Spark as a type of "analytics operating system," IBM created the Data Science Experience to extend Spark's data crunching power to more than two million members of the R community through new contributions to SparkR, SparkSQL and Apache SparkML
"The Data Science Experience's open and collaborative environment allows data scientists to accelerate and simplify data ingestion, curation and analysis by bringing together the content, data, models, and open source resources from IBM and others including H2O, RStudio, Jupyter Notebooks on Apache Spark in a single security-rich managed environment," claims IBM.
"With Apache Spark, we see an opportunity to significantly transform the role of the data scientist by providing access to curated data sets, open source tools and a collaborative platform to accelerate innovation," said Bob Picciano, Senior Vice President, IBM Analytics. "IBM's Digital Science Experience is the killer enterprise app for Apache Spark, and gives data scientists new opportunities to deliver insight-driven models to developers, and opens the door for unprecedented innovation from the open source community."