Cloudera Launches Plan to Help Unite Apache Spark and Hadoop

by Ostatic Staff - Sep. 10, 2015

Rarely has any Apache project gained so much momentum so early as Apache Spark has. Folks everywhere in the Big Data and Hadoop communities are becoming increasingly interested in Spark, an open source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley. In fact, IBM has called Spark "potentially the most important new open source project in a decade that is being defined by data."

Now, Cloudera, which focuses on enterprise analytic data management powered by Hadoop, has announced the One Platform Initiative, an effort to accelerate Apache Spark development for the enterprise.

Cloudera officals call Spark "already the most popular open source project in the Hadoop ecosystem," and the company's Spark initiative will enable it to become the successor to Hadoop's original MapReduce framework for general Hadoop data processing. By embedding Spark deeply and broadly across the platform, in areas spanning management, security, scalability, and streaming, Cloudera is looking to help make the next generation of analytic applications possible.

According to the company:

Over the past 18 months, Spark has seen wide adoption, with over 200 of Cloudera's customers - including Avvo, Barclays, Concur, DigitalGlobe, RelayHealth, and Santander UK - running Spark across diverse industries and for multiple use cases. Recognizing Spark's potential to become the next general processing framework for Hadoop, due to its ease-of-use for developers, modular flexibility, and performance, Cloudera invested ahead of the market in core engineering, support, services, and training to make customers successful with Spark.

As the first Hadoop vendor to ship and support Spark, Cloudera has been a leader in the Spark community and, in particular, in integrating Spark and Hadoop. With over 5x the Spark engineering resources of other Hadoop vendors, Cloudera has contributed over 370 patches and 43,000 lines of code to Spark and has made its development a key initiative with its partner, Intel. As a result, Spark is a deeply integrated and widely used component of Cloudera's Hadoop platform. This production experience has provided considerable insight into the challenges of running Spark in customer environments at scale, and extensive knowledge of engineering and analytics teams' requirements.

 "Spark is rapidly becoming a popular choice to complement Hadoop as businesses want a friendly, fast, and versatile engine to cover analytics needs around streaming, graph, and even machine learning," said Nik Rouda, senior analyst, ESG. "Cloudera is making big investments in developing and supporting Spark as a full-fledged component of their robust offerings. The big data market will continue to evolve rapidly, but this ensures Cloudera will be not only relevant, but remain a leader going forward."

"Spark is well on its way to succeeding MapReduce in enabling jobs with hundreds of executors each, running simultaneously on large multi-tenant clusters with tens of thousands of nodes, but there is still some heavy lifting to do," said Mike Olson, founder and chief strategy officer, Cloudera. "It's an ambitious goal, but with the community of committers and supporters, and our leadership, we think that's highly achievable."

For more information about the One Platform Initiative and how to get involved, you can join Doug Cutting, the co-creator of Hadoop and chief architect at Cloudera, for the webinar, "Unifying Spark and Hadoop: The One Platform Initiative" on Thursday, September 24 at 10.00am PT.