Cloudera Deepens Integration of Spark with Hadoop

by Ostatic Staff - Dec. 04, 2015

Cloudera, focused on big data and Apache Hadoop, has announced that it has further matured Apache Spark integration within Hadoop environments. Spark and Hadoop are both flourishing on the big data scene. To further expand the enterprise capabilities of Spark, Cloudera has added support for Spark SQL and MLlib into Cloudera Enterprise 5.5 and CDH 5.5, which the company launched recently.

Due to its development ease and flexible data processing, Spark has taken off in the open source community and across customer use cases. It is the most active project in the Apache Software Foundation (ASF), with more than 800 developers from more than 200 companies, and IBM is spending huge sums on initiatives surrounding it. Cloudera’s team of Spark committers have been actively driving the enterprise capabilities of Spark and uniting Spark within Hadoop to meet customer needs and further production adoption, according to the company, which has provided a related infographic.

”The embrace of Spark by the developer community and Cloudera’s efforts in the past year to drive its mainstream adoption have been nothing short of remarkable,” said Doug Cutting, chief architect at Cloudera. “With the most customers running Spark with Hadoop, we have already made impressive strides in furthering the enterprise capabilities of Spark for Hadoop deployments across industries and use cases. With the addition of Spark SQL and MLlib to Cloudera’s platform, and a clear roadmap with the One Platform Initiative, Spark adoption will continue to soar for batch, streaming, and machine learning use cases.”

Over the past year, Cloudera claims that it has made significant strides in maturing Spark to address a wider range of data processing use cases, including end-to-end Internet of Things (IoT) applications, simpler batch processing, and native machine learning.