Cloudera Delivers Release Built on Apache Spark 2.0, and Advances Kudu

by Ostatic Staff - Sep. 30, 2016

Cloudera, focused on Apache Hadoop and other open source technologies,has announced its release built on the Apache Spark 2.0 (Beta), with enhancements to the API experience, performance improvements, and enhanced machine learning capabilities.

The company is also working with the community to continue developing Apache Kudu 1.0, recently released by the Apache Software Foundation, which we covered here. Kudu is an open source columnar storage engine built for the Apache Hadoop ecosystem designed to enable flexible, high-performance analytic pipelines. Taken together, Cloudera's new tools are giving it more diverse kinds of presence on the Big Data scene.

Cloudera claims it was the first Hadoop big data analytics vendor to deliver a commercially supported version of Spark, and has participated actively in the open source community to enhance Spark for the enterprise through its One Platform Initiative. "With Spark 2.0, organizations are better able to take advantage of streaming data, develop richer machine learning models, and deploy them in real time, enabling more workloads to go into production," the company reports.

Spark 2.0 features include: 

Better performance and enhanced usability with the new Dataset API

Structured Steaming for better performance and easier ingest of traditional structured data for time series, tabular, and Internet of Things (IoT) data

 Compile-time type safety for user-defined functions for improved reliability in mission-critical applications

 Machine learning model, pipeline persistence, and newly supported machine learning libraries to take on new data sets and analytic applications

 "Cloudera was the first vendor to offer a commercially supported version of Apache Spark in our big data platform. In the years since then, Spark has become a standard for stream processing and machine learning workloads across the industry," said Mike Olson, founder and chief strategy officer at Cloudera. "As a component of a Cloudera enterprise data hub, Spark benefits from the security, manageability, data governance, and compliance services that customers demand. It can handle high-scale, high-performance workloads reliably. Being a part of the global Spark community, and committed to continued enhancements for demanding enterprises."

As for Apache Kudu, it's one of several Big Data tools that the Apache Software Foundation has graduated to Top-Level recently.  Cloudera donated Kudu to the Apache Software Foundation (ASF) to open it to the broader developer community to expand the type and variety of use cases for it. "While Spark 2.0 will give businesses better access to streaming data, Kudu 1.0 will enable enterprises to adopt real-time use cases at a greater pace," Cloudera claims.

“Kudu is a response to the increase in prevalence of real-time analytic use cases in the market,” said Charles Zedlewski, vice president, Products at Cloudera. “As far back as 2012, Cloudera recognized the analytic gap in the Hadoop ecosystem that was leading architects to create complex hybrid architectures for real-time analytics. With the Apache Kudu 1.0 launch, the original vision is coming to fruition as users can now rely on a single, simplified project for fast analytics on fast data. We’ve seen the community quickly adopt Kudu and apply it to numerous high-scale, real-time analytic use cases.”

Kudu offers fast scans across data for analytics, and instant read/write capabilities for frequent updates and searches.

Here are the download links for the new tools:

  • Download Spark 2.0
  • Download Kudu 1.0