Cloudera Tests Impala Against Competitive Analytics Engines

by Ostatic Staff - Sep. 22, 2016

In the cloud and on the Big Data scene, there is a pronounced need for advanced data analytics and database-driven insigts. Apache Impala has emerged as an important tool providing these solutions, and Cloudera is out with some notable test results for Impala. Cloudera, focused on Apache Hadoop, released benchmark results that show that its  analytic database solution, powered by Apache Impala (incubating), delivers very fast capabilities for cloud-native workloads but does so at better cost performance compared to alternatives.

Here are some of the results of tests comparing this solution to Amazon Redshift on S3.

Impala decouples data and compute to provide SQL analytics whether cloud-natively over data in S3 or across on-premise and cloud storage options. Cloudera claims that Impala enables all these capabilities while also delivering up to 275% more cost-efficiency and up to 10x greater performance compared to Amazon’s analytic database Redshift.

Using queries from the TPC-DS industry standard benchmark, Cloudera compared Impala running on the cloud (both cloud-natively over S3 and over local EBS storage) to Amazon Redshift (only able to run over its own storage on dedicated AWS instances).  It found that Impala is 28-275% less costly and 42-400% faster on EBS compared to either pre-tuned or general purpose tuned Redshift.

“Increasingly our customers are looking to move BI and analytic workloads to cloud environments to tap into the cost-effectiveness of elastic scale and greater flexibility. But they still require the high-performance analytics and big data agility they’re used to on-premises,” said Charles Zedlewski, Vice President, Products, at Cloudera. “Impala brings all its advantages it has over traditional, on-premise analytic databases to the cloud with a modern architecture that enables unprecedented agility no matter where the data lives. This comparison is clear evidence that Impala is unmatched for these BI and analytic workloads in the cloud.”

Impala works natively with data stored on a number of storage engines, including Amazon S3 object store.

Cloudera also reports that the latest release of Impala delivers 12x better performance on secure workloads compared to its two prior versions.

For more on Cloudera's tests see Performance Comparison Blog of Apache Impala (incubating) and Amazon Redshift.