Google Cloud Dataflow Stacks Up with Spark in Benchmark Tests

by Ostatic Staff - May. 03, 2016

Apache Spark has rocketed to success as an in-memory data processing framework that even has a billion dollar development commitment from IBM behind it. But Mammoth Data, which specializes in Big Data consulting, has announced the findings of its latest cloud solution benchmark study, which compares Google Cloud Dataflow and Apache Spark. Surprisingly, in performance tests, Cloud Dataflow was very competitive with Spark.

In its benchmark tests, Mammoth Data identified five key advantages of using Google Cloud Dataflow:

Performance: Google Cloud Dataflow provides dynamic work rebalancing and intelligent auto-scaling, which enables increased performance with zero increased operational complexity.

Developer friendly: Google Cloud Dataflow features a developer-friendly API with a unified approach to batch and streaming analysis.

Operational simplicity: Google Cloud Dataflow holds distinct advantages with a job-centric and fully managed resource model.

Easy integration: Google Cloud Dataflow can easily be integrated with Google Platform and its different services.

Open-source: Google Cloud Dataflow's API was recently promoted to an Apache Software Foundation incubation project called Apache Beam.

 "Google Cloud Platform data processing and analytics services are aimed at removing the implementation complexity and operational burden found in traditional Big Data technologies. Mammoth Data found that Cloud Dataflow outperformed Apache Spark, underscoring our commitment to balance performance, simplicity and scalability for our customers," said Eric Schmidt, product manager for Google Cloud Dataflow.

"When Google asked us to compare Dataflow to other Big Data offerings, we knew this would be an exciting project," said Andrew C. Oliver, president and founder of Mammoth Data. "We were impressed by Dataflow's performance, and think it is a great fit for large-scale ETL or data analysis workloads. With the Dataflow API now part of the Apache Software Foundation as Apache Beam, we expect the technology to become a key component of the Big Data ecosystem."

 Spark grabs so many headlines that it is easy to think of it exclusively for many Big Data tasks, but Cloud Dataflow is going to be worth a look, especially for many enterprises.

Here are the complete Benchmark Results.