Survey Shows Spark Spreading Out, Heading to the Cloud

by Ostatic Staff - Nov. 10, 2016

New survey data from nearly 7,000 respondents in the Big Data space are in, conducted by The Taneja Group for Cloudera, which focuses on Hadoop/Spark-based data-centric tools. The new "Apache Spark Market Survey" shows that Spark is set to break from the Hadoop ecosystem and function more and more as an independent data processing tool. It may move from on-premises installations to the cloud in many instances.

Here are details.

According to the survey leaders:

"We found that across the broad range of industries, company sizes, and big data maturities represented in the survey, over one-half (54%) of respondents are already actively using Spark. Spark is proving invaluable as 64% of those currently using Spark plan to notably increase their usage within the next 12 months. And new Spark user adoption is clearly growing – 4 out of 10 of those who are already familiar with Spark but not yet using it plan to deploy Spark soon.

The top reported use cases globally for Spark include the expected Data Processing/Engineering/ETL (55%), followed by forward-looking data science applications like Real-Time Stream Processing (44%), Exploratory Data Science (33%), and Machine Learning (33%). The more traditional analytics applications like Customer Intelligence (31%) and BI/DW (29%) were close behind, and illustrate that Spark is capable of supporting many different kinds of organizational big data needs. The main reasons and drivers reported for adopting Spark over other solutions start with Performance (mentioned by 74%), followed by capabilities for Advanced Analytics (49%), Stream Processing (42%) and Ease of Programming (37%).

When it comes to choosing a source for Spark, more than 6 out of 10 Spark users in the survey have considered or evaluated Cloudera, nearly double the 35% that may have looked at the Apache Download or the 33% that considered Hortonworks. Interestingly, almost all (90+%) of those looking at Cloudera Spark adopted it for their most important use case, equating to 57% of those who evaluated Cloudera overall. Organizations cited quality of support (46%) as their most important selection factor, followed by demonstrated commitment to open source (29%), enterprise licensing costs (27%) and the availability of cloud support (also 27%)."

The data show that Spark has remarkable momentum and it continues to drive strategies at up and coming companies. 

“We have seen a significant customer adoption of Spark for building data pipelines and advanced analytics,” said Anoop Dawar, vice president of product management, Spark and Hadoop, MapR Technologies. “MapR has fully supported the Spark stack for two years – more than any other vendor in this industry. Based on customer feedback MapR provides early preview releases so data scientists and developers can try cutting edge features and then follows it up with a GA release for production deployments.”  

The Taneja group has summarized many of its survey findings in some infographics, found here