Report Shows Hadoop on the Rise, But Demand for Spark is Most Notable

by Ostatic Staff - Jan. 19, 2016

Syncsort, which specializes in Big Data analytics and mainframe software, has announced the results from its second annual Hadoop survey. It shows that as more organizations are moving from Hadoop experimentation to production, realizing the full potential of big data analytics, there are a few top areas they will focus on in 2016.

Notably, the survey also shows evidence of much demand for Spark in the analytics community.

Based on the survey results, Syncsort reports on three trends that it anticipates in 2016:

1. Apache Spark will move from a talking point into deployment. Nearly 70 percent of respondents are most interested in Apache Spark, surpassing interest in all other compute frameworks, including the recognized incumbent, MapReduce (55 percent). While Syncsort expects MapReduce will still be the prevalent compute framework in production, the high level of interest should translate into more Spark deployments, mostly running on Hadoop.

2. Offloading from expensive platforms into Hadoop will continue to increase in numbers and scope. 63 percent of respondents feel Hadoop will help them increase business/IT agility, 55 percent expect to increase operational efficiency and reduce costs, and over 51 percent want to leverage it to make more data available for business use across the entire organization. These findings are consistent with Syncsort customer use cases that should continue to gain steam in 2016, including Mainframe and Enterprise Data Warehouse (EDW) offload to Hadoop. 

3. A growing number of companies will look to leverage Hadoop for advanced use cases. More than half of respondents see Hadoop as a way to innovate, using data from social media and IoT, and applying predictive analytics and visualization for greater insights about their business. Hadoop is yet to be leveraged for mobile apps and software, as only 4.9 percent reported utility for these use cases.

"As Hadoop adoption becomes mainstream, the number of applications in production increases and the use cases, frameworks and data sources become more varied and complex. Organizations realize significant benefits from Hadoop; however, they also cite challenges in keeping up with new tools and skills, connectivity and data movement, and unforeseen costs," said Tendü Yoğurtçu, General Manager of Syncsort's Big Data business. "A single software environment to access all enterprise data and manage the entire data pipeline will be critical for organizations to maximize the ROI on their Big Data projects, especially as the demand for real-time analytics in industries such as financial services, healthcare, telecommunications, and retail increases."

This latest survey is not the only piece of analysis pointing to huge things ahead for the open source Spark platform. Apache Spark is a data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.  According to Apache, Spark can run programs up to 100 times faster than Hadoop MapReduce in memory, and ten times faster on disk. When crunching large data sets, those are big performance differences.

In OStatic's  interview with Eucalyptus cloud originator Rich Wolski, he cited Spark and other technologies competitive with MapReduce as being very interesting. Databricks and Typesafe are also out with some survey results that bolster the case for Spark usage being on the rise.

The survey results indicate that 13% are already using Spark in production environments with 20% of the respondents with plans to deploy Spark in production environments, and 31% are currently in the process of evaluating it. We provided more details in this post.