Spark Gains More Momentum, New, Free Insider's Guide Available

by Ostatic Staff - Nov. 10, 2015

Folks in the Big Data and Hadoop communities have been getting increasingly interested in Apache Spark, an open source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.  Apache notes that Spark can run programs up to 100 times faster than Hadoop MapReduce in memory, and ten times faster on disk. When crunching large data sets, those are big performance differences

In OStatic's recent interview with Eucalyptus cloud originator Rich Wolski, he cited Spark and other technologies competitive with MapReduce as being very interesting. Now, insideBIGDATA is offering An Insider’s Guide to Apache Spark, available for download from the insideBIGDATA White Paper Library.

According to insideBIGDATA:

"All of the major Hadoop distributions now support Spark, and with good reason: Spark is vendor agnostic, which means it doesn’t tie the user to any specific provider. Due to Spark’s open-source nature, businesses are free to create a Spark-based analytics infrastructure without worrying about what happens if they change Hadoop vendors later. If they make a switch, they can bring their analytics with them."

Additonally, there are some new survey findings regarding Spark.

Typesafe conducted a survey of the Spark ecosystem recently. Key takeaways show how Spark is being deployed:

For data source, 62% of Spark survey respondents were using HDFS. Nearly half, 46%, were using some form of database. 41% were using Kafka, and 29% were using Amazon S3.

For cluster management, 56% were running standalone Spark. 42% were on YARN, and 26% were on Apache Mesos.

For languages, 88% were using Scala, 44% were using Java, and 22% were using Python.

 According to the TypeSafe summary of the survey results:  "By far the most desirable features are Spark's vastly improved processing power over MapReduce (over 78 percent mention this) and the ability to process event streams (over 66 percent mention this), which MapReduce cannot do."