Apache Spark Gets Billed as the Next Big Data Thing
Follow Us:
Follow us on Twitter
Subscribe to our RSS
Apache Spark Gets Billed as the Next Big Data Thing
by Sam Dean - Jul. 31, 2014Comments (0)
Related Blog PostsMapR Introduces an In-Hadoop Document DatabaseCloudera's Kudu Storage Does Fast Analytics on Fast DataOur Latest Talks with Cloud and Data Analytics InfluencersDatabricks Survey Shows Massive Commitments to SparkSplunk Joins Effort to Apply Visual Analytics to Hadoop
People in the Big Data and Hadoop communities are becoming increasingly interested in Apache Spark, an open source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.  According to Apache, Spark can run programs up to 100 times faster than Hadoop MapReduce in memory, and ten times faster on disk. When crunching large data sets, those are big performance differences.Among vendors making moves surrounding Spark, Cloudera made a number of notable announcements recently. The company, focused on Hadoop, announced Apache Spark training "to prepare developers and software engineers to build complete, unified applications that combine batch, streaming, and interactive analytics.""Broadly embraced by the open source community, Big Data vendors, and data-intensive enterprises for its stream processing capabilities and its support for complex, iterative algorithms, Spark offers performance gains that enable applications to run on the data in a Hadoop cluster at speeds up to 100 times faster than traditional MapReduce programs," Cloudera claims.Cloudera has already been involved in offering commercial support for Spark as part of its Cloudera Enterprise subscription and the company recently announced a collaboration with Databricks, IBM, Intel, and MapR to broaden support for Spark as the standard data processing engine for the Hadoop ecosystem. "Spark offers clear benefits for realizing sophisticated analytics and is quickly becoming the future of data processing on Hadoop," said Sarah Sproehnle, vice president, Education Services, Cloudera, in a statement. "With Spark, customers can realize immediate business advantages. For example, Spark Streaming enables businesses to process live data as it arrives in the enterprise data hub, rather than having to wait to batch-process it later. The fact that the same codebase can be used for streaming data and data-at-rest significantly reduces development time for Big Data applications, speeding up time-to-insight by several orders of magnitude and decreasing the need for expensive specialized systems. This is just one case where the benefits of Spark have a direct impact on a company's bottom line."  Some are actually calling Apache Spark "the next big thing in Big Data."  According to a post by John Furrier:"What is the next big thing in #bigdata?  It’s called Spark. Spark is a fast data analysis engine. Think Hadoop MapReduce, but 100x faster and still fully interoperable with the wider Hadoop ecosystem. Spark has the largest open-source development community in the Big Data space, after Hadoop MapReduce, with over 90 developers from 25 companies contributing code."You can find out more about Spark here, including release notes on a brand new version that arrived a week ago. We also covered Cloudera's work with Intel and partners to deliver Hadoop appliances leveraging Apache Spark here.  In an announcement, Cloudera, Dell and Intel said they are launching a dedicated Dell In-Memory Appliance for Cloudera Enterprise, to be known as Dell Engineered Systems for Cloudera Enterprise. It's basically an integrated appliance solution that can make advanced Hadoop-driven analytics easy to implement in data centers, but powerful via Spark integration.    
Hadoop Cloudera Apache Spark
Previous: Android Nails Down R...Next: New Fedora Security,... Browse Blog
Jesse Babson uses OStatic to support Open Source, ask and answer questions and stay informed. What about you?
Share Your Comments
If you are a member, Sign in to have your comment attributed to you. If you are not yet a member, Join OStatic and help the Open Source community by sharing your thoughts, answering user questions and providing reviews and alternatives for projects.
Your Name
Email Address (kept hidden)
Your Comment *
Promote Open Source Knowledge by sharing your thoughts, listing Alternatives and Answering Questions!
Featured MembersViewLeon MerchiSystem Analyst at a large IT Services firm. Based...
ViewJesse BabsonI used to work for a large chip company here and a...
Related Questions
Get answers and share your expertise.
Have a question? Ask the community
Any Authentic Hadoop Tutorials Out There?
By Alvin Pieterson - Jun 22, 2011
post answer
What is the Limit of the HDFS Directory in Hadoop?
By Alvin Pieterson - Jun 21, 2011
2 answers
How MapReduce splits the work among machines in cluster?
By Balasubramanian T - Sep 29, 2010
1 answer
Partner Center
Happening Now on OStatic
Ilya Geller commented on Pivotal Open Sources Key Analytics, Hadoop Tools
hinaismil commented on Dia: A Strong Open Source Answer to Microsoft's Visio
Rudi commented on SourceForge's Community Choice Awards: Winners Named
About OStatic
Terms of Service
Privacy Policy
Send Feedback
Powered by Vox Holdings
© 2015 OStatic. Built on fine Open Source Software from projects like
MySQL and
Sign in to OStatic
Username: *
Password: *
Not a member? Join NowI forgot my password