Google's Managed Hadoop and Spark Cloud Service Goes Live

by Ostatic Staff - Feb. 24, 2016

Google has announced that its Cloud Dataproc service — a managed tool based on the Hadoop and Spark open source big data software — is now generally available. Google Cloud Dataproc, because it leverages both Apache Hadoop and Apache Spark, promises to be in strong demand, especially at enterprises.

"When analyzing data, your attention should be focused on insights, not your tools," Google notes. "Often, popular tools to process data, such as Apache Hadoop and Apache Spark, require a careful balancing act between cost, complexity, scale, and utilization. Unfortunately, this means you focus less on what is important  your data  and more on what should require little or no attention  the cluster processing it. We created our managed Spark and Hadoop cloud service, Google Cloud Dataproc, to rectify the balance, so that using these powerful data tools is as easy as 1-2-3."

According to the announcement:

"Since Cloud Dataproc entered beta last year, customers have taken advantage of its speed, scalability, and simplicity. We’ve seen them create clusters from three to thousands of virtual CPUs, using our Developers Console and Google Cloud SDK, without wasting time waiting for their cluster to be ready."

"With integrations to Google BigQuery, Google Cloud Bigtable, and Google Cloud Storage, which provide reliable storage independent from Dataproc clusters, customers have created clusters only when they need them, saving time and money, without losing data. Cloud Dataproc can also be used in conjunction with Google Cloud Dataflow for real-time batch and stream processing."

"While in beta, Cloud Dataproc added several important features including property tuning, VM metadata and tagging, and cluster versioning. In general availability, just like in beta, new versions of Cloud Dataproc, with new features, functionalities and software components, will be frequently released. One example is support for custom machine types, available today."

 Also according to Google, Cloud Dataproc minimizes cost and complexity by providing:

Low-cost. We believe two things  using Spark and Hadoop should not break the bank and that you should pay for what you actually use. As a result, Cloud Dataproc is priced at only 1 cent per virtual CPU in your cluster per hour, on top of the other Cloud Platform resources you use. Moreover, with per-minute billing and a low 10-minute minimum, you pay for what you actually use, not a rounded (up) approximation.

Speed. With Cloud Dataproc, clusters do not take 10, 15, or more minutes to start or stop. On average, Cloud Dataproc start and stop operations take 90 seconds or less. This can be a 2-10x improvement over other on-premises and IaaS solutions. As a result, you spend less time waiting on clusters and more time hands-on with data.

Management. Cloud Dataproc clusters don't require specialized administrators or software products. Cloud Dataproc clusters are built on proven Cloud Platform services, such as Google Compute Engine, Google Coud Networking, and Google Cloud Logging to increase availability while eliminating the need for complicated hands-on cluster administration. Moreover, Cloud Dataproc supports cluster versioning, giving you access to modern, tested, and stable versions of Spark and Hadoop.

 Meanwhile, partners are already aligning around the new offering. Attunity Ltd., a provider of Big Data management software solutions, announced a new cloud solution, Attunity CloudBeam for Google Cloud Dataproc. "The Attunity solution seamlessly integrates with Google Cloud Platform to enable users accelerated Big Data loading from on-premises data centers and the cloud into Spark and Hadoop on Cloud Platform," according to the compan. "This enables a faster and easier process for enabling Big Data analytics in the cloud."