Netflix Open Sources Genie Tool for Hadoop Services on AWS

by Ostatic Staff - Jun. 24, 2013

Although cloud computing platforms make headlines every day now, including leading open source platforms, it's still true that there is a great need for reliable, proven tools and components for cloud deployments. Los Gatos, Calif.-based Netflix has extensive experience with running cloud services, and beginning last year the company began steadily open sourcing a series of interesting components that it has deployed as satellite utilities orbiting its central cloud platform.

First, Netflix released Chaos Monkey, and then came Janitor Monkey.  Janitor Monkey is a service which runs in the Amazon Web Services (AWS) cloud looking for unused resources to clean up. Chaos Monkey randomly kills instances within Netflix's architecture, working on the assumption that constant failures will help build robust defenses against catastrophic failure. Now, Netflix has open sourced its Genie tool for running Hadoop jobs on Amazon Web Services.

Sriram Krishnan from Netflix has a blog post up about Genie:

"Salient features of our architecture include the use of Amazon’s Simple Storage Service (S3) as our "source of truth", leveraging the elasticity of the cloud to run multiple dynamically resizable Hadoop clusters to support various workloads, and our horizontally scalable Hadoop Platform as a Service called Genie. Today, we are pleased to announce that Genie is now open source, and available to the public from the Netflix OSS GitHub site."

"Genie provides job and resource management for the Hadoop ecosystem in the cloud. From the perspective of the end-user, Genie abstracts away the physical details of various (potentially transient) Hadoop resources in the cloud, and provides a REST-ful Execution Service to submit and monitor Hadoop, Hive and Pig jobs without having to install any Hadoop clients. And from the perspective of a Hadoop administrator, Genie provides a set of Configuration Services, which serve as a registry for clusters, and their associated Hive and Pig configurations."

Netflix actually runs Hadoop clusters in the cloud to support a number of different types of services. Some of the tasks run nightly in algorithmic fashion, and some run occasionally on an ad-hoc basis. The Netflix post stresses that Genie is not a workflow scheduler, such as Oozie. Genie’s unit of execution is a single Hadoop, Hive or Pig job.

There is a useful diagram showing how Genie works at the bottom of this page

Netflix has more tools from its cloud arsenal due to be open sourced and the notable thing about these utilities and components is that they were designed by cloud-experienced engineers to solve real cloud problems. As open source cloud deployments proliferate, these are worth watching.