LinkedIn Open Sources Highly Useful Hadoop Tools

by Ostatic Staff - Aug. 17, 2015

From Google to Netflix to Facebook, many large tech companies are good citizens when it comes to open sourcing tools that can be widely used by others, and now LinkedIn is joining the crowd. Sevelopers at LinkedIn internally created a project called Gradle, comprised of several workflow tools that simplify connecting multiple Hadoop jobs within the context of an application.

There are a lot of organizations that know the challenge of building applications that run across a Hadoop cluster. Gradle is a potentially effective solution.

The LinkedIn Gradle Plugin for Apache Hadoop ("Hadoop Plugin") includes the LinkedIn Gradle DSL for Apache Hadoop ("Hadoop DSL"). You can get the Hadoop Plugin on Github today.

LinkedIn has already adopted Gradle as itsprimary build system. "With Gradle, developers can easily extend the build system by defining their own plugins," the company claims. "We developed the Hadoop Plugin to help our Hadoop application developers more effectively build, test and deploy Hadoop applications. The Plugin includes the Hadoop DSL, a domain-specific language for specifying jobs and workflows for Hadoop workflow managers like Azkaban and Apache Oozie."

A post from Alex Bain adds:

"In particular, the Hadoop Plugin includes tasks that will help you more easily work with a number of Hadoop application frameworks. Since no one tool is perfect for every kind of job, Hadoop jobs at LinkedIn are written using a number of different application frameworks. The Hadoop Plugin enables developers to organize their Hadoop projects in a consistent fashion regardless of the particular tool they choose for the job."

"Long before the Hadoop Plugin, Hadoop developers at LinkedIn had realized that writing individual Hadoop jobs was only part of the challenge in using Hadoop effectively. Most data-driven features that appear on LinkedIn are actually generated by processing pipelines that may consist of dozens of individual Hadoop jobs chained together into workflows managed in Azkaban or Oozie. Understanding the relationships between jobs in a workflow and managing the workflow specification files became a challenge in itself."

 You can find out much more about Gradle here.