Hadoop Adjuncts Proliferate: YARN, Koya, Slider, and, Yes...Kafka

by Ostatic Staff - Feb. 06, 2015

Lately we've been covering tools that orbit Hadoop in the Big Data ecosystem, ranging from Elastic Search to Qubole, which offers analytics on Hadoop data as a service (HaaS), to the Apache Spark project. In this arena, Kafka and YARN are much talked about. YARN is a sub-project of Hadoop at the Apache Software Foundation that takes Hadoop beyond batch to enable broader data-processing. Kafka allows a single cluster to serve as the central data backbone for a large organization. With it, data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine.

Back in November, DataTorrent announced KOYA – an open source initiative to integrate Kafka and YARN. Kafka utilizes the centrally managed pool of resources running under the YARN umbrella. 

Apache Slider from HortonWorks was built to enable long running services on YARN without making changes to the services themselves. Slider allows users to create and run different versions of long running applications in Hadoop with YARN. And, DataTorrent has been able to bring Kafka to YARN using Slider.

According to a guest post by DataTorrent on Hortonworks' blog:

"It makes sense to integrate Kafka with YARN. Existing investments and skills can be leveraged. Kafka running under the YARN umbrella can utilize the centrally managed pool of resources. The process monitoring and recovery features of YARN can be extended to provide complete HA for Kafka servers (Kafka provides replicated partitions, but it does not offer automation for dealing with failed brokers)."

"Given the background, why not set out and write a completely new application master for KOYA? Considering our goals with KOYA and that Kafka was built with fault tolerance in mind and already provides most of the HA features, we evaluated Apache Slider. Slider was built to enable long running services on YARN without making changes to the services themselves. We found it sufficient to bring Kafka to YARN using Slider as it provides much of the infrastructure required for KOYA."

KOYA requires Hadoop 2.6. It supports installation with embedded Slider or as Slider application package that can be added to an existing Slider install. KOYA consists of Python scripts for the agent and configuration files.

KOYA is under development as open source and is gaining collaboration with Kafka and YARN communities. The first official release is scheduled for the second quarter of this year.