Myriad Project Marries YARN and Apache Mesos Resource Management
There are a lot of interesting announcements arriving as the O'Reilly Strata event rolls out. In one notable example, MapR and Mesosphere have announced a new open source Big Data framework (called Myriad) that allows Apache YARN jobs to run alongside other applications and services in enterprise and cloud datacenters. The initiative was kicked off by a developer at Ebay and turned into a collaborative effort between multiple companies. The project is now approaching Apache incubation. Here are more details.
Myriad (available on GitHub) is an open source project focused on consolidating big data with other workloads in the datacenter into a single pool of resources for operational efficiency. There are plans to submit Myriad as an Apache Incubator project with the Apache Software Foundation in the first quarter of 2015.
According to the announcement of Myriad:
"To date, Hadoop developers have been forced to run big data jobs on dedicated clusters, leaving those resources isolated from other applications and services in production, and typically resulting in poor server utilization rates. Myriad leverages both Apache YARN and Apache Mesos, allowing big data workloads to run alongside other applications including long-running Web services, streaming applications (like Storm), build systems, continuous integration tools (like Jenkins), HPC jobs (like MPI), Docker containers, as well as custom scripts and applications."
“Big data developers no longer have to choose between YARN and Mesos for managing clusters,” said Florian Leibert, CEO and co-founder of Mesosphere. “Myriad allows you to run both, and to run all of your big data workloads and distributed applications and systems on a single pool of resources. Big data developers get the best of YARN’s power for Hadoop-driven workloads, and Mesos’ ability to run any other kind of workload, including non-Hadoop applications like Web applications and other long-running services.”
YARN is a sub-project of Hadoop at the Apache Software Foundation that takes Hadoop beyond batch to enable broader data-processing. These types of tools, focused on improving on popular types of batch processing, were mentioned by Eucalyptus cloud platform founder Rich Wolski in our recent interview with him. We also discussed related topics with Mesosphere's Ben Hindman in a recent interview.
You can find Mesosphere's post on Myriad here. It notes the following:
"Today, Mesosphere and MapR are proud to announce project Myriad, an open source framework for running YARN on Mesos that integrates the two major powerhouses in the datacenter—Mesos and Hadoop—and makes them fully compatible technologies."
For organizations that run YARN, conventional practice has been for operations teams to create a statically partitioned cluster dedicated to YARN workloads. The YARN cluster would only Hadoop workloads and nothing else.It would have it’s own hardware or cloud instances, its own operations team, and could not share resources with other workloads in the datacenter.
"Project Myriad combines the best of YARN and Mesos, allowing modern Hadoop workloads to run elastically with other datacenter and cloud workloads, thereby sharing resources with all of the organizations Linux applications (e.g., web servers, Java apps) as well as their datacenter services like Cassandra, Kafka, Elasticsearch, and Kubernetes."