Q&A: Pepperdata's Chad Carson Discusses Getting Much More Out of Hadoop
In the data analytics and Hadoop arena, the folks at Pepperdata have an interesting story to tell. Pepperdata's cofounders ran the web search engineering team at Yahoo during the development of the first production use of Hadoop and created Pepperdata with the mission of providing a simple way of prioritizing Hadoop jobs to give resources to the ones that need them most, while ensuring that a company adheres to its SLAs.
The company's software installs in under 30 minutes on an existing Hadoop cluster without any modifications to the scheduler, workflow, or jobs, delivering visibility into Hadoop workloads at the task level. As part of an interview series that we’re doing here at OStatic, we caught up with Chad Carson, cofounder of Pepperdata (shown), for a talk. Here are his thoughts.
Yahoo has a long history of using Hadoop and you were involved with the web search engineering team there that leveraged it. What did you learn from that experience and how did it lead you to found Pepperdata?
The web search team my Pepperdata co-founder, Sean Suchter, led at Yahoo was the first production use of Hadoop anywhere in the world. Hadoop had a huge impact on our ability to make improvements to the product very quickly and cost-effectively. I saw similar gains from using Hadoop in my sponsored search data science team — where we were able to run hundreds of live experiments each year, an increase of about 10X, leading to huge revenue gains. When we founded Pepperdata, our goal was to help enterprises get those same kinds of benefits from using Hadoop, by giving it the level of performance and reliability that enterprises need.
Tell us how Pepperdata's software tools work, and why organizations focusing on Hadoop should be interested.
Pepperdata’s software (which installs on customers' existing Hadoop clusters in less than one hour) monitors and controls the use of every kind of hardware resource in real time, so that Hadoop operators can ensure that high-priority production jobs get the resources they need so their most critical jobs complete on time. The best part: our overhead is typically 1% or less for this level of control. Pepperdata enables enterprises to rely on Hadoop to hit SLAs for production workloads.
There have been comparisons between your software and Ambari, which helps provision, manage and monitor Hadoop clusters. What are the differences between the tools?
Tools like Ambari are great for setting up and managing Hadoop clusters, and most of our customers are using tools like Ambari, Cloudera Manager, etc. Pepperdata’s software is totally complementary to those tools — we monitor and control hardware resource usage by every job and process, in real time, once they’ve started running. Enterprises who rely on Hadoop in production should definitely be using both cluster management tools and Pepperdata’s real-time cluster supervisor software — they work great together.
What do you have in the works as you develop your software?
We recently added support for Spark and we’ll be continuing to add support for new kinds of applications over the coming months.
Recently Gartner reported on how many enterprises are finding it hard to deploy, maintain and optimally use Hadoop. Do you think there are difficulties and a shortage of skilled Hadoop workers?
Definitely. A lot of the focus has been about a shortage of data scientists who are skilled in Hadoop, but we also see a real shortage of skilled Hadoop operators. More and more people are learning how to deploy and operate Hadoop, but industry's adoption of Hadoop is growing at an even faster rate. Software like Pepperdata’s, along with tools like Ambari and Cloudera Manager, will help address the skills gap by making it easier for operators to deploy Hadoop and get predictable performance.
What other projects in the Big Data and analytics spaces have you taken notice of?
We see a lot of people using HBase, of course, along with a huge spike of interest in Spark. Spark adoption is still early, but companies are definitely starting to use it in production.
OStatic's latest series of interviews with project leaders working on the cloud, Big Data, and the Internet of Things has included talks with Rich Wolski who founded the Eucalyptus cloud project, Ben Hindman from Mesosphere, Tomer Shiran of the Apache Drill project, Philip DesAutels who oversees the AllSeen Alliance, CEO of StackStorm Evan Powell, Tomer Shiran on MapR and Hadoop, the University of Washington team behind Grappa for data analytics, Luke Marsden, co-founder of ClusterHQ, and co-founder of Mirantis Boris Renski.