Q&A: H20's Oleg Rogynskyy on Open Source and Machine Learning

by Ostatic Staff - Sep. 22, 2015

H2O, formerly known as Oxdata, has steadily been carving out a niche with its  open source software for big data analysis and machine learning. There is a community aligned behind the company's tools, and machine learning is a rapidly expanding field.

OStatic caught up with Oleg Rogynskyy, VP of Marketing & Growth at H2O, for an interview. Here are his thoughts.

Please tell us what H2O’s tools are good for, in terms of allowing organizations to make better sense of their stored data.

H2O’s machine learning technology can take a vast quantity of data and reduce it to manageable, actionable insights. To illustrate, one of our customers, Cisco, uses 60,000 models to predict purchasing decisions. Our technology was able to quickly and accurately score all these models 10-15x faster than their previous solution according to Cisco’s principal data scientist, Lou Carvalheira. In addition, H2O makes data scientists more efficient by automating many of their most tedious duties, such as data munging, so that they can focus their attention on more critical tasks.

What is the H2O business model and to what extent do you keep your tools open source?

We are a completely open source organization. We make the world's best machine learning technology and give it away for free. It is our open source roots that have allowed us to reach unparalleled adoption. Of the approximately 250,000 data scientists in the world 25,000 already use H2O, despite our product being around for less than three years. We follow the traditional open source business model pioneered by companies like Red Hat and Hortonworks of selling enterprise service and support on top of our customer’s mission-critical applications.

You have some very high-profile customers, ranging from PayPal to Cisco. What do some of these organizations do with your tools?

All of our customers use our tools to make better predictions. Many, like Cisco and PayPal, already have extensive predictive models in place. For organizations like these our primary goal is to make the predictive process easier by offering them a way to score their models faster and more accurately. Other organizations lack the in-house technical expertise to develop their own models, so we offer software and support to help them create predictive metrics from scratch. These models are used to predict fraud, calculate risk and determine consumer preferences, to give just a few examples.

Gartner recently issued a report on Hadoop's complexity and how many organizations are finding it difficult to work with. To what extent do you work to keep your tools simple, even when working with large data stores?

Simplicity and ease-of-use is extremely important to us at H2O. It’s our belief that data science should be easy and fun. The majority of our customers run H2O along with data stored in Hadoop. That’s why H2O runs out-of-the-box on top of customer’s Hadoop clusters, the technology is completely plug-and-play. Now that more organizations are turning to Spark, we’ve begun offering Sparkling Water + H2O, the best machine learning technology available on the Spark platform. The idea is to easily integrate with whatever solutions our customers are using.

Where can people find your tools?

All our software is avaliable for download at h2o.ai/download.

Over the next five years, on the data crunching and analytics scenes, what do you forecast will happen? Will currently popular tools like Hadoop and MapReduce see viable competition appear?

We’re already seeing significant interest from the developer community regarding machine learning and its uses for app development. Underlying tools and platforms will need to be able to abstract away a lot of data science and domain science and expose intelligent APIs that can be used to build smarter applications. We’re already seeing Spark become increasingly popular for data crunching on Hadoop, overtaking MapReduce. Platforms like Hadoop, Spark, etc. will continue to evolve and newer tools will continue to arise to support the faster, and easier, processing of data. At H2O we provide the tools and building blocks for creating smarter applications, whether you use Hadoop, Spark or something else entirely.

Is there anything you would like to add?

If I may, I just want to add that if you’re interested in open source machine learning technology I would encourage you to attend our annual H2O World conference this November 9th to 11th. We’re going to have some great speakers including Hilary Mason, the Founder of Fast Forward Labs and Monica Rogati, Data Science Advisor at Data Collective. Looking forward to seeing you all there!


Editor's Note: This interview is a continuation of our latest comprehensive collection of recent talks with some really influential people. These have included project leaders working on the cloud, Big Data, and the Internet of Things, and they have involved talks with Chad Carson, co-founder of Pepperdata, Rich Wolski who founded the Eucalyptus cloud project, Ben Hindman from Mesosphere, Tomer Shiran of the Apache Drill project, Philip DesAutels who oversees the AllSeen Alliance, CEO of StackStorm Evan Powell, Tomer Shiran on MapR and Hadoop, the University of Washington team behind Grappa for data analytics, and co-founder of Mirantis Boris Renski