Q&A: MapR Wraps Apache Drill into its Hadoop Distribution

by Ostatic Staff - May. 19, 2015

MapR Technologies, which focuses on Apache Hadoop, today announced the general availability of Apache Drill 1.0 in the MapR Distribution. Drill, which we've covered before, delivers self-service SQL analytics without requiring pre-defined schema definitions, dramatically reducing the time required for business analysts to explore and understand data. It also enables interactivity with data from both legacy transactional systems and new data sources, such as Internet of things (IOT) sensors, Web click-streams, and other semi-structured data, along with support for popular business intelligence (BI) and data visualization tools.

“Backed by a vibrant open source community, Apache Drill combines on-the-fly schema discovery with the familiarity of ANSI SQL so analysts can interactively explore any type of data in a self-service fashion,” said Anil Gadre, senior vice president, product management, MapR Technologies.

For more on Drill and the MapR distribution, we interviewed MapR's VP of Product Management Tomer Shiran (shown). Here are Tomer's thoughts in a Q&A session:

What are the advantages that customers can leverage with Drill, and why did you wrap it into your distribution?

When you look at the role of a query engine, you want to be able to execute queries really quickly on a lot of data. But you don’t really want a lot of overhead from things like loading data, creating and maintaing schemas, and transporting and converting data.

Drill gives you what you want from a query engine without all of that overhead. It makes for much more agile queries.

Gartner researchers recently reported that some enterprises are finding Hadoop just plain difficult, and some can’t find skilled Hadoop workers. Do you think Drill can make leveraging Hadoop simpler?

That’s exactly what Drill does. It can open Hadoop up to a much broader set of users, so you don’t have just developers. You can get analysts, data scientists and business users involved. Drill leaves all the advantages you get with Hadoop in terms of being agile and flexible, as well.

The first generation of Hadoop users really relied on running SQL on Hadoop, but that can limit much of Hadoop’s agility and flexibility. You have maintain schemas and pre-process your data with that approach. Drill can help anyone who is capable of using a Business Intelligence tool leverage Hadoop.

Drill works with legacy systems as well as new and emerging types of data streams, correct?

Yes, and it’s also worth taking a step back on this topic. For forty years the relational database was basically a monopoly—the only game in town. Only in the last five years, and especially right now, are we seeing the rise of non-relational data stores. That’s due to two things: the volume of data has grown incredibly, and the data itself is rapidly changing with developers adding new fields and changing the structure of the data.

The data captured in Internet of Things (IoT) streams, for example, is rapidly changing in structure. Lots of data types are doing so. It’s essential for data-centric tools to work with all kinds of emerging data types.

 Editor's Note: This is our latest interview in a series of talks with project leaders working on the cloud, Big Data, and the Internet of Things, which have included talks with Rich Wolski who founded the Eucalyptus cloud project, Ben Hindman from Mesosphere, Tomer Shiran of the Apache Drill project, Philip DesAutels who oversees the AllSeen Alliance, CEO of StackStorm Evan Powell, Tomer Shiran on MapR and Hadoop, the University of Washington team behind Grappa for data analytics, and co-founder of Mirantis Boris Renski