Hadoop Drives Storage Costs Down, Needs Friendly Front Ends
The Hadoop Summit went on this week in San Jose, California, right in the heart of Silicon Valley, sponsored by Hortonworks and Yahoo. There were some interesting keynotes, including one from Microsoft on "Transforming data into action using Hadoop, Excel, and the Cloud," and Red Hat officials delved into "Enterprise Hadoop and the open hybrid Cloud." At the Summit, it was clear that Hadoop has become a true open source success story. It's also driving down enterprise storage costs.
In conjunction with the Summit, Thomas Davenport, a distinguished professor at Babson College, delivered a keynote and has penned a blog post for The Wall Street Journal. In his post, he cites an interesting reason why many organizations are looking into Hadoop:
"A single factoid will explain why Hadoop is growing rapidly in popularity among large corporations. I went to one presentation by TrueCar, a website that tracks vehicle prices. They said that their previous cost for storing a gigabyte of data (including hardware, software, and support) for a month in a data warehouse was $19. Using Hadoop, they pay 23 cents a month per gigabyte. That two orders-of-magnitude cost differential has got to be appealing to a lot of CIOs. There are performance improvements as well, of course, although they weren’t quite as dramatic as the cost reductions in the examples I heard."
The cost of storage itself, of course, has been plummeting, but it's interesting that organizations are reaping further storage savings by leveraging a free, open source software platform.
Babson also points to the fact that many organizations are looking for a higher level of integration between Hadoop and other key software components:
"A lot of people were talking about greater integration of Hadoop with enterprise software of various types, particularly SQL. That would make it much easier to query Hadoop datasets for non-technical analysts. I expect we will see many pairings of Hadoop at the back end for cheap storage and fast processing, and familiar tools like Excel, Tableau, and SAS at the front end for analysis and user interface."
Actually, early last year I did a post on how these front-end tools for working with Hadoop in familiar applications are arriving. Talend Open Studio for Big Data provides a front end for easily working with Hadoop to mine large data sets, and is released under an Apache license. Hortonworks has bundled it with its Hadoop distribution.
The upshot from the Summit is that enterprises large and small are looking into Hadoop as a way to draw better insights from large data sets. "Overall, I get a feeling of having glimpsed the future of enterprise IT by coming to [Hadoop Summit]," concludes Davenport.