Apache Hadoop is a Free Java software framework that supports data intensive distributed applications running on large clusters of commodity computers. It enables applications to e... More
At today's Hadoop Summit in Silicon Valley, Yahoo! announced the availability of the Yahoo! Distribution of Hadoop, a source-only version of Apache Hadoop that Yahoo! uses within its own search engine. Hadoop, of course, is an open source software framework that helps process very large data sets, and is widely used in large-scale data mining applications as well as in search tools at sites like Facebook and many others. For developers and users interested in Hadoop, it's worth noting that the Yahoo! Distribution of Hadoop has been widely tested and developed at Yahoo! for years now, as Eric Baldeschwieler, VP of grid computing at Yahoo, described in detail here.
What’s next for Hadoop, the open source software framework that helps process very large data sets? "We’re in the midst of a data-mining renaissance, and Hadoop is playing a leading role," writes Gay Orenstein on GigaOm. Hadoop recently helped the Yahoo! Developer Network set a new record in data sorting, and it is reaching other milestones. Check out the GigaOm story.
Last summer, we reported on Microsoft's acquisition (reportedly for $100 million) of Powerset, which specializes in semantic search based on the open source, cluster-based software framework Hadoop. This acquisition of an open source-centric search company was more strategic than many people realize. Hadoop also underlies Yahoo!'s search engine with its ability to search large data sets quickly, and the acquisition of Powerset may have played a key part in how Microsoft decided to give up its effort to acquire Yahoo!
Of course, Microsoft's big search engine news of the week is Bing, which I've found to have both strengths and weaknesses. Surprisingly, as The Register reports, Powerset's technology plays only a small part in how Bing works, but what it does in Bing is open source-driven, and interesting.