Powerset, Leveraging Open Source Hadoop, Powers Microsoft's Bing

by Ostatic Staff - Jun. 05, 2009

Last summer, we reported on Microsoft's acquisition (reportedly for $100 million) of Powerset, which specializes in semantic search based on the open source, cluster-based software framework Hadoop. This acquisition of an open source-centric search company was more strategic than many people realize. Hadoop also underlies Yahoo!'s search engine with its ability to search large data sets quickly, and the acquisition of Powerset may have played a key part in how Microsoft decided to give up its effort to acquire Yahoo!

Of course, Microsoft's big search engine news of the week is Bing, which I've found to have both strengths and weaknesses. Surprisingly, as The Register reports,  Powerset's technology plays only a small part in how Bing works, but what it does in Bing is open source-driven, and interesting.

As this blog post from Powerset describes, "the Powerset division has contributed to Bing in both subtle and more conspicuous ways." Most notably, Powerset's technology provides a corrolary engine to Bing's main engine, designed to search Wikipedia. For example, at Bing.com, type in a search for "squirrel monkey." On the left rail of the search results that come back, you'll find a "Reference" link, and if you click on it, you'll get a formatted version of the Wikipedia entry for squirrel monkey, with extras such as an outline of the article, with links to key parts of it.

What's less apparent, though, is that Bing includes the Hadoop-driven semantic wikisearch technology that is really Powerset's specialty. I wrote about how Powerset goes about this with Hadoop, and clusters, here. You can quickly get a sense for it by going to Bing.com and typing natural language queries in. Try these at the site:

Was Einstein married?

What did Benjamin Franklin invent?

What is the top selling album of all time?

Powerset's technology in Bing delivers easily scannable answers to questions like these, and also links back to the reference source for the answers. This technology in Bing is actually pretty good, though semantic search has never been perfect, and I'm surprised more people aren't talking about it as Bing rolls out. It's also an example of Microsoft leveraging open source technology in a big way.