Yahoo's CaffeOnSpark Tool Open Sourced, for "Deep Learning"

by Ostatic Staff - Feb. 25, 2016

Artificial intelligence and machine learning are going through a mini-renaissance right now, and some of the biggest tech companies are helping to drive the trend. Recently, I covered Google's decistion to open source a program called TensorFlow. It’s based on the same internal toolset that Google has spent years developing to support its AI software and other predictive and analytics programs. Additionally, Facebook is open sourcing its machine learning system designed for artificial intelligence (AI) computing at a large scale. It's based on Nvidia hardware. And, IBM announced that its proprietary machine learning program known as SystemML will be freely available to share and modify through the Apache Software Foundation.

Now, Yahoo has released its key artificial intelligence software (AI) under an open source license. The company previously developed a library called CaffeOnSpark to perform a popular type of AI called “deep learning” on the big troves of data found in its Hadoop file system. Now CaffeOnSpark is becoming available for community use under an open source Apache license on GitHub.

Yahoo has made lots of open source contributions over the years, and Hadoop has its roots at the company. As WIRED notes:

"CaffeOnSpark is based on deep learning, a branch of artificial intelligence particularly useful in helping machines recognize human speech, or the contents of a photo or video. Yahoo, for example, uses it to improve search results on Flickr by determining the contents of different photos. Instead of relying on the descriptions and keywords entered by the people who upload photos to the site, Yahoo teaches its computers to recognize certain characteristics of a photo, such as specific colors or even objects and animals."

 CaffeOnSpark works with x86 chips or graphics processing units (GPUs). It can be run on cloud infrastructure or within data centers. Among many uses for it at Yahoo, it has helped make connections for content recommendations.