Baidu Delivers a Hardened Open Source Deep Learning Tool

by Ostatic Staff - Sep. 01, 2016

A few weeks ago, in an article for TechCrunch, Spark Capital's John Melas-Kyriazi weighed in on how startups can leverage artificial intelligence and deep learning tools to advance their businesses or even give birth to brand new ones. In a subsequent post, I noted that quite a few of them have been tested and hardened at Google, Facebook, Microsoft and other companies, and some of them may represent business opportunities.Since then, there has been more action on the open source AI and deep learning front.

Now, Baidu, the leading Chinese language Internet search provider, has  announced it will release an open source software platform for the deep learning community. The tested, hardened open source tools in this technology space are absolutely proliferating.

Baidu's PaddlePaddle will be released on GitHub on Sept. 30 with full documentation and specs. A pre-release alpha version is available now at PaddlePaddle is being released under an Apache license.

Originally developed for internal use by and for Baidu's own engineers, the platform focuses on data handling and specifying model structure. "Its ease of use makes it a natural starting point for programmers and enthusiasts who want to apply deep learning to their projects and products," the company claims.  It has been used to develop a range of Baidu's products and technologies in areas such as advertising, search ranking, large-scale image classification, optical character recognition and machine translation.

Pieter Abbeel, Associate Professor, UC Berkeley EECS and Research Scientist, OpenAI, commented, "Progress in AI is critically dependent on time spent in software development to prototype and test new ideas. Sharing software development frameworks, like Baidu is doing with its platform, is key to accelerating progress for the entire community."

Xu Wei, Baidu scientist and leader of PaddlePaddle development, said, "With this platform, designing a deep learning model is like writing pseudocode. Engineers can focus on the high-level structure of their model without worrying about the low-level details. We expect it to be useful to programmers who want to quickly apply deep learning models to problems in areas that can really benefit from AI, such as health care and finance."

Dr. Ng added, "Other deep learning platforms have been a great boon to researchers wanting to invent new deep learning algorithms. but their high degree of flexibility limits their ease of use. In contrast, PaddlePaddle focuses on making it easy for enthusiasts and programmers -- not just machine learning researchers -- to learn and use powerful deep learning tools."

In other open source deep learning and AI news, you may want to look into the following developments:

Nervana. Just recently, Nervana Systems, a startup focused on artificial intelligence and deep learning, announced that it had released its Neon deep learning software under an Apache open source license, allowing anyone to try it out for free. Soon after that, Intel announced that it is acquiring the company. Neon is written in Python, and includes a Machine Learning Operations (MOP) Layer, allowing other deep learning systems, like Theano and Caffe, to integrate with it. In recent interviews here on OStatic, found here and here, we have explored the efforts of, formerly known as Oxdata, which has steadily been carving out a niche with its  open source software for big data analysis and machine learning. You can get the main H2O platform and Sparkling Water, a package that works with Apache Spark, by simply downloading them. You can run them on clusters powered by Amazon Web Services (AWS) and others for just a few hundred dollars. Find out more about the opportunity this company's tools can provide here

From Redmond. Microsoft CEO Satya Nadella has been very enthusiastic about AI. Microsoft has open sourced the artificial intelligence framework it uses to power speech recognition in its Cortana digital assistant and Skype Translate applications. The framework is called, CNTK, and can help machines do things like understand speech and determine logical connections between photos. Microsoft released its Computational Network Toolkit (CNTK) as an open source project on GitHub, and developers are likely to leverage it to advance deep learning networks.

Facebook On Board. In early 2015, Facebook open sourced modules for the Torch deep learning toolkit. According to Facebook leaders: "Torchnet provides a collection of boilerplate code, key abstractions, and reference implementations that can be snapped together or taken apart and then later reused, substantially speeding development. It encourages a modular programming approach, reducing the chance of bugs while making it easy to use asynchronous, parallel data loading and efficient multi-GPU computations."

Meanwhile, Facebook has open sourced its machine learning system designed for artificial intelligence (AI) computing at a large scale. It's based on Nvidia hardware. Facebook's Kevin Lee and Serkan Piantino wrote in a blog post that the open sourced AI hardware more efficient than off-the-shelf options because the servers can be operated within data centers based on Open Compute Project standards.

Google's TensorFlow. In numerous recent posts, we covered Google's decision to open source a program called TensorFlow and the related platform TensorFlow Serving. These are based on the same internal toolset that Google has spent years developing to support its AI software and other predictive and analytics programs. TensorFlow is rapidly gaining momentum. 

It is being leveraged by researchers who need to analyze very large sets of complex data, according to Google. According to a Google post:

"TensorFlow a high performance, open source serving system for machine learning models, designed for production environments and optimized for TensorFlowTensorFlow Serving is ideal for running multiple models, at large scale, that change over time based on real-world data."

Caffe. Yahoo has released its key artificial intelligence software (AI) under an open source license. The company previously developed a library called CaffeOnSpark to perform “deep learning” on the big troves of data found in its Hadoop file system. Now CaffeOnSpark has become available for community use under an open source Apache license on GitHub. CaffeOnSpark works with x86 chips or graphics processing units (GPUs). It can be run on cloud infrastructure or within data centers. Among many uses for it at Yahoo, it has helped make connections for content recommendations.