Google Opens Up What it Bills as the World's Best Language Parser

by Ostatic Staff - May. 13, 2016

Artificial intelligence and machine learning are going thoroughly open source, with some of the biggest tech companies contributing projects to the community. Recently, I covered Google's decistion to open source a program called TensorFlow. It’s based on the same internal toolset that Google has spent years developing to support its AI software and other predictive and analytics programs.

Now, in a follow-on move, Google is open sourcing SyntaxNet, which is natural-language understanding software that can automatically parse sentences. SyntaxNet is part of its TensorFlow open source machine learning library, and is hardened and tested by Google. It includes code for training new models, as well as a pre-trained model for parsing English text.

As Wired notes:

"Using deep neural networks, SyntaxNet parses sentences in an effort to understand what role each word plays and how they all come together to create real meaning. The system tries to identify the underlying grammatical logic—what’s a noun, what’s a verb, what the subject refers to, how it relates to the object—and then, using this info, it tries to extract what the sentence is generally about—the gist, but in a form machines can read and manipulate."

And Google's Research Blog adds:

"Our release includes all the code needed to train new SyntaxNet models on your own data, as well as Parsey McParseface, an English parser that we have trained for you and that you can use to analyze English text. Parsey McParseface is built on powerful machine learning algorithms that learn to analyze the linguistic structure of language, and that can explain the functional role of each word in a given sentence. Because Parsey McParseface is the most accurate such model in the world, we hope that it will be useful to developers and researchers interested in automatic extraction of information, translation, and other core applications of NLU.

SyntaxNet applies neural networks to the ambiguity problem. An input sentence is processed from left to right, with dependencies between words being incrementally added as each word in the sentence is considered. At each point in processing many decisions may be possible—due to ambiguity—and a neural network gives scores for competing decisions based on their plausibility."

 In other open source news from Google, the company has contributed OpenThread to the community. It was posted on GitHub this week under a BSD license and can help users build low-power mesh networking standards into smart home products. It comes from the Google subsidiary Nest.