Google's Custom Chip Can Accelerate Machine Learning Jobs
On the heels of open sourcing TensorFlow, a possibly hugely influential contribution to the field of machine learning, Google recently officially released TensorFlow Serving to the open-source community. TensorFlow Serving is billed as "a high performance, open source serving system for machine learning models, designed for production environments and optimized for TensorFlow." TensorFlow itself is the actual engine behind several Google tools you may already use, including Google Photos and the speech recognition found in the Google app.
Now, Google has delivered another important complement to TensorFlow, which is a custom chip called Tensor Processing Unit, or TPU. At the Google I/O conference this week, CEO Sundar Pichai said the TPU provides an order of magnitude faster performance per watt than existing chips for machine learning jobs. It could help democratize inexpensive machine learning.
Google engineer Norm Jouppi, in a blog post, said:
"Machine learning provides the underlying oomph to many of Google’s most-loved applications. In fact, more than 100 teams are currently using machine learning at Google today, from Street View, to Inbox Smart Reply, to voice search. But one thing we know to be true at Google: great software shines brightest with great hardware underneath. That’s why we started a stealthy project at Google several years ago to see what we could accomplish with our own custom accelerators for machine learning applications. The result is called a Tensor Processing Unit (TPU), a custom ASIC we built specifically for machine learning — and tailored for TensorFlow. We’ve been running TPUs inside our data centers for more than a year, and have found them to deliver an order of magnitude better-optimized performance per watt for machine learning. This is roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore’s Law).
TPU is tailored to machine learning applications, allowing the chip to be more tolerant of reduced computational precision, which means it requires fewer transistors per operation. Because of this, we can squeeze more operations per second into the silicon, use more sophisticated and powerful machine learning models and apply these models more quickly, so users get more intelligent results more rapidly. A board with a TPU fits into a hard disk drive slot in our data center racks."