Google Rolls Out Smart, Open Image Analysis Tools

by Ostatic Staff - Feb. 19, 2016

There has been a lot of discussion in the tech arena about a next wave of applications that incorporate the capability to understand and process components within images. This has a lot of useful potential. For example, ecommerce sites could use that capability to create better associations between the blue sweater you are looking at online and other similar sweaters that you might like.

On this front, back in December, Google showed off the beta of its Cloud Vision API, allowing select developers an opportunity to run advanced image processing services that let applications intelligently handle pictures. With it, Google said that "developers can now build powerful applications that can see, and more importantly understand, the content of images." As of this week, Google  launched the open beta for its Cloud Vision service, advancing this effort.

Developers will be able to put up to 1,000 images through Google services at no cost, and then pay a small fee for each group of 1,000 images they upload after that. Developers will get discounts for sending large volumes of pictures through the service.  But it is important to note that you can send a maximum of 20 million images a month through Google Cloud Vision during the open beta test. 

According to a Google post:

 Have you ever wondered how Google Photos helps you find all your favorite dog photos? With today’s release of Google Cloud Vision API, developers can now build powerful applications that can see, and more importantly understand, the content of images. The uses of Cloud Vision API are game changing to developers of all types of applications and we are very excited to see what happens next!

Advances in machine learning, powered by platforms like TensorFlow, have enabled models that can learn and predict the content of an image. Our limited preview of Cloud Vision API encapsulates these sophisticated models as an easy-to-use REST API. Cloud Vision API quickly classifies images into thousands of categories (e.g., "boat", "lion", "Eiffel Tower"), detects faces with associated emotions, and recognizes printed words in many languages. With Cloud Vision API, you can build metadata on your image catalog, moderate offensive content, or enable new marketing scenarios through image sentiment analysis.

The following set of Google Cloud Vision API features can be applied in any combination on an image:
Label/Entity Detection picks out the dominant entity (e.g., a car, a cat) within an image, from a broad set of object categories. You can use the API to easily build metadata on your image catalog, enabling new scenarios like image based searches or recommendations.

Optical Character Recognition to retrieve text from an image. Cloud Vision API provides automatic language identification, and supports a wide variety of languages.

Safe Search Detection to detect inappropriate content within your image. Powered by Google SafeSearch, the feature enables you to easily moderate crowd-sourced content.

Facial Detection can detect when a face appears in photos, along with associated facial features such as eye, nose and mouth placement, and likelihood of over 8 attributes like joy and sorrow. We don't support facial recognition and we don’t store facial detection information on any Google server.

Landmark Detection to identify popular natural and manmade structures, along with the associated latitude and longitude of the landmark.

Logo Detection to identify product logos within an image. Cloud Vision API returns the identified product brand logo, with the associated bounding polybox.

If you find this interesting, definitely watch the video at the bottom of this post, which presents an example of how the API could be leveraged in a robotics scenarios.