Google And Embedded Vision: Cat Cognition And Practical Applications

Around two months ago, I mentioned some of the notable image processing-based technologies that Google's R&D lab was busy improving and turning into publicly available products. Here's another one, involving neural network-based analysis and identification. Earlier this summary, quoting from the blog post co-authored by company Fellow Jeff Dean and Andrew Ng, a Visiting Faculty member from Stanford University, Google announced a "purrfect" breakthrough:

Today’s machine learning technology takes significant work to adapt to new uses. For example, say we’re trying to build a system that can distinguish between pictures of cars and motorcycles. In the standard machine learning approach, we first have to collect tens of thousands of pictures that have already been labeled as “car” or “motorcycle”—what we call labeled data—to train the system. But labeling takes a lot of work, and there’s comparatively little labeled data out there.

Fortunately, recent research on self-taught learning (PDF) and deep learning suggests we might be able to rely instead on unlabeled data—such as random images fetched off the web or out of YouTube videos. These algorithms work by building artificial neural networks, which loosely simulate neuronal (i.e., the brain’s) learning processes.

Neural networks are very computationally costly, so to date, most networks used in machine learning have used only 1 to 10 million connections. But we suspected that by training much larger networks, we might achieve significantly better accuracy. So we developed a distributed computing infrastructure for training large-scale neural networks. Then, we took an artificial neural network and spread the computation across 16,000 of our CPU cores (in our data centers), and trained models with more than 1 billion connections.

We then ran experiments that asked, informally: If we think of our neural network as simulating a very small-scale “newborn brain,” and show it YouTube video for a week, what will it learn? Our hypothesis was that it would learn to recognize common objects in those videos. Indeed, to our amusement, one of our artificial neurons learned to respond strongly to pictures of… cats. Remember that this network had never been told what a cat was, nor was it given even a single image labeled as a cat. Instead, it “discovered” what a cat looked like by itself from only unlabeled YouTube stills. That’s what we mean by self-taught learning.

Using this large-scale neural network, we also significantly improved the state of the art on a standard image classification test—in fact, we saw a 70 percent relative improvement in accuracy. We achieved that by taking advantage of the vast amounts of unlabeled data available on the web, and using it to augment a much more limited set of labeled data. This is something we’re really focused on—how to develop machine learning systems that scale well, so that we can take advantage of vast sets of unlabeled training data.

Given the comparative prevalence of cat videos on YouTube, the neural network's notable identification adeptness here versus with other subjects is perhaps not surprising 😉 The accurate identification of human faces was also a particular success story for the algorithm.

For more information on the project, you can check out the comparatively detailed writeup at Google+, along with the full paper that was presented at the International Conference on Machine Learning in late June. And the announcement got a fair bit of print and online editorial attention, beginning with a fairly comprehensive article from John Markoff at the New York Times. Subsequent coverage published at the time came from the following media outlets:

It's perhaps not a shock to you to learn that Google has broader search-based aspirations for its artificial intelligence breakthroughs, and not just of an image-centric nature, either. To wit, note a Google Research blog post published two months ago that discussed speech recognition applications for Deep Learning. The announcement of a 20+ percent accuracy improvement for U.S. English was officially announced at a conference last month and picked up by MIT Technology Review a few days ago, with subsequent coverage at Slashdot and VentureBeat.

If you're building AI or vision-enabled products, you've come to the right place.

Google And Embedded Vision: Cat Cognition And Practical Applications

Pages

Topics

Contact

Address

Phone