Bookmark and Share

Deep Learning at the Boston Image Processing Computer Vision Group

This blog post was originally published at Auviz Systems' website. It is reprinted here with the permission of Auviz Systems.

Since I spend my summers in New England, I often look for ways to tap into the technology community there. We have been doing a lot of work at Auviz on Deep Learning and are also planning a full day on this topic at next year's Embedded Vision Summit, so last week I attended a Boston Image Processing Computer Vision Group (BIPCVG) Meetup on Deep Learning at the Microsoft NERD (New England Research and Development) Center, located in Cambridge, Massachusetts, near MIT. Started in 2012 by Youssef Rouchdy and Samson Timoner, and with ~900 members, the BIPCVG meets once a month and covers a range of computer vision topics. This month's topic was Deep Learning and Vision. It was a full house with 200 attendees. Not only was it a great location (high rise view of Boston and the Charles River) but there was also free pizza, networking and an opportunity for a one-minute pitch at the end. The agenda was a nice mix of topics covering research, applications, and products and tools.

First up was Professor Tomaso Poggio from MIT, speaking on the "Visual Cortex and Deep Networks". His talk was fascinating, since he has been working in this field for years and is now part of the Center for Brains, Mind and Machines, whose mission is to create a new field — the Science and Engineering of Intelligence — by bringing together computer scientists, cognitive scientists, and neuroscientists to work in close collaboration. This new field is dedicated to developing a computationally based understanding of human intelligence and establishing an engineering practice based on that understanding. Needless to say there is a lot of research going into how the brain (specifically the visual cortex) works and how to use this understanding to improve Deep Learning. Clearly visual understanding is at the heart of intelligence, and how our mammal brain learns to recognize objects, faces, etc is key to understanding how to take full advantage of Deep Learning. A takeaway for me was that while we are in the early stages of research, we are learning quickly how the brain actually learns (infants are important).

Next up was Daniel McDuff from Affectiva, presenting "Deep Emotion Learning: Mining the World's Largest Facial Expression Dataset." Affectiva has to date analyzed over 3 million faces from more than 75 countries. McDuff did a good job of laying out the challenges in decoding emotion from visual images. The company is applying Deep Learning in the hopes of speeding up both analysis and application of the results. Affectiva is based on research by Professor Rosalind Picard at the MIT Media Labs, who was a speaker at a past Embedded Vision Summit. The original work by Professor Picard initially used only cameras but later added skin sensors. Maybe a role for the Apple Watch? She coined the term Affective Computing, thus the company name. Affectiva has released its Affdex SDK for building applications. I can think of lots of cool ones, starting with trying to figure out what my wife's emotional state is (or my son's).

Finally, Barton Fiske from NVIDIA spoke on "Accelerated Deep Learning with GPUs." NVIDIA is doing great work in this area and has built a complete platform for Deep Learning. The speedup over CPUs is impressive, and they have a great roadmap. The inherent parallelism of GPUs is what delivers the acceleration. At Auviz, we are working with FPGAs to do the same thing, but the NVIDIA platform is superior at this point for the training part of Deep Learning. Once trained, either GPUs or FPGAs can run the network and the algorithms. Our AuvizDNN library matches the functionality of NVIDIA's CuDNN, which is used to create and train the networks. Regardless of what hardware platform you deploy your algorithms on, you most likely will start with NVIDIA.

This Meetup was oversubscribed and speaks to the high level of interest from developers, product people and investors, and how Deep Learning is poised to transform and accelerate 'visual intelligence', which is why we founded the Embedded Vision Alliance four-plus years ago. As I mentioned earlier, we plan a full day of technical and business discussions at the next Embedded Vision Summit, May 2-4, 2016 in Santa Clara, Ca. Auviz will have some new products there and I hope to see you as well.

By Vin Ratford
Co-Founder, Auviz Systems
Executive Director, Embedded Vision Alliance