May 2014 Embedded Vision Summit West Presentations


Computer Vision for Next-Generation Products

The event for software and hardware developers who want to incorporate visual intelligence into their products

29 May 2014 • 8 am to 7:30 pm
Santa Clara Convention Center • Santa Clara, CA USA

Below are abstracts for the presentations at the now-concluded May 2014 Embedded Vision Summit West.

About the sketches

"Trends and Recent Developments in Processors for Vision" by Jeff Bier, BDTI
Processor suppliers are investing intensively in new processors for vision applications, employing a diverse range of architecture approaches to meet the conflicting requirements of high performance, low cost, energy efficiency, and ease of application development. In this presentation, we draw from our ongoing processor evaluation work to highlight significant recent developments in processors for vision applications, including mobile application processors, graphics processing units, and specialized vision processors. We also explore what we consider to be the most significant trends in processors for vision—such as the increasing use of heterogeneous architectures—and the implications of these trends for system designers and application developers.

"What’s New in Tools for Vision Application Design and Development?" by Jeff Bier, BDTI
Today, there’s an unprecedented diversity of tools, APIs and libraries available for product creators who are designing and implementing vision applications, algorithms, and systems. But this wealth of options also creates challenges. New offerings arrive at a dizzying pace, and established tools sometimes vanish when companies are acquired or change direction, and it can be difficult to choose from the increasingly crowded field. In this talk, we highlight some of the most significant recent arrivals and departures in the realm of tools, APIs and libraries for designing and implementing vision algorithms, applications, and systems.

"Embedded Lucas-Kanade Tracking: How It Works, How to Implement It, and How to Use It" by Goksel Dedeoglu, PercepTonic
This tutorial is intended for technical audiences interested in learning about the Lucas-Kanade (LK) tracker, also known as the Kanade-Lucas-Tomasi (KLT) tracker. Invented in the early 80s, this method has been widely used to estimate pixel motion between two consecutive frames. We present how the LK tracker works and discuss its advantages, limitations, and how to make it more robust and useful. Using DSP-optimized functions from TI's Vision Library (VLIB), we will also show how to detect feature points in real-time and track them from one frame to the next using the LK algorithm. We demonstrate this on Texas Instruments' C6678 Keystone DSP, where we detect and tracks thousands of Harris corner features in 1080p HD resolution video.

"Self-Driving Cars" by Nathaniel Fairfield, Google
Self-driving cars have the potential to transform how we move: they promise to make us safer, give freedom to millions of people who can't drive, and give people back their time. The Google Self-Driving Car project was created to rapidly advance autonomous driving technology and build on previous research. For the past four years, Google has been working to make cars that drive reliably on many types of roads, using lasers, cameras, and radar, together with a detailed map of the world. Fairfield will describe how Google leverages maps to assist with challenging perception problems such as detecting traffic lights, and how the different sensors can be used to complement each other. Google's self-driving cars have now traveled more than a half a million miles autonomously. In this talk, Fairfield will discuss Google's overall approach to solving the driving problem, the capabilities of the car, the company's progress so far, and the remaining challenges to be resolved.

"Computer Vision Powered by Heterogeneous System Architecture (HSA)" by Harris Gasparakis, AMD
We will review the HSA vision and its current incarnation though OpenCL 2.0, and discuss its relevance and advantages for Computer Vision applications. HSA unifies CPU cores, GPU compute units, and auxiliary co-processors (such as an ISP, DSP, and Video Codecs) on the same die. It enables all IPs to have a unified and coherent view of system memory and enables concurrent processing, allowing the most suitable IP to be used for each vision pipeline task. We will elucidate this concept with examples (such as multi-resolution optical flow and adaptive deep learning networks) and live demos. Finally, we will describe the transparent integration of OpenCL in OpenCV, soon to be released in OpenCV 3.0, first conceived and evangelized in the community by the author.

"How to Make the World More Interactive: Augmented Reality as the Interface Between Wearable Tech and the Internet of Things" by Ori Inbar,
In this talk, we will explain how augmented reality, which relies heavily on embedded vision, is transitioning from a bleeding-edge technology embraced mainly by enthusiasts to a mainstream commercial technology with applications in diverse markets ranging from mobile devices to retail point-of-sale systems to enterprise and industrial systems. We will also relate the discussion to related trends in wearable computing and the Internet of Things.

"Implementing Histogram of Oriented Gradients on a Parallel Vision Processor" by Marco Jacobs, videantis
Object detection in images is one of the core problems in computer vision. The Histogram of Oriented Gradients method (Dalal and Triggs 2005) is a key algorithm for object detection, and has been used in automotive, security and many other applications. In this presentation we will give an overview of the algorithm and show how it can be implemented in real-time on a high-performance, low-cost, and low-power parallel vision processor. We will demonstrate the standard OpenCV based HOG with Linear SVM for Human/Pedestrian detection on VGA sequences in real-time. The SVM Vectors used are provided with OpenCV, learned from the Daimler Pedestrian Detection Benchmark Dataset and the INRIA Person Dataset.

"Combining Flexibility and Low-Power in Embedded Vision Subsystems: An Application to Pedestrian Detection" by Bruno Lavigueur, Synopsys
We present an embedded-mapping and refinement case study of a pedestrian detection application. Starting from a high-level functional description in OpenCV, we decompose and map the application onto a heterogeneous parallel platform consisting of a high-performance control processor and application-specific instruction-set processors (ASIPs). This application makes use of the HOG (Histogram of Oriented Gradients) algorithm. We will review the computation requirements of the different kernels of the HOG algorithm, and present possible mapping options onto the control processor and ASIPs. We will also present an OpenCV-to-ASIP software refinement methodology and supporting tools. We will present detailed results of the final configuration consisting of one control processor and four ASIPs, including cost and power figures. Finally, we will summarize the results on an FPGA-based rapid prototyping platform.

"Convolutional Networks:  Unleashing the Potential of Machine Learning for Robust Perception Systems" by Yann LeCun, Facebook
Convolutional Networks (ConvNets) have become the dominant method for a wide array of computer perception tasks including object detection, object recognition, face recognition, image segmentation, visual navigation, handwriting recognition, as well as acoustic modeling for speech recognition and audio processing. ConvNets have been widely deployed for such tasks over the last two years by companies like Facebook, Google, Microsoft, NEC, IBM, Baidu, Yahoo, sometimes with levels of accuracy that rival human performance. ConvNets are composed of multiple layers of filter banks (convolutions) interspersed with point-wise non-linearities and spatial pooling and subsampling operations. ConvNets are a particular embodiment of the concept of "deep learning" in which all the layers in a multi-layer architecture are subject to training. This is unlike more traditional pattern recognition architectures that are composed of a (non-trainable) hand-crafted feature extractor followed by a trainable classifier. Deep learning allows us to train a system end to end, from raw inputs to ultimate outputs, without the need for a separate feature extractor or pre-processor. This presentation will demonstrate several practical applications of ConvNets. ConvNets are particularly easy to implement in hardware, particularly using dataflow architectures. A design called NeuFlow will be described. Large-scale ConvNets for image labeling have been demonstrated to run on an FPGA implementation of the NeuFlow architecture. ConvNets bring the promise of real-time embedded systems capable of impressive image recognition tasks with applications to smart cameras, and mobile devices, automobiles, and robots.

"Fast 3D Object Recognition in Real-World Environments" by Ken Lee, VanGogh Imaging
Real-time 3D object recognition can be computationally intensive and difficult to implement when there are a lot of other objects (i.e. clutter) around the target. There are several approaches to deal with the clutter problem, but most are computationally expensive. We will describe an algorithm that uses a robust descriptor, fast data structures, and an efficient sampling with a parallel implementation on an FPGA platform. Our method can recognize multiple model instances in the scene and provide their position and as well as orientation. Our algorithm scales well with the number of models and runs in linear time.

"Vision-Based Gesture User Interfaces" by Francis MacDougall, Qualcomm
The means by which we interact with the machines around us is undergoing a fundamental transformation. While we may still sometimes need to push buttons, touch displays and trackpads, and raise our voices, we’ll increasingly be able to interact with and control our devices simply by signaling with our fingers, gesturing with our hands, and moving our bodies. This presentation explains how gestures fit into the spectrum of advanced user interface options, compares and contrasts the various 2-D and 3-D technologies (vision and other) available to implement gesture interfaces, gives examples of the various gestures (and means of discerning them) currently in use by systems manufacturers, and forecasts how the gesture interface market may evolve in the future.

"Multiple Uses of Pipelined Video Pre-Processor Hardware in Vision Applications" by Rajesh Mahapatra, Analog Devices
Significant resemblance and overlap exist among the pre-processing blocks of different vision applications. For instance, image gradients and edges have proven beneficial for a variety of applications, such as face detection, traffic sign recognition, and fault detection. What if such pre-processing was available "for free" to vision developers, as a built-in hardware feature with a compute pipelining mechanism? In this presentation, we illustrate how edge detection as a pre-processing tool can be used effectively to reduce the computational cost of vision applications, enabling low-power implementations. Our examples will include the detection of faces, traffic speed limit signs and irises, demonstrating a significant reduction in programmable DSP loading.

"Evolving Algorithmic Requirements for Recognition and Classification in Augmented Reality" by Simon Morris, CogniVue
Augmented reality (AR) applications are based on accurately computing a camera’s 6 degrees of freedom (6DOF) position in 3-dimensional space, also known as its “pose”. In vision-based approaches to AR, the most common and basic approach to determine a camera’s pose is with known fiducial markers (typically square, black and white patterns that encode information about the required graphic overlay). The position of the known marker is used along with camera calibration to accurately overlay the 3D graphics. In marker-less AR, the problem of finding the camera pose requires significantly more complex and sophisticated algorithms, e.g. disparity mapping, feature detection, optical flow, and object classification. This presentation compares and contrasts the typical algorithmic processing flow and processor loading for both marker-based and marker-less AR. Processing loading and power requirements are discussed in terms of the constraints associated with mobile platforms.

"How to Create a Great Object Detector" by Avinash Nehemiah, MathWorks
Detecting objects of interest in images and video is a key part of practical embedded vision systems. Impressive progress has been made over the past few years by optimizing object detectors built on statistical machine learning methods. However, the pre-trained object detectors available today do not satisfy the increasing diversity of embedded vision system requirements. This talk will teach you the basics of creating a robust and accurate object detector. We cover the following topics: The importance of good training data sets, the curse of dimensionality, overfitting (why too much training data is not a good thing), and how to select a classifier/detector based on the problem you are trying to solve.

"Challenges in Object Detection on Embedded Devices" by Adar Paz, CEVA
As more products ship with integrated cameras, there is an increased potential for computer vision (CV) to enable innovation. For instance, CV can tackle the “scene understanding” problem by first figuring out what the various objects in the scene are. Such "object detection" capability holds big promise for embedded devices in mobile, automotive, and surveillance markets. However, performing real-time object detection while meeting a strict power budget remains a challenge on existing processors. In this session, we will analyze the trade-offs of various object detection, feature extraction and feature matching algorithms, their suitability for embedded vision processing and recommend methods for efficient implementation in a power- and budget-constrained embedded device.

"Taming the Beast: Performance and Energy Optimization Across Embedded Feature Detection and Tracking" by Chris Rowen, Cadence
We will look at a cross-section of advanced feature detectors, and consider the algorithm, bit precision, arithmetic primitives and implementation optimizations that yield high pixel processing rates, high result quality and low energy. We will also examine how these optimization methods apply to kernels used in tracking applications, including fast connected component labeling. From this we will derive general principles on the priority and likely impact of different optimization types.

"The OpenVX Hardware Acceleration API for Embedded Vision Applications and Libraries" by Neil Trevett, Khronos
This presentation will introduce OpenVX, a new application programming interface (API) from the Khronos Group. OpenVX enables performance and power optimized vision algorithms for use cases such as face, body and gesture tracking, smart video surveillance, automatic driver assistance systems, object and scene reconstruction, augmented reality, visual inspection, robotics and more. OpenVX enables significant implementation innovation while maintaining a consistent API for developers. OpenVX can be used directly by applications or to accelerate higher-level middleware with platform portability. OpenVX complements the popular OpenCV open source vision library that is often used for application prototyping.

"Programming Novel Recognition Algorithms on Heterogeneous Architectures" by Kees Vissers, Xilinx
The combination of heterogeneous systems, consisting of processors and FPGA, is a high-performance implementation platform for image and vision processing. One of the significant hurdles in leveraging the compute potential was the inherent low-level of programming with RTL for the FPGA part and connecting RTL blocks to processors. Novel complete software environments are now available that support algorithm development, programming exclusively in C/C++ and OpenCL. We will show examples of relevant novel vision and recognition algorithms for Zynq based devices, with a complete platform abstraction of any RTL design, High-Level Synthesis interconnect, or processor low level drivers.We will show the outstanding system level performance and power consumption of a number of applications programmed on these devices.

About the Sketches

The artist’s portraits of Embedded Vision Summit speakers shown on this page were drawn by Kurt Salinas using the neo.1 smart pen from NEOlab. The pen uses embedded vision to determine which page the user is currently writing on, as well as determining its location and movement on the page. The neo.1 is powered by CogniVue’s low-power embedded vision processor. For a video of the neo.1 in action, click here.

Additional Information

If you have other questions about the Embedded Vision Summit West, please contact us at

See you at the Summit! May 18-21 in Santa Clara, California!