Embedded Vision Alliance: Technical Articles

Developing a Part Finishing and Inspection System Using NI Smart Cameras and LabVIEW

"By using NI hardware and software, we seamlessly combined the material removal and inspection solution using the framework for previously developed solutions."
- Michael Muldoon, AV&R Vision & Robotics

The Challenge

Automating the deburring and final inspection of turbine airfoils for aircraft engines.

The Solution

Sylvania Lighting Develops Flexible Machine Control System Using NI Vision, Motion, Intelligent DAQ, and LabVIEW

"With the LabVIEW platform, we can easily adapt to different process requirements and adapt the software for future enhancements."
- Danny Hendrikx, Sylvania Lighting International

The Challenge

Building an accurate and flexible research and development production machine for a new, smaller type of metal halide lamp.

The Solution

Implementing Visual Intelligence in Embedded Systems

By Jeff Bier
Founder, Embedded Vision Alliance
Co-Founder and President, BDTI

Camera-Based ADAS for Mass Deployments

Bookmark and Share

Camera-Based ADAS for Mass Deployments

By Peter Voss and Benno Kusstatscher, Automotive Segment Team
Analog Devices
This article was originally published by Elektronik Magazine. It is reprinted here with the permission of the original publisher.


Car manufacturers (OEM‘s) and system suppliers (Tier 1s) do agree: Advanced Driver Assistance Systems (ADAS) will see a steep growth in the years to come. One driving factor for sure is the increased safety awareness and the desire for more driving comfort on the customer side. But first and foremost Euro NCAP’s tightened safety requirements will boost installed ADAS equipment from single digit take rates to almost 100% over the next years. Hence it’s to no surprise that the call for commercially viable solution is on. This is to satisfy deployments as high volume standard equipment vs. the moderate customer option business as seen today. Analog Devices recently announced a new ADAS Processor Family, the Blackfin® ADSP-BF60x series which was specifically developed for this now emerging mass market requirement. The first two processors of this new family are available now to service this new requirement for camera based solutions.


Depending on the ADAS tasks to solve, different types of sensors are being deployed in systems today.  For the environmental near and far-field monitoring around the vehicle, Radar-, Ultrasonic-, LIDAR-, PMD-, Camera- and Night camera sensors are used either as single or multi sensor systems. The latter do combine the sensor data to achieve more accurate results as desired. Camera systems will further see an additional boost looking inside the car for analyzing the driver’s state. Driver situation and prediction analysis will be used to better filter the various warning messages an ADAS system can generate in the future. The goal is not to overflow the driver with irrelevant warnings. If the system would detect that a sporty driver is concentrating on the traffic in front of him, a premature warning of a car he is following might be very annoying. However, knowing that the driver is distracted while playing with his mobile phone or simply is about to fall asleep, an early enough warning or even braking action in the same situation  might be very appropriate.

ADAS systems providing the extended view outside of the car while detecting and classifying objects plus systems determining the driver state are all aiming at increasing road safety and will soon become part of the standard equipment in new cars. Increased driving comfort and economical aspects like controlling the timely gear shifts before an incline will additionally drive customer acceptance and will enable car manufacturers to market optional functionality in addition to the standard ADAS equipment.

ADAS Deployments Fueled by Euro NCAP

Euro NCAP tests and publically reports the safety of vehicles marketed in Europe. Transparent test methods are used while four categories are considered: Adult Occupant protection, Child occupant protection, Pedestrian protection and safety assist (e.g. seatbelt reminder). As a result Euro NCAP awards up to five safety stars which then will be published on their web site. Each safety level (one to five stars) requires a minimum number of points in each of the four categories and in addition a minimum number of total points. It is expected that by 2017 cars will not be able to achieve the desired five star rating without ADAS present and that car manufacturers will therefore include at least one ADAS system as standard equipment by that time. While looking at the Euro NCAP rating scheme and comparing the changes over the years, one get a feel for the current priorities. Between 2011 and 2012, requirements to achieve the minimum points for pedestrian protection were upped by 50%. This boots the importance for ADAS cameras due to their ability to not only detect and classify objects as pedestrians but they also have the capability to do the same with partially obscured pedestrians as required by the rating schemes.

Cameras Make the Standard Equipment

Camera based ADAS is not new, the technology had some time to mature over the past years but was mostly seen as customer option on premium brand cars. The accumulated knowhow is now very useful while going after the emerging Euro NCAP requirements. However, a rethinking has to happen as the standard equipment business brings the commercial aspect right into the focus. While in the past, complex and high performance systems were developed the task at hand is now a different one. ADAS systems must now support the “just required” functionality at the appropriate commercial level. System providers and component suppliers like Analog Devices are now asked to walk a fine line between enabling commercially attractive solutions while maintaining the freedom for OEM’s to differentiate.

Due to the early engagement in driver assistance systems and the continued investment into this emerging technology, Analog Devices started in time with the development of dedicated ADAS solutions and is now sampling the first two processors focusing on camera based systems. The specification for these ADAS Camera centric processors targeted from the start the total cost of ownership (ToC) without taking the flexibility out of the hands of the system suppliers and OEM’s. Next to the required programmability and enough processing power, lowest power consumption in class was achieved in order to keep the thermal design manageable.  Support for functional safety following ISO26262 and the availability of an application oriented development environment with optimized vision processing libraries helps to design a total system with “time to market” and low risk in mind.

Optimum Performance on System Level

The ADSP-BF60x family reduces the overall cost of a five function system of up to 30%. The ADSP-BF609 (processing up to Mega Pixel format) and the ADSP-BF608 (processing up to VGA format) support up to five concurrent Vision functions with up to 30 processed frames per second. With the <1.3W at ambient temperature of 105°C the ADSP-BF60x offer the lowest power consumption figures in their class.

To enable this, Analog Devices deployed a straight forward but unique concept. It is based on two Blackfin cores as they are used in production ADAS systems today. However, algorithms that can’t be modeled by software in a highly efficient and economical way have been implemented as hardware engines instead, resulting in a highly configurable tool box of vision processing units. Analog Devices calls this the Pipelined Vision Processor (PVP1) which is now part of the new ADSP-BF60x processors. Although, all implemented on low-power process technology, further innovation was needed to also tackle those consumers dominating power dissipation of modern designs more than logic circuits do: external memory (DDR) interfaces. Lowest power dissipation is achieved by distribution of processing and by smart utilization of moderate memory bandwidth. In addition, the Blackfin architecture has been enriched by a number of hardware blocks to address functional safety requirements.

Efficient Data Streaming

Many chip architectures receive data from the video sensor and save them frame by frame into the external memories, such as DDR, just to read them back frame by frame (slightly delayed). Multi-core architectures even tend to multiply that data movement for the only benefit of letting the cores identify the region of interests (ROIs) in every frame. The ADSP-BF60x concept avoids such power consuming transport of video data. Still, the full frame is stored in DDR2 memory, but there is no need to read the entire frame ever back. As soon as received by the device the incoming data is multi-casted to the Pipelined Vision Processor (PVP1) which preprocesses the data directly on its way in.

Figure 1: Vision Processing with ADSP-BF60x Processors

As shown in Figure 1 the so-called PVP1 Camera Pipes can generate up to three intermediate preprocessing results like edge map, integrals or re-quantification by non-linear thresholds. On top, it can provide status information like histograms. Not a single instruction has to be executed on the Blackfin cores and not a single byte has to be transferred over the DDR bus in order to achieve these results. Additionally, the PVP1 uses dense data formats when outputting intermediate results and hence, most of times the results fit into on-chip L1 and L2 memories.

From here software running on the Blackfin cores can take it further. The results provided by the PVP1 Camera Pipes enable software to identify ROIs efficiently. Now, only the ROIs need to be read back from external memory for further analysis. They can be either read by DMA, by the cores or by the so-called PVP1 Memory Pipe. The latter can further analyze or massage the ROI data, say it filters it or scales it to the preferred template size. Again, the histogram results come for free. If the dense output of the Memory Pipe is stored into on-chip memories, the Blackfin cores can now locally classify the ROIs and, if needed, verify them against raw data or against respective region in previous frames as stored in DDR2 memory.

Pipelined Vision Processor (PVP1) - A Closer Look

The PVP1 can process up to four data streams, three Camera Pipes and one Memory Pipe, not yet counting histogram status outputs. As shown in Figure 2 the PVP1 tool box consists of twelve processing blocks, which are optimized for various vision processing steps.

Figure2: Pipelined Vision Processor (PVP1)

Typically, the Memory Pipe utilizes one or multiple processing blocks on demand to analyze a series of ROIs and may change operators on the fly. Camera Pipes tend to process the full frame in parallel manner. Incoming data can be multi-casted to multiple processing branches. The user has full flexibility in interconnecting operators to form the pipes. One possible configuration is shown in Figure 3:

Figure 3: Example Configuration for Object and Pedestrian Detection

The traditional Canny-structure can be identified in the figure. It consists of a low-pass filter with Gaussian coefficients, enhanced Sobel filters with a 5x5 matrix, Cartesian-to-Polar conversion and a Non-Maxima Suppression (NMS) stage. Remaining maxima might be passed through a Threshold block to reduce bit resolution or to run-length compress the output for least memory loading. If post-processed by software, say with a Hough-Transform algorithm, line detection and lane keep assist strategies can be derived from this edge map.

If the gradients are forwarded to an Integral block (Histogram of Gradients, HoG) half of the required processing for Pedestrian Detection has already been accomplished. The square of the low-passed pixels may feed the other Integral block, which then can  operate in regular (SAT) or diagonal mode (RSAT). It can reduce the frame resolution at the output if high-resolution is not needed, like in the case of shadow detection.

The shown example configuration demonstrates how lane keep assist as well as pedestrian and vehicle detection can be implemented on a single chip in efficient manner, while the MIPS load of the Blackfin cores is still kept at a moderate level. Unused MIPS and the not-yet-used Memory Pipe functionality can be utilized to integrate head lamp control and/or traffic sign detection on top. All of this is enables by an ADSP-BF60x device at less than 1.3W at 105°C ambient temperature and can therefore be easily installed behind the central mirror in the car next to the image sensor.

Affordable Driver Assistance

At Analog Devices, ADAS solutions for Vision and Radar Systems are a focus. With the new ADAS Processor models ADSP-BF609 and ADSP-BF608, camera based driver assistance is becoming commercial attractive for mass deployments. ADAS Camera systems no longer need to be a privilege for premium models. Euro NCAP’s tighten safety requirements will indirectly fuels the acceptance of ADAS Vision systems. Similar trends are observed worldwide. The ADAS Vision processors from Analog Devices do support Day- and Night vision systems analyzing the surroundings of the car as well as driver monitoring systems to detect the state of the driver.

Auto-Safety Mandate Brings Video Systems into Full View, Along with Design Challenges

Bookmark and Share

Auto-Safety Mandate Brings Video Systems into Full View, Along with Design Challenges

By Don Nesbitt, Marketing Manager in the High Speed Signal Conditioning Group
Analog Devices
This article was originally published by Hanser Automotive Magazine. It is reprinted here with the permission of the original publisher.

The emerging field of video-based automotive safety was in the headlines during December 2011 when the U.S. Department of Transportation proposed safety regulations to help eliminate the blind zones behind vehicles that can make it difficult to see pedestrians -- especially toddlers. The proposed rearview-visibility rule was required by Congress as part of the Cameron Gulbransen Kids Transportation Safety Act of 2007 (“K.T. Safety Act”), named for a two-year-old boy who was accidentally killed by his father in the family’s driveway. Back-over accidents cause about 300 deaths and 18,000 injuries a year.

The recent mandate requires rearward detection that will allow drivers to “see” behind the vehicle using new technology such as a rear-mounted video camera. The proposal calls for expanding the required field of view for all passenger cars, pickup trucks, minivans, buses, and low-speed vehicles so that drivers can see directly behind the vehicle when backing up. By September 2012, 10 percent of new vehicles must comply with the new requirements of the proposed rule, 40 percent by September 2013 – and 100 percent by September 2014.  Some vehicles can now be purchased with surround-vision safety systems that show a 360 degree view of everything near the vehicle’s proximity.   

The Rise of Rearview Video Will Improve Safety

To meet the new standard, the National Highway Traffic Safety Administration (NHTSA) projects that automobile manufacturers will install rear mounted video cameras and in-vehicle displays.  NHTSA’s preliminary assessment was that rearview video (RV) systems have greater potential to improve vehicles’ rear visibility than sensor-based rear object detection systems and rear-mounted convex mirrors.

Rearview video represents the latest in driver-assistance safety, merging infotainment and communication technologies. A growing number of vehicles in the U.S. – and around the world – are already equipped with RV systems which permit a driver to see much of the area behind the vehicle using an in-car video display that shows the image from a video camera mounted on the rear of the vehicle.  Aftermarket RV systems are also available and retail for about $60 to $200.

Rearview Video System Design Challenges

Designers of RV systems must be concerned with performance issues and robustness in order to meet automotive makers’ strict requirements.  For starters, the rearview camera must be as small as possible to fit seamlessly into the vehicles’ current body structure. Designers of OEM systems have been hard-pressed to eliminate some of the dozen or more electronic components that go into an RV camera.  However, ICs are being introduced to help reduce overall assembly size by eliminating numerous discrete electronic components in favor of an integrated approach.  For example, a new RV camera from a leading manufacturer integrates high-speed video reconstruction filters with short-to-battery protection in a package that reduces space requirements by 90% when compared to an equivalent discrete solution.   Space will be even a more critical for future surround-vision safety systems using four or more cameras per vehicle.

A second pressing design issue centers on the robustness of the signal chain, as RV systems must be virtually indestructible. A robust signal chain is required to survive an overvoltage condition up to 18 V, be ESD-hardened, and operate without interruption in the presence of large common-mode voltage noise. Since video, ground, and battery signals all run on the same long cable assembly, one of the most severe faults for any video signal is to be shorted directly to the battery voltage. This short to battery could not only damage the RV or surround-vision system but also the more expensive head-unit video systems. 

In addition, sources of transient noise abound: windshield wipers, power windows, and the A/C compressor motors turning on and off – all cause current and voltage spikes in chassis ground resulting in common-mode error voltages that can wreak havoc on RV systems. Noise from these sources can degrade image quality and even damage electronic systems. Anything outside of a regular video signal can be seen as noise, but, regardless of interference source, car manufacturers expect OEMs to maintain stringent requirements regarding robustness. Most often harmful voltage surges come from electrostatic discharge, or ESD, a single, fast, high current transfer of electrostatic charge. ESD can permanently damage electronic systems. While most manufactures install safeguards, ESD-hardened ICs provide an extra level of robustness.

New automotive operational amplifier and analog video filter ICs, such as Analog Devices’ ADA4830 series of difference amplifiers and the ADA4433/32 series of video reconstruction filters, offer integrated short-to-battery protection, large common-mode rejection, and heightened ESD tolerance in small footprints. These devices integrate many costly, bulky discrete components such as capacitors, diodes, transistors and switches that typically protect standard operational amplifiers. The fault detection output of these new integrated amplifiers and video filters allows for proactive and speedy diagnosis;  eliminating discrete electronic components allows system designers to shave about 20 percent from their component costs while saving up to 90% PCB real estate – an important feature for today’s rearview video and tomorrow’s surround-vision safety systems.

In just a few short years rearview video – and eventually surround-vision -- will be a reality for all cars. RV system designers can save space, time, and resources now by adopting integrated ICs that also offer crucial overvoltage protection along with excellent picture quality, low power consumption, and diagnostic features. Most importantly, the new generation of rearview video and communication systems can help to end the tragedy of back-over deaths in the U.S. and around the world. Robust video amplifier integrated circuits (ICs) are arriving to meet the mandate for automobile rearview and, increasingly, surround-view safety standards; new ICs protect sensitive video circuitry from a short to the battery voltage in equipment that will help save lives.

Don Nisbett is a marketing engineer in the High Speed Signal Conditioning Group. Prior to his current position, he held product engineering and applications engineering responsibilities, respectively.  He has worked at Analog Devices Inc since 2002, following his graduation from Worcester Polytechnic Institute with a Bachelor of Science degree in Electrical Engineering.

Image Sensors Evolve to Meet Emerging Embedded Vision Needs - Part 2: HDR Processing

Bookmark and Share

Image Sensors Evolve to Meet Emerging Embedded Vision Needs - Part 2: HDR Processing

by Michael Tusch
Founder and CEO
Apical Limited
This article was originally published at EDN Magazine. It is reprinted here with the permission of EDN. It was adapted from Michael's technical article on the Embedded Vision Alliance website.

Part 1 of this article looks at examples of embedded vision and how the technology transition from elementary image capture to more robust image analysis, interpretation and response has led to the need for more capable image sensor subsystems.

In any review of image sensor technology, it's often important to discuss so-called HDR (High Dynamic Range) or WDR (Wide Dynamic Range) sensors. Many embedded vision applications (as with photography and other image capture applications in a more general sense) require robust functionality even with challenging real-life scenes. HDR and WDR mean the same thing –it's just a matter of how you use each axis of your dynamic range graph. I'll employ the common terminology "HDR" throughout this discussion.

Cameras, even high-end DSLRs, aren't often able to capture as much information in high contrast scenes as our eyes can discern. This fact explains why we have rules of photography such as "make sure the sun is behind you." Although conventional image sensors struggle in such conditions, the industry has devoted significant work over many years to developing HDR sensors that extend the raw image capture capability. The reliability of the image capture component is, of course, one key element of the overall system performance.

The dynamic range (DR) of the sensor is the ratio of the brightest pixel intensity to the darkest pixel intensity that the camera can capture within a single frame. This number is often expressed in decibels (dB), i.e.,

DR in dB = 20 * log10 (DR)

The human eye does very well with respect to dynamic range and, depending on exactly how the quantity is measured, is typically quoted as being able to resolve around 120 to 130 dB (i.e., ~20 bits) in daytime conditions.

Image sensors are analog devices that convert pixel intensities to digital values via an analog-digital converter (ADC). The bit depth of the output pixels sets an upper limit on the sensor's dynamic range (Table A). In reality, the maximum dynamic range is never quite achieved, since in practice the noise level takes up to ~2 bits off the useful pixel range.

Type of sensor


Maximum intensity levels recorded

Maximum sensor dynamic range (dB)

Very low-cost standard




Average standard




Higher quality standard












Table A. The dynamic range potential of various image sensor types

Standard CMOS and CCD sensors achieve up to ~72 dB dynamic range. This result is sufficient for the great majority of scene conditions. However, some commonly encountered scenes exist which overwhelm such sensors. Well-known examples are backlit conditions (i.e., a subject standing in front of a window), outdoor scenes with deep shadows and sunsets, and nighttime scenes with bright artificial lights (Figure B).

Figure B: This backlit scene has a dynamic range of around 80 dB.

Such scenes typically exhibit a dynamic range of around 100 dB and, in rare cases, up to 120 dB (Figure C). If captured with a conventional sensor, the image either loses detail in shadows or has blown-out (i.e., clipped, also known as "blooming") highlights.

Figure C: This high-contrast scene has a dynamic range of around 100 dB.

Numerous attempts have been made to extend standard CMOS and CCD image sensor technologies, overcoming the limitations of pixel sensitivity and ADC precision, in order to capture such scenes. Pixim developed the first really successful HDR sensor, based on CCD technology, and it was the industry standard for many years. However the technology, which effectively processes each pixel independently, is somewhat costly.

More recently, other vendors have concentrated on sensors constructed from more conventional CMOS technology. Numerous different solutions are available; the remainder of this essay will survey the main vendors and the techniques that they employ.

Available solutions

Multi-frame HDR is an HDR method that does not rely on custom CMOS or CCD technology. Acting as a sort of video camera, the sensor is programmed to alternate between long and short exposures on a frame-by-frame basis, with successive images blended together by the image sensor processor (ISP) in memory to produce a single HDR image (Figure D). If the blending algorithm is robust, an exposure ratio multiple of around 16 is comfortably achievable, adding an extra 4 bits to the single-exposure dynamic range. For example, by using multi-frame HDR, a 12-bit sensor-based system can produce images characteristic of a 16-bit sensor.

Figure D: Blending together short- and long-exposure versions of a scene creates a multi-frame HDR result.

As with all HDR technologies, there is a catch. In this particular case, it is the potential generation of motion artifacts, most noticeable as "ghosting" along the edges of objects that have moved between the two frames. Such artifacts are very expensive to even partially eliminate, although processing in the ISP can significantly suppress their appearance. Further, the effective frame rate is reduced. If the input frame rate is 60 fps, the output can remain at 60 fps, but highlights and shadows will exhibit an effective frame rate closer to 30 fps, and mid-tones will be somewhere between 30 and 60 fps depending on how clever the blending algorithm is.

A related approach is employed by the AltaSens A3372 12-bit CMOS sensor, which uses a "checkerboard" pixel structure, wherein alternating Bayer pattern (RGGB) quad-pixel clusters are set to long- and short-exposure configurations (Figure E). In HDR scenes, the long-exposure pixels capture dark information, while short-exposure pixels handle bright details.

Figure E: The AltaSens A3372 checkerboard array devotes alternating quad-pixel clusters to capturing dark and light scene details.

Long exposure delivers improved signal-to-noise but results in the saturation of pixels corresponding to bright details; the short exposure pixels conversely capture the bright details properly. Dynamic range reaches ~100 dB. The cost of HDR in this case is the heavy processing required to convert the checkerboard pattern to a normal linear Bayer pattern. This reconstruction requires complex interpolation because, for example, in highlight regions of an HDR image, half of the pixels are missing (clipped). An algorithm must estimate these missing values.

While such interpolation can occur with remarkable effectiveness, some impact on effective resolution inevitably remains. However, this tradeoff is rather well controlled, since the sensor only needs to employ the dual-exposure mode when the scene demands it; the A3372 reverts to non-HDR mode when it's possible to capture the scene via the standard 12-bit single-exposure model.

A very different HDR method is the so-called "companding" technique employed by sensors such as Aptina's MT9M034 and AR0330, along with alternatives from other vendors. These sensors use line buffers to accumulate multiple exposures (up to 4, in some cases) line-by-line. The output pixels retain a 12-bit depth, set by the ADC precision, but those 12 bits encompass up to 20 (or more) effective bits of linear intensity data. Companding is conceptually similar to the way that gamma correction encodes 2 bits of additional data in a color space such as sRGB. Inverting this non-linear data structure enables obtaining an HDR Bayer-pattern image.

This method produces the highest dynamic ranges; one vendor claims 160 dB. But it again comes with associated costs. First, the data inversion relies on a very accurate and stable knowledge of where the various exposures begin and finish. In practice, imperfections lead to noise at specific intensity levels that can be hard to eliminate. Second, the sequential exposures in time create motion artifacts. These can be suppressed but are difficult to remove. Standard techniques for flicker avoidance (such as "beating" with the 50Hz or 60Hz flicker of indoor lighting) also don't work when more than one exposure time exists.

Yet another HDR sensor implementation is the dual-pixel structure employed by OmniVision in sensors such as the OV10630. It consists of a non-Bayer array of pixels made up of two physically different types: a "dark" pixel and a "bright" pixel, which can be of different sizes. The dark pixels are more sensitive to light and therefore handle dark areas well, with good signal-to-noise. Conversely, the bright pixels are less light-sensitive and therefore don't saturate as readily in bright regions.

In principle, the dual-pixel approach is a very "clean" HDR technology. It avoids motion artifacts and requires no complex non-linear processing. Penalties include the fact that two pixels are blended into one, so the effective resolution is half of the actual resolution. The dual-pixel structure is also more costly on a per-pixel basis, and the output raw pixel pattern cannot be processed by standard ISPs.

More generally, each of the sensor types described in this discussion requires a different image-processing pipeline to convert its captured images into a standard output type. This fact means that it is not typically possible to directly connect an HDR sensor to a standard camera ISP and obtain an HDR result. Figure F shows the pipelines for Bayer-domain processing of the multi-frame AltaSens- and Aptina-type HDR sensors' raw inputs. Standard processing is possible subsequent to color interpolation.

Figure F: The image processing flow varies depending on what type of HDR sensor is being employed.

Obtaining genuine HDR imagery is also not just a matter of leveraging an HDR sensor coupled with an HDR ISP. For scenes with dynamic range beyond 100 dB, optics also plays a central role. Unless the lens is of sufficient quality and the optical system has the necessary internal anti-reflection coatings to prevent back-reflection from the sensor, it is impossible to avoid flare and glare in many HDR scenes, creating artifacts that effectively negate much of the sensor's capture capabilities. To put it simply, building a HDR camera suitable for the full range of scene conditions is not inexpensive.

In conclusion, a variety of sensor and ISP technologies exist for capturing and processing HDR imagery. They all involve some kind of image quality trade-off in exchange for the extended dynamic range, either in resolution or in time. It is worth remembering that although the technology may be elaborate, the purpose is simply to extend effective pixel bit depth and reduce noise. To see this, compare the images shown in Figure G.

Figure G: A comparison of two images reveals HDR shadow strengths.

The first image was captured using a 12-bit CMOS sensor in normal mode. The second image harnesses the exact same sensor but employs the multi-exposure mode discussed earlier. The effect of the HDR mode closely resembles that of noise reduction. In the first image, strong local tone mapping is used to increase the digital gain so that shadows are visible, while exposure is kept low enough to avoid highlight clipping. This technique in effect captures the window area at ISO 100 and the shadow area at ISO 3200, and it does not require any non-standard capture technology. The HDR image, conversely, obtains the same exposure values for shadows and highlights, but this time by varying the exposure times, leading to greater sensitivity and lower noise in the shadow region.

High-performance temporal and spatial noise reduction technology can extend dynamic range by up to ~12 dB. And high-performance dynamic range compression technology can map input dynamic range to a standard output without loss of information. So a standard 12-bit CMOS sensor with good noise reduction can achieve around 84 dB, which is "pretty good HDR", while a 14-bit CMOS sensor with good noise reduction can achieve nearly 100 dB, which is "mainstream HDR". However, HDR-specific sensors are required for truly high dynamic range scenes.

Michael Tusch is founder and CEO of Apical Limited, a UK-based technology company specializing in image and video processing technology. He started his career as a researcher in semiconductor quantum theory at Oxford University before moving to industry, first with the Boston Consulting Group and later holding several technology management positions before founding Apical in 2001.

Image Sensors Evolve to Meet Emerging Embedded Vision Needs - Part 1

Bookmark and Share

Image Sensors Evolve to Meet Emerging Embedded Vision Needs - Part 1

By Brian Dipert
Embedded Vision Alliance
and Eric Gregori and Shehrzad Qureshi
Senior Engineers
This article was originally published at EDN Magazine. It is reprinted here with the permission of EDN. It was adapted from Eric and Shehrzad's technical trends presentation at the March 2012 Embedded Vision Alliance Member Summit.

In Part 1 of this two-part series put together by Embedded Vision Alliance editor-in-chief Brian Dipert and his colleagues Eric Gregori and Shehrzad Qureshi at BDTI, we look at examples of embedded vision and how the technology transition from elementary image capture to more robust image analysis, interpretation and response has led to the need for more capable image sensor subsystems.

In Part 2, "HDR processing for embedded vision," by Michael Tusch of Apical Limited, an EVA member, we discuss the dynamic range potential of image sensors, and the various technologies being employed to extend the raw image capture capability.

Look at the systems you're designing, or more generally at the devices that surround your life, and you're likely to see a camera or few staring back at you. Image sensors and their paired image processors are becoming an increasingly common presence in a diversity of electronic products. It's nearly impossible to purchase a laptop computer without a bezel-mount camera, for example, and an increasing percentage of all-in-one desktop PCs, dedicated computer displays and even televisions are now also including them.

Smartphones and tablets frequently feature image sensors, too, often located on both the front and back panels, and sometimes even arranged in "stereo" configurations for 3-D image capture purposes. And you'll even find cameras embedded in portable multimedia players and mounted in cars, among myriad other examples.

Application abundance

The fundamental justification for including the camera(s) in the design is often for elementary image capture purposes, notably still photography, videography, and videoconferencing. However, given that the imaging building blocks are already in place, trendsetting software and system developers are also leveraging them for more evolved purposes, not only capturing images but also discerning meaning from the content and taking appropriate action in response to the interpreted information.

In the just-mentioned vehicle case study, for example, an advanced analytics system doesn't just "dumbly" display the rear-view camera's captured video feed on a LCD but also warns the driver if an object is detected behind the vehicle, even going so far (in advanced implementations) as to slam on the brakes to preclude impact.

Additional cameras, mounted both inside the vehicle and in various locations around it, alert the driver to (and, in advanced implementations, take active measures to avoid) unintended lane transitions and collisions with objects ahead, as well as to discern road signs' meanings and consequently warn the driver of excessive speed and potentially dangerous roadway conditions. And they can minimize distraction by enabling gesture-interface control of the radio and other vehicle subsystems, as well as to snap the driver back to full attention if he or she is distracted by a text message or other task that has redirected eyeballs from the road, or has dozed off.

In smartphones, tablets, computers and televisions, front-mounted image sensors are now being employed for diverse purposes. They can advise if you're sitting too close or too far away from a display and if your posture is poor, as well as preventing extinguish of the backlight for as long as they sense you're still sitting in front of them (and conversely auto-powering down the display once you've left). Gesture interfaces play an increasingly important role in these and other consumer electronics applications such as game consoles, supplementing (if not supplanting) traditional button and key presses, trackpad and mouse swipes and clicks, and the like.

A forward-facing camera can monitor your respiration (by measuring the chest rise-and-fall cadence) and heart rate (by detecting the minute cyclical face color variance caused by blood flow), as well as advise if it thinks you've had too much to drink (by monitoring eyeball drift). It can also uniquely identify you, automatically logging you in to a system when you appear in front of it and loading account-specific programs and settings. Meanwhile, a rear-mount camera can employ augmented reality to supplement the conventional view of an object or scene with additional information.

These are all examples of embedded vision, a burgeoning application category that also extends to dedicated-function devices such as surveillance systems of numerous types, and manufacturing line inspection equipment. In some cases, computers running PC operating systems historically handled the vision analytics task; this was a costly, bulky, high power and unreliable approach. In other situations, for any or all of these reasons, it has been inherently impractical to implement vision functionality.

Nowadays, however, the increased performance, decreased cost and reduced power consumption of processors, image sensors, memories and other semiconductor devices has led to embedded vision capabilities being evaluated in a diversity of system form factors and price points. But it's also led to the need for increasingly robust imaging subsystems (see sidebar "Focus: the fourth dimension").

Dynamic range versus resolution

For many years, the consumer digital camera market, whose constituent image sensors (by virtue of their high volumes and consequent low prices) also find homes in many embedded vision systems, was predominantly fueled by a "more megapixels is better" mentality. However, in recent times, the limitations of such a simplistic selection strategy have become progressively more apparent.

For one thing, consumers increasingly realize that unless they're printing wall-sized enlargements or doing tight crops of a source photo, they don't need high-resolution pictures that take up significantly more storage space than their lower-resolution precursors. And for another, the noisy and otherwise artifact-filled images generated by modern cameras reveal the lithography-driven downside of the increasing-resolution trend.

Sensors need to remain small in overall dimensions in order to remain cost-effective, a critical attribute in consumer electronics systems. As manufacturers shoehorn an increasing number of pixels onto them, individual pixel dimensions must therefore predictably shrink. Smaller pixels are capable of collecting fewer photons in a given amount of time, thereby leading to decreased light sensitivity. And this phenomenon not only degrades a camera's low-light performance, it also adversely affects the system's dynamic range, a compromise which can only be partially compensated for by post-processing, and then often only with motion artifacts and other tradeoffs (see Part 2 "HDR processing for embedded vision").

Ironically, embedded vision-focused applications tend to have lower resolution requirements than does general-purpose photography. The infrared and visible light image sensors in the Microsoft Kinect, for example, are VGA (640x480 pixel) resolution models, and the vision peripheral only passes along QVGA (320x240 pixel) depth map images to the connected game console or PC.

Given the plethora of available pixels in modern sensors, some suppliers leverage the surplus to both improve light sensitivity and color accuracy, by migrating beyond the conventional Bayer RGB pattern filter array approach (Figure 1). Additional (and altered) filter colors claim to enhance full-spectrum per-pixel interpolation results, while monochrome (filter-less) pixels let in even more light at the tradeoff of dispensing with color discernment (Reference 1).

Figure 1: The Bayer sensor pattern, named after a Kodak imaging scientist, remains the predominant filter array scheme used in digital imaging applications (a). More modern approaches increase the number of green-spectrum pixels in a random pattern for enhanced detail in this all-important portion of the visible light spectrum (b), leverage subtractive colors (at the tradeoff of greater required post-processing) in order to improve the filters' light transmission capabilities (c) and even add filter-less monochrome pixels to the mix (d).

Leica's latest digital still camera takes filter alteration to the extreme; this particular model captures only black-and-white images by virtue of its filter-less monochrome image sensor (Figure 2a). Reviewers, however, espouse the sharpness of its photographs even at high ISO settings.

Meanwhile, Nokia's 808 PureView smartphone (Figure 2b) embeds a 41 Mpixel image sensor but by default outputs 8 Mpixel or smaller-resolution images (Reference 2). By combining multiple pixels in variable-sized clusters depending on the digital zoom setting, the 808 PureView dispenses with the need for a complex, costly and bulky optical zoom structure, and it also multiplies the effective per-image-pixel photon collection capability for improved low light performance.

Implementation specifics aside, a new sensor design technique pioneered by Sony called backside illumination (Figure 2c) routes the inter-pixel wiring behind the pixels' photodiodes as a means of improving the sensor's per-pixel fill factor (the percentage of total area devoted to light collection) and therefore its low-light capabilities.

Figure 2: Leica's M Monochrom will set you back $8,000 or so and only captures black-and-white images, but reviewers rave at its sharpness and low-light performance (a). The Nokia 808 PureView Smartphone packs an enormous 41 Mpixel image sensor, which it uses to implement digital zoom and clustered-pixel photon collection capabilities (b). A Sony-pioneered sensor design technique called backside illumination routes the inter-pixel wiring behind the photodiodes, thereby improving the per-pixel fill factor ratio (c).

Depth discernment

A conventional image sensor-based design is capable of supporting many embedded vision implementations. It can, for example, assist in interpreting elementary gestures, as well as tackle rudimentary face detection and recognition tasks, and it's often adequate for optical character recognition functions. However, it may not be able to cleanly detect a more elaborate gesture that incorporates movement to or away from the camera (3-D) versus one that's completely up-and-down and side-to-side (2-D). More generally, it's unable to fully discern the entirety (i.e., the depth) of an object; it can't, for example, easily distinguish between a person's face and a photograph of that same person. For depth-cognizant requirements like these, a 3-D image sensor subsystem is often necessary (Reference 3).

Regardless of the specific 3-D sensor implementation, the common output is the depth map, an image matrix in which each pixel data entry (which is sometimes additionally color-coded for human interpretation purposes) represents the distance between the sensor and a point in front of the sensor (Figure 3). Each depth map frame commonly mates with a corresponding frame captured by a conventional 2-D image sensor, with the two frames parallax-corrected relative to each other, since they source from separate cameras in separate locations.

Figure 3: Regardless of the specific 3-D camera technique employed, they all output a depth map of objects they discern (a). A device such as the HTC EVO 3D smartphone, whose stereo sensor array is primarily intended to capture 3-D still and video images, can also be used for 3-D embedded vision purposes (b). Microsoft's Kinect harnesses the structured light method of discerning depth (c), by projecting a known infrared light pattern in front of it and then analyzing the shape and orientation of the ellipses that it sees (d).

One common method of discerning depth is by means of a stereo sensor array, i.e., a combination of two image sensors that are spaced apart reminiscent of the arrangement of a pair of eyes. As with the eyes and brain, the differing-perspective viewpoints of a common object from both sensors are processed by the imaging SoC in order to assess distance between the object and the sensor array. It's possible to extend the concept beyond two sensors, in multi-view geometry applications. The dual-sensor approach is often the lowest cost, lowest power consumption and smallest form factor candidate. And it's particularly attractive if, as previously mentioned, it's already present in the design for 3-D still image photography and video capture purposes.

One stereo imaging implementation is the discrete "binocular" approach, achieved by "bolting" two cameras together. Although this version of the concept may be the most straightforward from a hardware standpoint, the required software support is more complex. By means of analogy, two motorcycles linked together with a common axle do not make a car! The cameras require calibration for robust image registration, and their frames need to be synchronized if, as is commonly the case, either the camera array or the subject is moving.

Alternatively, it's possible to combine two image sensors in a unified SoC or multi-die package, outputting a combined data stream over a single bus. The advantages of the fully integrated approach include improved control and frame synchronization. The tighter integration leads to improved calibration, which in turn yields better stereo imaging results manifested as increased depth perception and faster image processing.

Projection approaches

Structured light, the second common 3-D sensor scheme, is the technology employed by Microsoft's Kinect. The method projects a predetermined pattern of light onto a scene, for the purpose of analysis. 3-D sensors based on the structured light method use a projector to create the light pattern and a camera to sense the result.

In the case of the Microsoft Kinect, the projector employs infrared light. Kinect uses an astigmatic lens with different focal lengths in the X and Y direction. An infrared laser behind the lens projects an image consisting of a large number of dots that transform into ellipses, whose particular shape and orientation in each case depends on how far the object is from the lens.

Advantages of the structured light approach include its finely detailed resolution and its high accuracy, notably in dimly illuminated environments where visible light spectrum-focused image sensors might struggle to capture adequate images. Structured light software algorithms are also comparatively simple, versus the stereo sensor approach, although the concomitant point cloud processing can be computationally expensive, approaching that of stereo vision.

Conversely, the structured light technique's reliance on infrared light, at least as manifested in Kinect, means that it will have issues operating outdoors, where ambient infrared illumination in sunlight will destructively interfere with the light coming from the projector. The projector is also costly and bulky, consumes a substantial amount of power and generates a large amount of heat; Kinect contains a fan specifically for this reason (Reference 4). And further adding to the total bill of materials cost is the necessary custom projector lens.

Time-of-flight is the third common method of implementing a 3-D sensor. As with structured light, a time-of-flight camera contains an image sensor, a lens and an active illumination source. However, in this case, the camera derives range (i.e., distance) from the time it takes for projected light to travel from the transmission source to the object and back to the image sensor (Reference 5). The illumination source is typically either a pulsed laser or a modulated beam, depending on the image sensor type employed in the design.

Image sensors that integrate digital time counters typically combine with pulsed laser beams, as do shutter-inclusive range-gated sensors. In the latter case, the shutter opens and closes at the same rate with which the projector emits the pulsed beam. The amount of light each image sensor pixel "sees" is therefore related to the distance the pulse travelled, hence the distance from the sensor to the object. Time-of-flight designs that include image sensors with phase detectors conversely use modulated beam sources. Since the strength of the beam varies over time, measuring the incoming light's phase indirectly derives the time-of-flight distance.

Time-of-flight cameras are common in automotive applications such as pedestrian detection and driver assistance, and are equally prevalent in robotics products. They also have a long and storied history in military, defense, and aerospace implementations. The required image processing software tends to be simpler (therefore more real-time amenable) than that needed with stereo camera setups, although the time-of-flight technique's susceptibility to ambient illumination interference and multiple reflections somewhat complicates the algorithms. Frame rates can be quite high, upwards of 60 fps and a speed that's difficult to achieve with stereo imager setups, but the comparative resolution is usually lower. And, as with the structured light technique, the required time-of-flight light projector translates into cost, power consumption (and heat dissipation), size and weight downsides.

Industry alliance can help ensure design success

As previously showcased, embedded vision technology has the potential to enable a wide range of electronic products that are more intelligent and responsive than before, and thus more valuable to users. It can add helpful features to existing products. And it can provide significant new markets for hardware, software, semiconductor and systems manufacturers alike. A worldwide organization of technology developers and providers, the Embedded Vision Alliance (see Sidebar B), was formed beginning last summer to provide engineers with the tools necessary to help speed the transformation of this potential into reality.


  1. www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/documents/pages/selecting-and-designing-image-sensor-
  2. www.embedded-vision.com/news/2012/03/05/nokias-808-pureview-technical-article-and-several-videos-you
  3. www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/videos/pages/march-2012-embedded-vision-alliance-summ
  4. www.ifixit.com/Teardown/Microsoft-Kinect-Teardown/4066/1
  5. www.embedded-vision.com/news/2012/03/01/time-flight-samsungs-new-rgb-image-sensor-also-has-depth-sight

Sidebar A: Focus: the fourth dimension

Conventional digital cameras, regardless of which 2-D or 3-D image sensor technology they employ, are all hampered by a fundamental tradeoff that has its heritage in "analog" film. The smaller the lens aperture (or said another way, the larger the aperture setting number), the broader the depth of field over which objects will be in focus. But with a smaller aperture, the lens passes a smaller amount of light in a given amount of time, translating into poor low-light performance for the camera.

Conversely, the wider the aperture (i.e., the smaller the aperture setting number), the better the low-light performance, but the smaller the percentage of the overall image which will be in focus. This balancing act between depth of focus and exposure has obvious relevance to embedded vision applications, which strive to accurately discern the identities, locations and movements of various objects and therefore need sharp source images.

Plenoptic imaging systems, also known as "light field" or "integral" imagers (with the Lytro Light Field camera the most well known consumer-tailored example) address the depth of field-versus-exposure tradeoff by enabling post-capture selective focus on any region within the depth plane of an image (Figure A). The approach employs an array of microlenses in-between the main camera lens and the image sensor, which re-image portions of the image plane. These microlenses sample subsets of the light rays, focusing them to multiple sets of points known as microimages.

Subsequent image reconstruction processes the resulting light field (i.e., the suite of microimages) to generate a 2-D in-focus picture. Altering the process parameters allows for interactive re-focusing within a single acquired image, and also engenders highly robust and real-time 3D reconstruction such as that promoted by Raytrix with its light field cameras.

Figure A: Lytro's Light Field camera (a) is the best-known consumer example of the plenoptic image capture technique, which allows for selective re-focus on various portions of the scene post-capture (b).

The technique was first described in the early 1900s, but digital image capture coupled with robust and low-cost processing circuitry has now made it feasible for widespread implementation (Reference A). Potential plenoptic imaging applications include:

  • Optical quality inspection
  • Security monitoring
  • Microscopy
  • 3-D input devices, and
  • Both consumer and professional photography

The primary tradeoff at the present time versus conventional digital imaging involves resolution: the typical reconstructed plenoptic frame is 1 to 3 Mpixels in size. However, larger sensors and beefier image processors can improve the effective resolution, albeit at cost and other tradeoffs.

And longer term, the same lithography trends that have led to today's mainstream double-digit megapixel conventional cameras will also increase the resolutions of plenoptic variants. Also, sophisticated "super-resolution" interpolation techniques purport to at least partially mitigate low-resolution native image shortcomings.

A. http://en.wikipedia.org/wiki/Light-field_camera

Sidebar B: The Embedded Vision Alliance

The mission of the Embedded Vision Alliance is to provide practical education, information, and insights to help designers incorporate embedded vision capabilities into products. The Alliance's website (www.Embedded-Vision.com) is freely accessible to all and includes articles, videos, a daily news portal, and a discussion forum staffed by a diversity of technology experts.

Registered website users can receive the Embedded Vision Alliance's e-mail newsletter, Embedded Vision Insights, and have access to the Embedded Vision Academy, containing numerous training videos, technical papers, and file downloads, intended to enable those new to the embedded vision application space to rapidly ramp up their expertise (References 6 and 7).

A few examples of the website's content include:

  • The BDTI Quick-Start OpenCV Kit, a VMware virtual machine image that includes OpenCV and all required tools pre-installed, configured, and built, thereby making it easy to quickly get started with vision development using the popular OpenCV library (Reference 8).
  • The BDTI OpenCV Executable Demo Package, a combination of a descriptive article, tutorial video, and interactive Windows vision software demonstrations (Reference 9).
  • And a three-part video interview with Jitendra Malik, Arthur J. Chick Professor of EECS at the University of California at Berkeley and a computer vision pioneer.10

For more information, visit www.Embedded-Vision.com. Contact the Embedded Vision Alliance at info@Embedded-Vision.com and +1-510-451-1800.


  1. www.embeddedvisioninsights.com
  2. www.embeddedvisionacademy.com
  3. www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/downloads/pages/OpenCVVMwareImage
  4. www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/downloads/pages/introduction-computer-vision-using-op
  5. www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/malik-uc-interview-1

Part 2 of this article discusses HDR processing for embedded vision.

Brian Dipert is Editor-In-Chief of the Embedded Vision Alliance (info@Embedded-Vision.com, +1-510-451-1800). He is also a Senior Analyst at BDTI, the industry's most trusted source of analysis, advice, and engineering for embedded processing technology and applications, and Editor-In-Chief of InsideDSP, the company's online newsletter dedicated to digital signal processing technology. Brian has a bachelors degree in Electrical Engineering from Purdue University in West Lafayette, IN. His professional career began at Magnavox Electronics Systems in Fort Wayne, IN; Brian subsequently spent eight years at Intel Corporation in Folsom, CA. He then spent 14 years (and five months) at EDN Magazine.

Eric Gregori is a Senior Software Engineer and Embedded Vision Specialist with BDTI. He is a robot enthusiast with over 17 years of embedded firmware design experience. His specialties are computer vision, artificial intelligence, and programming for Windows Embedded CE, Linux, and Android operating systems. Eric authored the Robot Vision Toolkit and developed the RobotSee Interpreter. He is working towards his Masters in Computer Science and holds 10 patents in industrial automation and control.

Shehrzad Qureshi is a Senior Engineer with BDTI and Principal of Medallion Solutions LLC. He has an M.S. in Computer Engineering from Santa Clara University, and a B.S. in Computer Science from the University of California, Davis. He is an inventor or co-inventor on 8 U.S. issued patents, with many more pending. Earlier in his career, Shehrzad was Director of Software Engineering and Acting Director of IT at Restoration Robotics. Prior to Restoration Robotics, Shehrzad was the Software Manager at biotech startup Labcyte. Shehrzad has held individual contributor positions at Accuray Oncology, where he worked in medical imaging, and Applied Signal Technology, where he worked on classified SIGINT defense programs. He is the author of Embedded Image Processing on the TMS320C6000 DSP (Springer, 2005).

The OpenCV Foundation: Gary Bradski Provides More Information

By Brian Dipert
Embedded Vision Alliance
Senior Analyst

Building Machines That See: Finding Edges in Images

Bookmark and Share

Building Machines That See: Finding Edges in Images

By Eric Gregori
Senior Software Engineer
This article was originally published at EE Times' Embedded.com Design Line. It is reprinted here with the permission of EE Times.

With the emergence of increasingly capable low-cost processors and image sensors, it’s becoming practical to incorporate computer vision capabilities into a wide range of embedded systems, enabling them to analyze their environments via image and video inputs.

Products like Microsoft’s Kinect game controller and Mobileye’s driver assistance systems are raising awareness of the incredible potential of embedded vision technology. As a result, many embedded system designers are beginning to think about implementing embedded vision capabilities.

In this article, the second in a series, we introduce edge detection, one of the fundamental techniques of computer vision. Like many aspects of computer vision, edge detection sounds simple but turns out to be complex. We explore some of the common challenges and pitfalls associated with edge detection, and present techniques for addressing them. Then, after explaining edge detection from an algorithmic perspective, we show how to use OpenCV, a free open-source computer vision software component library, to quickly implement application software incorporating edge detection. For an overview of computer vision and OpenCV, read "Introduction to Embedded Vision and the OpenCV Library."

What is an "Edge" in Computer Vision?

Images, of course, are made up of individual points called pixels. In a grayscale image, the value of each pixel represents the intensity of the light captured when creating the image. An edge is defined as an area of significant change in the image intensity. Figure 1 below shows a grayscale image on the left, with its corresponding binary edge map on the right. If you look carefully, you will notice that the white lines in the edge map delineate the portions of the image separated by pixels of different intensity.

Figure 1. Grayscale image and its corresponding binary edge map

If you graph the values of a single row of pixels containing an edge, the strength of the edge is shown by slope of the line in the graph. We use the term "hard" to describe an edge with a large difference between nearby pixel values, and "soft" to describe small differences in close-proximity pixel values.

Figure 2. Grayscale soft edge

Figure 2 above and Figure 3 below represent eight pixels from two different parts of a grayscale image. The numeric value for each pixel is shown above an expanded view of the actual pixel. Figure 2 represents a "soft" edge while Figure 3 represents a "hard" edge.

Figure 3. Grayscale hard edge

How Edges are Used in Computer Vision

The goal of a computer vision application is to extract meaning from images. In general, this objective is first accomplished by finding relevant features in the images. Edges are one type of image feature.

For example, edges can be used to find the boundaries of objects in an image, in order to enable isolating those objects for further processing. This process is known as "segmentation." After an image is segmented, objects can be individually analyzed or discarded depending on what the application is trying to accomplish.

Unfortunately, in many applications, not all edges in an image actually represent separate objects, and not all objects are separated by clean edges. As mentioned earlier, edges are graded on a scale from no edge to "soft" edge, and finally to "hard" edge. Somewhere in the algorithm, a decision has to be made regarding whether a given edge is "hard" enough to be classified as a true edge versus, for example, just a small change in brightness due to lighting variation.

Finding Edges in an Image: The Canny Algorithm

Many algorithms exist for finding edges in an image. This example focuses on the Canny algorithm. Considered by many to be the best edge detector, the Canny algorithm was described in 1986 by John F. Canny of U.C. Berkeley. In his paper, "A Computational Approach to Edge Detection," Canny describes three criteria to evaluate the quality of an edge detection algorithm:

  1. Good detection: There should be a low probability of failing to mark real edge points, and a low probability of falsely marking non-edge points. This criterion corresponds to maximizing signal-to-noise ratio.
  2. Good localization: The points marked as edge points by the operator should be as close as possible to the center of the true edge.
  3. Only one response to a single edge: This criterion is implicitly also captured in the first one, since when there are two responses to the same edge, one of them must be considered false.

The algorithm achieves these criteria using multiple stages, as shown in Figure 4 below.

Figure 4. Block diagram of the Canny edge detection algorithm

First Stage: Gaussian Filter

In the first stage of the Canny edge detection algorithm, noise in the image is filtered out using a Gaussian filter. This step is referred to as image smoothing or blurring. The Gaussian filter removes the high frequency white noise (i.e. "popcorn noise") common with images collected from CMOS and CCD sensors, without significantly degrading the edges within the image.

The amount of blurring done by the Gaussian filter is controlled in part by the size of the filter kernel. As the filter kernel size increases, the amount of blur also increases. Too much blur will soften the edges to a point where they cannot be detected. Too little blur conversely will allow noise to pass through, which will be detected by the next stage as false edges. The series of images in Figure 5 below shows the result of applying a Gaussian filter to a grayscale image.

Figure 5a. Original image

Figure 5b. Gaussian filtered image: filter kernel size 3

Figure 5c. Gaussian filtered image: filter kernel size 5

Figure 5d. Gaussian filtered image: filter kernel size 7

Second Stage: Intensity Gradient

The second stage of the Canny edge detection algorithm calculates the intensity gradient of the image using a first derivative operator. Looking back at Figures 2 and 3, the intensity gradient corresponds to the slope of the line in the graph. The intensity gradient of an image reflects the strength of the edges in the image.

Strong edges have larger slopes and therefore have larger intensity gradients. The slope of a line can be calculated using a first derivative operator. The original Canny paper tested various first derivative operators; most modern Canny implementations (including the one in the OpenCV library) use the Sobel operator (Figure 6, below).

Figure 6. Vertical derivative of the image shown in Figure 5b, calculated using the Sobel operator

The Sobel operator separately returns the first derivative of the image in the horizontal and vertical directions. This process is done on every pixel in the image. The Sobel operator results in two images, with each pixel in the first image representing the horizontal derivative of the input image and each pixel in the second image representing the vertical derivative, as illustrated in Figure 6 above and Figure 7 below.

Figure 7. Horizontal derivative of the image shown in Figure 5b, calculated using the Sobel operator

To calculate the overall image gradient requires converting the horizontal and vertical scalar values to a vector value comprising a magnitude and an angle. Given horizontal and vertical scalars, the vector can be calculated using the equations in Figure 8 below.

Figure 8. Converting the horizontal and vertical first derivative scalar values to a vector

The equations in Figure 8 are applied to each pixel, resulting in an image gradient such as the one shown in Figure 9 below.

Figure 9. Image gradient of the image shown in Figure 5b

Each pixel has a magnitude and angle value, although Figure 9 shows only the magnitude; the angle information is not included. Sharper edges have higher magnitudes.

Note how thick the edges in Figure 9 are. This is a by-product of the method that the Sobel operator uses to calculate the first derivative, which employs a group of pixels. The minimum number of pixels is nine, organized as a 3-pixel-by-3-pixel square cluster.

This 3x3 group of pixels is used to calculate the derivative of the single pixel in the center of the group. The group size is important because it has an effect both on the performance of the operator and on the number of computation operations required. In this case, a smaller operator creates the cleanest edges.

Third Stage: Non-Maximum Suppression

The next stage in the Canny algorithm thins the edges created by the Sobel operator, using a process known as non-maximum suppression. Non-maximum suppression removes all pixels that are not actually on the edge "ridge top," thereby refining the thick line into a thin line. Simply put, non-maximum suppression finds the peak of the cross section of the thick edge created by the Sobel operator. Figure 10 below shows a graph of a single row of pixels from an edge detected in Figure 9.

Figure 10. Thick edges caused by the Sobel operator

The non-maximum suppression method finds the peak of the edge by scanning across the edge using the angle data calculated by the Sobel operator, looking for a maximum magnitude. Any pixels less than the maximum magnitude are set to zero. The result is shown in Figure 11 below. Notice that the edges have been thinned, but they are still not perfect.

Figure 11. Thinned edge using non-maximum suppression

Fourth Stage: Hysteresis

The final stage of the Canny edge detector is referred to as hysteresis. In this stage, the detector uses high (T2) and low (T1) threshold values to filter out pixels in the gradient image left over from the previous stage that are not part of an edge.

The algorithm first finds a gradient in the image greater than T2. It then follows the gradient, marking each pixel that is greater than T1 as a definite edge. The algorithm requires a gradient greater than the high threshold to begin following the edge, but it will continue to follow the edge as long as the gradient stays above the low threshold.

Any pixels with gradients below T2 that are not connected to gradients above T1 are rejected as edges. This hysteresis helps to ensure that noisy edges are not broken up into multiple edge fragments.

Finding Edges in an Image Using OpenCV

OpenCV is an open-source software component library for computer vision application development, and a powerful tool for prototyping embedded vision algorithms. Originally released in 2000, it has been downloaded over 3.5 million times. The OpenCV library comprises more than 2,500 functions and contains dozens of valuable vision application examples. The library supports C, C++, and Python, and has been ported to Windows, Linux, Android, Mac OS X, and iOS.

The great thing about OpenCV is how much work it does for you behind the scenes. Some of the leading minds in the field have contributed to the OpenCV library, making it a very powerful tool. You can find the edges in a grayscale image using the OpenCV Canny() function described here (from the OpenCV documentation):

void Canny(InputArray image, OutputArray edges, double threshold1, double threshold2, int apertureSize=3, bool L2gradient=false)


Single-channel 8-bit input image.


Output edge map. It has the same size and type as image.


First threshold for the hysteresis procedure.


Second threshold for the hysteresis procedure.


Aperture size for the Sobel() operator.


Flag indicating method used to calculate magnitude of gradient.

The input image must be a grayscale image. An example of how to convert a color image to a grayscale image will be discussed in detail later in this article. The output will also be a grayscale image with edges marked as 1 and non-edges marked as 0.

threshold1 and threshold2 are the upper (T1) and lower (T2) Canny thresholds used in the fourth (hysteresis) stage of the algorithm.

apertureSize sets the size of the Sobel operator used in stage two of the Canny algorithm. In the OpenCV implementation, the apertureSize size can be 3, 5, or 7, representing operator sizes of 3x3, 5x5, and 7x7, respectively.

The L2gradient flag determines the method used to calculate the magnitude when combining the horizontal and vertical Sobel results, as shown in Figure 8. For performance purposes, OpenCV offers a simplified form of the magnitude equation in Figure 8, eliminating the square root operation. The equation G = |Gx| + |Gy| gives a rough approximation to G = sqrt( Gx^2 + Gy^2 ) without requiring an expensive square root operation. Setting L2gradient false uses the faster G = |Gx| + |Gy| equation to calculate the magnitude.

OpenCV Canny Edge Detector Examples

Two easy-to-use tools have been created to help developers get up and running with OpenCV. The first tool, the BDTI OpenCV Executable Demo Package, is Microsoft Windows-based and allows you to experience OpenCV using just your mouse, with no programming required. The BDTI OpenCV Executable Demo Package includes various OpenCV examples that you can execute by simply double clicking on an icon.

All of the examples have sliders that allow you to alter each demonstrated computer vision algorithm by adjusting parameters with the mouse.

You can download the BDTI OpenCV Executable Demo Package here (free registration is required). BDTI has also developed an online user guide and tutorial video for the BDTI OpenCV Executable Demo Package.

The second tool, the BDTI Quick-Start OpenCV Kit, is for engineers who want to develop their own algorithms using OpenCV and who want to get started quickly. The included VMware virtual machine image is Ubuntu Linux-based and runs on a Windows machine using the free VMware Player. OpenCV, along with all the required packages and tools, have been preinstalled in the image so the user can be up in running in minutes, with no prior Linux experience required.

The image contains Eclipse and GCC for C/C++ development, pre-installed and configured with many OpenCV examples. Eclipse is a very common and easy to use graphical integrated development environment (IDE) that makes it easy to get OpenCV up and running. The BDTI Quick-Start OpenCV Kit is a full OpenCV installation, with source code included so you can configure the library to your own specific needs. The library is prebuilt with a documented configuration, making it easy to get up and running quickly.

You can download the BDTI Quick-Start OpenCV Kit here (free registration is required). BDTI has also developed an online user guide for the BDTI Quick-Start OpenCV Kit.

The edge detection demo included in the BDTI Quick-Start OpenCV Kit is shown in Figure 12 below.

Figure 12. OpenCV Canny example showing the edge detection process

The demo software can process a static image, a video file, or the video stream from a web camera. Processing occurs in real-time so you see results on the fly. The demo opens multiple windows, representing various stages in Canny edge detection algorithm. All windows are updated in real-time. The next section describes each of the windows displayed by this demo.

Canny Edge Detector Example Window Names and Functions

Parameters: Allows you to modify the Canny parameters on the fly using simple slider controls.

The parameters controlled by the sliders are:

Low Thres: Canny Low Threshold Parameter (T2) (LowThres)

High Thres: Canny High Threshold Parameter (T1) (HighThres)

Gaus Size: Gaussian Filter Size (Fsize)

Sobel Size: Sobel Operator Size (Ksize)

The corresponding OpenCV calls are:

// smooth gray scale image using Gaussian, output to image (blurred)

GaussianBlur( gray, blurred, Size(Fsize, Fsize), 0 );

// Perform Canny edge detection on image (blurred), output edgemap to (CannyEdges)

Canny( blurred, CannyEdges, LowThres, HighThres, Ksize );

Gaussian Filter: The output of the Gaussian filter.

GradientX: The result of the horizontal derivative (Sobel) of the image in the Gaussian Filter window.

GradientY: The result of the vertical derivative (Sobel) of the image in the Gaussian Filter window.

Magnitude: The result of combining the GradientX and GradientY images using the equation G = |Gx|+|Gy|

Angle: Color-coded result of the angle equation from Figure 8 combining GradientX and GradientY using arctan(Gy/Gx).

Black = 0 degrees

Red = 1 degree to 45 degrees

White = 46 degrees to 135 degrees

Blue = 136 degrees to 225 degrees

Green = 226 degrees to 315 degrees

Red = 316 degrees to 359 degrees

The 0-degree marker indicates the left to right direction, as shown in Figure 13 below.

Figure 13. The direction color code for the angle window in the Canny edge detector example. Left to right is 0 degrees

Canny: The Canny edgemap.


Edge detection is a fundamental building block in many computer vision and embedded vision applications. Edges represent unique groups of pixels that form natural boundaries around objects. By robustly identifying edges, many vision applications make an important first step towards discerning meaning from images.

Edges can be used to separate the objects in an image, in an operation referred to as segmentation. The output of an edge detector is called an edge map. In an edge map, pixels represent the edges detected in the original image. Edge maps find use by subsequent algorithms to group and classify objects.

The Canny algorithm is an optimal algorithm (per Canny’s criteria) for finding edges in a grayscale image. The algorithm produces thin edges that are well localized, accurately indicating the position of the edge in the image. Canny is a multi-stage algorithm, but only two lines of code are required to invoke it in OpenCV.

BDTI has created two tools to help engineers get started in computer vision quickly. The Windows-based BDTI OpenCV Executable Demo Package allows you to experiment with OpenCV algorithms such as edge detection, face detection, line detection, optical flow and Canny edge detection. It requires no programming; the demos can be run with the click of a mouse.

The BDTI Quick-Start OpenCV Kit is a VMware virtual machine image with OpenCV pre-installed, along with all required tools. The kit contains the same examples as in the BDTI OpenCV Executable Demo Package, along with full source code. The BDTI Quick-Start OpenCV Kit is intended for developers who want to quickly get started using OpenCV for computer vision application development. Simply modify the provided examples to begin implementing your own algorithm ideas.

Eric Gregori is a Senior Software Engineer and Embedded Vision Specialist with Berkeley Design Technology, Inc. (BDTI), which provides analysis, advice, and engineering for embedded processing technology and applications. He is a robot enthusiast with over 17 years of embedded firmware design experience, with specialties in computer vision, artificial intelligence, and programming for Windows Embedded CE, Linux, and Android operating systems.

To read more about the topic of embedded vision, go to “Jumping on the embedded vision bandwagon.”

(Editor’s Note: The Embedded Vision Alliance was launched in May of 2011, and now has 19 sponsoring member companies: AMD, Analog Devices, Apical, Avnet Electronics Marketing, BDTI, CEVA, Cognimem, CogniVue, eyeSight, Freescale, IMS Research, Intel, MathWorks, Maxim Integrated Products, NVIDIA, National Instruments, Omek Interactive, Texas Instruments, Tokyo Electron Device, and Xilinx.

The first step of the Embedded Vision Alliance was to launch a website at www.Embedded-Vision.com. The site serves as a source of practical information to help design engineers and application developers incorporate vision capabilities into their products. The Alliance’s plans include educational webinars, industry reports, and other related activities.

Everyone is free to access the resources on the website, which is maintained through member and industry contributions. Membership details are also available at the site. For more information, contact the Embedded Vision Alliance at info@embedded-vision.com and 1-510-451-1800.)

Introduction to Embedded Vision and the OpenCV Library (Embedded.com Article)

Bookmark and Share

Introduction to Embedded Vision and the OpenCV Library (Embedded.com Article)

By Eric Gregori
Senior Software Engineer
This article was originally published at EE Times' Embedded.com Design Line. It is reprinted here with the permission of EE Times.

The term “embedded vision” refers to the use of computer vision technology in embedded systems. Stated another way, “embedded vision” refers to embedded systems that extract meaning from visual inputs.

Similar to the way that wireless communication has become pervasive over the past 10 years, embedded vision technology is poised to be widely deployed in the next 10 years. Vision algorithms were originally only capable of being implemented on costly, bulky, power-hungry computer systems, and as a result computer vision has mainly been confined to a few application areas, such as factory automation and military equipment.

Today, however, a major transformation is underway. Due to the emergence of very powerful, low-cost, and energy-efficient processors, image sensors, memories and other ICs, it has become possible to incorporate vision capabilities into a wide range of embedded systems.

Similarly, OpenCV, a library of computer vision software algorithms originally designed for vision applications and research on PCs has recently expanded to support embedded processors and operating systems.

Intel started OpenCV in the mid 1990s as a method of demonstrating how to accelerate certain algorithms. In 2000, Intel released OpenCV to the open source community as a beta version, followed by v1.0 in 2006. Robot developer Willow Garage, founded in 2006, took over support for OpenCV in 2008 and immediately released v1.1. The company has been in the news a lot lately, subsequent to the unveiling of its PR2 robot.

OpenCV v2.0, released in 2009, contained many improvements and upgrades. Initially, the majority of algorithms in the OpenCV library were written in C, and the primary method of using the library was via a C API. OpenCV v2.0 migrated towards C++ and a C++ API. Subsequent versions of OpenCV added Python support, along with Windows, Linux, iOS and Android OS support, transforming OpenCV (currently at v2.3) into a cross-platform tool. OpenCV v2.3 contains more than 2,500 functions.

What can you do with OpenCV v2.3? Think of OpenCV as a box of 2,500 different food items. The chef's job is to combine the food items into a meal. OpenCV in itself is not the full meal; it contains the pieces required to make a meal. But the good news is that OpenCV includes a suite of recipes that provide examples of what it can do.

Experimenting with OpenCV

If you’d like to quickly do some hands-on experimentation with basic computer vision algorithms, without having to install any tools or do any programming, you’re in luck. BDTI has created the BDTI OpenCV Executable Demo Package, an easy-to-use software package that allows anyone with a Windows computer and a web camera to run a set of interactive computer vision algorithm examples built with OpenCV v2.3.

You can download the installer zip file from the Embedded Vision Alliance website, in the Embedded Vision Academy section  (free registration is required). The installer will place several prebuilt OpenCV applications on your computer, and you can run the examples directly from your Start menu. BDTI has also developed an online user guide and tutorial video  for the OpenCV demonstration package.

Examples named xxxxxxSample.bat use a video clip file as an input (example video clips are provided with the installation), while examples named xxxxxWebCamera.bat use a video stream from a web camera as an input. BDTI will periodically expand the BDTI OpenCV Executable Demo Package with additional OpenCV examples; keep an eye on the Embedded Vision Academy section of the Embedded Vision Alliance website for updates.

Developing Apps with OpenCV

The most difficult part of using OpenCV is building the library and configuring the tools. The OpenCV development team has made great strides in simplifying the OpenCV build process, but it can still be time consuming.

To make it as easy as possible to start using OpenCV, BDTI has also created the BDTI Quick-Start OpenCV Kit – a VMware virtual machine image that includes OpenCV and all required tools preinstalled, configured, and built. The BDTI Quick-Start OpenCV Kit makes it easy to quickly get OpenCV running and to start developing vision algorithms using OpenCV.

The BDTI Quick-Start OpenCV Kit image uses Ubuntu 10.04 as the operating system (Figure 1 below). The associated Ubuntu desktop is intuitive and easy to use. OpenCV 2.3.0 has been preinstalled and configured in the kit, along with the GNU C compiler and tools (GCC version 4.4.3). Various examples are included, along with a framework so you can get started creating your own vision algorithms immediately.

The Eclipse integrated development environment is also installed and configured for debugging OpenCV applications. Various Eclipse OpenCV examples are included to get you up and running quickly and to seed your own projects. And four example Eclipse projects are provided to seed your own projects.

Figure 1: The Ubuntu Desktop installed in the BDTI VMware virtual machine image

A USB webcam is required to use the examples provided in the BDTI Quick-Start OpenCV Kit. Logitech USB web cameras, specifically the Logitech C160 in conjunction with the free VMware Player for Windows and VMware Fusion for Mac OS X, have been tested with this virtual machine image. Be sure to install the drivers provided with the camera in Windows or whatever other operating system you use.

To get started, download the BDTI Quick-Start OpenCV Kit from the Embedded Vision Academy area of the Embedded Vision Alliance website (free registration required). BDTI has also created an online user guide for the Quick-Start OpenCV Kit.

OpenCV Examples

Two sets of OpenCV example programs are included in the BDTI Quick-Start OpenCV Kit. The first set is command-line based, the second set is Eclipse IDE based.

The command line examples include example makefiles to provide guidance for your own projects. The Eclipse examples are the same as the command-line examples but configured to build in the Eclipse environment. The source code is identical, but the makefiles are specialized for building OpenCV applications in an Eclipse environment.

The same examples are also provided in the previously mentioned BDTI OpenCV Executable Demo Package. Embedded vision involves, among other things, classifying groups of pixels in an image or video stream as either background or a unique feature.

Each of the examples demonstrates various algorithms that accomplish this goal using different techniques. Some of them use code derived from the book OpenCV 2 Computer Vision Application Programming Cookbook by Robert Laganiere (ISBN-10: 1849513244, ISBN-13: 978-1849513241).

A Motion Detection Example

As the name implies, motion detection uses the differences between successive frames to classify pixels as unique features (Figure 2 below). The algorithm considers pixels that do not change between frames as being stationary and therefore part of the background. Motion detection or background subtraction is a very practical and easy-to-implement algorithm.

In its simplest form, the algorithm looks for differences between two frames of video by subtracting one frame from the next. In the output display in this example, white pixels are moving, black pixels are stationary.

Figure 2: The user interface for the motion detection example, included in both the BDTI OpenCV Executable Demo Package (top) and the BDTI Quick-Start OpenCV Kit (bottom)

This example adds an additional element to the simple frame subtraction algorithm: a running average of the frames.The number of frames in the running average represents a length in time.

The LearnRate sets how fast the accumulator “forgets” about earlier images. The higher the LearnRate, the longer the running average. By setting LearnRate to 0, you disable the running average and the algorithm simply subtracts one frame from the next.[JB2] Increasing the LearnRate is useful for detecting slow moving motion.

The Threshold parameter sets the change level required for a pixel to be considered moving. The algorithm subtracts the current frame from the previous frame, giving a result. If the result is greater than the threshold, the algorithm displays a white pixel and considers that pixel to be moving.


  • LearnRate: Regulates the update speed (how fast the accumulator "forgets" about earlier images).
  • Threshold: The minimum value for a pixel difference to be considered moving.

The Line Detection Example

Line detection classifies straight edges in an image as features (Figure 3 below). The algorithm relegates to the background anything in the image that it does not recognize as a straight edge, thereby ignoring it. Edge detection is another fundamental function in computer vision.

Figure 3: The user interface for the line detection example, included in both the BDTI OpenCV Executable Demo Package (top) and the BDTI Quick-Start OpenCV Kit (bottom)

Image processing determines an edge by sensing close-proximity pixels of differing intensity. For example, a black pixel next to a white pixel defines a “hard” edge. A gray pixel next to a black (or white) pixel defines a “soft” edge.

The Threshold parameter sets a minimum on how hard an edge has to be in order for it to be classified as an edge. A Threshold of 255 would require a white pixel be next to a black pixel to qualify as an edge. As the Threshold value decreases, softer edges in the image are detected.

After the algorithm detects an edge, it must make a difficult decision: is this edge part of a straight line? The Hough transform, employed to make this decision, attempts to group pixels classified as edges into a straight line.

It uses the MinLength and MaxGap parameters to decide ("classify" in computer science lingo) a group of edge pixels into either a straight continuous line or ignored background information (edge pixels not part of a continuous straight line are considered background, and therefore not a feature).


  • Threshold: Sets the minimum difference between adjacent pixels to be classified as an edge.
  • MinLength: The minimum number of "continuous" edge pixels required to classify a potential feature as a straight line.
  • MaxGap: The maximum allowable number of missing edge pixels that still enable classification of a potential feature as a "continuous" straight line.

The Optical Flow Example

Optical flow estimates motion by analyzing how groups of pixels in the current frame changed position from the previous frame of a video sequence (Figure 4 below).

The "group of pixels" is a feature. Optical flow estimation finds use in predicting where objects will be in the next frame. Many optical flow estimation algorithms exist; this particular example uses the Lucas-Kanade approach. The algorithm's first step involves finding "good" features to track between frames. Specifically, the algorithm is looking for groups of pixels containing corners or points.

Figure 4: The user interface for the optical flow example, included in both the BDTI OpenCV Executable Demo Package (top) and the BDTI Quick-Start OpenCV Kit (bottom)

The qlevel variable determines the quality of a selected feature. Consistency is the end objective of using a lot of math to find quality features.

A "good" feature (group of pixels surrounding a corner or point) is one that an algorithm can find under various lighting conditions, as the object moves. The goal is to find these same features in each frame.

Once the same feature appears in consecutive frames, tracking an object is possible. The lines in the output video represent the optical flow of the selected features.

The MaxCount parameter determines the maximum number of features to look for. The minDist parameter sets the minimum distance between features. The more features used, the more reliable the tracking.

The features are not perfect, and sometimes a feature used in one frame disappears in the next frame. Using multiple features decreases the chances that the algorithm will not be able to find any features in a frame.


  • MaxCount: The maximum number of good features to look for in a frame.
  • qlevel: The acceptable quality of the features. A higher quality feature is more likely to be unique, and therefore to be correctly identified in the next frame. A low quality feature may get lost in the next frame, or worse yet may be confused with another feature in the image of the next frame.
  • minDist: The minimum distance between selected features.

The Face Detector Example

The face detector used in this example is based on the Viola-Jones feature detector algorithm (Figure 5 below). Throughout this article, we have been working with different algorithms for finding features; i.e. closely grouped pixels in an image or frame that are unique in some way.

The motion detector used subtraction of one frame from the next frame to find pixels that moved, classifying these pixel groups as features. In the line detector example, features were groups of edge pixels organized in a straight line. And in the optical flow example, features were groups of pixels organized into corners or points in an image.

Figure 5: The user interface for the face detector example, included in both the BDTI OpenCV Executable Demo Package (top) and the BDTI Quick-Start OpenCV Kit (bottom)

The Viola-Jones algorithm uses a discrete set of six Haar-like features (the OpenCV implementation adds additional features). Haar-like features in a 2D image include edges, corners, and diagonals. They are very similar to features in the optical flow example, except that detection of these particular features occurs via a different method.

As the name implies, the face detector example detects faces. Detection occurs within each individual frame; the detector does not track the face from frame to frame.

The face detector can also detect objects other than faces. An XML file "describes" the object to detect. OpenCV includes various Haar cascade XML files that you can use to detect various object types. OpenCV also includes tools to allow you to train your own cascade to detect any object you desire and save it as an XML file for use by the detector.


  • MinSize: The smallest face to detect. As a face gets further from the camera, it appears smaller. This parameter also defines the furthest distance a face can be from the camera and still be detected.
  • MinN: The “minimum neighbor” parameter combines, into a single detection, faces that are detected multiple times. The face detector actually detects each face multiple times in slightly different positions. This parameter simply defines how to group the detections together. For example, a MinN of 20 would group all detection within 20 pixels of each other as a single face.
  • ScaleF: Scale factor determines the number of times to run the face detector at each pixel location. The Haar cascade XML file that determines the parameters of the to-be-detected object is designed for an object of only one size.

In order to detect objects of various sizes (faces close to the camera as well as far away from the camera, for example) requires scaling the detector.

This scaling process has to occur at every pixel location in the image. This process is computationally expensive, but a scale factor that is too large will not detect faces between detector sizes.

A scale factor too small, conversely, can use a huge amount of CPU resources. You can see this phenomenon in the example if you first set the scale factor to its max value of 10. In this case, you will notice that as each face moves closer to or away from the camera, the detector will not detect it at certain distances.

At these distances, the face size is in-between detector sizes. If you decrease the scale factor to its minimum, on the other hand, the required CPU resources skyrocket, as shown by the prolonged detection time.

Detection Time Considerations

Each of these examples writes the detection time to the console while the algorithm is running. This time represents the number of milliseconds the particular algorithm took to execute.

A larger amount of time represents higher CPU utilization. The OpenCV library as built in these examples does not have hardware acceleration enabled; however OpenCV currently supports CUDA and NEON acceleration.

The intent of this article and accompanying software is to help you quickly get up and running with OpenCV. The examples discussed in this article represent only a miniscule subset of algorithms available in OpenCV; they were chosen because at a high level they represent a broad variety of computer vision functions.

Leveraging these algorithms in combination with, or alongside, other algorithms can help you solve various embedded vision problems in a variety of applications.

Stay tuned for future articles in this series on Embedded.com, which will both go into more detail on already-mentioned OpenCV library algorithms and introduce new algorithms (along with the BDTI OpenCV Executable Demo Package and the BDTI Quick-Start OpenCV Kit -based examples of the algorithms).


Embedded vision technology has the potential to enable a wide range of electronic products that are more intelligent and responsive than before, and thus more valuable to users. It can add helpful features to existing products.

And it can provide significant new markets for hardware, software and semiconductor manufacturers. The Embedded Vision Alliance, a worldwide organization of technology developers and providers of which BDTI is a founding member, is working to empower engineers to transform this potential into reality in a rapid and efficient manner.

More specifically, the mission of the Alliance is to provide engineers with practical education, information, and insights to help them incorporate embedded vision capabilities into products.

To execute this mission, the Alliance has developed a full-featured website, freely accessible to all and including (among other things) articles, videos, a daily news portal and a discussion forum staffed by a diversity of technology experts.

Registered website users can receive the Embedded Vision Alliance's e-mail newsletter; they also gain access to the Embedded Vision Academy, containing numerous training videos, technical papers and file downloads, intended to enable those new to the embedded vision application space to rapidly ramp up their expertise.

Eric Gregori is a Senior Software Engineer and Embedded Vision Specialist with Berkeley Design Technology, Inc. (BDTI), which povides analysis, advice, and engineering for embedded processing technology and applications. He is a robot enthusiast with over 17 years of embedded firmware design experience, with specialties in computer vision, artificial intelligence, and programming for Windows Embedded CE, Linux, and Android operating systems. Eric authored the Robot Vision Toolkit  and developed the RobotSee Interpreter. He is working towards his Masters in Computer Science and holds 10 patents in industrial automation and control.

(Editor’s Note: The Embedded Vision Alliance was launched in May of 2011, and now has 19 sponsoring member companies: AMD, Analog Devices, Apical, Avnet Electronics Marketing, BDTI, CEVA, Cognimem, CogniVue, eyeSight, Freescale, IMS Research, Intel, MathWorks, Maxim Integrated Products, NVIDIA, National Instruments, Omek Interactive, Texas Instruments, Tokyo Electron Device, and Xilinx.

The first step of the Embedded Vision Alliance was to launch a website at www.Embedded-Vision.com. The site serves as a source of practical information to help design engineers and application developers incorporate vision capabilities into their products. The Alliance’s plans include educational webinars, industry reports, and other related activities.

Everyone is free to access the resources on the website, which is maintained through member and industry contributions. Membership details are also available at the site. For more information, contact the Embedded Vision Alliance at info@embedded-vision.com and 1-510-451-1800.)