Embedded Vision Alliance: Technical Articles

Computer Vision Metrics: Bibliography

Bookmark and Share

Computer Vision Metrics: Bibliography

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


  1. Bajcsy, R. “Computer Description of Textured Surfaces.” International Conference on Artificial Intelligence, 1973.
  2. Bajcsy, R., and L. Lieberman. “Texture Gradient as a Depth Cue.” Computer Graphics and Image Processing 5, no. 1 (1976).
  3. Cross, G. R., and A. K. Jain. “Markov Random Field Texture Models.” PAMI 54, no. 1 (1983).
  4. Gonzalez R., and R. Woods. Digital Image Processing, 3rd ed. Englewood Cliffs, NJ: Prentice-Hall, 2007.
  5. Haralick, R. M. “Statistical and Structural Approaches to Texture.” Proceedings of the International Joint Conference on Pattern Recognition, 1979.
  6. Haralick, R. M., R. Shanmugan, and I. Dinstein. “Textural Features for Image Classification.” IEEE Transactions on. Systems, Man Cybernetics. SMC-3, no. 6 (1973).
  7. Hu, M. K. “Visual Pattern Recognition by Moment Invariants.” IRE Transactions on Information Theory, Volume: 8, Issue: 2, (1962)
  8. Lu, H. E., and K. S. Fu. “A Syntactic Approach to Texture Analysis.” Computer Graphics Image Processing 7, no. 3 (1978).
  9. Pratt, W. K. Digital Image Processing, 3rd ed. Hoboken, NJ: John Wiley, 2002.
  10. 10. Rosenfeld A., and A. C. Kak. Digital Picture Processing, 2nd ed. New York: Academic Press, 1982.
  11. Tomita, F., Y. Shirai, and S. Tsuji. “Description of Texture by a Structural Analysis.” Pattern Analysis and Machine Intelligence 4, no. 2 (1982).
  12. Wong, R. Y., and E. L. Hall. “Scene Matching with Invariant Moments.” Computer Graphics Image Processing 8 (1978).
  13. Guoying Zhao and Matti Pietikainen. “Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions.” Transactions of Pattern Analysis and Machine Intelligence (2007).
  14. Kellokumpu, Vili, Guoying Zhao, and Matti Pietikäinen. “Human Activity Recognition Using a Dynamic Texture Based Method.”
  15. Guoying Zhao and Matti Pietikäinen. Dynamic Texture Recognition Using Local Binary Patterns with an Application to Facial Expressions.” Pattern Analysis and Machine Intelligence 2007.
  16. Eichmann, G., and T. Kasparis. “Topologically Invariant Texture Descriptors.” Computer Vision, Graphics and Image Processing 41, no. 3 (March 1988).
  17. Lam, S. W. C., and H. H. S. Ip. “Structural Texture Segmentation Using Irregular Pyramid.” Pattern Recognition Letters 15, no. 7 (July 1994).
  18. Pietikäinen, Matti, Guoying Zhao, and Ahonen Hadid. Computer Vision Using Local Binary Patterns. New York: Springer, 2011.
  19. Ojala, T., M. Pietikäinen, and D. Hardwood. “Performance Evaluation of Texture Measures with Classification Based on Kullback Discrimination of Distributions.” Proceedings of the...

Computer Vision Metrics: Chapter One (Part D)

Bookmark and Share

Computer Vision Metrics: Chapter One (Part D)

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


For Part C of Chapter One, please click here.

Bibliography references are set off with brackets, i.e. "[XXX]". For the corresponding bibliography entries, please click here.


Correspondence

Correspondence, or feature matching, is common to most depth-sensing methods. For a taxonomy of stereo feature matching algorithms, see Scharstein and Szeliski [440]. Here, we discuss correspondence along the lines of feature descriptor methods and triangulation as applied to stereo, multi-view stereo, and structured light.

Subpixel accuracy is a goal in most depth-sensing methods, so several algorithms exist [468]. It’s popular to correlate two patches or intensity templates by fitting the surfaces to find the highest match; however, Fourier methods are also used to correlate phase [467, 469], similar to the intensity correlation methods.

For stereo systems, the image pairs are rectified prior to feature matching so that the features are expected to be found along the same line at about the same scale, as shown in Figure 1-11; descriptors with little or no rotational invariance are suitable [215, 120]. A feature descriptor such as a correlation template is fine, while a powerful method such as the SIFT feature description method [161] is overkill. The feature descriptor region may be a rectangle favoring disparity in the x-axis and expecting little variance in the y-axis, such as a rectangular 3x9 descriptor shape. The disparity is expected in the x-axis, not the y-axis. Several window sizing methods for the descriptor shape are used, including fixed size and adaptive size [440].

Multi-view stereo systems are similar to stereo; however, the rectification stage may not be as accurate, since motion between frames can include scaling, translation, and rotation. Since scale and rotation may have significant correspondence problems between frames, other approaches to feature description have been applied to MVS, with better results. A few notable feature descriptor methods applied to multi-view and wide baseline stereo include the MSER [194] method (also discussed in Chapter 6), which uses a blob-like patch, and the SUSAN [164, 165] method (also discussed in Chapter 6), which defines the feature based on an object region or segmentation with a known centroid or nucleus around which the feature exists.

For structured light systems, the type of light pattern will determine the feature, and correlation of the phase is a popular method [469]. For example, structured light methods that rely on phase-shift patterns using phase correlation [467]...

Computer Vision Metrics: Chapter One (Part C)

Bookmark and Share

Computer Vision Metrics: Chapter One (Part C)

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


For Part B of Chapter One, please click here.

Bibliography references are set off with brackets, i.e. "[XXX]". For the corresponding bibliography entries, please click here.


Time-of-Flight Sensors

By measuring the amount of time taken for infrared light to travel and reflect, a time-of-flight (TOF) sensor is created [450]. A TOF sensor is a type of range finder or laser radar [449]. Several single-chip TOF sensor arrays and depth camera solutions are available, such as the second version of the Kinect depth camera. The basic concept involves broadcasting infrared light at a known time into the scene, such as by a pulsed IR laser, and then measuring the time taken for the light to return at each pixel. Sub-millimeter accuracy at ranges up to several hundred meters is reported for high-end systems [449], depending on the conditions under which the TOF sensor is used, the particular methods employed in the design, and the amount of power given to the IR laser.

Each pixel in the TOF sensor has several active components, as shown in Figure 1-14, including the IR sensor well, timing logic to measure the round-trip time from illumination to detection of IR light, and optical gates for synchronization of the electronic shutter and the pulsed IR laser. TOF sensors provide laser range-finding capabilities. For example, by gating the electronic shutter to eliminate short round-trip responses, environmental conditions such as fog or smoke reflections can be reduced. In addition, specific depth ranges, such as long ranges, can be measured by opening and closing the shutter at desired time intervals.


Figure 1-14. A hypothetical TOF sensor configuration. Note that the light pulse length and sensor can be gated together to target specific distance ranges

Illumination methods for TOF sensors may use very short IR laser pulses for a first image, acquire a second image with no laser pulse, and then take the difference between the images to eliminate ambient IR light contributions. By modulating the IR beam with an RF carrier signal using a photonic mixer device (PMD), the phase shift of the returning IR signal can be measured to increase accuracy—which is common among many laser range-finding methods [450]. Rapid optical gating combined with intensified CCD sensors can be used to increase accuracy to the sub-millimeter range in limited...

Computer Vision Metrics: Chapter One (Part B)

Bookmark and Share

Computer Vision Metrics: Chapter One (Part B)

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


For Part A of Chapter One, please click here.

Bibliography references are set off with brackets, i.e. "[XXX]". For the corresponding bibliography entries, please click here.


2D Computational Cameras

Novel configurations of programmable 2D sensor arrays, lenses, and illuminators are being developed into camera systems as computational cameras [424,425,426], with applications ranging from digital photography to military and industrial uses, employing computational imaging methods to enhance the images after the fact. Computational cameras borrow many computational imaging methods from confocal imaging [419] and confocal microscopy [421, 420]—for example, using multiple illumination patterns and multiple focal plane images. They also draw on research from synthetic aperture radar systems [422] developed after World War II to create high-resolution images and 3D depth maps using wide baseline data from a single moving-camera platform. Synthetic apertures using multiple image sensors and optics for overlapping fields of view using wafer-scale integration are also topics of research [419]. We survey here a few computational 2D sensor methods, including high resolution (HR), high dynamic range (HDR), and high frame rate (HF) cameras.

The current wave of commercial digital megapixel cameras, ranging from around 10 megapixels on up, provide resolution matching or exceeding high-end film used in a 35mm camera [412], so a pixel from an image sensor is comparable in size to a grain of silver on the best resolution film. On the surface, there appears to be little incentive to go for higher resolution for commercial use, since current digital methods have replaced most film applications and film printers already exceed the resolution of the human eye.

However, very high resolution gigapixel imaging devices are being devised and constructed as an array of image sensors and lenses, providing advantages for computational imaging after the image is taken. One configuration is the 2D array camera, composed of an orthogonal 2D array of image sensors and corresponding optics; another configuration is the spherical camera as shown in Figure 1-8 [411, 415], developed as a DARPA research project at Columbia University CAVE.


Figure 1-8. (...

Computer Vision Metrics: Chapter One

Bookmark and Share

Computer Vision Metrics: Chapter One

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


Bibliography references are set off with brackets, i.e. "[XXX]". For the corresponding bibliography entries, please click here.


Image Capture and Representation

“The changing of bodies into light, and light into bodies, is very conformable to the course of Nature, which seems delighted with transmutations.”
—Isaac Newton

Computer vision starts with images. This chapter surveys a range of topics dealing with capturing, processing, and representing images, including computational imaging, 2D imaging, and 3D depth imaging methods, sensor processing, depth-field processing for stereo and monocular multi-view stereo, and surface reconstruction. A high-level overview of selected topics is provided, with references for the interested reader to dig deeper. Readers with a strong background in the area of 2D and 3D imaging may benefit from a light reading of this chapter.

Image Sensor Technology

This section provides a basic overview of image sensor technology as a basis for understanding how images are formed and for developing effective strategies for image sensor processing to optimize the image quality for computer vision.

Typical image sensors are created from either CCD cells (charge-coupled device) or standard CMOS cells (complementary metal-oxide semiconductor). The CCD and CMOS sensors share similar characteristics and both are widely used in commercial cameras. The majority of sensors today use CMOS cells, though, mostly due to manufacturing considerations. Sensors and optics are often integrated to create wafer-scale cameras for applications like biology or microscopy, as shown in Figure 1-1.


Figure 1-1. Common integrated image sensor arrangement with optics and color filters

Image sensors are designed to reach specific design goals with different applications in mind, providing varying levels of sensitivity and quality. Consult the manufacturer’s information to get familiar with each sensor. For example, the size and material composition of each photo-diode sensor cell element is optimized for a given semiconductor manufacturing process so as to achieve the best tradeoff between silicon die area and dynamic response for light intensity and color detection.

For computer vision, the effects of sampling theory are relevant—for example, the Nyquist frequency applied to pixel coverage of the target scene. The sensor resolution and optics together must provide adequate resolution for each...

Computer Vision Metrics: Introduction

Bookmark and Share

Computer Vision Metrics: Introduction

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


Dirt. This is a jar of dirt.
Yes.
...Is the jar of dirt going to help? If you don’t want it, give it back.
—Pirates Of The Carribean, Jack Sparrow and Tia Dalma

This work focuses on a slice through the field - Computer Vision Metrics – from the view of feature description metrics, or how to describe, compute and design the macro-features and micro-features that make up larger objects in images. The focus is on the pixel-side of the vision pipeline, rather than the back-end training, classification, machine learning and matching stages. This book is suitable for reference, higher-level courses, and self-directed study in computer vision. The book is aimed at someone already familiar with computer vision and image processing; however, even those new to the field will find good introductions to the key concepts at a high level, via the ample illustrations and summary tables.

I view computer vision as a mathematical artform and its researchers and practitioners as artists. So, this book is more like a tour through an art gallery rather than a technical or scientific treatise. Observations are provided, interesting questions are raised, a vision taxonomy is suggested to draw a conceptual map of the field, and references are provided to dig deeper. This book is like an attempt to draw a map of the world centered around feature metrics, inaccurate and fuzzy as the map may be, with the hope that others will be inspired to expand the level of detail in their own way, better than what I, or even a few people, can accomplish alone. If I could have found a similar book covering this particular slice of subject matter, I would not have taken on the project to write this book.

What is not in the Book

Readers looking for computer vision “‘how-to”’ source code examples, tutorial discussions, performance analysis, and short-cuts will not find them here, and instead should consult the well-regarded http://opencv.org library resources, including many fine books, online resources, source code examples, and several blogs. There is nothing better than OpenCV for the hands-on practitioner. For this reason, this book steers a clear path around duplication of the “how-to” materials already provided by the OpenCV community and elsewhere, and instead provides a counterpoint discussion, including a comprehensive survey, analysis and taxonomy of methods. Also, do not expect all computer vision topics to be covered deeply with proofs and performance analysis, since the bibliography references cover these matters quite well: for example, machine learning, training and classification methods are only lightly introduced, since the focus here is on the feature metrics.

In summary, this book is about the feature metrics, showing “‘what”’ methods practitioners are using, with detailed observations and analysis of...

Mobile Photography's Developing Image

Bookmark and Share

Mobile Photography's Developing Image

A version of this article was originally published at EE Times' Embedded.com Design Line. It is reprinted here with the permission of EE Times.

Still photos and videos traditionally taken with standalone cameras are increasingly being captured by camera-inclusive smartphones and tablets instead. And the post-capture processing that traditionally required a high-end computer and took a lengthy amount of time can now also take place near-immediately on a mobile electronics device, thanks to the rapidly improving capabilities of embedded vision technology.

Michael McDonald
President, Skylane Technology Consulting
Consultant, Embedded Vision Alliance

Next time you take a "selfie" or a shot of that great meatball entrée you're about to consume at the corner Italian restaurant, you will be contributing to the collection of 880 billion photos that Yahoo expects will be taken in 2014. Every day, Facebook users upload 350 million pictures and Snapchat users upload more than 400 million images. Video is also increasingly popular, with sites like YouTube receiving 100 hours of video every minute. These statistics, forecasted to increase further in the coming years, are indicative of two fundamental realities: people like to take pictures and shoot video, and the increasingly ubiquitous camera phone makes it easy to do so. Cell phone manufacturers have recognized this opportunity, and their cameras' capabilities are increasingly becoming a significant differentiator between models, therefore a notable investment target.

However, image sensor technology is quickly approaching some fundamental limits. The geometries of the sensor pixels are approaching the wavelengths of visible light, making it increasingly difficult to shrink their dimensions further. For example, latest-generation image sensors are constructed using 1,100 nm pixels, leaving little spare room to capture red-spectrum (~700 nm wavelength) light. Also, as each pixel's silicon footprint shrinks, the amount of light it is capable of capturing and converting to a proportional electrical charge also decreases. This decrease in sensitivity increases noise in low-light conditions, and decreases the dynamic range – the ability to see details in shadows and bright areas of images. Since smaller pixel sensors can capture fewer photons, each photon has a more pronounced impact on each pixel’s brightness.

Resolution Ceilings Prompt Refocus On Other Features

Given the challenges of further increases in image sensor resolution, camera phone manufacturers appear reluctant to further promote the feature. As a case in point, Apple’s advertising for the latest iPhone 5s doesn’t even mention resolution, instead focusing generically on image quality and other camera features. Many of these features leverage computational photography - using increasing processing power and sophisticated vision algorithms to make better photographs. After taking pictures, such advanced camera phones can edit them in such a way that image flaws – blur, low light, color fidelity, etc – are eliminated. In addition, computational photography enables brand new applications, such as reproducing a photographed object on a 3D printer, automatically labeling pictures so that you can easily find them in the future, or easily removing that person who walked in front of you while you were taking an otherwise perfect picture.

High Dynamic Range (HDR) is an example of computational photography that is now found on many camera phones. A camera without this capability may be hampered by images with under- and/or over-exposed regions. It can be difficult, for example, to capture the detail found in a shadow without making the sky look pure white. Conversely, capturing detail in the sky can make shadows pitch black. With HDR, multiple pictures are taken at different exposure settings, some optimized for bright regions of the image (such as the sky in the example), while others are optimized for dark areas (i.e. shadows). HDR algorithms then select and combine the best details of these multiple pictures, using them to synthesize a new image that captures the nuances of the clouds in the sky and the details in the shadows. This merging also needs to comprehend and compensate for between-image camera and subject movement, along with determining the optimum amount of light for different portions of the image (Figure 1).


Figure 1. An example of a high dynamic range image created by combining images captured using different exposure times.

Another example of computational photography is Super Resolution, wherein multiple images of a given scene are algorithmically combined, resulting in a final image that delivers finer detail than that present in any of the originals. A similar technique can be used to transform multiple poorly lit images into a higher-quality well-illuminated image. Movement between image frames, either caused by the camera or subject, increases the Super Resolution implementation challenge, since the resultant motion blur must be correctly differentiated from image noise and appropriately compensated for. In such cases, more intelligent (i.e. more computationally intensive) processing, such as object tracking, path prediction, and action identification, is required in order to combine pixels that might be in different locations of each image frame.

Multi-Image Combination and Content Subtraction

Users can now automatically "paint" a panorama image simply by capturing sequential frames of the entire scene from top to bottom and left to right, which are subsequently "stitched" together by means of computational photography algorithms. By means of this technique, the resolution of the resultant panorama picture will far exceed the native resolution of the camera phone's image sensor. As an example, a number of highly zoomed-in still pictures can be aggregated into a single image that shows a detailed city skyline, with the viewer then being able to zoom in and inspect the rooftop of a specific building (Figure 2). Microsoft (with its PhotoSynth application), CloudBurst Research, Apple (with the panorama feature built into latest-generation iPhones), and GigaPan are examples of companies and products that expose the potential of these sophisticated panorama algorithms, which do pattern matching and aspect ratio conversion as part of the "stitching" process.


Figure 2. The original version of this "stitched" panorama image is 8 gigapixels in size, roughly 1000x the resolution of a normal camera, and allows you to clearly view the people walking on the street when you zoom into it.

Revolutionary capabilities derived from inpainting, the process of reconstructing lost or deteriorated parts of images and videos, involve taking multiple pictures from slightly different perspectives and comparing them in order to differentiate between objects. Undesirable objects such as the proverbial "photobomb," identified via their location changes from frame to frame, can then be easily removed (Figure 3). Some replacement schemes sample the (ideally uniform) area surrounding the removed object (such as a grassy field or blue sky) and use it to fill in the region containing the removed object. Other approaches use pattern matching and change detection techniques to fill in the resulting image "hole". Alternatively, and similar to the green-screen techniques used in making movies, you can use computational photography algorithms to automatically and seamlessly insert a person or object into a still image or real-time video stream. The latter approach is beneficial in advanced videoconferencing setups, for example.


Figure 3. Object replacement enables the removable of unwanted items in a scene, such as this "photobombing" seal.

Extrapolating the Third Dimension

The ability to use computational photography to obtain "perfect focus" anywhere in a picture is being pursued by companies such as Lytro, Pelican Imaging, and Qualcomm. A technique known as plenoptic imaging involves taking multiple simultaneous pictures of the same scene, with the focus point for each picture set to a different distance. The images are then combined, placing every part of the final image in focus – or not – as desired (Figure 4). As a byproduct of this computational photography process, the user is also able to obtain a complete depth map for 3D image generation purposes, useful for 3D printing and other applications.




Figure 4. Plenoptic cameras enable you to post-capture refocus on near (top), mid (middle), and far (bottom) objects, all within the same image.

Homography is another way of building up a 3D image with a 2D camera using computational photography. In this process, a user moves the camera around and sequentially takes a series of shots of an object and/or environment from multiple angles and perspectives. The subsequent processing of the various captured perspectives is an extrapolation of the stereoscopic processing done by our eyes, and is used to determine depth data. The coordinates of the different viewpoints can be known with high precision thanks to the inertia (accelerometer, gyroscope) and location (GPS, Wi-Fi, cellular triangulation, magnetometer, barometer) sensors now built into mobile phones. By capturing multiple photos at various locations, you can assemble a 3D model of a room you're in, with the subsequent ability to virtually place yourself anywhere in that room and see things from that perspective. Google's recently unveiled Project Tango smartphone exemplifies the 3D model concept. The resultant 3D image of an object can also feed a 3D printer for duplication purposes.

Capturing and Compensating for Action

Every home user will soon have the ability to shoot a movie that's as smooth as that produced by a Hollywood cameraman (Figure 5). Computational photography can be used to create stable photos and videos, free of the blur artifacts that come from not holding the camera steady. Leveraging the movement information coming from the previously mentioned sensors, already integrated into cell phones, enables motion compensation in the final image or video. Furthermore, promising new research is showing that video images can be stabilized in all three dimensions, with the net effect of making it seem like the camera is smoothly moving on rails, for example, even if the photographer is in fact running while shooting the video. 


FIgure 5. Motion compensation via vision algorithms enables steady shots with far less equipment than previously required with conventional schemes.

Cinemagraphs are an emerging new art medium that blends otherwise still photographs with minute movements. Think of a portrait where a person's hair is blowing or the subject's eye periodically winks, or a landscape shot that captures wind effects (Figure 6). The Cinemagraphs website offers some great examples of this new type of photography. Action, a Google Auto-Awesome feature, also enables multiple points of time to be combined and displayed in one image. In this way, the full range of motion of a person jumping, for example – lift-off, in flight, and landing – or a bird flying, or a horse running can be captured in a single shot (Figure 7).


Figure 6. In this cinemagraph created using the animated GIF format, the reed grass is moving in the wind, while the rest of the image is static.



Figure 7. This Google-provided example shows how you can combine multiple shots (top) into a single image (bottom) to show motion over time.

Other advanced features found in the wildly popular GoPro and its competitors are finding their way to your cell phone, as well. Leading-edge camera phones such as Apple's iPhone 5s offer the ability to manipulate time in video; you can watch a person soar off a ski jump in slow motion, or peruse an afternoon's worth of skiing footage in just a few minutes, and even do both in the same video. Slow-motion capture in particular requires faster capture frame rates and high image processing requirements, as well as significantly larger storage requirements (Figure 8). As an alternative to consuming local resources, these needs align well with the increasingly robust ability to wirelessly stream high-resolution captured video to the "cloud" for remote processing and storage.


Figure 8. Extreme slow motion photography creates interesting new pictures and perspectives.

The ability to stream high quality video to the cloud is also enabling improved time-lapse photography and "life logging". In addition to the commercial applications of these concepts, such as a law enforcement officer or emergency professional recording a response situation, personal applications also exist, such as an Alzheimer's patient using video to improve memory recall or a consumer enhancing the appeal of an otherwise mundane home movie. The challenge with recording over extended periods of time is to intelligently and efficiently identify significant and relevant events to include in the video, labeling them for easier subsequent search. Video surveillance companies have already created robust analytics algorithms to initiate video recording based on object movement, face detection, and other "triggers". These same techniques will soon find their way into your personal cell phone.

Object Identification and Reality Augmentation

The ubiquity of cell phone cameras is an enabler for new cell phone applications. For example, iOnRoad – acquired in 2013 by Harman – created a cell phone camera-based application to enable safer driving. With the cell phone mounted to the vehicle dashboard, it recognizes and issues alerts for unsafe driving conditions – following too closely, leaving your lane, etc – after analyzing the captured video stream. Conversely, when not driving, a user might enjoy another application called Vivino, which recognizes wine labels using image recognition algorithms licensed from a company called Kooaba.

Plenty of other object identification applications are coming to market. In early February, for example, Amazon added the "Flow" feature to its mobile app, which identifies objects (for sale in stores, for example) you point the cell phone's camera at, tells you how much Amazon is selling the item for, and enables you to place an order then and there. This is one of myriad ways that the camera in a cell phone (or integrated into your glasses or watch, for that matter) will be able to identify objects and present the user with additional information about them, a feature known as augmented reality. Applications are already emerging that identify and translate signs written in foreign languages, enable a child to blend imaginary characters with real surroundings to create exciting new games, and countless other examples (Figure 9).


FIgure 9. This augmented reality example shows how the technology provides additional information about objects in an image.

And then of course there's perhaps the most important object of all, the human face. Face detection, face recognition and other facial analysis algorithms represent one of the hottest areas of industry investment and development. Just in the past several years, Apple acquired Polar Rose, Google acquired Neven Vision, PittPatt, and Viewdle, Facebook acquired Face.com, and Yahoo acquired IQ Engines. While some of these solutions implement facial analysis "in the cloud", other mobile-based solutions will eventually automatically tag individuals as their pictures are taken. Other facial analysis applications detect that a person is smiling and/or that their eyes are open, triggering the shutter action at that precise moment (Figure 10).


Figure 10. Face detection and recognition and other facial analysis capabilities are now available on many camera phones.

For More Information

The compute performance required to enable computational photography and its underlying vision processing algorithms is quite high. Historically, these functions and solutions have been exclusively deployed on more powerful desktop systems; mobile architectures had insufficient performance or a power budget that limited how much they were able to do. However, with the increased interest and proliferation of these vision and computational photography functions, silicon architectures are being optimized to run these algorithms more efficiently and with lower power. The next article in this series will take a look at some of the emerging changes in processors and sensors that will make computational photography capabilities increasingly prevelant in the future.

Computational photography is one of the key applications being enabled and accelerated by the Embedded Vision Alliance, a worldwide organization of technology developers and providers. Embedded vision refers to the implementation vision technology in mobile devices, embedded systems, special-purpose PCs, and the "cloud". First and foremost, the Alliance's mission is to provide engineers with practical education, information, and insights to help them incorporate embedded vision capabilities into new and existing products. To execute this mission, the Alliance maintains a website providing tutorial articles, videos, code downloads and a discussion forum staffed by technology experts. Registered website users can also receive the Alliance’s twice-monthly email newsletter, among other benefits.

In addition, the Embedded Vision Alliance offers a free online training facility for embedded vision product developers: the Embedded Vision Academy. This area of the Alliance website provides in-depth technical training and other resources to help engineers integrate visual intelligence into next-generation embedded and consumer devices. Course material in the Embedded Vision Academy spans a wide range of vision-related subjects, from basic vision algorithms to image pre-processing, image sensor interfaces, and software development techniques and tools such as OpenCV. Access is free to all through a simple registration process.

The Alliance also holds Embedded Vision Summit conferences in Silicon Valley. Embedded Vision Summits are technical educational forums for product creators interested in incorporating visual intelligence into electronic systems and software. They provide how-to presentations, inspiring keynote talks, demonstrations, and opportunities to interact with technical experts from Alliance member companies. These events are intended to:

  • Inspire attendees' imaginations about the potential applications for practical computer vision technology through exciting presentations and demonstrations.
  • Offer practical know-how for attendees to help them incorporate vision capabilities into their hardware and software products, and
  • Provide opportunities for attendees to meet and talk with leading vision technology companies and learn about their offerings.

The most recent Embedded Vision Summit was held in May 2014, and a comprehensive archive of keynote, technical tutorial and product demonstration videos, along with presentation slide sets, is available on the Alliance website. The next Embedded Vision Summit will take place on April 30, 2015 in Santa Clara, California; please reserve a spot on your calendar and plan to attend.

Michael McDonald is President of Skylane Technology Consulting, which provides marketing and consulting services to companies and startups primarily in vision, computational photography, and ADAS markets. He has over 20 years of experience working for technology leaders including Broadcom, Marvell, LSI, and AMD. He is hands-on and has helped define everything from hardware to software, and silicon to systems. Michael has previously managed $200+M businesses, served as a general manager, managed both small and large teams, and executed numerous partnerships and acquisitions. He has a BSEE from Rice University.

Industrial Automation and Embedded Vision: A Powerful Combination

Bookmark and Share

Industrial Automation and Embedded Vision: A Powerful Combination

Register or sign in to access the Embedded Vision Academy's free technical training content.

The training materials provided by the Embedded Vision Academy are offered free of charge to everyone. All we ask in return is that you register, and tell us a little about yourself so that we can understand a bit about our audience. As detailed in our Privacy Policy, we will not share your registration information, nor contact you, except with your consent.

Registration is free and takes less than one minute. Click here to register, and get full access to the Embedded Vision Academy's unique technical training content.

If you've already registered, click here to sign in.

See a sample of this page's content below:


A version of this article was originally published at InTech Magazine. It is reprinted here with the permission of the International Society of Automation.

In order for manufacturing robots and other industrial automation systems to meaningfully interact with the objects they're assembling, as well as to deftly and safely move about in their environments, they must be able to see and understand their surroundings. Cost-effective and capable vision processors, fed by depth-discerning image sensors and running robust software algorithms, are transforming longstanding autonomous and adaptive industrial automation aspirations into reality.

By Michael Brading
Automotive and Industrial Business Unit Chief Technology Officer
Aptina Imaging

Brian Dipert
Editor-in-Chief
Embedded Vision Alliance

Tim Droz
Vice President and General Manager
SoftKinetic North America

Pedro Gelabert
Senior Member of the Technical Staff and Systems Engineer
Texas Instruments

Carlton Heard
Product Marketing Manager – Vision Hardware & Software
National Instruments

Yvonne Lin
Marketing Manager – Medical & Industrial Imaging
Xilinx

Thomas Maier
Sales & Business Development – Time-of-Flight Sensors
Bluetechnix

Manjunath Somayaji
Staff Imaging Scientist
Aptina Imaging

and Daniël Van Nieuwenhove
Chief Technical Officer
SoftKinetic

Automated systems in manufacturing line environments are capable of working more tirelessly, faster, and more exactly than do their human forebears. However, their success has traditionally been predicated on incoming parts arriving in fixed orientations and locations, thereby increasing manufacturing process complexity. Any deviation in part position and/or orientation will result in assembly failures. Humans use their eyes (along with other senses) and brains to understand and navigate through the world around them. Robots and other industrial automation systems should be able to do the same thing, leveraging camera assemblies, vision processors, and various software algorithms in order to skillfully adapt to evolving manufacturing line circumstances, as well as to extend vision processing's benefits to other areas of the supply chain, such as piece parts and finished goods inventory tracking.

Historically, such vision-augmented technology has typically only been found in a short list of complex, expensive systems. However, cost, performance and power consumption advances in digital integrated circuits are now paving the way for the...

Realizing the Benefits of GPU Compute for Real Applications with Mali GPUs

This article was originally published at ARM's Community site. It is reprinted here with the permission of ARM.

It’s Tegra K1 Everywhere at Google I/O

This article was originally published at NVIDIA's blog. It is reprinted here with the permission of NVIDIA.

You couldn’t get very far at Google I/O’s dazzling kickoff today without bumping into our new Tegra K1 mobile processor.