Embedded Vision Alliance: Technical Articles

Designing Visionary Mobile Apps Using the Tegra Android Development Pack

Bookmark and Share

Designing Visionary Mobile Apps Using the Tegra Android Development Pack

by Shalini Gupta
This article was originally published at EE Times' Embedded.com Design Line. It is reprinted here with the permission of EE Times.

NVIDIA makes life easier for Android developers by providing all of the software tools needed to develop for Android (see "Developing OpenCV Computer Vision Apps for the Android Platform") on NVIDIA’s Tegra platform in a single easy-to-install TADP (Tegra Android Development Pack). The TADP is available for Windows, Mac OS X, Ubuntu Linux 32-bit and Ubuntu Linux 64-bit. It is targeted for Tegra devices, but will set up your development environment for any Android device.

You can obtain the TADP by registering on NVIDIA’s Developer Zone website and by applying for the Tegra Registered Developer Program. Once approved, typically within 48 hours, you’ll be able to access the TADP by logging into your NVIDIA Developer Zone account and by following the links to DEVELOPER CENTERS > Mobile > TEGRA ANDROID DEVELOPMENT PACK.

Every 1-2 months, Nvidia publishes incremental updates to the TADP containing newer versions of its components. Major updates are published about once a year. These updates may contain a completely restructured package with new components.

OpenCV for Tegra

Beginning with TADP 2.0, NVIDIA provides a software development kit (SDK) for OpenCV for Tegra. OpenCV for Tegra is a fully compatible, optimized backend implementation of OpenCV for Android, which runs 2-20x faster on Tegra 3-based devices.

The OpenCV for Tegra SDK in the TADP contains:

  • OpenCV for Tegra libraries in binary form
  • Pre-configured Android Eclipse example projects for OpenCV for Tegra
  • Step-by-step documentation on How to Use OpenCV for Tegra, including details of its functionality, and
  • An OpenCV for Tegra Demo App, which demonstrates the optimization that can be achieved with OpenCV for Tegra vs. the un-optimized OpenCV for Android library on Tegra 3-based devices .

The OpenCV for Tegra library is packaged into a single Android Application Package (OpenCV_2.4.2_binary_pack_tegra3.apk) located in the OpenCV-2.4.2-Tegra-sdk/apk/ folder of the TADP installation directory. You can also download this .apk, free of cost, from the Google Play store. Once installed, the OpenCV for Tegra libraries are ready to be dynamically linked into your Android applications.

If your application uses the OpenCV Manager service, it will automatically search for and load the OpenCV for Tegra library on your Tegra 3-based device at run time. If the OpenCV for Tegra library is absent on your device, its .apk will either be downloaded automatically from the Google Play store, or, if the latter is inaccessible, the user will be prompted to install the .apk manually.

Alternatively, the native (C/C++) versions of OpenCV for Tegra libraries are also availablein binary form in the OpenCV-2.4.2-Tegra-sdk/sdk/native/libs/tegra3 folder of your TADP installation directory. You can link them statically into the native part of your Android applications by appropriately including the path to the OpenCV-tegra3.mk file in your application’s Android.mk file:

include ../../sdk/native/jni/OpenCV-tegra3.mk

By default, all the Eclipse OpenCV examples in the TADP come pre-configured to use the OpenCV for Tegra libraries (via static or dynamic linking). Note that if the Tegra optimized libraries are statically linked in, they are not guaranteed to run on non-Tegra 3-based device. Hence, it is advisable to always use dynamic linking via the OpenCV Manager service in your published apps to ensure their compatibility on different hardware platforms.

To determine whether the Tegra optimized functions are indeed being used in your application, check for the presence of the following message in the Android systems logs. You can use the adb logcat command to write out the system logs.

E/OpenCV_for_Tegra(28465): Tegra platform detected, optimizations are switched ON!

Additionally, the OpenCV Manager installed on your device will display the information about your detected Tegra hardware (Figure 1).

Figure 1: The OpenCV Manager displays information about your detected Tegra hardware

To check out the speedups you can get with the OpenCV for Tegra, download and run the OpenCV for Tegra Demo App. You can download it from Google Play or find its OpenCVBenchmark.apk in the OpenCV-2.4.2-Tegra-sdk/apk/ folder of your TADP installation directory. The app continuously grabs preview frames from the camera, processes them in an asynchronous processing thread, and displays the processed frames.

Different processing modes (medianBlur, GaussianBlur, etc.) can be selected from the menu options. OpenCV for Tegra optimizations can be turned on or off by touching on the NVIDIA logo on the lower right of the display. When Tegra optimizations are enabled, the NVIDIA logo will turn green. Check the information displayed in the upper part of the display for your hardware's performance statistics.

Shalini Gupta is a Senior Mobile Computer Vision Engineer at NVIDIA. Her previous experience includes two years as an Imaging and Architecture Scientist at Texas Instruments, along with work at AT&T Laboratories and Advanced Digital Imaging Research, LLC (where she developed successful novel algorithms for 3D facial recognition). Shalini obtained her Bachelors Degree in Electronics and Electrical Communication Engineering at Punjab Engineering College, and her Masters Degree and Ph.D. in Electrical and Computer Engineering at the University of Texas at Austin.

Developing OpenCV Computer Vision Apps for the Android Platform

Bookmark and Share

Developing OpenCV Computer Vision Apps for the Android Platform

By Eric Gregori
This article was originally published at EE Times' Embedded.com Design Line. It is reprinted here with the permission of EE Times.

You now can hold in the palm of your hand computing power that required a desktop PC form factor just a decade ago. And with its contributions to the development of open-source OpenCV4Android, NVIDIA has brought the power of the OpenCV computer vision library to the smartphone and tablet.

The tools described in this article provide a unique implementation opportunity to do robust development on low-cost and full-featured hardware and software, for computer vision experimenters, academics, and professionals alike.

The present and future of embedded vision

Beginning in 2011, the number of smartphones shipped exceeded shipments of client PCs (netbooks, nettops, notebooks and desktops). [1] And nearly three quarters of all smartphones sold worldwide during the third quarter of 2012 were based on Google’s Android operating system. [2] Analysts forecast that more than 1 billion smartphones and tablets will be purchased in 2013. [3]

Statistics such as these suggest that smartphones and tablets, particularly those based on Android, are becoming the dominant computing platforms. This trend is occurring largely because the performance of these devices is approaching that of laptops and desktops. [4] Reflective of smartphone and tablet ascendance, Google announced that as of September 2012 more than 25 billion apps had been downloaded from Google's online application store, Google Play (Figure 1). [5]

Figure 1: Google's Play online application store has experienced dramatic usage growth

The term “embedded vision” refers to the use of computer vision technology in systems other than conventional computers. Stated another way, “embedded vision” refers to non-computer systems that extract meaning from visual inputs. Vision processing algorithms were originally only capable of being implemented on costly, bulky, and power-hungry high-end computers. As a result, computer vision has historically been primarily confined to a scant few application areas such as factory automation and military equipment.

However, mainstream systems such as smartphones and tablets now include SoCs with dual- and quad-core GHz+ processors, along with integrated GPUs containing abundant processing capabilities and dedicated image processing function blocks. Some SoCs even embed general-purpose DSP cores. The raw computing power in the modern-day mobile smartphone and tablet is now at the point where the implementation of embedded vision is not just possible but practical. [6] NVIDIA recognized the power of mobile embedded vision in 2010 and began contributing to the OpenCV computer vision library, to port code originally intended for desktop PCs to the Android operating system.


The OpenCV (Open Source Computer Vision) Library was created to provide a common resource for diverse computer vision applications and to accelerate the use of computer vision in everyday products. [7] OpenCV is distributed under the liberal BSD open-source license and is free for commercial or research use.

Top universities and Fortune 100 companies, among other sources, develop and maintain the 2,500 algorithms contained in the library. OpenCV is written in C and C++, but the Applications Interface also includes "wrappers" for Java, MATLAB and Python. Community-supported ports currently exist for Windows, Linux, Mac OS X, Android, and iOS platforms. [8]

The OpenCV library has been downloaded more than five million times and is popular with both academics and design engineers. Quoting from the Wiki for the Google Summer of Code 2010 OpenCV project (http://opencv.willowgarage.com/wiki/GSOC_OpenCV2010), “OpenCV is used extensively by companies, research groups, and governmental bodies. Some of the companies that use OpenCV are Google, Yahoo, Microsoft, Intel, IBM, Sony, Honda, and Toyota. Many startups such as Applied Minds, VideoSurf, and Zeitera make extensive use of OpenCV. OpenCV's deployed uses span the range from stitching Streetview images together, detecting intrusions in surveillance video in Israel, monitoring mine equipment in China (more controversially, OpenCV is used in China's "Green Dam" internet filter), helping robots navigate and pick up objects at Willow Garage, detection of swimming pool drowning events in Europe, running interactive art in Spain and New York, checking runways for debris, inspecting labels on products in factories around the world on to rapid face detection in Japan.”

With the creation of the OpenCV Foundation in 2012, OpenCV has a new face and a new infrastructure [9, 10]. It now encompasses more than 40 different "builders", which test OpenCV in various configurations on different operating systems, both mobile and desktop (a “builder” is an environment used to build the library under a specific configuration. The goal is to verify that the library builds correctly under the specified configuration). A "binary compatibility builder" also exists, which evaluates binary compatibility of the current snapshot against the latest OpenCV stable release, along with a documentation builder that creates reference manuals and uploads them to the OpenCV website.


OpenCV4Android is the official name of the Android port of the OpenCV library. [11] OpenCV began supporting Android in a limited "alpha" fashion in early 2010 with OpenCV version 2.2. [12] NVIDIA subsequently joined the project and accelerated the development of OpenCV for Android by releasing OpenCV version 2.3.1 with beta Android support. [13] This initial beta version included an OpenCV Java API and native camera support. The first official non-beta release of OpenCV for Android was in April 2012, with OpenCV 2.4. At the time of this article's publication, OpenCV 2.4.3 has just been released, with even more Android support improvements and features, including a Tegra 3 SoC-accelerated version of OpenCV.

OpenCV4Android supports two languages for coding OpenCV applications to run on Android-based devices. [14] The easiest way to develop your code is to use the OpenCV Java API. OpenCV exposes most (but not all) of its functions to Java. This incomplete implementation can pose problems if you need an OpenCV function that has not yet received Java support. Before choosing to use Java for an OpenCV project, then, you should review the OpenCV Java API for functions your project may require.

OpenCV Java API

With the Java API, each ported OpenCV function is "wrapped" with a Java interface. The OpenCV function itself, on the other hand, is written in C++ and compiled. In other words, all of the actual computations are performed at a native level. However, the Java wrapper results in a performance penalty in the form of JNI (Java Native Interface) overhead, which occurs twice: once at the start of each call to the native OpenCV function and again after each OpenCV call (i.e. during the return). This performance penalty is incurred for every OpenCV function called; the more OpenCV functions called per frame, the bigger the cumulative performance penalty.

Although applications written using the OpenCV Java API run under the Android Dalvik virtual machine, for many applications the performance decrease is negligible. Figure 2 shows an OpenCV for Android application written using the Java API. This application calls three OpenCV functions per video frame. The ellipses highlight each cluster of two JNI penalties (entry and exit); this particular application will incur six total JNI call penalties per frame.

Figure 2: When using the Java API, each OpenCV function call incurs JNI overhead, potentially decreasing performance

A slightly more difficult but more performance optimized development method uses the Android NDK (Native Development Kit). In this approach, the OpenCV vision pipeline code is written entirely in C++, with direct calls to OpenCV. You simply encapsulate all of the OpenCV calls in a single C++ class, calling it once per frame. With this method, only two JNI call penalties are incurred per frame, so the per-frame JNI performance penalty is significantly reduced. Java is still used for non-vision portions of the application, including the GUI.

Using this method, you first develop and test the OpenCV implementation of your algorithm on a host platform. Once your code works the way you want it to, you simply copy the C++ implementation into an Android project and rebuild it using the Android tools. You can also easily port the C++ implementation to another platform such as iOS by rebuilding it with the correct tools in the correct environment.

Figure 3 shows the same OpenCV for Android application as in Figure 2, but this time written using the native C++ API. It calls three OpenCV functions per video frame. The ellipse highlights the resulting JNI penalties. The OpenCV portion of the application is written entirely in C++, thereby incurring no JNI penalties between OpenCV calls. Using the native API for this application reduces the per-video frame JNI penalties from six to two.

Figure 3: Using native C++ to write the OpenCV portion of your application reduces JNI calls, optimizing performance

Figure 4 shows the OpenCV4Android face detection demo running on a NVIDIA Tegra 3-based Nexus 7 tablet. The demo has two modes: Java mode uses the Java API, while native mode uses the C++ API. Note the frame rates in the two screen snapshots: in this case, the OpenCV face detection Java API is performing at the same frame rate as the C++ API.

Figure 4: A face detection algorithm implemented using OpenCV4Android delivers identical performance in Java mode (a) and native mode (b): 7.55 fps

How is this possible, considering the previously discussed performance discrepancies between Java and C++ API development? Unlike the previous application, this example implements only one function call (detectMultiScale()) per frame, for face detection purposes. Calling only a single OpenCV function, regardless of whether you're using the native C++ or Java API, incurs the same two JNI call penalties.

In this case, the slight difference in performance is most likely the result of the number of parameters that have to pass through the JNI interface. The native C++ face detector call only has two parameters; the remainder are passed during the initialization phase. The Java API detectMultiScale() call, on the other hand, passes seven parameters through the JNI interface.

The OpenCV Manager

With the release of OpenCV version 2.4.2, NVIDIA introduced the OpenCV Manager, an application that can be downloaded and installed on an Android device from the Google Play Store or directly installed using the Tegra Android Development Pack (Figure 5). Once installed, the OpenCV Manager manages the OpenCV libraries on the device, automatically updating them to the latest versions and selecting the one that is optimized for the device.

Figure 5: NVIDIA's OpenCV Manager is available for download from the Google Play store

The best practice for Android OpenCV applications is to use dynamically linked versions of the OpenCV libraries. Said another way, the OpenCV libraries should not be a part of the application (i.e., statically linked); instead, they should be dynamically linked (i.e., runtime linked) when the application is executed. The biggest advantage of dynamically linking the application to an OpenCV library involves updates. If an application is statically linked to the OpenCV library, the library and application must be updated together. Therefore, every time a new version of OpenCV is released, if the application is dependent on changes in the new version (bug fixes, for example), the application must also be upgraded and re-released. With dynamic linking, on the other hand, the application is only released once. Subsequent OpenCV updates do not require application upgrades.

An additional advantage of dynamic linking in conjunction with the OpenCV Manager involves the latter's automatic hardware detection feature. The OpenCV Manager automatically detects the platform it is installed on and selects the optimum OpenCV library for that hardware. Prior to NVIDIA's release of the OpenCV Manager, no mechanism existed for selecting the optimum library for a particular hardware platform. Instead, the application developer had to release multiple versions of his application for various hardware types. [15]

OpenCV for Android Tutorials

The OpenCV4Android project has developed a series of tutorials that walk the reader through the process of creating an OpenCV4Android host build machine and developing an OpenCV4Android application. The root node of the documentation tree can be found in the Android section of the OpenCV website.

The first tutorial, "Introduction into Android Development," covers two methods of creating an Android host build machine. The automatic method, using NVIDIA’s TADP, is described later. The manual method requires that you install the following software:

  • Sun/Oracle JDK 6
  • Android SDK
  • Android SDK components
  • Eclipse IDE
  • ADT plugin for Eclipse
  • Android NDK, and
  • CDT plugin for Eclipse

The next tutorial, "OpenCV for Android SDK," covers the OpenCV4Android SDK package, which enables development of Android applications using the OpenCV library.

The SDK structure is illustrated here: [16]


|_ apk

|   |_ OpenCV_2.4.3_binary_pack_XXX.apk

|   |_ OpenCV_2.4.3_Manager.apk


|_ doc

|_ samples

|_ sdk

|    |_ etc

|    |_ java

|    |_ native

|          |_ 3rdparty

|          |_ jni

|          |_ libs

|               |_ armeabi

|               |_ armeabi-v7a

|               |_ x86


|_ license.txt

|_ README.android

  • The sdk folder contains the OpenCV API and libraries for Android.
  • The sdk/java folder contains an Android library Eclipse project, providing an OpenCV Java API that can be imported into developer’s workspace.
  • The sdk/native folder contains OpenCV C++ headers (for JNI code) and native Android libraries (*.so and *.a) for ARM-v5, ARM-v7a and x86 architectures.
  • The sdk/etc folder contains the Haar and LBP cascades distributed with OpenCV.
  • -The apk folder contains Android packages that should be installed on the target Android device to enable OpenCV library access via OpenCV Manager API. On production devices that have access to the Internet and the Google Play Market, these packages will be installed from the Market on the first application launch, via the OpenCV Manager API. Development kits without Internet and Market connections require these packages to be manually installed. Specifically, you must install the Manager.apk and corresponding binary_pack.apk, dependent on the device CPU (the Manager GUI provides this info). However, installation from the Internet is the preferable approach, since the OpenCV team may publish updated versions of various packages via the Google Play Market.
  • The samples folder contains sample applications projects and their prebuilt packages (the APK). Import them into Eclipse workspace and browse the code to learn ways of using OpenCV on Android.
  • The doc folder contains OpenCV documentation in PDF format. This documentation is also available online at the preceding link. The most recent (i.e. nightly build) documentation is at this location. Although it's generally more up-to-date, it can refer to not-yet-released functionality.

Beginning with version 2.4.3, the OpenCV4Android SDK uses the OpenCV Manager API for library initialization.

Finally, the "Android Development with OpenCV" tutorial walks the reader through how to create his or her first OpenCV4Android application. This tutorial covers both Java and native development, using the Eclipse-based tools. This tutorial also provides a framework for binding to the OpenCV Manager, to take advantage of the dynamic OpenCV libraries. The example code snippet that follows is reproduced from the OpenCV website. [17]

public class MyActivity extends Activity implements HelperCallbackInterface

private BaseLoaderCallback mOpenCVCallBack = new BaseLoaderCallback(this) {
public void onManagerConnected(int status) {
switch (status) {
case LoaderCallbackInterface.SUCCESS:
Log.i(TAG, "OpenCV loaded successfully");

// Create and set View
mView = new puzzle15View(mAppContext);
} break;
} break;


/** Call on every application resume **/
protected void onResume()
Log.i(TAG, "called onResume");


Log.i(TAG, "Trying to load OpenCV library");
if (!OpenCVLoader.initAsync(OpenCVLoader.OPENCV_VERSION_2_4_2, this, mOpenCVCallBack))
Log.e(TAG, "Cannot connect to OpenCV Manager");


NVIDIA’s TADP (Tegra Android Development Pack)

NVIDIA has been a significant contributor to the OpenCV library since 2010. NVIDIA has continued its support of OpenCV and Android more generally with the TADP (see "Designing visionary mobile apps using Tegra Android Development Pack”). The development pack was originally intended only for general Android development. However, with release 2.0, OpenCV was added as part of the TADP download direct from NVIDIA.

Per NVIDIA's documentation, the Tegra Android Development Pack 2.0 installs all software tools required to develop for Android on NVIDIA’s Tegra platform. This suite of developer tools is targeted at Tegra devices, but will configure a development environment that will work with almost any Android device. TADP 2.0 is available on Windows, Mac OS X, Ubuntu Linux 32-bit and Ubuntu Linux 64-bit (Figure 6).

Figure 6: The TADP 2.0 can be installed on 32-bit Ubuntu Linux, along with other operating systems

Tegra Android Development Pack 2.0 features include:

  • Android Development
  • Android SDK r18
  • Android APIs
  • Google USB Driver
  • Android NDK r8
  • JDK 6u24
  • Cygwin 1.7
  • Eclipse 3.7.1
  • CDT 8.0.0
  • ADT 18.0.0
  • Apache Ant 1.8.2
  • Python 2.7

Tegra libraries and tools include

  • Nsight Tegra 1.0, Visual Studio Edition (Windows only)
  • NVIDIA Debug Manager for Eclipse 12.0.1
  • PerfHUD ES 1.9.7
  • Tegra Profiler 1.0
  • Perf for Tegra
  • OpenCV for Tegra 2.4.2
  • Tegra samples, documentation and OS images
  • Tegra SDK samples (all of which can also be imported into an Eclipse workspace, see Figure 7)
  • Tegra SDK documentation
  • Tegra Android OS images for Cardhu, Ventana and Enterprise development kits

Figure 7: TADP examples can also be imported into an Eclipse workspace

OpenCV for Tegra

OpenCV for Tegra is a version of OpenCV for Android that NVIDIA has optimized for Tegra 3 platforms running the Android operating system. It currently supports Android API levels 9 through 16, and contains optimizations that enable OpenCV for Tegra to often run several times faster on Tegra 3 than does the generic open-source OpenCV for Android implementation. The TADP includes a SDK package for OpenCV for Tegra.

Figure 8 shows the OpenCV for Tegra Demo available for download from the Google Play store. Only the Sobel and Morphology algorithms are shown, although the demo supports additional algorithms such as various blurs and optical flow. The screen shots show performance both with and without Tegra optimizations enabled. Notice in Figures 8b and 8c that the Sobel algorithm runs twice as fast using the NVIDIA optimized version of OpenCV. Figures 8d and 8e show edge detection using morphology operators. The operations are listed in red at the bottom of the image. In this case, the NVIDIA-optimized OpenCV library executes the specified operators five times faster than the standard ARM version of OpenCV.

Figure 8: The OpenCV for Tegra demo is also available on Google Play (a). The included Sobel algorithm can be run either with Tegra optimizations off (b) or on (c); in the latter case it's twice as fast. Similarly, the morphology algorithm is five times (e).

Sample applications

The OpenCV4Android SDK includes four sample applications and five tutorials to help you get started in developing OpenCV applications for Android. The tutorials are meant to serve as frameworks or foundations for your specific application: simply open an appropriate tutorial project and start adding your code:

  • Android Camera: This tutorial is a skeleton application for all of the others. It does not use OpenCV at all, but gives an example Android Java application working with a camera.
  • Add OpenCV: This tutorial shows the simplest way to add a Java OpenCV call to the Android application.
  • Use OpenCV Camera: This tutorial functions exactly the same as the previous one, but uses OpenCV’s native camera for video capture.
  • Add Native OpenCV: This tutorial demonstrates how you can use OpenCV in the native part of your application, through JNI.
  • Mix Java + Native OpenCV: This tutorial shows you how to use both the C++ and Java OpenCV APIs within a single application.

The sample applications, on the other hand, are complete applications that you can build and run:

  • Image-manipulations: This sample demonstrates how you can use OpenCV as an image processing and manipulation library. It supports several filters and demonstrates color space conversions and working with histograms.
  • 15-puzzle: This sample shows how you can implement a simple game with just a few calls to OpenCV. It is also available on Google Play.
  • Face-detection: This sample is the simplest implementation of the face detection functionality on Android. It supports two modes of execution: an available-by-default Java wrapper for the cascade classifier, and a manually crafted JNI call to a native class which supports tracking. Even the Java version is able to deliver close to real-time performance on a Google Nexus One device.
  • Color-blob-detection: This sample shows a trivial implementation of a color blob tracker. After the user points to a particular image region, the algorithm attempts to select the whole blob of a similar color.


We are at a notable point in the evolution of computing. Modern smartphones and tablets are now quite capable of running useful computer vision algorithms. And by delivering significant advancements to OpenCV4Android, NVIDIA has brought the power of OpenCV to the smartphone and tablet. Developers can implement their algorithms using either Java or native C++ API’s. The Java API in particular exposes computer vision to a whole new level of developers. This is a very exciting time to be involved with computer vision!

Eric Gregori is a Senior Software Engineer and Embedded Vision Specialist with Berkeley Design Technology, Inc. (BDTI), which provides engineering services for embedded vision applications. He is a robot enthusiast with over 17 years of embedded firmware design experience, with specialties in computer vision, artificial intelligence, and programming for Windows Embedded CE, Linux, and Android operating systems.


  1. Smart phones overtake client PCs in 2011
  2. Gartner Says Worldwide Sales of Mobile Phones Declined 3 Percent in Third Quarter of 2012; Smartphone Sales Increased 47 Percent
  3. Gartner Says 821 Million Smart Devices Will Be Purchased Worldwide in 2012; Sales to Rise to 1.2 Billion in 2013
  4. December 2012 Embedded Vision Alliance Member Summit Technology Trends Presentation (requires registration)
  5. Google Play hits 25 billion downloads
  6. July 2012 Embedded Vision Alliance Member Summit Technology Trends Presentation on OpenCL (requires registration)
  7. Introduction To Computer Vision Using OpenCV (registration required)
  8. Home page of OpenCV.org
  9. September 2012 Embedded Vision Summit Afternoon Keynote: Gary Bradski, OpenCV Foundation (requires registration)
  10. July 2012 Embedded Vision Alliance Member Summit Keynote: Gary Bradski, Industrial Perception (requires registration)
  11. Introduction into Android Development
  12. OpenCV Change Logs
  13. Android Release Notes 2.3.1 (beta1)
  14. OpenCV4Android Usage Models
  15. OpenCV4Android Reference
  16. OpenCV4Android SDK
  17. Android development with OpenCV

The Gesture Interface: A Compelling Competitive Advantage in the Technology Race

Bookmark and Share

The Gesture Interface: A Compelling Competitive Advantage in the Technology Race

By Brian Dipert
Senior Analyst, BDTI
Editor-in-Chief, Embedded Vision Alliance

Yair Siegel
Director of Marketing, Multimedia, CEVA

Simon Morris
Chief Executive Officer, CogniVue

Liat Rostock
Marketing Director, eyeSight Mobile Technologies

Gershom Kutliroff
Chief Technical Officer, Omek Interactive
This article was originally published at EE Times' Communications Design Line. It is reprinted here with the permission of EE Times.

The means by which we interact with the machines around us is undergoing a fundamental transformation. While we may still sometimes need to push buttons, touch displays and trackpads, and raise our voices, we’ll increasingly be able to interact with and control our devices simply by signaling with our fingers, gesturing with our hands, and moving our bodies.

Most consumer electronic devices today - smart phones, tablets, PCs, TVs, and the like - either already include or will soon integrate one or more cameras. Automobiles and numerous other products are rapidly becoming camera-enabled, too. What can be achieved with these cameras is changing the way we interact with our devices and with each other. Leveraging one or multiple image sensors, these cameras generate data representing the three-dimensional space around the device, and innovators have developed products that transform this data into meaningful operations.

Gesture recognition, one key example of these sensor-enabled technologies, is achieving rapid market adoption as it evolves and matures. Although various gesture implementations exist in the market, a notable percentage of them are based on embedded vision algorithms that use cameras to detect and interpret finger, hand and body movements. Gestures have been part of humans’ native interaction language for eons. Adding support for various types of gestures to electronic devices enables using our natural "language" to operate these devices, which is much more intuitive and effortless when compared to touching a screen, manipulating a mouse or remote control, tweaking a knob, or pressing a switch.

Gesture controls will notably contribute to easing our interaction with devices, reducing (and in some cases replacing) the need for a mouse, keys, a remote control, or buttons (Figure 1). When combined with other advanced user interface technologies such as voice commands and face recognition, gestures can create a richer user experience that strives to understand the human "language," thereby fueling the next wave of electronic innovation.



Figure 1. Touching a tablet computer or smartphone's screen is inconvenient (at best, and more likely impossible) when you're in the kitchen and your hands are coated with cooking ingredients (a). Similarly, sand, sun tan lotion and water combine to make touchscreens infeasible at the beach, assuming the devices are even within reach (b).

Not just consumer electronics

When most people think of gesture recognition, they often imagine someone waving his or her hands, arms or bodies in the effort to control a game or other application on a large-screen display. Case studies of this trend include Microsoft’s Kinect peripheral for the Xbox 360, along with a range of gesture solutions augmenting traditional remote controls for televisions and keyboards, mice, touchscreens and trackpads for computers. At the recent Consumer Electronics Show, for example, multiple TV manufacturers showcased camera-inclusive models that implemented not only gesture control but also various face recognition-enabled features. Similarly, Intel trumpeted a diversity of imaging-enabled capabilities for its Ultrabook designs.

However, gesture recognition as a user interface scheme also applies to a wide range of applications beyond consumer electronics. In the automotive market, for example, gesture is seen as a convenience-driven add-on feature for controlling the rear hatch and sliding side doors. Cameras already installed in rear of the vehicle for reversing, and in the side mirrors for blind spot warning, can also be employed for these additional capabilities. As the driver approaches the car, a proximity sensor detects the ignition key in the pocket or purse and turns on the cameras. An appropriate subsequent wave of the driver’s hand or foot could initiate opening of the rear hatch or side door.

Another potential automotive use case is inside the cabin, when an individual cannot (or at least should not) reach for a particular button or knob when driving but still wants to answer an incoming cellphone call or change menus on the console or infotainment unit. A simple hand gesture may be a safer, quicker and otherwise more convenient means of accomplishing such a task. Many automotive manufacturers are currently experimenting with (and in some cases already publicly demonstrating) gesture as a means for user control in the car, among other motivations as an incremental safety capability.

Additional gesture recognition opportunities exist in medical applications where, for health and safety reasons, a nurse or doctor may not be able to touch a display or trackpad but still needs to control a system. In other cases, the medical professional may not be within reach of the display yet still needs to manipulate the content being shown on the display. Appropriate gestures, such as hand swipes or using a finger as a virtual mouse, are a safer and faster way to control the device (Figure 2).

Figure 2. Microsoft's "Kinect Effect" video showcased a number of applications then under development for Kinect for Windows (and conceptually applicable to other 2-D and 3-D sensor technologies, as well)

Gesture interfaces are also useful in rehabilitation situations. Gesturetek’s IREX, for example, guides patients through interactive exercises that target specific body parts. And less conventional health-related applications for gesture recognition also exist. For instance, users with physical handicaps may not be able to use a conventional keyboard or mouse but could instead leverage the recognition of facial gestures as a means of control. And active university research is also underway on using gesture recognition as a means of translating sign language to text and speech.

More generally, diverse markets exist where gestures are useful for display control. You might recall, for example, the popular image of Tom Cruise manipulating the large transparent display in the movie Minority Report. Or consider the advertising market where interactive digital signs could respond to viewers’ gestures (not to mention identifying a particular viewer's age, gender, ethnicity and other factors) in order to optimize the displayed image and better engage the viewer. Even in industrial markets, appliances such as ceiling-positioned HVAC sensors could be conveniently controlled via gestures. As sensor technologies, gesture algorithms and vision processors continue to improve over time, what might appear today to be a unique form of interactivity will be commonplace in the future, across a range of applications and markets.

Implementations vary by application

The meaning of the term "gesture recognition" has become broader over time, as it's used to describe an increasing range of implementation variants. These specific solutions may be designed and optimized, for example, for either close- or long-range interaction, for fine-resolution gestures or robust full-bodied movements, and for continuous tracking or brief-duration gestures. Gesture recognition technology entails a wide variety of touch-free interaction capabilities, each serving a different type of user interface scenario.

Close-range gesture detection is typically used in handheld devices such as smartphones and tablets, where the interaction occurs in close proximity to the device’s camera. In contrast, long-range gesture control is commonly employed with devices such as TVs, set-top boxes, digital signage, and the like, where the distance between the user and the device can span multiple feet and interaction is therefore from afar.

While user interface convenience is at the essence of gesture control in both user scenarios, the algorithms used, specifically the methods by which gestures are performed and detected, are fundamentally different. In close-range usage, the camera "sees" a hand gesture in a completely different way than how the camera "sees" that same hand and gesture in long-range interaction.

Additionally, a distinction exists between different gesture "languages." For example, when using gestures to navigate through the detailed menus of a "smart" TV, the user will find it intuitive to use fine-resolution, small gestures to select menu items. However, when using the device to play games based on full-body detection, robust gestures are required to deliver the appropriate experience.

Moreover, differences exist between rapid-completion gestures and those that involve continuous hand tracking. A distinctive hand motion from right to left or left to right can be used, for example, to flip eBook pages or change songs on a music playback application. These scenarios contrast to continuous hand tracking, which is relevant for control of menus and other detailed user interface elements, such as a Windows 8 UI or a smart TV's screen.

Other implementation challenges

Any gesture control product contains several different key hardware and software components, all of which must be tightly integrated in order to provide a compelling user experience. First is the camera, which captures the raw data that represent the user’s actions. Generally, this raw data is then processed, in order to reduce the noise in the signal, for example, or (in the case of 3-D cameras) to compute the depth map.

Specialized algorithms subsequently interpret the processed data, translating the user’s movements into "actionable" commands that a computer can understand. And finally, an application integrates these actionable commands with user feedback in a way that must be both natural and engaging. Adding to the overall complexity of the solution, the algorithms and applications are increasingly implemented on embedded systems with limited processing, storage and other resources.

Tightly integrating these components to deliver a compelling gesture control experience is not a simple task, and the complexity is further magnified by the demands of gesture control applications. In particular, gesture control systems must be highly interactive, able to process large amounts of data with imperceptible latency. Commonly encountered incoming video streams, depending on the application, have frame resolutions ranging from QVGA to 1080p HD, at frame rates of 24 to 60 fps.

Bringing gesture control products to market therefore requires a unified effort among the different members of the technology supplier ecosystem: sensor and camera manufacturers, processor companies, algorithm providers, and application developers. Optimizing the different components to work together smoothly is critical in order to provide an engaging user experience. Vision functions, at the core of gesture algorithms, are often complex to implement and may require substantial additional work to optimize for the specific features of particular image processors. However, a substantial-sized set of functions finds common and repeated use across various applications and products. A strong case can therefore be made for the development of cross-platform libraries that provide common low-level vision functions.

In a market as young as gesture control, there is also still little to no standardization across the ecosystem. Multiple camera technologies are used to generate 3-D data, and each technique produces its own characteristic artifacts. Each 3-D camera also comes with its own proprietary interface. And gesture dictionaries are not standardized; a motion that may one thing on one system implementation may mean something completely different (or alternatively nothing at all) on a different system. Standardization is inevitable and is necessary for the industry to grow and otherwise mature.

Industry alliance opportunities

The term “embedded vision,” of which gesture control is one key application example, refers to the use of computer vision technology in embedded systems, mobile devices, PCs, and the cloud. Stated another way, “embedded vision” refers to embedded systems that extract meaning from visual inputs. Similar to the way that wireless communication has become pervasive over the past 10 years, embedded vision technology is poised to be widely deployed in the next 10 years.

Embedded vision technology has the potential to enable a wide range of electronic products that are more intelligent and responsive than before, and thus more valuable to users. It can add helpful features to existing products. And it can provide significant new markets for hardware, software and semiconductor manufacturers. The Embedded Vision Alliance, a worldwide organization of technology developers and providers, is working to empower engineers to transform this potential into reality.

BDTI, CEVA, CogniVue, eyeSight Mobile Technologies and Omek Interactive, the co-authors of this article, are all members of the Embedded Vision Alliance. First and foremost, its mission is to provide engineers with practical education, information, and insights to help them incorporate embedded vision capabilities into products. To execute this mission, the Alliance has developed a website (www.Embedded-Vision.com) providing tutorial articles, videos, code downloads and a discussion forum staffed by a diversity of technology experts. Registered website users can also receive the alliance’s twice-monthly e-mail newsletter (www.embeddedvisioninsights.com), among other additional benefits.

Transforming a gesture control experience into a shipping product entails compromises – in cost, performance, and accuracy, to name a few. The Embedded Vision Alliance catalyzes these conversations in a forum where such tradeoffs can be understood and resolved, and where the effort to productize gesture control can therefore be accelerated, enabling system developers to effectively harness gesture user interface technology. For more information on the Embedded Vision Alliance, including membership details, please visit www.Embedded-Vision.com, email info@Embedded-Vision.com or call 925-954-1411.

Please also consider attending the Alliance's upcoming Embedded Vision Summit, a free day-long technical educational forum to be held on April 25th in San Jose, California and intended for engineers interested in incorporating visual intelligence into electronic systems and software. The event agenda includes how-to presentations, seminars, demonstrations, and opportunities to interact with Alliance member companies. For more information on the Embedded Vision Summit, including an online registration application form, please visit www.embeddedvisionsummit.com.

Author biographies

Brian Dipert is Editor-In-Chief of the Embedded Vision Alliance. He is also a Senior Analyst at Berkeley Design Technology, Inc., which provides analysis, advice, and engineering for embedded processing technology and applications, and Editor-In-Chief of InsideDSP, the company's online newsletter dedicated to digital signal processing technology. Brian has a B.S. degree in Electrical Engineering from Purdue University in West Lafayette, IN. His professional career began at Magnavox Electronics Systems in Fort Wayne, IN; Brian subsequently spent eight years at Intel Corporation in Folsom, CA. He then spent 14 years at EDN Magazine.

Yair Siegel serves as Director of Marketing, covering multimedia, at CEVA. Prior to this, he was Director of Worldwide Field Applications. Yair has worked with CEVA, along with the licensing division of DSP Group, since 1997, serving in various R&D engineering and management positions within the Software Development Tools department. He holds a BSc degree in Computer Science and Economics from the Hebrew University in Jerusalem, as well as a MBA and a MA in Economics from Tel-Aviv University.

Simon Morris has over 20 years of professional experience in both private and public semiconductor companies, and as CogniVue's CEO is responsible for leading the company's evolution from an R&D cost center through spin-out to an independent privately held fabless semiconductor and embedded software business. Prior to joining CogniVue, Simon was Director at BDC Venture Capital. From 1995-2006 he also held various senior and executive leadership positions at Atsana Semiconductor, and senior positions at Texas Instruments. Simon has an M.Eng in Electrical Engineering and a B.Eng in Electrical Engineering from the Royal Military College of Canada, and is a member of the Professional Engineers of Ontario.

As eyeSight’s Marketing Director, Liat Rostock’s responsibilities cover company branding, public relations, event management, channel marketing and communication strategy. Prior to this position Liat gained the valuable insight into the company as eyeSight’s Senior Project Manager as part of which she was responsible for the design, development process, and distribution of applications integrated with eyeSight’s technology. Liat holds a B.A from IDC Herzeliya where she majored in marketing.

Gershom Kutliroff is CTO and co-founder of Omek Interactive, in which role he is responsible for the company’s proprietary algorithms and software development. Before founding Omek, Dr. Kutliroff was the Chief Scientist of IDT Video Technologies, where he led research efforts in developing computer vision applications to support IDT’s videophone operations. Prior to that, he was the leader of the Core Technologies group at DPSi, a computer animation studio. He earned his Ph. D. and M. Sc. in Applied Mathematics from Brown University, and his B. Sc. in Applied Mathematics from Columbia University.

Expanding Resources Streamline the Creation of "Machines That See"

By Brian Dipert and Jeff Bier
Embedded Vision Alliance

The Embedded Vision Revolution

Bookmark and Share

The Embedded Vision Revolution

Semiconductor and software advances are enabling medical devices to derive meaning from digital still and video images.

By Brian Dipert
Editor-in-Chief, Embedded Vision Alliance
Senior Analyst, BDTI
Kamran Khan
Technical Marketing Engineer, Xilinx
This article was originally published in the May 2013 edition of MD+DI (Medical Device and Diagnostic Industry) Magazine. It is reprinted here with the permission of the original publisher.

Imaging technologies such as x-rays and MRI have long been critical diagnostic tools used by healthcare professionals. But it's ultimately up to a human operator to analyze and interpret the images these technologies produce, and that leaves open the possibility that critical information will be overlooked. In a recent study, for example, 20 out of 24 radiologists missed the image of a gorilla that was around 48 times the size of a typical cancer nodule artificially inserted into CT scans of patients' lungs (Figure 1). This oversight occurred even though the radiologists were presented with the highly anomalous and seemingly easily detectible primate picture on average more than four times in separate images.

Figure 1. See the gorilla in the upper right corner of this CT scan? 20 of 24 radiologists in a recent study didn't notice it.

Advanced, intelligent x-ray and CT systems—which, unlike humans, are not subject to fatigue, distraction, and other degrading factors—could assist radiologists in rapidly and accurately identifying image irregularities. Such applications are among many examples of the emerging technology known as embedded vision, the incorporation of automated image analysis, or computer vision, capabilities into a variety of electronic devices. Such systems extract meaning from visual inputs, and similar to the way wireless communication has become pervasive over the past 10 years, embedded vision technology is poised to be widely deployed in medical electronics.

Just as high-speed wireless connectivity began as an exotic, costly technology, embedded vision has until recently been found only in complex, expensive systems, such as quality-control inspection systems for manufacturing. Advances in digital integrated circuits were critical in enabling high-speed wireless technology to evolve from fringe to mainstream. Similarly, advances in digital chips such as processors, image sensors, and memory devices are paving the way for the proliferation of embedded vision into a variety of applications, including high-volume consumer devices such as smartphones and tablets (see sidebar “Mobile Electronics Applications Showcase Embedded Vision Opportunities").

How It Works

Vision algorithms typically require high compute performance, and, of course, embedded systems of all kinds are usually required to fit into tight cost and power consumption envelopes. In other digital signal processing application domains, such as wireless communications and compression-centric consumer video equipment, chip designers achieve this challenging combination of high performance, low cost, and low power by using specialized coprocessors and accelerators to implement the most demanding processing tasks in the application. These coprocessors and accelerators are typically not programmable by the chip user, however.

This tradeoff is often acceptable in applications where software algorithms are standardized. In vision applications, however, there are no standards constraining the choice of algorithms. On the contrary, there are often many approaches to choose from to solve a particular vision problem. Therefore, vision algorithms are very diverse, and change rapidly over time. As a result, the use of nonprogrammable accelerators and coprocessors is less attractive.

Achieving the combination of high performance, low cost, low power and programmability is challenging. Although there are few chips dedicated to embedded vision applications today, these applications are adopting high-performance, cost-effective processing chips developed for other applications, including DSPs, CPUs, FPGAs and GPUs. Particularly demanding embedded vision applications often use a combination of processing elements. As these chips and cores continue to deliver more programmable performance per dollar and per watt, they will enable the creation of more high-volume embedded vision products. Those high-volume applications, in turn, will attract more attention from silicon providers, who will deliver even better performance, efficiency and programmability.

A Diversity of Opportunities

Medical imaging is not the only opportunity for embedded vision to supplement—if not supplant—human medical analysis and diagnosis. Consider endoscopy (Figure 2), for example. Historically, endoscope systems only displayed unenhanced, low-resolution images. Physicians had to interpret the images they saw based solely on knowledge and experience (see sidebar "Codifying Intuition"). The low-quality images and subjective nature of the diagnoses inevitably resulted in overlooked abnormalities and incorrect treatments. Today, endoscope systems are armed with high-resolution optics, image sensors, and embedded vision processing capabilities. They can distinguish among tissues, enhance edges and other image attributes; perform basic dimensional analytics (e.g., length, angle); and overlay this data on top of the video image in real time. Advanced designs can even identify and highlight unusual image features, so physicians are unlikely to overlook them.

Figure 2. Leading-edge endoscope systems not only output high-resolution images but also enhance and information-augment them to assist in physician analysis and diagnosis.

In ophthalmology, doctors historically relied on elementary cameras that only took pictures of the inner portions of the eye. Subsequent analysis was left to the healthcare professional. Today, ophthalmologists use medical devices armed with embedded vision capabilities to create detailed 2-D and 3-D models of the eye in real time as well as overlay analytics such as volume metrics and the dimensions of critical ocular components. With ophthalmological devices used for cataract or lens correction surgery preparation, for example, embedded vision processing helps differentiate the cornea from the rest of the eye. It then calculates an overlay, complete with surgical cut lines, based on various dimensions it has ascertained. Physicians now have a customized operation blueprint, which dramatically reduces the likelihood of mistakes. Such data can even be used to guide human-assisted or fully automated surgical robots with high precision (Figure 3).

Figure 3. Robust vision capabilities are essential to robotic surgery systems, whether they be human-assisted (either local or remote) or fully automated.

Another example of how embedded vision can enhance medical devices involves advancements in clinical testing. In the past, it took days, if not weeks, to receive the results of blood and genetic tests. Today, more accurate results are often delivered in a matter of hours. DNA sequencers use embedded vision to accelerate analysis by focusing in on particular areas of a sample. After DNA molecules and primers are added to a slide, the sample begins to group into clusters. A high-resolution camera then scans the sample, creating numerous magnified images. After stitching these images together, the embedded vision–enhanced system identifies the clusters, along with their density parameters. These regions are then subjected to additional chemical analysis to unlock their DNA attributes. This method of visually identifying clusters before continuing the process drastically reduces procedure times and allows for more precise results. Faster blood or genetic test results enables quicker treatment and improves healthcare.

Magnifying Minute Variations

Electronic systems are also adept at detecting and accentuating minute image-to-image variations that the human visual system is unable to perceive, whether due to insufficient sensitivity or inadequate attention. As research at MIT and elsewhere has shown, it's possible to accurately measure pulse rate simply by placing a camera in front of a person and logging the minute facial color change cycles that are reflective of capillary blood flow (Figure 4). Similarly, embedded vision systems can precisely assess respiration rate by measuring the periodic rise and fall of the subject's chest.

Figure 4. Embedded vision systems are capable of discerning (and amplifying) the scant frame-to-frame color changes in a subject's face, suggestive of blood flow.

This same embedded vision can be used to provide early indication of neurological disorders such as amyotrophic lateral sclerosis (also known as Lou Gehrig's disease) and Parkinson's disease. Early warning signs such as minute trembling or aberrations in gait—so slight that they may not yet even be perceptible to the patient—are less likely to escape the perceptive gaze of an embedded vision–enabled medical device.

Microsoft's Kinect game console peripheral (initially harnessed for image-processing functions such as a gesture interface and facial detection and recognition) is perhaps the best known embedded vision product. More recently, its official sanction expanded beyond the Xbox 360 to Windows 7 and 8 PCs. In March of this year, Microsoft began shipping an upgraded Kinect for Windows SDK that includes Fusion, a feature that transforms Kinect-generated images into 3-D models. Microsoft demonstrated the Fusion algorithm in early March by transforming brain scans into 3-D replicas of subjects' brains, which were superimposed onto a mannequin's head and displayed on a tablet computer's LCD screen.

Gesture and Security Enhancements

The Kinect and its 2-D and 3-D sensor counterparts have other notable medical uses. At the end of October 2011, Microsoft released "The Kinect Effect," a video showcasing a number of Kinect for Windows applications then under development. One showed a surgeon in the operating room flipping through LCD-displayed x-ray images simply by gesturing with his hand in the air (Figure 5). A gesture interface is desirable in at least two scenarios: when the equipment to be manipulated is out of arm's reach (thereby making conventional buttons, switches, and touchscreens infeasible) and when sanitary concerns prevent tactile control of the gear.

Figure 5. Gesture interfaces are useful in situations where the equipment to be controlled is out of arm's reach and when sanitary or other concerns preclude tactile manipulation of it.

Embedded vision intelligence can also ensure medical facility employees follow adequate sanitary practices. Conventional video-capture equipment, such as that found in consumer and commercial settings, records constantly while it is operational, creating an abundance of wasted content that must be viewed in its entirety in order to pick out footage of note. An intelligent video monitoring system is a preferable alternative; it can employ motion detection to discern when an object has entered the frame and use facial detection to confirm the object is a person. Recording will only continue until the person has exited the scene. Subsequent review—either by a human operator or, increasingly commonly, by the computer itself—will assess whether adequate sanitary practices have been followed in each case.

Future Trends

In the future, medical devices will use increasing degrees of embedded vision technology to better diagnose, analyze, and treat patients. The technology has the potential to make healthcare safer, more effective, and more reliable than ever before. Subjective methods of identifying ailments and guess-and-check treatment methods are fading. Smarter medical devices with embedded vision are the future. Indicative of this trend, a recent Indiana University study found that analyzing patient data with simulation-modeling machine-learning algorithms can drastically reduce healthcare costs and improve quality. The artificial intelligence models used in the study for diagnosing and treating patients resulted in a 30–35% increase in positive patient outcomes.

Vision-enhanced medical devices will be able to capture high-resolution images, analyze them, and provide recommendations and guidance for treatment. Next-generation devices will also be increasingly automated, performing preoperation analysis, creating surgical plans from visual and other inputs, and in some cases even performing surgeries with safety and reliability that human hands cannot replicate. As embedded vision in medical devices evolves, more uses for it will emerge. The goal is to improve safety, remove human subjectivity and irregularity, and administer the correct treatment the first (and only) time.

Embedded vision has the potential to enable a range of medical and other electronic products that will be more intelligent and responsive than before—and thus more valuable to users. The technology can both add helpful features to existing products and open up brand new markets for equipment manufacturers. The examples discussed in this article, along with its accompanying sidebars, have hopefully sparked your creativity. A worldwide industry alliance of semiconductor, software and services suppliers is also poised to assist you in rapidly and robustly transforming your next-generation ideas into shipping-product reality (see sidebar "The Embedded Vision Alliance"). The era of cost-effective, power-efficient, and full-featured embedded vision capabilities has arrived. What you do with the technology's potential is fundamentally limited only by your imagination.

Sidebar: Mobile Electronics Applications Showcase Embedded Vision

Thanks to service provider subsidies coupled with high shipment volumes, relatively inexpensive smartphones and tablets supply formidable processing capabilities: multi-core GHz-plus CPUs and graphics processors, on-chip DSPs and imaging coprocessors, and multiple gigabytes of memory. Plus, they integrate front- and rear-viewing cameras capable of capturing high-resolution still images and HD video clips. Harnessing this hardware potential, developers have created medical applications for mobile electronics devices.

Sidebar: Codifying Intuition

Medicine, like any science, heavily relies on data to diagnose and treat patient ailments. However, as any healthcare professional will tell you physicians and nurses also commonly rely on intuition. Simply by glancing at a patient, and relying in no small part on past experience, a medical professional can often deliver an accurate diagnosis, even in the presence of nebulous or contradictory information.

While we typically don’t understand the processes by which such snap judgments are created, this does not necessarily preclude harnessing them to create insightful automated systems. The field of machine learning, which is closely allied with embedded vision, involves creating machines that can learn by example. By inputting to a computer the data sets of various patient cases, complete with information on healthcare professionals’ assessments, the system's artificial intelligence algorithms may subsequently be able to intuit with human-like success rates.

Sidebar: The Embedded Vision Alliance

The Embedded Vision Alliance, a worldwide organization of technology developers and providers, is working to empower engineers to transform the potential of embedded vision technology into reality. The alliance’s mission is to provide engineers with practical education, information, and insights to help them incorporate embedded vision capabilities into products. To execute this mission, the alliance has developed a Web site with tutorial articles, videos, code downloads, and a discussion forum staffed by a diversity of technology experts. For more information on the Embedded Vision Alliance, including membership details, please email info@Embedded-Vision.com or call 925/954-1411.

Brian Dipert is editor-in-chief of the Embedded Vision Alliance (Walnut Creek, CA) and a senior analyst at Berkeley Design Technology Inc. (BDTI; Walnut Creek), a firm that provides analysis, advice, and engineering for embedded processing technology and applications. Dipert holds a bachelor’s degree in electrical engineering from Purdue University. Reach him at dipert@embedded-vision.com.

Kamran Khan is a technical marketing engineer at Xilinx Corp. (San Jose, CA). He has more than six years of technical marketing and application experience in the semiconductor industry, focusing on FPGAs. Prior to joining Xilinx, he worked on developing FPGA-based applications and solutions for customers in an array of end markets, including medical. Contact him at kamran.khan@xilinx.com.

Xilinx is a platinum member of the Embedded Vision Alliance. Xilinx enables smarter devices by pioneering new technologies and tools to make embedded vision more powerful. Xilinx’s All Programmable SoCs provide embedded applications with the productivity of a dual-core processor and FPGA fabric, all in one device. Building smaller, smarter and otherwise better medical equipment begins with the right devices and tools to accelerate applications. You can, for example, off-load real-time analytics to hardware accelerator blocks in the FPGA fabric, while the integrated microprocessor handles secure operating systems and communications protocols via software. Xilinx’s commitment to innovating All Programmable SoCs will usher in new medical devices that will be smarter, safer and more accurate. For more information, please visit http://www.xilinx.com.

3D Imaging With National Instruments' LabVIEW

Bookmark and Share

3D Imaging With National Instruments' LabVIEW

by Vineet Aggarwal
Embedded Systems Group Manager
National Instruments
This is a reprint of a National Instruments-published white paper, which is also available here.


3D imaging technology has come a long way from its roots in academic research labs, and thanks to innovations in sensors, lighting and most importantly, embedded processing, 3D vision is now appearing in a variety of machine automation applications. From vision-guided robotic bin-picking to high precision metrology, the latest generation of processors can now handle the immense data sets and sophisticated algorithms required to extract depth information and quickly make decisions. The LabVIEW 2012 Vision Development Module makes 3D vision accessible to engineers through seamless integration of software and hardware tools for 3D within one graphical development environment.

Introduction to 3D Imaging

There are several ways to calculate depth information using 2D camera sensors or other optical sensing technologies.  Below are brief descriptions of the most common approaches:


Autonomous vehicles use depth information to measure the size and distance of obstacles for accurate path planning and obstacle avoidance.  Stereo vision systems can provide a rich set of 3D information for navigation applications, and can perform well even in changing light conditions.

Industrial Robotics

A stereo vision system is useful in robotic industrial automation of tasks such as bin picking or crate handling. A bin-picking application requires a robot arm to pick a specific object from a container that holds several different kinds of parts.  A stereo vision system can provide an inexpensive way to obtain 3D information and determine which parts are free to be grasped.  It can also provide precise locations for individual products in a crate and enable applications in which a robot arm removes objects from a pallet and moves them to another pallet or process.

Automated Inspection

3D information is also very useful for ensuring high quality in automated inspection applications.  You can use stereo vision to detect defects that are very difficult to identify with only 2-dimensional images. Ensuring the presence of pills in a blister pack, inspecting the shape of bottles and looking for bent pins on a connector are all examples of automated inspection where depth information has a high impact on ensuring quality.


Stereo vision systems are also good for tracking applications because they are robust in the presence of lighting variations and shadows. A stereo vision system can accurately provide 3D information for tracked objects which can be used to detect abnormal events, such as trespassing individuals or abandoned baggage. Stereo vision systems can also be used to enhance the accuracy of identification systems like facial recognition or other biometrics.

New Stereo Vision Functions in Vision Development Module 2012

Starting with LabVIEW 2012, Vision Development Module now includes binocular stereo vision algorithms to calculate depth information from multiple cameras.  By using calibration information between two cameras, the new algorithms can generate depth images, providing richer data to identify objects, detect defects, and guide robotic arms on how to move and respond.

A binocular stereo vision system uses exactly two cameras. Ideally, the two cameras are separated by a short distance and are mounted almost parallel to one another. In the example shown in Figure 1, a box spherical chocolates is used to demonstrate the benefits of 3D imaging for automated inspection.  After calibrating the two cameras to know the 3D spatial relationship, such as separation and tilt, two different images are acquired to locate potential defects in the chocolate.  Using the new 3D Stereo Vision algorithms in the Vision Development Module, the two images can be combined to calculate depth information and visualize a depth image.

Figure 1. Example of depth image generated from left and right images using Stereo Vision

While it’s less apparent in the 2-dimensional images, the 3D depth image shows that two of the chocolates are not spherical enough to pass the high-quality standards.  The image in Figure 2 shows a white box around the defects that have been identified.

Figure 2. Example of depth image generated from left and right images using Stereo Vision

One important consideration when using stereo vision is that the computation of the disparity is based on locating a feature from a line of the left image and the same line of the right image. To be able to locate and differentiate the features, the images need to have sufficient texture, and to obtain better results, you may need to add texture by illuminating the scene with structured lighting.

Finally, binocular stereo vision can be used to calculate the 3D coordinates (X,Y,Z) of points on the surface of an object being inspected. These points are often referred to as point clouds or cloud of points. Point clouds are very useful in visualizing the 3D shape of objects are can also be used by other 3D analysis software.  The AQSense 3D Shape Analysis Library (SAL3D), for example, is now available in the LabVIEW Tools Network, and uses a cloud of points for further image processing and visualization.

How Stereo Vision Works

To better illustrate how binocular stereo vision works, Figure 3 shows the diagram of a simplified stereo vision setup, where both cameras are mounted perfectly parallel to each other, and have the exact same focal length.

Figure 3. Simplified Stereo Vision System

The variables in Figure 3 are:

  • b is the baseline, or distance between the two cameras
  • f is the focal length of a camera
  • XA is the X-axis of a camera
  • ZA is the optical axis of a camera
  • P is a real-world point defined by the coordinates X, Y, and Z
  • uL is the projection of the real-world point P in an image acquired by the left camera
  • uR is the projection of the real-world point P in an image acquired by the right camera

Since the two cameras are separated by distance “b”, both cameras view the same real-world point P in a different location on the 2-dimensional images acquired.  The X-coordinates of points uL and uR are given by:

uL = f * X/Z


uR = f * (X-b)/Z

Distance between those two projected points is known as “disparity” and we can use the disparity value to calculate depth information, which is the distance between real-world point “P” and the stereo vision system.

disparity  =   uL – uR   =  f * b/z

depth = f * b/disparity

In reality, an actual stereo vision set-up is more complex, would look more like the typical system shown in Figure 4, but all of the same fundamental principles still apply.

Figure 4. Typical Stereo Vision System

The ideal assumptions made for the simplified stereo vision system cannot be made for real-world stereo vision applications. Even the best cameras and lenses will introduce some level of distortion to the image acquired, and in order to compensate, a typical stereo vision system also requires calibration.  The calibration process involves using a calibration grid, acquired at different angles to calculate image distortion as well as the exact spatial relationship between the two cameras.  Figure 5 shows the calibration grid included with the Vision Development Module.

Figure 5. A calibration grid is included as a PDF file with the Vision Development Module

The Vision Development Module includes functions and LabVIEW examples that walk you through the stereo vision calibration process to generate several calibration matrices that are used in further computations to calculate disparity and depth information.  You can then visualize 3D images as shown earlier in Figure 1, as well as perform different types of analysis for defect detection, object tracking, and motion control.

Stereo Vision Applications

Stereo vision systems are best suited for applications in which the cameras settings and locations are fixed, and won’t experience large disturbances. Common applications include navigation, industrial robotics, automated inspection and surveillance.


Autonomous vehicles use depth information to measure the size and distance of obstacles for accurate path planning and obstacle avoidance.  Stereo vision systems can provide a rich set of 3D information for navigation applications, and can perform well even in changing light conditions.

Industrial Robotics

A stereo vision system is useful in robotic industrial automation of tasks such as bin picking or crate handling. A bin-picking application requires a robot arm to pick a specific object from a container that holds several different kinds of parts.  A stereo vision system can provide an inexpensive way to obtain 3D information and determine which parts are free to be grasped.  It can also provide precise locations for individual products in a crate and enable applications in which a robot arm removes objects from a pallet and moves them to another pallet or process.

Automated Inspection

3D information is also very useful for ensuring high quality in automated inspection applications.  You can use stereo vision to detect defects that are very difficult to identify with only 2-dimensional images. Ensuring the presence of pills in a blister pack, inspecting the shape of bottles and looking for bent pins on a connector are all examples of automated inspection where depth information has a high impact on ensuring quality.


Stereo vision systems are also good for tracking applications because they are robust in the presence of lighting variations and shadows. A stereo vision system can accurately provide 3D information for tracked objects which can be used to detect abnormal events, such as trespassing individuals or abandoned baggage. Stereo vision systems can also be used to enhance the accuracy of identification systems like facial recognition or other biometrics.

Summary and Next Steps

The new stereo vision features in the LabVIEW Vision Development Module bring new 3D vision capabilities to engineers in a variety of industries and applications areas.  Through the openness of LabVIEW, engineers can also use third -party hardware and software 3D vision tools for additional advanced capabilities, including the SICK 3D Ranger Camera for laser triangulation imaging and the AQSense 3D Shape Analysis Library for 3D image processing. The LabVIEW 2012 Vision Development Module makes 3D vision accessible to engineers in one graphical development environment.

Next Steps:

Reference Material:

An Architecture for Compute-Intensive, Custom Machine Vision

Bookmark and Share

An Architecture for Compute-Intensive, Custom Machine Vision

By Tom Catalino
Vice President
Critical Link, LLC
Asheesh Bhardwaj
DSP Senior Applications Engineer
Texas Instruments
This is a reprint of a Texas Instruments-published white paper, which is also available here (560 KB PDF).


Machine vision technology is growing in adoption, and it has been and will continue to be deployed in a variety of application areas. These include instrumentation and inspection equipment used to manufacture a wide range of products as varied as sheet goods, pharmaceuticals, semiconductors, razor blades and automobiles. Each application of machine vision carries a unique set of vision system requirements that are often not readily served by preconfigured devices. This white paper addresses a particular class of machine vision systems – those that are compute intensive – by detailing an architecture and additional resources the reader can use to move forward with their own vision system design, based on this, or a similar, architecture.

Problem statement

Many machine vision system algorithms are very compute intensive and therefore may require dedicated hardware. Each application carries unique requirements that lend themselves to programmable architectures like DSP or FPGA, rather than to a dedicated, fixed-function device or core, such as vision analytics and video compression with CODEC or an ASIC.

There are a number of readily available vision products on the market today that provide the building blocks suitable for a standards-based vision system. These include open source vision algorithms from OpenCV. The standards many vision systems must support include video encoding or transcoding using a number of different standard video formats (MPEG-4, H.264, etc.) as well as a standard set of digital interfaces (USB, GigE, Camera Link, HDMI, etc.).

But what if you needed to develop a very specialized camera? Maybe you need a camera where full-resolution, high-definition video at 30 frames per second (fps) or 60 fps is not necessarily the objective. Instead, you require fully customized algorithmic processing on a small region of interest at a very high frame rate – in the thousands of fps, with 50×50 pixel resolution. At the other end of the spectrum, you might need to execute a custom algorithm on an image that is non-standard, very high resolution, at a lower frame rate with low overall power consumption. And perhaps the volumes needed for this special camera are low to medium – not enough to justify the schedule, expense or risk required for the development of a specialized ASIC.

This paper provides an architecture that addresses such unique requirements, and provides resources necessary to move forward with your own design based on the outlined concepts.


An architecture used by Critical Link, a member of the Texas Instruments Incorporated (TI) Design Network, in a number of designs with these types of requirements leverages the Spartan-6 family of FPGAs from Xilinx, along with the power and flexibility of the TI OMAP-L138 DSP+ARM9TM processor, including use of its uPP interface. The combination of these two devices, illustrated in this paper with Critical Link’s MityDSP-L138F System on Module (SoM), allows for low-power systems that can efficiently address unique requirements.

Programmable logic with the Xilinx Spartan-6

Incorporating programmable logic into any hardware design increases the overall flexibility of the system. In this case, the Xilinx Spartan-6 is a particularly good choice as a match for the TI OMAP-L138 processor and the types of systems addressed in this paper (Figure 1). The Spartan-6 has its SelectIO, which can be configured to support a wide variety of signaling standards, including LVDS, LVTTL, LVCMOS and many others. It can be customized to interface directly to the digital imaging sensor most suitable for the system being designed. Further, it can also be used to interface to analog-based sensors, such as CCD imagers, by designing an ADC between the FPGA and the sensor itself.

Figure 1. Image processing in Xilinx FPGA

Once the image data has been acquired by the FPGA, the programmable logic can be used to quickly and efficiently perform a wide variety of operations on the data. The FPGA is well suited for operations such as edge or corner detection, extraction and low-level analytics operations. In addition, basic frame-to-frame operations can also be efficiently performed in the FPGA.

Another advantage of the FPGA is that it can be used to cull raw data down to only that data which must be touched by the DSP/applications processor. This is particularly useful when the processing requirements are more than the DSP/applications processor can handle – such as analyzing every frame of a full-resolution video stream at high frame rate, or when the sheer volume of data is higher than the processor’s digital interfaces can take in. This culling of the raw data can be performed by using the FPGA to statically or dynamically identify particular regions of the image that would be of interest to the applications processor (for example, a license plate or a face), or by using the FPGA to calculate key statistical data such as a histogram or background estimation on the images and provide this information to the OMAP-L138 processor.

The Xilinx Spartan-6 FPGA is capable of capturing the data at a very high frame rate and dropping extraneous full or partial frames. It can then pass on only the interesting data for processing in the DSP, reducing the overall bandwidth required between the FPGA and the DSP/applications processor.

Xilinx Spartan-6 devices provide up to 180 DSP48A1 slices that can be leveraged to implement high- performance video acceleration blocks using parallel hardware architecture. Xilinx also provides video- and image-processing IP cores that can be used to reduce development time.

Finally, Xilinx FPGAs are also used to implement non-standard interfaces. This could be a multiple display or multiple camera system, custom synchronous serial interfaces, a custom parallel interface, or simply added capability for additional standard interfaces for which the applications processor within the system lacks support. These interfaces can again take advantage of the variety of physical layers provided by the FPGA (LVDS, etc.).

Carrying image data from FPGA to DSP via uPP

Critical Link has used the uPP interface on the TI OMAP-L138 processor to carry image data from the Xilinx Spartan-6 to the DSP on board the OMAP-L138 processor. This interface is specifically designed to move large amounts of data continuously into or out of the processor’s memory.

The uPP can clock one data word (8 or 16 bits) per clock cycle. (Or it can handle two data words per clock for double-data-rate, but the clock speed must be half as fast.) The uPP clock rate can be up to half of the processor clock rate, with a maximum uPP clock of 75 MHz, allowing a throughput of up to 150 MBytes/s. This elegantly simple uPP interface allows for easy integration with an FPGA (Figure 2).

Figure 1. uPP Interface between the TI OMAP-L138 DSP+ARM9TM processor and an FPGA

The TI OMAP-L138 processor actually includes two uPP interfaces, each of which can be independently configured. From a hardware point of view, the uPP interface is a fairly simple synchronous data interface. It includes a clock pin, data pins and several control pins that indicate valid data and start/wait conditions. In fact, the interface can be used gluelessly with some parallel ADCs and DACs.

Using the above uPP data rates as an example, Table 1 illustrates the theoretical maximum fps that can be transferred from the Xilinx Spartan-6 FPGA to the OMAP-L138 processor system memory. Note that these calculations assume no inter-frame or inter-line gaps that may be required by the sensor. The ability of the OMAP-L138 processor’s 456-MHz ARM9TM and 456-MHz floating-point DSP to execute the intended algorithm on the data must be taken into account when sizing up a system, as this will affect the overall processed frame rate.

Table 1. Theoretical frame rates via uPP at 150 MB/s

Preferably, this should be performed through experimenting with the intended algorithms on a prototype system or evaluation module. In any case, when the application processing in the DSP or the ARM® is identified as the system design’s limiting factor, this is precisely the situation where the FPGA can be leveraged the most by identifying repetitive operations that can be moved out of the OMAP-L138 processor and into the FPGA. This leaves the DSP and ARM to attend to non-repetitive operations, more global operations, or those requiring multiple frames at once. The FPGA can be leveraged to reduce the overall data rate if the capture rate is very high and the uPP peripheral is unable to handle the full volume of raw data.

Figure 3. uPP into and out of the TI OMAP-L138 DSP+ARM9 processor

Using the uPP’s two independent channels, an architecture like that shown in Figure 3 can be conveniently implemented. Inbound data can be pre-processed by the FPGA and sent via uPP to the DSP in the OMAP-L138 processor, where it is further operated on to perform intelligent image/video analytic operations. Finally, the data can be sent back through the FPGA to any of the output interfaces. As it passes through the FPGA for the second time, low-level output processing can be performed, such as overlaying the video on top of a graphical user interface provided by the OMAP-L138’s ARM core, or providing text or other graphical overlays as directed by the DSP in the OMAP-L138 processor.

Integrated ARM and DSP processing with the OMAP-L138 processor

Up to this point, we’ve discussed data acquisition into the FPGA, FPGA processing of the data, and data transfer to the OMAP-L138 processor, a processor that provides a unique architecture for dealing with image data in and of itself – integrating a 456-MHz ARM9 applications processing core and a 456-MHz TMS320C674x DSP core (Figure 4).

Figure 4. Image processing in the TI OMAP-L138 DSP+ARM9 processor

The combined OMAP-L138 processor + Xilinx Spartan-6 FPGA architecture is typically leveraged at Critical Link by allowing the DSP in the OMAP-L138 processor to perform the remaining algorithmic “heavy lifting” on the pre-processed image data as it arrives from the FPGA. This work can be performed by implementing custom algorithms hand-coded in C or C++ and optimized for the DSP using the TI-provided compilers in the Code Composer StudioTM integrated development environment (IDE), or by using the already optimized libraries provided by TI for image and video processing: IMGLIB and VLIB. OpenCV is an open-source library featuring many vision-related algorithms that are easy to port to the DSP.

The DSP is capable of handling computationally-intensive image analytics and processing operations, such as object detection, object identification, edge detection, color conversions, image filtering, object tracking and resizing.

Machine vision algorithms require many filtering operations for finding shapes, cracks, dirt and other anomalies on an object. Dilation, erosion, Sobel, Canny filters, Harris, Hough and Haar classifiers help with object finding and feature extraction. The object detection and tracking algorithms are mainly supported by Lucas Kanade, optical flow, Kalman filter, Bhattacharya distance and Gaussian models.

Most of the image-processing operations happen in grayscale or RGB color mode. If the captured image data is in YCbCr format, the luminance Y data can be used for grayscale processing. Depending on the type of processing needed, captured data in RGB format can either be converted to a YCbCr format or remain in the RGB format. The functions which perform extensive signal-processing algorithms are part of the either IMGLIB or VLIB, or available in OpenCV. The applications developed also utilize the DMA access to the external memory, such that the data can be transferred into internal memory for faster processing.

Moving object segmentation is an example of machine vision processing that requires video to be processed in grayscale mode with background to be deleted and foreground data to be processed. This is done with morphological operations like dilation and erosion on the foreground mask. The object boundary is formed using a connected component model (Figure 5).

Figure 5. Moving object segmentation

Moving object segmentation with 16-bit precision has lower performance than 16-bit precision of a single Gaussian model. Depending on the type of operation needed, the approximate resolution fed to the DSP can be chosen. There will be additional 50% overhead for VLIB processing when data is in external memory (Table 2).

Table 2. VLIB performance benchmarks

External system interfaces are connected to and managed by the ARM core so that data handling and the driver’s processing load can be handled there. Careful partitioning of the computations across the three available cores will result in obtaining the most processing power out of the architecture.

As the DSP algorithmic work is being performed, the DSP can communicate with the ARM in the OMAP-L138 processor, and, through the ARM, with the external world. This may be a local or remote user interface, or may even be another processing subsystem within a larger system. The DSP may communicate with the ARM via shared memory, mailboxes or a variety of other mechanisms provided by TI’s DSPLink library. RingIO, MessageQ and Notify provide the interfaces within DSPLink for this communication. The shared memory region between the ARM and the DSP is used for sharing the data pointers between these processors. The DSPLink library increases the efficiency of data exchange.

Configuration data may be passed from the ARM to the DSP, while measurements, statistics, and raw or processed image/video data may be passed to the ARM for sharing with the outside world. The ARM is par- ticularly well suited for communications and display functions, as it can run a full-featured operating system such as Embedded LinuxTM, Windows® CE, QNX® or ThreadX.


This paper illustrates an architecture that addresses some of the widely varying requirements in the machine-vision market. This architecture leverages the power of Xilinx Spartan-6 FPGAs (though any family FPGA could be used in the design), the floating- and fixed-point DSP and ARM9TM provided in the TI OMAP-L138 processor, and the convenient and easy-to-use uPP interface peripheral that it provides.

Additional information about each of these topics can be explored through the references provided below. In particular, the MityDSP-L138F SoM and the Vision Development Kit provided by Critical Link provide a convenient platform engineers can use to further explore this architecture in their own designs.

Useful links


  1. "An Optimized Vision Library Approach for Embedded Systems", G. Dedeoglu, B. Kisacanin, D. Moore, V. Sharma, and A. Miller, Proceedings of the IEEE Workshop on Embedded Computer Vision, pp. 8-13, 2011.
  2. "Moving Object Segmentation for Security and Surveillance Applications", G. Dedeoglu, D. Moore, Texas Instruments, Inc., Proceedings of the Embedded Vision Summit, September 2012.

Using Xilinx FPGAs to Solve Endoscope System Architecture Challenges

Bookmark and Share

Using Xilinx FPGAs to Solve Endoscope System Architecture Challenges

By Jon Alexander, Technical Marketing Manager for ISM (Industrial, Scientific, Medical) Markets
Xilinx Corporation
Image enhancement functions – noise reduction, edge enhancement, dynamic range correction, digital zoom, scaling, etc – are key elements of many embedded vision designs, in improving the ability for downstream algorithms to automatically extract meaning from the image. Interface flexibility and performance are also important attributes in many embedded vision systems. All of these concepts are discussed in this case study article, a reprint of a Xilinx-published white paper, which is also available here (543 KB PDF).

The advent and growth of minimally invasive surgery (MIS) has made endoscopes an indispensable part of improved surgical procedures. Advances in semiconductor manufacturing and imaging technology continue to fuel innovation and pave a path for endoscopes to be used in many new applications each year.

This white paper describes how Xilinx® FPGAs can enable endoscope system manufacturers to meet complex design constraints to produce competitive products and the use of Xilinx FPGAs to build low power, small form-factor endoscope camera heads; low cost, high-performance camera control units (CCUs); and low cost, versatile image management devices.

Introduction to Endoscope Systems

Technology advancements have enabled the use of endoscopes in many different applications. Health care providers have aggressively adopted endoscopic techniques and continue to challenge suppliers to push the boundaries of technology for many reasons.

Endoscopy provides unprecedented diagnostic capabilities for certain ailments that no other method can match today, such as detecting polyps in the colon and ulcers or fungi in the GI tract. Diagnosis with endoscopes is radiation-free and can be done with minimal pain to the patient. With these inherent benefits, physicians have aggressively adopted endoscopic techniques and are continually demanding innovation to improve imaging capabilities even further. Such demands force suppliers to deploy new techniques such as Narrow Band Imaging, Autofluorescence Imaging, and Multi-Band Imaging. These methods provide much more accurate visualization of blood vessels, lesions, and mucosal surfaces than could be achieved in earlier endoscope systems, enabling physicians to more accurately diagnose patient ailments.

Endoscopy greatly improves the quality of patient care by enabling minimally invasive surgical techniques. While traditional surgeries required large incisions to enable surgeons to view the subject tissue and to use large, hand-held instruments, endoscopes and laparoscopes (a type of endoscope with a rigid tube) enable minimally invasive surgical techniques with only one or two incisions less than a centimeter in length. This greatly reduces risk of infection and provides faster patient recovery time, allowing patients to leave the hospital in days compared to weeks for many procedures. Shorter hospital stays are a big benefit to the cost structure of health care providers and insurance companies. However, laparoscopic procedures tend to be more expensive and take longer to perform than open cavity procedures. So the health care providers are continually pushing for innovative ways to perform operations more efficiently and at a lower cost point.

Physicians want equipment to be small, flexible and light weight so they can easily position their equipment for sustained periods of time to maximize patient comfort without causing operator fatigue. Within the case of both diagnostic and surgical endoscopy, burden is placed on the physician to maneuver equipment through small openings to obtain a usable visual image of the subject. In diagnostic procedures with flexible endoscopes, the physician often has to hold the endoscope for a period of time. In surgical procedures, although equipment is mounted on a mechanical assembly, several laparoscopes and operating tools are used simultaneously in a confined location, leading to complexity in setting up for a procedure.

Within this small system, the electronics must also be low heat generating, since there is minimal clearance for heat dissipation techniques, and the tolerance for heat at the exterior of handheld products is low. This puts the demand on suppliers to keep the mechanical footprint of the electronics to a minimum with the added challenge of satisfying the low power design constraints.

Endoscope System Architecture

A typical endoscope system has five key components (Figure 1).

Figure 1. Endoscope System Components

Camera Head

The camera head is the physical device that contains the CCD or CMOS image sensor, pre-processing electronics and connections for the light source, and various mechanical apparatus such as water tubes, air, vacuum, and biopsy tools.

In flexible endoscopes, the image sensor is located at the distal end of the tube; in rigid endoscopes the sensor is located at the proximal end of the tube, often in the camera head itself. The camera head is connected to the camera control unit through a cable that supplies power to the camera head and enables data transfer between the two units.

One of the main design constraints of the camera head is to keep the mechanical form factor and the electronics footprint to a minimum for improved ease of use. Reducing the electrical components through integration of functionality and using smaller component packages provides system designers the ability to shrink the overall mechanical envelope. To further reduce the form factor, system designers can also reduce the amount of processing functions performed in the camera head. This places the burden of performing the majority of the image processing on the CCU. A typical system-level functional block diagram of an endoscope camera head is shown in Figure 2.

Figure 2. Endoscope Camera Head Diagram

The initial image is generated on the CMOS/CCD image sensor and is then passed to the downstream image processing chain before being transmitted to the CCU. The standard Bayer pattern for image sensors is passed through the lens shading and distortion blocks to help minimize noise and create uniformity in the image. The Color Filter Array (CFA) interpolates between pixels and converts the Bayer image to RGB color space. The RGB image can then be used to automatically control the Auto Exposure, Gain, White Balance, and other parameters such as Focus.

  • Auto Exposure - Controls the amount of brightness within the image by allowing the sensor to absorb more/less light. There are several functions starting with the mechanical shutter through the digital gain that affect exposure.
  • Gain - The amount of amplification that is performed on the sensor output. This is one mechanism used to adjust exposure.
  • White Balance - Controls the color fidelity of the image and usually employs the color correction block to align the image to the appropriate color temperature of the scene.
  • Focus - When using an Auto-Focus lens, this provides the sharpest image automatically. Focus is generally independent of the color or exposure controls.

Adjustment of the exposure, gain, and white balance is most effective when completed as far upstream as possible where there is a minimal amount of noise. These adjustments are completed by a two-step process; first, a dedicated hardware block determines the image statistics; second, the image statistics are passed to the software for decision making on how to set the various parameters in the system for optimum quality. The Exposure, Gain, and White Balance functionality are inter-dependent to some extent and, therefore, can be quite complicated to implement. For example, if the Exposure needs to change the iris size, there is a downstream effect of altering the White Balance. All of the automatic functions are performed simultaneously with the image processing steps and are done on a frame by frame basis.

Camera Head Design Challenges

The image sensors and analog circuitry used in endoscopes have a low tolerance for power supply noise. While power is delivered to the camera head through a long cable from the CCU, the power within the camera head is regulated and filtered to maintain a stable, spike-free supply to the camera system. This power regulation must be designed carefully because voltage spikes occur naturally as current switches across the impedance of long cables. In a typical system, voltage spikes are mitigated by voltage regulation, using bulk capacitance of the Printed Circuit Board (PCB) and by adding additional bulk and bypass capacitors to the PCB. The added capacitance also helps reduce noise generated from local switching activity on the board such as from logic devices. However, in a small form-factor system like an endoscope camera head, there simply is not enough bulk capacitance on the PCB or enough space to add capacitors around the components.

The best solution to minimize power supply noise is to reduce the power consumed by logic devices in the camera head. This limits power spikes and the switching current across the power cable, thereby reducing local noise of the system. The benefits from reducing power are twofold; first, the cost and mechanical footprint are lowered; second, heat is minimized. Since power supplies for medical systems have uniquely stringent requirements for safety and quality that must be adhered to, increasing the system power increases the cost and complexity of the power supply design. The camera head has low heat dissipation capabilities, so lowering the power translates directly to lower heat within the camera head.

System designers have several options when choosing the central image processor in an endoscope camera head. One solution is to implement multiple ASSPs and/or DSP processors to support the functions; however, these types of implementations are not the most efficient use of PCB real estate. A single device solution is a better choice for two reasons. It has a much smaller PCB footprint compared to a multiple device solution, and it can offer improved manufacturing reliability since there is only one component to be assembled. The system designer must decide whether to implement an ASIC or an FPGA. The ASIC implementation has substantial NRE and design costs, which might not provide the return on investment needed to justify the costs for this implementation. FPGA technology provides the best cost and performance in a single device, low-power solution for developers of endoscope systems.

Camera Control Unit

The CCU receives image data from the camera head in RGB or YUV format through a DVI or SDI interface, and then performs any combination of processing steps to enhance the image quality. Dedicated image processing devices are typically used to deliver optimal image quality at high resolution with minimal lag.

A functional block diagram of a typical CCU is shown in Figure 3.

Figure 3. CCU Functional Diagram

The first stage of image enhancements typically includes:

  • Noise reduction
  • Edge enhancement
  • Wide dynamic range correction

The image enhancement stage is followed by user-controlled image adjustments, which typically include:

  • Digital zoom
  • Video scaler
  • Static image capture

A processor is often employed to manage data flow and to control the algorithms and functions of the CCU. It also can oversee communications to the camera head, the image management unit, and the display.

Noise Reduction

The best noise reduction algorithm to implement in a given endoscopy system is highly dependent on the specific application and the sensor quality. In general, endoscopy generates video of slow-moving objects, so high-speed motion noise is not a concern. However, it is still critical for video to be blur-free as motion blur can adversely affect the sharpness of subject tissue, making it difficult to perform an accurate diagnosis. High frame rate image sensors can help solve the problem of motion blur, but they do not solve the problem completely. Noise reduction algorithms are used in the CCU to add further improvement. In applications where small, low frame rate image sensors are employed, the noise reduction algorithm plays a key role in improving the image quality. Temporal, or motion-based noise reduction techniques, are often best suited for endoscopy applications where motion is a concern. In temporal filtering techniques, a noise model is created for individual pixels over time. Then a low pass filter is applied to eliminate rapid changes in pixels. These rapid changes are predominantly caused by motion, so by filtering them, only the slower-moving pixel changes related to the subject are allowed to pass, leaving a clean, blur-free image.

In applications where motion is less of a concern, such as in colonoscopy, or where a small, low resolution sensor is used, such as in cystoscopy, a spatial noise reduction technique can be employed. In spatial noise reduction, noise is detected and corrected on a frame by frame basis. This technique can exasperate blur, so a hybrid spatial/temporal filter might be needed.

Edge Enhancement

Edge enhancement is an important image processing technique for endoscopes because it provides physicians a better view of the boundary of abnormalities in tissue. As an example, small blood vessels might be difficult to discern from surrounding tissue based on color alone. Edge enhancement might be required to generate a sharper view of the vessels so they can be further analyzed by the physician. Edge enhancement is also commonly used for improved viewing of tissue textures and the surface of mucous membranes.

A variety of edge enhancement technologies can be leveraged for endoscopy. Sobel operators and bilateral filters are two popular implementations. Many endoscopy system suppliers have developed proprietary enhancement techniques for specific applications. Most systems give the user control over the amount of filtering to provide, in the form of "low," "medium," and "high" settings.

Wide Dynamic Range Correction

A wide dynamic range (WDR) defines the ability of an imaging system to provide clear images when there is a wide range of luminance within each image. Since endoscopes often acquire imagery in a setting with a bright foreground and a dark background, WDR is an important characteristic of the system. High quality, low noise CCD or CMOS image sensors have the biggest impact on dynamic range and should be used whenever possible. However, many endoscopy applications use lenses that are constrained by other parameters such as physical size and resolution, so the sensors do not offer maximum dynamic range. In such cases, WDR processing algorithms become a critical component of the system. The closer the WDR processing block is to the sensor, the larger impact it has on the resultant image. Ideally, WDR is located in the camera head; however, to satisfy power and device density constraints in the camera head, the designer needs to consider locating the WDR processing in the CCU.

Digital Zoom

While some endoscopes include an optical zoom lens, many do not due to size restrictions. The ability to zoom is a valuable feature in endoscope systems that enables physicians to get a much closer view of the subject. Digital zoom increases the image size at the expense of resolution. If the native resolution of the video stream is high, the quality of the resulting zoom image can be acceptable. For lower resolution systems, interpolation filtering might be needed to improve the resolution.

Video Scaler

A video scaler is used to map a video stream to the appropriate aspect ratio and resolution for a receiving device. As shown in Figure 3, the CCU can output the video stream to both a local display and to the image management unit. The local display might be a much lower resolution and smaller aspect ratio than the native video format, so the video scaler adjusts the video accordingly for the display device and passes the video stream directly to the image management unit.

Static Image Capture

Static image capture is used by physicians to quickly capture and share an image of the subject tissue. The sensor control circuit built into some image sensors contains a static image capture circuit. In other systems, this function is typically performed downstream, after the image enhancement functions have been performed. The static image capture can be performed either in hardware or software, and the image is typically held in local memory until saved to disk by the physician.

Camera Control Unit Design Challenges

In endoscopic surgeries, physicians are often looking at imagery for hours at a time, so eye fatigue can become an issue. It is important to have high-speed image processing, which reduces video lag and achieves maximum frame rate, to deliver smooth video that minimizes eye fatigue and reduces the chance of injury to the patient due to lag between actual and perceived location of instruments. Some endoscopy systems, especially those used in critical applications such as surgery, can avoid using compression altogether due to the inherent lag that results from the calculation-intensive compression algorithms.

Endoscope suppliers often differentiate their products by the unique image enhancement functions that they implement in the CCU, and they are continually pushing the limits of processing capabilities to deliver new image enhancement techniques without sacrificing processing speed. Like most electronic systems, the CCU is also constrained by a power budget, time to market requirements, and by cost. When selecting the type of device to use for image processing in the CCU, it can be a difficult task to meet the performance requirements without sacrificing the other constraints.

Dedicated ASSPs and DSP processors tend to have a familiar design flow and short ramp up time, but do not typically offer the processing performance required and are limited in I/O options and count. An ASIC solution offers the performance needed—but limited volume and long redesign schedules make them an unattractive solution. The best balance between performance, cost, power, and shortened development cycles is achieved through the use of FPGAs as the main image processor. Furthermore, a very high-level of integration can be achieved with Xilinx FPGAs by integrating multiple interfaces into a single device to provide interface bridging, which reduces component count and lowers both cost and power of the system.

Image Management

In endoscopy systems, the image management unit is usually responsible for image file management, user interface, network connectivity, and other system management functions. It can also perform some post-processing image enhancement such as rotation, on-screen display (OSD), and picture-in-picture display. In compact systems, the image management functions can be incorporated as one unit with the CCU; in large systems, they can be a separate unit. The image management unit is typically a PC-style architecture, built around a processor system (Figure 4). It will likely include an operating system such as Windows or LINUX with a custom GUI for endoscopy. It can have multiple video inputs to simultaneously display images from different angles or different zoom levels. It can offer control with a mouse and keyboard and have its own monitor.

Figure 4. Image Management Unit

High-end systems, especially those incorporating high definition video, often offload the video interface and video processing to a Xilinx FPGA. In such a case, image rotation, picture-in-picture, on-screen display, and video receive and transmit are all handled in FPGA hardware, rather than in software on the processor. This architecture ensures that the video stream is processed only by high-performance logic, resulting in low-lag video that does not impact other processes running simultaneously in the processor. System designers can also leverage the FPGA's available high pin count and versatile I/O interface standards to implement data, network, storage, and user interfaces. This can help reduce overall component count for the system.

Light Source

The light source is used for illumination of the subject during endoscopic examinations and procedures. The light unit connects to the camera head via fiber-optic cables. The light is then transferred to the distal tip of the endoscope through another set of fiber optic cables. The light unit typically consists of either a xenon or Metal Halide bulb, and is predominantly built of power electronics. It also typically has a simple user interface with controls for brightness, power, and system status.

Some light sources might connect to the image management unit via Ethernet or other communication protocol so it can be controlled remotely by an operator using the image management interface. This interface must be real-time and must remain active even when the rest of the system is inactive, so it can wake up the light source on command. While multiple ASSPs can be used to implement each of the various functions, an FPGA offers a single device solution to perform all of the user interface and communication requirements. A small, low power FPGA, such as the ArtixTM-7 FPGA by Xilinx is well suited for managing the logic requirements in the light source with its optimized combination of low power consumption, high performance, and high degree of interconnect versatility.


The final component of an endoscopy system is the display monitor. Displays are a key component that can influence diagnostic accuracy in endoscopy systems. Medical displays have a few unique requirements over commercial displays, so they are typically purpose-built. Some of the key requirements include: excellent gray scale and black level performance, factory and on-site calibration, communication with a PC for diagnostics and calibration, ability to tile multiple monitors together to display a single image, image enhancement for long cable lengths, anti-glare, low reflection, and support for multiple, simultaneous inputs. The display monitor is an important component that, like other medical devices, must conform to stringent medical safety and quality criteria. It can connect to the CCU or the image management unit. It is also common to have multiple display monitors for a single system, where one or multiple display monitors are connected to the CCU for viewing by physicians and assistants, and another is connected to the image management unit for viewing or control by others.

The architecture of the display electronics is similar to a consumer monitor, and a typical block diagram is shown in Figure 5.

Figure 5. Medical Display Block Diagram

Several different technology solutions can fulfill the logic requirements of display monitors. ASICs, microprocessors, DSP processors, ASSPs, and FPGAs are all capable of performing many of the functions. However, only FPGAs are able to provide a cost effective, fast time-to-market, scalable solution that can be used across a product line. They also are capable of supporting the wide array of interface standards found within displays and are capable of performing the required image enhancement, gamma correction, and noise reduction processing functions, making them an excellent fit.

Power Consumption

Reducing power consumption is a major design constraint for endoscopy systems. In medical systems, this requirement stems from uniquely stringent requirements for safety and quality that must be adhered to. To remain within safety and quality constraints, the cost and complexity of the power supply design can increase greatly as the demand for added power increases. System designers continually strive to adopt new technology and design techniques that keep power consumption at a minimum without sacrificing performance.

Heat is another major factor driving the reduction of power consumption. When semiconductors containing an enormous amount of system gates are clocked at high frequency, they generate heat that has to be rapidly removed from the system to keep the temperature of components within the desired operating range. To expel this heat, heat sinks, fans, packaging, and the PCB must be carefully designed. The heat management system adds to the overall weight, size, and cost of the system, and the use of increasing fan speeds adds to the power consumption.


Endoscope systems use many different types of logic devices to handle the various interconnect and processing tasks. Each of these devices has different interface requirements, creating the need for a versatile interconnect solution. However, the interconnect solution must also be high performance due to high demand for increasing bandwidth throughout the system. The high bandwidth requirement is due to high resolution image sensors, large displays, and the need to pass serial data between system components through cabling. The combination of requirements for versatility and high bandwidth places an enormous burden on the I/O count and throughput between devices and the system components. Most of the microprocessors, ASSPs, and DSP processors available do not offer enough I/Os for parallel interfaces and do not directly support many of the interfaces found in endoscope systems such as PCIe®, USB, and serial transceiver data transfers used for SDI and chip-to-chip communication. Figure 6 shows common interfaces used in endoscope systems.

Figure 6. Common Interfaces in Endoscope Systems

System architects are left with few choices to address this interface complexity. They can select discrete components for each interface, build a custom ASIC, or use FPGAs. ASICs tend to be too expensive for endoscopes. FPGAs offer the highest I/O count and support most interface standards for a reasonable price. Therefore, they are commonly used to handle interface challenges throughout an endoscope system. Depending on the size of the FPGA being used, system architects might decide it is more cost effective to use dedicated interface devices for the most complex functions such as USB.

Xilinx FPGAs in Endoscopes

An FPGA is a multipurpose silicon device that enables designers to integrate multiple system functions in a single device. It is a collection of configurable memory, DSP, and I/Os that are tightly integrated with a large array of logic cells, built on leading-edge process technology. This single device system integration greatly reduces the need for challenging and expensive physical PCB-level connectivity.

A Xilinx FPGA is more than just a silicon device though. It represents a design ecosystem that is packaged with design tools and a comprehensive IP library to help users create their designs quickly. Since the silicon is designed by Xilinx, there are no NRE mask or production costs for the system designer, who only has to create the design, download the design file to the device, and then the device is configured for that specific design.


An FPGA is a multipurpose silicon device that enables designers to integrate multiple system functions in a single device. It is a collection of configurable memory, DSP, and I/Os that are tightly integrated with a large array of logic cells, built on leading-edge process technology. This single device system integration greatly reduces the need for challenging and expensive physical PCB-level connectivity.

A Xilinx FPGA is more than just a silicon device though. It represents a design ecosystem that is packaged with design tools and a comprehensive IP library to help users create their designs quickly. Since the silicon is designed by Xilinx, there are no NRE mask or production costs for the system designer, who only has to create the design, download the design file to the device, and then the device is configured for that specific design.

High power requirements increase the cost, form factor, and noise of an endoscope system, while impacting reliability and performance. Xilinx has produced high performance, power optimized FPGAs, which are ideally suited for endoscope camera heads, CCUs, and displays.

Xilinx achieves an excellent economy of scale by providing a product line that spans many customers' applications. This economy of scale is unmatched by other types of logic products, so it enables Xilinx to lead the adoption of advanced semiconductor process technologies. By utilizing the latest process, Xilinx offers an excellent balance of power, performance, cost, and features. Xilinx brings customers the ability to migrate their silicon to the next generation process technology easily by continuing to push technology node advancements with every product family.

Xilinx works closely with semiconductor fabrication partners to develop a variety of transistor sizes that can be used opportunistically across the product line and within each device. This approach ensures that system designers experience extremely high performance with substantially reduced power compared to other FPGAs at the same process node. In addition, high-speed DSP blocks are able to maintain Xilinx's performance leadership by leveraging dedicated high-performance processing slices. Xilinx FPGAs also provide the localized memory and logic resources to achieve the performance requirements for endoscope applications. That, coupled with embedded processing, standardized I/O, and a proven ecosystem of soft IP, brings customers a path to reducing risk, cost, and schedule when developing their products.

To easily achieve the lowest possible power consumption in the 7 series FPGAs, several advanced design optimization techniques have been simplified. A low power device option enables the use of a low power supply voltage, which reduces standby power by 26%, giving a 50% improvement over previous generation FPGAs. Power optimization tools, such as automatic clock gating, reduce dynamic power by an additional 30%, while best-in-class power analysis tools help design engineers pinpoint power inefficiencies in their design so they can implement significant power saving design techniques.

More information and resources supporting Xilinx's low power advantage can be found at the Power Solutions Resource Center:

Cost Savings with Scalable Design

Xilinx 7 series FPGAs use nearly identical logic architectures across the Artix-7, KintexTM-7, and Virtex®-7 product families. This enables IP portability across devices, so that system designers can scale their designs up or down to efficiently address their entire product line with a single base design. In endoscope systems, this can be particularly valuable because there is often proprietary image processing IP or functionality that is used for different systems with different feature sets and image resolutions. Xilinx's common device architecture provides system designers with a large cost and time savings over the re-coding time that is typically required to port RTL from one FPGA architecture to another.

Intellectual Property

Xilinx Intellectual Property (IP) cores are key building blocks for Xilinx designs. An extensive catalog of general purpose cores is available to address the general needs of FPGA designers as well as robust domain- and market-specific cores to address requirements found in DSP, embedded, and connectivity designs. Many of the key DSP functions and connectivity interfaces found in endoscope systems are available as direct or partner IP cores throughout Xilinx's extensive partner ecosystem. Using Xilinx ecosystem IP minimizes development schedules and enables system designers to focus on the differentiating aspects of the design rather than developing standard functions—a distinct advantage with Xilinx.

Refer to Xilinx's IP Center for more information:

Refer to Xilinx's video and image processing IP for more information:

Design Platforms

Xilinx has created a variety of Targeted Design Platforms (TDPs) that enable system designers to evaluate Xilinx FPGAs in specific applications and for specific functions. The combination of hardware, reference designs, and development tools integrated into the TDPs provides customers with the ability to start differentiating their product from the very beginning. The market-specific TDP kits provide additional market-specific applications to help system designers evaluate IP, demonstrate product implementations, and develop advanced algorithms. One such market-specific TDP kit is the Spartan®-6 FPGA Industrial Video Kit (IVK), which gives system designers an out- of-the-box video processing system to use during the development of display and camera applications. The IVK leverages Xilinx's image processing pipeline (iPipe) to perform image pre-processing functions, so system designers can focus their efforts on developing their own high-value, proprietary algorithms. The IVK is well suited as a starting point for endoscope system development.

Some of Xilinx's relevant TDPs include:

Selecting the Right Xilinx Device

Xilinx offers an array of devices with varying feature sets, logic densities, package options, and power-performance trade-offs. While this wide selection gives users the opportunity to find a perfectly optimized product for their application, it also can be a challenge to compare and contrast the multitude of parameters that define the optimal part. To help system designers select the best product for their implementation, Table 1 provides initial guidance for the product selection process along with some suggested devices.

Table 1. Xilinx Device Selection for Endoscope Systems


FPGAs are an excellent fit for endoscope systems, where small form-factor, low power, and high performance are critical. Xilinx Spartan-6, Virtex-6, and 7 series FPGAs offer the performance of ASICs with the added benefits of low NRE cost, substantially reduced time to market, scalable design, and high I/O count. In addition, Xilinx's custom low-power process, coupled with leading edge power optimization tools, offer significantly lower power consumption than competing solutions. All of these benefits enable endoscope system developers to improve patient care by rapidly deploying systems that deliver the latest technology within budget and power consumption constraints.

For additional information on Xilinx's medical solutions, please visit:

Available Resources

Xilinx Medical Solutions:

Xilinx 7 Series FPGAs

Xilinx Targeted Design Platforms:

Xilinx Industrial Video Kit:

Connectivity Design Platform:

DSP Design Platform:

Embedded Platform:

Xilinx IP Center:

Xilinx Video IP:

Camera Head:

CCU and Image Management:

Power Solutions Resource Center:

Eight Considerations When Evaluating a Smart Camera

Bookmark and Share

Eight Considerations When Evaluating a Smart Camera

By Carlton Heard
Product Engineer – Vision Hardware and Software
National Instruments

With the increase in performance and decrease in cost, smart cameras have become increasingly more accessible over the past decade. Given this trend, how do you determine which smart camera best meets your needs or decide if a smart camera is appropriate for your application? Learn about the top 8 considerations you don't want to forget when selecting a smart camera, including application examples and alternative solutions.

Smart cameras have been around for many years, but advances in processor technologies have made smart cameras much more accessible and popular within the past decade, especially in applications such as machine vision and surveillance. However, when the term smart camera is mentioned, a wide variety of ideas still come to mind, because there is no widespread agreement on the technical definition of just what a smart camera is.

It is generally agreed that the basics of a smart camera includes not only the image sensor, but also some type of processing chip. This can be a CPU, DSP, FPGA, or other type of processing device. However, today even an off-the-shelf point & shoot digital camera has some type of built-in image processing, for example to make the image look more desirable, remove red eye effects, conduct facial recognition, and/or other types of image processing.

So, if the inclusion of a processor along with an image sensor isn’t the defining attribute of a smart camera, what makes a smart camera “smart”? They key lies in the output. Unlike most cameras, the primary output of a smart camera is not an image but a decision or some other type of information (Figure 1). Since the image processing or machine vision algorithm is done directly on the smart camera, the image doesn’t need to be passed onto a PC or another device. Instead, the result of the processing can be passed directly to an operator or another device in the system.

Figure 1. Smart cameras differ from normal cameras in that the primary output is not an image, but a decision or some other kind of information. This difference results in a potential reduction in system size, cost, and complexity.

For example, a smart camera may be selected for use in an in-line inspection system for a manufacturing line. The output of the smart camera could be a pass/fail report over a network to a database, a digital signal triggering a sorting system, or a serial command to a PLC. A smart camera is a decision maker. Still, if you conduct an internet search for a smart camera you will receive a large variety of results with very different features and appearance options. How do you first decide if a smart camera is best suited for a specific application and then choose between the sea of options available? Let’s review eight considerations to keep in mind when evaluating smart cameras.

1. Processor
As previously mentioned, the growing popularity of smart cameras can mainly be attributed to the increase in processor performance over the past decade. A 1 MHz smart camera 15 years ago would have been four times the size and cost of a >1 GHz smart camera of today. Up until a few years ago, smart cameras did not have the necessary capacity for processing and interpreting images and were instead reduced to simple tasks such as reading bar codes. Today, many may be surprised when they realize that the processing performance on some smart cameras rivals what can be done on PC-based systems. Smart cameras come with a range of available processors including DSPs, PowerPC-class, and Atom-class processors. There are also options with a mixed offering including a CPU with a DSP co-processor for certain algorithms.

2. Size
One benefit to using a smart camera is that multiple components of a vision system are integrated into a single package, resulting in a small size and the potential to save a lot of space. Space savings can be very important for machine vision and embedded applications where the space for a particular inspection cell may be fixed, but where more and more inspection or control steps are being added to the same space. Smart cameras also benefit from Moore’s Law, with today's available cameras being smaller than 55mm x 50mm and weighing less than 60 grams.

3. Image Sensor
Of course, a smart camera is still a camera and must acquire images. Both CMOS and CCD sensors can be found in smart cameras, with resolutions of up to 5MP and available in both color and monochrome variants. Smart cameras aren’t just limited to area scan. Line scan smart cameras are available with frequencies over 10 kHz. While smart cameras do not cover the full range of options as normal cameras, the ever increasing availability of smart cameras still spans a wide diversity of options, including some of the most popular sensors.

4. Operating System
As with any embedded system, the operating system is a variable between models of smart cameras. Real-time variants provide the software stability and deterministic operation desired for many embedded applications, such as machine vision, surveillance or robotics. However, there are other options such as Linux- and Windows-based models that exist for those who need to run Linux or Windows-only software (such as a database access program) along with the vision processing or to simply ensure software familiarity.

5. Software
Many times software is the key differentiation between smart cameras because most manufacturers will offer a range of smart cameras instead of a single model. This means that there can be a lot of overlap based on features such as the image sensor, processor performance, and size. Of course, the main differentiation is in the way that it’s programmed (Figure 2). Most smart cameras include relatively simple-to-use software so an advanced knowledge of programming skills is not required to use them, but it’s important to keep flexibility and scalability in mind.

Figure 2. Most smart cameras come with relatively intuitive programming software, reducing the need for an advanced knowledge of programming skills. The program should not only be easy-to-use, but should also be flexible enough to scale to different hardware targets without a complete rewrite.

The investment to learn a new piece of software and write an application should be somewhat long-term in nature This means that the software should scale with application requirements and future projects. Sensor and processor technologies are advancing rapidly, so the best-case scenario takes place when the smart camera model and software are well integrated but not exclusive to each other. As a result, if you change smart camera models to a new version or need to move to a different system hardware or operating system platform, a complete rewrite of the application or IP should not be required.

6. Distributed Versus Standalone
This aspect is one of the key differentiators between when to use a smart camera versus another type of a vision system. By a strict definition, the processing unit and image sensor are integrated into one device (which includes a host of benefits mentioned previously), but there are other times when multiple cameras are needed for the vision application. Synchronizing the outputs of multiple processing units can be more difficult than making sure multiple cameras are connected to the same processing unit. Also, this can lead to an increase in system costs.

The ability to upgrade hardware can also be a challenge with smart cameras since proprietary hardware is often used and the components are tightly coupled together. To retrieve extra memory or change the image sensor, the entire smart camera may need to be replaced with a different unit. Upgradability and distributed imaging are often reasons to use a PC-based system or other type of vision system.

7. Ruggedness
The level of ruggedness required is dependent upon the environment in which the smart camera is to be deployed; many applications take place in fairly harsh environments. Whether a manufacturing line or the outdoors, environmental specifications must always be considered. The ingress protection (IP) rating can help with this decision (Figure 3). IP ratings are usually two-number ratings in which each number represents a certain protection level:

  1. Protection from solid material such as dust (on a scale between 0 and 6)
  2. Protection from liquids (on a scale between 0 and 8)

Figure 3. Smart cameras such as National Instruments' NI 1772 have high IP ratings for protection against dust, water and other environmental exposures. Higher IP ratings are beneficial for applications in harsh environments such as outdoor monitoring and industrial vision inspection.

The higher the number, the more protected the smart camera is from that specific ingress. For example, a camera with an IP rating of 40 can protect against solid objects over 1mm such as wires and operators fingers, but offers no protection against dust or water. A smart camera with an IP rating of 67 offers total protection against dust and can be submerged in water up to 1m deep. An example of where a higher IP rating can be beneficial is in food inspection, where many times the cleaning process for the line includes washing everything on the line (including the camera).

Another aspect of ruggedness is the temperature rating. To reduce the chance of failure, smart cameras are usually fanless, which increases heat dissipation complexity for the manufacturer, especially with higher performance smart cameras containing Atom-class processors and above that require more power. Wider range temperature ratings make it possible for the smart camera to operate in a broader range of environments, which can be very beneficial in applications such as traffic monitoring or other types of surveillance where the smart camera is located outside.

8. Integration
Those who have completed a vision application know that vision is often part of a much larger system. Since the primary output of a smart camera is a decision, result, or some other information beyond an image, most smart cameras have built-in I/O to communicate or control other devices in the system (Figure 4). With industrial automation, the smart camera may need to control actuators to sort products, communicate inspection results to a robot controller, programmable logic controller (PLC), or programmable automation controller (PAC), save images and data to network servers, or communicate inspection parameters and results to a local or remote user interface.

Figure 4. Since the primary output of a smart camera is a decision or some type of signal other than an image, many smart cameras include integrated I/O and communication ports. These ports can include digital I/O, serial, Ethernet, and USB buses as well as display ports for user interfaces.

With USB and display ports, smart cameras can completely replace PC-vision systems where an operator interface is required, since everything is integrated in a single device. Often, for scientific imaging applications, vision must integrate with motion stages, data acquisition systems, microscopes, specialized optics, and advanced triggering. As a result, today many smart cameras include I/O such as industrial digital inputs and outputs, encoder inputs for image synchronization, displays, and communication ports.

Models are also available with built-in lighting or with light controllers to control illumination directly from the smart camera, reducing the need for external illumination or light controllers and cabling. More and more industrial communication protocols are also being supported natively in smart cameras with DeviceNet, EthernetIP, serial, etc. to effectively communicate to other devices. It is critical to think about the big picture to understand how the smart camera will best integrate into the overall system.

With the capabilities of today’s smart cameras, the adoption of these devices continues to grow and newer application areas are emerging which could help feed that growth, such as the use of vision in robotics applications. Robotic cells are usually space-confined, with a consequent need to reduce the amount of cables as much as possible. Smart cameras can be mounted on the end effector of a robot or in a position to conduct guidance or inspection.

The smart camera can essentially become the system master, performing image acquisition and outputting coordinates to the robot with in-line quality inspection or carrying out visual servoing, which gives continuous position feedback to the robot to align or track parts and locations. Smart cameras are also moving into the 3D vision space by providing solutions with multiple image sensors integrated into stereoscope or laser triangulation packages (Figure 5). The potential cost savings, ease of integration, and increasing performance makes smart cameras a cutting-edge option for many vision applications.

Figure 5. Smart cameras are now moving into emerging application areas such as 3D vision. 3D smart cameras incorporate the image sensor, laser, and processing into one device to report information about objects' height, shape and volume.

Particle Board Quality Control Using Parallel Processing of NI Smart Cameras

"Obvious advantages of NI Smart Camera integration with the existing system include zero accidents or health issues, a savings of 150,000 THB (~$4,879 USD) from the labor overtime cost in H1 2012, no transportation cost from returned products, and improving quality control errors detected from 85 percent to 100 percent."
- Sarapong Kaney, Siam Riso Wood Products Co. Ltd.

The Challenge

Automating portions of particle board manufacturing to prevent exposure to harmful dust and chemicals and to improve the accuracy of production.