Bookmark and Share

General Discussion

Free Book Opportunity: "Computer Vision Metrics: Survey, Taxonomy, and Analysis"
8 replies [Last post]
Brian Dipert
Last seen: 14 hours 12 min ago
Level 3: Conjurer
Joined: 2011-07-20
Points: 39

The Embedded Vision Alliance, in partnership with Apress Media LLC, is pleased to provide you with a free electronic copy of the newly published book "Computer Vision Metrics," by Scott Krig, and published on the Alliance website. The Alliance will be periodically publishing new book chapters in both HTML and PDF formats, whose links you'll find here.

The publisher has also provided the Alliance with a limited number of complimentary print copies; if you're located in the U.S. and willing to post a review of the book to the discussion forum area of the website, please contact us with your interest in receiving a free book. You can publish your reviews as follow-up posts to this one. The Alliance reserves the right to edit them as deemed necessary.

Last seen: 2 years 18 weeks ago
Level 1: Prestidigitator
Joined: 2014-12-22
Points: 1

Great place to find resorces and inspirations from this site. Thanks a lot .

Last seen: 2 years 21 weeks ago
Level 1: Prestidigitator
Joined: 2014-06-01
Points: 1

Computer Vision Metrics: Survey, Taxonomy, and Analysis by Scott Krig provides a comprehensive and well-organized overview of image feature descriptors. Despite the large number of books on computer vision available in the market today (many of which cover unnecessarily overlapping topics), this particular area has not been properly addressed so far. Although the book is still a bit rough on the edges, particularly in terms of copyediting details (which I am sure will be fixed in future prints of the book), Scott Krig does a very good job in covering these important topics and providing his readers a more clear perspective on them. It is a book worth having in your bookshelves.


Chapter one covers image acquisition methods. It discusses not only traditional 2D sensors and cameras, but also talks about 3D acquisition using technologies such as time-of-flight sensors or stereo cameras. It also discusses several methods and techniques to generate 3D data from 2D sensors from multi-view stereo to simultaneous localization and mapping (SLAM) and structure from motion. Its most innovative point is to make the connection between 3D sensing systems and algorithms that attempt to generate the 3D information as alternative means to the same end. It should help bridge the gap between researchers that focus exclusively on camera-based data and those that work with alternative types of sensors.


Chapter two offers a brief overview of common image pre-processing methods that can be employed as pre-processing steps for the subsequent image analysis processes. It covers topics such as colorimetry, spatial filters, edge detectors, transforms, morphological operations and segmentation, and histogram processing among others. The chapter presents intuitive insights as to how different preprocessing steps might be useful for different types of descriptors and makes a connection between the vision pipeline stages and the corresponding common types of operations. However, since much of these topics are covered in standard image processing textbooks, as the author warns on the very first page of the chapter, "readers with a strong background on image processing may benefit from a light reading of this chapter".


Chapter three discusses different features that can be used to characterize images at a global or local scale. After a brief historical review of image features, texture methods are discussed at some length, with particular emphasis on spatial dependency matrices (SDM). Image characterization through statistical methods such as image moments and global or local histograms is briefly covered and then a discussion on how basis functions can also be used to represent image features is presented. The chapter presents an interesting perspective that shows how these topics, which on the surface may seem unrelated, can all be used as means for the same goal of representing images as sets of features.


Chapter four starts to delve a bit into the main topic of the book: descriptors. It covers several concepts important for the understanding of image descriptors such as the design desiderata, distance functions to compare descriptors, etc.. As illustrations to the points being covered, it briefly presents some of the main descriptors in the literature such as BRISK, FREAK, and ORB. It goes on to discuss descriptor accuracy and search strategies to find correspondences and methods to select features (from manually, to statistically, to using machine learning). Although the chapter covers a lot of ground, and hence does so rather quickly, it provides an interesting overview of the different aspects involved in designing a descriptor and what a practitioner should consider when using one.


Chapter five provides a three-dimensional taxonomy of feature descriptors. The axes of the taxonomy are shape & pattern, density, and spectra. In summary, the three axes define which points in a neighborhood compose a given feature descriptor, how densely over an image these descriptors are computed, and which kinds of features are employed in the descriptor (e.g., gradients, intensities, colors, etc.). Chapter five is possibly the most useful and interesting in the book as it provides extensive lists of attributes that correspond to each of the three axes of the taxonomy and related factors. These lists can potentially be quite useful for researchers and practitioners when selecting feature descriptors for different applications. The extensive list of spectra type is particularly useful, but would be made even more valuable by including references to the different methods that employ each type in every item in the list. As a matter of fact, most lists presented in chapter five would benefit from having the references to the corresponding descriptors that match each category. At the least, the lists that correspond to major axes in the overall taxonomy might include the references. Readers with experience in the field may consider referring directly to chapter five to gain some additional insights from its feature descriptors presentation.


Chapter six discusses a number of well-known interest point detecting methods and feature descriptors (from SIFT to ORB, SURF and their variations among many others) according to the taxonomy presented in chapter five. Each feature descriptor is categorized according to the taxonomy and its characteristics are briefly described, with particular attention to the computational efficiency of each approach. Chapter six is an excellent reference for practitioners and researchers to search for potential descriptors to be used in their projects. For example, if a certain application requires rotation invariance, the taxonomy allows the user to discard several methods right away. In fact, an online interactive tool based on the information presented in Chapter 6 might be a great resource for the research community.


Chapter seven provides an interesting and detailed discussion on the often overlooked subject of ground truth. Although this topic is naturally quite broad and application-specific, by covering topics such as manual and automated ground truth generation using real and synthetic data, this chapter takes one step forward in creating a more systematic discussion of this issue, which is fundamental to keep moving computer vision to the next stage of maturity in which new method and techniques can be compared objectively across the board.


Chapter eight wraps up the book by presenting four hypothetical applications showing how the materials covered in the previous chapters can be applied in practice. Again, particular attention is paid to the computational requirements of each individual step of the illustrative systems. Chapter eight is particularly relevant to anyone who is interested in the area of embedded computer vision, where mapping the algorithmic steps to the hardware resources is critical. It is still worth reading even for those who are not particularly interested in performance as the illustrative applications should give the readers an idea of what is involved in the process of designing a computer vision system.


Overall, Computer Vision Metrics: Survey, Taxonomy and Analysis is an interesting book which provides a broad description of image descriptors from a new perspective and can help provide new insights for seasoned computer vision professionals. Although the book is mostly self-contained, due to its organizational structure, which is closer to a reference manual than to a textbook, beginners may want to refer to some of the standard more didactic texts before getting their hands on Scott Krig's book. For the more experienced researchers and practitioners, chapters six to eight provide the most innovative perspective on the topic and are particularly useful to help foster alternative (and potentially more rigorously and carefully designed) approaches to current practices.

Last seen: 48 weeks 4 days ago
Level 1: Prestidigitator
Joined: 2013-03-27
Points: 1

Hats off to Scott Krig for attempting to classify such a massive topic.  Reading through the book reminded me of the valiant Indian mythological character Abhimanyu in Mahabharat who single handedly fought a supremely challenging circular battle formation called “Chakravyuh.”   A bibliography of 551 references cited in the book is a testament to the daunting task.   

As someone who has been involved with making cameras for scientific applications, I found the book a great way to fill gaps in my understanding of computer vision.    

Chapter One
Provides what appears to be customary mention of the sources of images i.e. sensor and cameras relevant to CV.  There is only a mention of the sensor technologies such as CCD, CMOS, etc.  There is a brief discussion on corrections necessary.  This topic itself has massive volumes of literature that possibly cannot be included for the purpose of this book.  He touches briefly various camera systems such as single pixel and 2D computational cameras,  3D depth camera systems, stereo cameras, various array cameras.  There is also a discussion on 3D depth processing.  It appears that the author has had to make a valiant effort to shrink the vast literature and diverse systems into just a few pages.  But to author’s credit, the matter serves as a great launching pad to jump further into a specific topic.  This is helped by some great references noted for further study.

Chapter Two
The pre-processing with the end of feature extraction is discussed. Table 2-1 lays down the problems in image preprocessing required for feature extraction i.e. problems of image correction and enhancement.  This list by no means cover the entire gamut of image pre-processing.  But to see the problems classified this way for feature extraction is a great way to develop the insights into cleaning up the images for the purposes of computer vision.  The techniques for each preprocessing function are grouped in 4 - local binary, local gradient, FFT and polygons.  These techniques are compared in the discussion and their suitability for different images is mentioned.  In addition there are sections on Colorimetry and Spatial Filters describing their applications in segmentation.  The structure of the text does make material difficult to follow the classification presented early in the chapter.

Chapter Three
The metrics for texture analysis of image regions are classified into 4 groups - structural, statistical, model based and transform based.  First, the metrics for feature description based on texture analysis are surveyed - Edge metrics, cross-correlation, Fourier spectrum signatures, co-occurrence matrix, Laws texture metrics, local binary patterns, dynamic textures.  Subsequently, metrics to describe the texture are presented in two groups.  First, Statistical metrics e.g. histograms, points and image moments. Then, transform based such as HAAR, Slant, Walsh-Hadamard, KLT, wavelet and Gabor filters, Hough and Radon.  
Worth noting is the experience that the author shares in noting the historical development of various metrics as computational power increased allowing more complex analysis.

Chapter Four
This chapter discusses the metrics or techniques commonly used in describing local features i.e. regions of pixels with interesting information.  The idea is to first describe a local feature effectively so that subsequently it can be detected accurately.  Various metrics are discussed to describe the feature and its attributes of invariance and robustness.  Table 4-1 sums up the goodness of feature metrics.  Subsequently, various distance functions are discussed to match the features.  There is some valuable practical information regarding the density of the descriptors.  E.g. a rule of thumb that between .1 to 1 percent of the pixels in an image can yield interest points i.e. anchors to regions of interesting features.  
The discussion continues to note various methods to represent the feature descriptors  so that they could be stored and matched.  This is mostly discussion about various co-ordinate spaces.  Next various shapes and patterns to perform feature computations are surveyed.  An objective discussion of various descriptor discrimination techniques follows.  This is a very practical discussion that compares various techniques against the compute resources and robustness.  Next search strategies are discussed for sparse local features.  
Finally, a system level perspective is presented to show how features are used in computer vision systems.  This includes discussion on classification and learning of features.

Chapter Five
The author creates a taxonomy as a framework to analyze feature descriptor methods “to record ‘what’ practitioners are doing.” in author’s words.  The classification is based on shape and pattern, spectra and density.  i.e. 3 dimensions of classifying feature descriptor.  The taxonomy for robustness of feature descriptor lays down the criteria to measure the robustness aka invariance and accuracy of the feature descriptors.  
Figure 5-2 and Table 5-1 are the key takeaways.
The last section provides some examples on how the taxonomy could be used to evaluate various metrics corresponding feature description algorithms.

Chapter Six
A survey of algorithms to identify interest points aka anchors to features and algorithms to detect features aka descriptors are provided.  Together anchors and features identify an object.  There is a mention of computational complexity involved for some of the algorithms.  The chapter is a rich resource for references for further details on various algorithms.    
I do feel that for people new to the subject, it may be easier to follow the discussion on image pre-processing in Chapter 2 after going through this chapter.

Chapter Seven
Provides a framework to create test images and data to measure the effectiveness of the feature descriptor algorithms.  It provides a framework to specify synthetic as well as real test images and data based on the feature descriptor robustness taxonomy developed earlier.  Table 7-1 and 7-2 summarize the discussion well. Identifies the important criteria for test images and data.  

Chapter Eight
Shows various hw/sw trades in creating a vision system.  Example vision pipelines are discussed to illustrate the architectural analysis for specific vision problems.  Four examples are chosen - Automobile recognition, face recognition, image classification, augmented reality.  The examples provide insight into designing systems for a particular problem.  This chapter is a rich resource on author’s practical insights into optimizing the execution of the algorithms on various platforms.  

In closing, this book has helped me understand the entire computer vision pipeline.  It gave me a perspective that other books such as Learning OpenCV did not.  

I should mention that the book needs better formatting.  Numbering the titles of each topic would help.

Last seen: 2 years 26 weeks ago
Level 1: Prestidigitator
Joined: 2014-06-01
Points: 1

I agree with previous reviews: This is a great source of information about computer vision with good breadth and some interesting depth.  Subjects are covered with seemingly appropriate balance of key information and conciseness.

While the goals of the book excluded tutorials, it would be very instructive to have included architecture and implementation details, even if very concise, for some key historical and modern systems.  I think this could provide the beginnings of a reasoning framework for choosing and assembling certain approaches at each level.  Possible examples: USPS OCR, NIST / FBI fingerprint minutia analysis, Google self-driving car LIDAR etc., typical photography in-camera and postprocessing, one of the image recognition services, traditional and current manufacturing automation, historical and somewhat current military aircraft ID, etc.

References, both for papers/books and software / systems / services, should be updated and maintained on a web page.  Freely available vs. fee-based solution classification would be helpful.  Paths for learning or standard engineering combinations ought to be laid out for easy assembly and development setup.

The convolutional neural network and deep learning section ought to be expanded significantly, perhaps considering how a number of other computer vision related algorithms would work with or be implemented by such approaches.

There is a notable lack of any detail related to array cameras (Chapter 1, Pg. 22).  The following Siggraph paper and related video, at least, should be summarized and referenced:

Somewhat related, the depth granularity discussion doesn't mention the effects of superresolution on the possible depth detection accuracy.  The next section on multi-view stereo perhaps should include a summary of array camera parallax enhancements.

Last seen: 2 years 30 weeks ago
Level 1: Prestidigitator
Joined: 2013-06-27
Points: 1

I was both impressed and disheartened by Scott Krig’s book “Computer Vision Metrics”.


Foremost it is an ambitious book, providing a thorough survey of topics and methods attached to what is variously called Computer Vision, Image Processing, Artificial Vision.  It is encyclopedic in scope.  It provides a framework for discussing all aspects of the topic from sensing through information extraction and getting automated [software] processes to reach conclusions about the scene[s] being viewed.


In the end, though, I found many of the discussions depressingly shallow.  There are almost no equations offered;  lists of techniques are offered but without a discussion about how to choose among the techniques or a basis for deciding which technique would best fit a given problem.  It may well be that this book should have become a multi-volume encyclopedia in order to adequately address the issues.


Let me explain that I have been active in the field for more than 45 years.  At first Image Processing was limited to the operations that the electrical engineers could realize in circuitry;  my digital data sets would stretch over 30 or 50 magnetic tapes and it could take over 24 hours to run a single image processing operation on a data set.  The focus of so much work over the years was in developing heuristics and shortcuts to reduce the processing load without too much concern for the errors introduced in getting away from the theory and adopting the shortcuts.  My experience is that the closer one can come to the basic equations of a system the more robust the image processing system can provide answers.


My experience is that some image processing developers have a poor understanding of the actual sensing process, the underlying physics and the underlying mathematics that enable extraction of relevant information from images.  I first looked for a discussion of spatial resolution, equations for radiation transfer, explanations of the sensing process.  The naïve assumption I hear time and again is that a pixel represents only the energy returned from the area the pixel represents.  There is, in conversations I have with other imaging engineers, a lack of appreciation for the difference between pixels [ picture elements ] and reselms [resolution elements] and the role of sample rate in marrying the two.  I would have hoped for a discussion in the book about reselms and pixels and the actual meaning of the sensed data and the values reported by the imaging system.  With the common use of Bayer cameras [which provide 3-color imaging using a single imager] there should have been a discussion about the effects that Bayer imaging introduce into the data, how it confounds both the spatial resolution and the color information. 


I was bothered, for example, by the discussion of preprocessing;  the impression I got from the text is that this is a step that can help obtain better results from detection and classification processing but not a necessary step nor was there any discussion about how one should choose a preprocessing method from the list provided.  In my experience preprocessing is a critical step because it transforms the sensed data into the data that would have been collected had I placed an ideal sensor right next to the object or class of interest.  Preprocessing, for example, can be used to remove effects of imaging through an atmosphere, changes in illumination intensity and spectrum within an image,  imaging with a noisy sensor, imaging with a sensor whose spatial resolution does not match the sampling resolution, imaging with a sensor whose output characteristics change with class or object being sensed,  etc.  I recently worked on a system where one of the objects of interest was in shadow, being located next to a brick wall;  the sensed spectrum of the object did not match the laboratory observed spectrum because the sensed object was illuminated mostly with sunlight colored by the wall it was reflected from.   Obtaining robust results from any imaging begins with writing out the equations of the complete sensing system and then analyzing the equations to isolate the characteristics of interest and devising a means of solving for them.  Too often it seems to me that engineers treat Computer Vision problems as though they were using Joy of Cooking – throwing techniques or algorithms at a problem without understanding fully whether the technique is applicable to the specific problem at hand.  My problem with this book is that it speaks to the Joy-of-Cooking method of computer vision rather than solving specifically the inverse of the imaging system. 


In summary, while I found the book quite useful for its manner of bringing all sorts of discussions and techniques inside a single tent, the lack of in-depth discussion and equations left me, well, disheartened.  Mr. King set himself an enormous task but failed to deliver as comprehensive a volume as I would have like to see.

Jim Morgenstern

Image Mining LLC


Last seen: 41 weeks 4 days ago
Level 1: Prestidigitator
Joined: 2014-07-07
Points: 1

Scott's book, Computer Vision Metrics is a great resource for those who are both experts and new to computer vision. But just like any tool in your toolbox you do need to learn to use it correctly. The author provides numerous lists, descriptions, etc of various algorithms and techniques used in computer vision. If you are fairly familar with some of the algorithms, for example, you'll easily be able to learn about similar algorithms, where they are different and where they are the same. If you are a beginner, the way the book is organized will help you classify and organize all these new terms and definitions and what they are used for. But the book does not go into enough depth on each algorithm to allow you to really understand it to implement it. But this is OK, that is not the purpose of this book. For that you will have to dive deeper via Wikipedia, OpenCV documentation or someplace else.

And you get lots of help on where else to go. The author annotates well over 500 references. I found myself reading many of these on areas beyond my knowledge or just to get a more details or a more complete understanding of things I thought I knew well. A good example is the data in Table 6-2 that outlines the performance charactersitics of SIFT -- you really have to read a dozen pages in the original paper to understand where all these numbers come from.

The online version of the bibliography would be great if it were links (for those that are online) though, of course I realize that links get broken eventually and keeping it up-to-date would be a large undertaking.

In the last chapter four hypothetical use cases and their vision pipelines are presented. A few years ago some of these may have seemed a bit like TV science fiction. But the author shows that with careful analysis, picking the right toolset and data, that some objects are really closer than they appear to be.

Appendix A (not yet online as I write this) contains a description of a set of synthetic test images and test results which should prove invaluable to both researchers and implementors. I applaud this great effort as well as the author's suggestion that this type of work will help develop simple intuition about human vs. machine detection of interest points. While there is usually no shortcut in gaining experience, this type of work makes it a it easier.




Last seen: 1 year 50 weeks ago
Level 1: Prestidigitator
Joined: 2012-12-10
Points: 1

Computer Vision Metrics

Book review by Steve Leibson


This is a very meaty book on computer vision that Scott Krig has written. Rather than come up with new words, I think the best way to give you an overview of this book is to use the author’s own, very accurate words:

“This book is suitable for reference, higher-level courses, and self-directed study in computer vision. This book is aimed at someone already familiar with computer vision and image processing.”

“Readers looking for computer vision “how-to” source code examples, tutorial discussions, performance analysis, and short-cuts will not find them here, and instead should consult the well-regarded library resources, including many fine books, online resources, source code examples, and several blogs. There is nothing better than OpenCV for the hands-on practitioner.”

“…this book is about the feature metrics, showing “what” methods practitioners are using, with detailed observations and analysis of “why” those methods work, with a bias towards raising questions via observations rather than providing too many answers.”

“This book is aimed at a survey level, with a taxonomy and analysis, so no detailed examples of individual use-cases or horse races between methods are included. However, much detail is provided in over 540+ bibliographic references to dig deeper into practical matters.”

So Krig has been very forthcoming about what this book is not. What’s left is for me to tell you what the book is. In my opinion, it is a dictionary or encyclopedia of computer vision algorithms. It describes a staggering number of algorithms and tells you what they do. As such, this book is not a straight read any more than the dictionary is a straight read. Unless you are a Scrabble player or a spelling bee contestant, you don’t read the dictionary like a book, you use it as a reference and sample it as needed.

That’s what “Computer Vision Metrics” is. It lists hundreds of algorithms by name and explains what they do. When you need to know what a particular algorithm does because you’ve come across the name of the algorithm and nothing more, this book will fill in the details for you.

If you’re new to learning computer vision, this is not a textbook for you. However, you will likely want it on your shelf as a reference due to its comprehensive contents.

vratford's picture
Last seen: 5 weeks 1 day ago
Level 1: Prestidigitator
Joined: 2012-10-30
Points: 4

How Do You Know When Your Vision Algorithm Is Done?

One of our key missions here at the Embedded Vision Alliance is to educate developers on how to integrate embedded vision into their products. Since I am not a computer vision expert, I have turned to several reference books to educate myself. So in some ways I am a proxy for engineers who are re-educating themselves on new technology. I have read a couple of books to date: "Heterogeneous computing with OpenCL" by Gaster, Howes et al, and "Algoritms for Image Processing and Computer Vision" by J.R. Parker. These are good how-to books with lots of examples and enough computer science and math to jar my faded memories of Engineering School.

However, this new book by Scott Krig, which I recently finished, was probably where I should have started. It's not a how-to guide; instead, it's more like a reference guide, taking the reader through the basic principles of computer vision, without dragging the reader into great detail. This includes not only the algorithms themselves but also the implementation options. Overall, Krig provides an excellent overview of the technology, along with lots of sources of information. Specifically, the book has an excellent section on taxonomy, which is important, since having a standard vocabulary and definitions is essential for many of our member companies in the Alliance.

Scott uses an image pipe as a metaphor for classifying vision data, which helps bring into context the underlying hardware and software functions required to build the higher level algorithms. It also helps one to see how different implementation options can be used to trade off performance, power, latency and other key metrics. The book is strong in helping the reader understand the principles behind "ground truth data" and how to measure algorithms efficacy, and accuracy. In many ways, verification of algorithms against the right data sets remains one of the most significant barriers to productization of a prototype, so this particular chapter provides a good overview of the fundamentals and suggest ways to measure the results.

After all, how do you know when your algorithm or application is done? Look for example at Google/Nest's recent smoke alarm recall. I have several of these devices in my home; a feature called Nest Wave allows you to use gestures to temporarily suspend the alarm by waving vigorously nearby. Apparently, however, activity near the product during a fire can prevent the alarm from immediately sounding when the Nest Wave feature is enabled. And of course the default is that the feature is enabled. I don't know the exact reason for the failure, but it sounds like there wasn't adequate verification done on the underlying gesture recognition algorithms. Maybe Nest should have read Scott's book.

Even though the book was sponsored and selected by Intel, it is not an Intel promotion piece. The major implementation focus is on CPUs and GPUs so it's "light" on other platforms like DSPs, FPGAs and dedicated vision processors. Nevertheless, the basic principles still apply to all processing platforms, including heterogeneous multi-processor SoC and system architectures. If you're looking for a good introduction to the world of computer vision and in particular want to avoid future product recalls, I strongly suggest you start here.

Vin Ratford
Executive Director, Embedded Vision Alliance
Investor, Advisor, Co-Founder, Auviz Systems