Bookmark and Share

General Discussion

Stereo Disparity Processor
4 replies [Last post]
oznelig
Offline
Last seen: 4 years 37 weeks ago
Level 1: Prestidigitator
Joined: 2012-12-23
Points: 6

Hi,

My background is in image compressin where I have previously designed JPEG, MPEG-4 and H.264 processors in hardware (HDL coding for ASIC and FPGAs). I have now started to look at computer vision.

Although this might look like an hardware question, it really belongs here (see below). Besides, the algorithm in question can probably be very efficiently implemented on a processor with SSE instructions or similar.

Basically, I have come up with a stere disparity matching algorithm that accepts as input a rectified stereo pair. The quality of the algorithm is similar to block SAD based. For example, the algorithm correctly calculates over 80% of the disparities in the two well known stereo pairs "Teddy" and "Cones".
See :
http://vision.middlebury.edu/stereo/data/scenes2003/

To prove to myself how simple the algorithm really was, I have coded a core in VHDL and targeted it to Xilinx Artix-7 family. The demo core consists in two video raster inputs (from left and right rectified images) and buffers and 8 stereo disparity matching units. Each unit can check one disparity per clock cycle. In Artix-7 speedgrade 3, the core reaches 250 MHz and, therfore, 2 billion disparities/sec (there are basically no dead cycles). The inpu should be relatively immune to affine changes in illumination.

The footprint of the core is only 667 slices + 10 RAMB36 + 2 DSP48

this is only ~13% of the area of the smallest Artix-7 part. From what I can see in the available literature, this is 6-8x smaller than similarly performing cores. With 2 billion disparities per second it can do better than, for example, 640x480 @ 60 fps and 100 max disparity. The core is not ready for shipping as it needs documentation and more testing. It also needs a small addition consisting of a median filter.
Since the problem is embarrassingly parallel, the core can be scaled for higher performances.

Now my question and keep in mind that computer vision is not my area.

Is this potentially a valid product ? I have the book "Embedded Computer Vision", Springer. In the chapter "SAD-based Stereo Matching Using FPGAs", it says that stereo disparity extraction "has been considered computationally too complex for low cost solutions". This core would certainly allow very low cost solutions since it s many times smaller than the solution proposed in the same book and in other papers.
However, since 3D sensors like Primesense :
http://www.primesense.com/
are now emerging. Do you think there's still a market for passive stereo cameras  ?

Thank you very much in advance.

goksel
goksel's picture
Offline
Last seen: 4 weeks 6 days ago
Level 1: Prestidigitator
Joined: 2011-05-31
Points: 1

Hi, Stereo disparity computation at low power consumption is a very intriguing topic. I would encourage you to prepare a submission for the upcoming IEEE Embedded Vision Workshop (http://cvisioncentral.com/evw2014), where a mix of academic & industrial Program Committee would review and discuss the technical merits of the approach. It is perfectly OK to propose an application-specific stereo solution, so consider testing it out on various problems. Please note: since a paper submission would constitute a public disclosure, you may want to secure any IP by filing patent applications so that you can protect it towards a product.

oznelig
Offline
Last seen: 4 years 37 weeks ago
Level 1: Prestidigitator
Joined: 2012-12-23
Points: 6

Thank you for this information I will consider it, even though the date is not really convenient.

I'm also working on other aplications of this thing (see my reply below).

Do you know of a place on the web where the most appropriate conferences are listed together ? This way I might be able to choose one at later date.

twilson
Offline
Last seen: 3 years 41 weeks ago
Level 1: Prestidigitator
Joined: 2013-09-05
Points: 1

Hi I wanted to thank you for sharing the results of your disparity mapping implementation. I would like to hazard some answers to your questions:

1) is what you developed a valid product?

2) with other 3D sensor technologies,is passive stereo camera and disparity mapping pertinent?

my thoughts on 1 and 2 below:

1) it's difficult to judge whether it's a valid design/product or not based on the information so far.  The real-time performance (in MDE/s) seems very good and you should definitely be encouraged  by that.  However, it would be good to hear how the algorithm performs using for example percentage of bad  pixels in non-occluded areas on the Middlebury test set.  I read the 80% figure you provided above, but I'm not sure how that maps to the percent bad pixel measures in the Middlebury evaluation table.  You mention it compares to a SAD block matching approach,but typically a straightforward local dense block matching approach (I assume constant support window size in your implementation?) doesn't fare as well as more advanced global, semi-global or more sophisticated local dense approaches (e.g. adaptive window, etc.).  The advantage of a software programmable implementation is that it allows for more tuning of the algorithm to improve its 3D depth map quality (playing with adaptive support window parameters, etc.).  It may be that a hardware implementation of a local block matching approach might be too limited despite its real time performance.  However, this is completely dependent on the relative quality of its 3D depth maps of course.  I would encourage you to find out more about its 3D depth map output quality and look for optimizations as required.

2) definitely stereo vision will continue to play a key role as a 3D sensing technology.  Your reference is partly correct in that disparity mapping has historically been too computationally intensive for conventional architectures to do real time.  However, with new image cognition processing cores driving new levels of performance per area per power (like from CogniVue and others now) that is changing.   In the range of 3D sensing technologies that are available  stereo vision offers unique value propositions in terms of range, flexibility to lighting conditions, cost and power.  There's a table from my October EVA Summit presentation  that compares the different 3D sensing technologies and stereo has important characteristics that will make it probably the only practical approach for many use cases.  These advantages are driving strong interest in stereo vision for 3D vision amongst the biggest names in consumer verticals.

 

 

 

oznelig
Offline
Last seen: 4 years 37 weeks ago
Level 1: Prestidigitator
Joined: 2012-12-23
Points: 6

Thank you for your useful considerations. Regarding the quality of the result, I have marked all the disparities that differed from the ground truth by more than 1 pixel as wrong. I noticed that, for those images int he Midlebury files, the ground truth seemed to have 2 extra bits (probably subpixels matches so one pixel is scaled by 4). So, for every pixel in the image I do the following test

IF abs(my_disparity*4 - ground_truth) <= 4 THEN

correct pixel

ELSE

wrong pixel

When I do that, I get over 80% correct pixels for "teddy" and "cones", which is in line with other SAD block matching + plus some edge enhancing algorithms that I have seen. In fact most of the FPGA articles I have found use a fixed block matching SAD plus some pre-processing.

Regarding your other comments, I realise that I didn't write this properly : I have not found a novel algorithm for stereo matching at all. What I have found is a (hopefully novel) very simple and fast way to match two blocks of pixels and  doing so with the option of making it relatively immune by affine changes in illumination (like a*x+b contrast and offsets are automatically corrected). Then I decided to test it and picked up the simplest algorithm that does stereo matching : compare every block of pixel centered on every pixel of the left scan line with each one of the right (within the max disparity), keep the scores and choose the best match. So, the algorithm is not new, just the way to match the blocks.

So, those 667 slices of Xilinx fabric + memories contain two raster to block converters (left ad right) and then 8 block matching units, each matching a block per clock cycle. The block size is fixed at 8x8x pixels.At 250 MHz that gives you 2 Billion 8x8 block comparisons per second.

Some considerations/consequences of the above :

1) I was looking at this paper from Willow garage and realized that even a stupid algortihm like this can be greatly enhanced with an infrared LED :

http://www.willowgarage.com/sites/default/files/ptext.pdf

2) I seem to remember that one of the top performing algorithms on Middlebury evaluation table uses the census transform as basic matching block. Maybe all I need to do to get a much more accurate algorithm than my current one is to substitute the census matching wth mine (at extra gate cost compared to the current of course). Similarly for any other stereo matching algorithm that uses block matchng at its core.

3) I haven't done work on this yet but matching blocks of pixels in a way that is immune from affine transforms in the illumination, to me, seems a bit like matching keypoints in algortihms like SURF and SIFT. Being able to match a few billlion keypoints/s should be very useful in recognition, automotive, robotics etc. etc.

4) Normally the stuff I implement in hardware doesn't map well to software but, in this case, I think the SSE type instructions would do a very good job.

So, thank you for your comments, it made me better understand that, rahert ahn being a product in itself, this could be a precursor to a multitude of computer vision products. If I find the time, maybe I shoudl try to show up at one of the embedded vision summits and try to understand better the practical applications of computer vision by talking to the other attendees.