DPRG
DPRG List  



[DPRG] webcam robot vision limitations

Subject: [DPRG] webcam robot vision limitations
From: David Murphy dfm7 at earthlink.net
Date: Thu Jun 28 12:53:53 CDT 2007

Hi Chris,

On Jun 27, 2007, at 8:09 PM, Chris Jang wrote:


>  The thing I wondered then is how the board interfaces to
> another computer? It needs to be like a GPU card or a programmable
> network camera with the GPU embedded inside it. I know...this is not
> a product yet. It's a prototype.
In their case the cpu is inside the fpga and, although I don't know  
specifically what they did, at the 50,000ft level the cpu and 'vision  
acceleration' hardware share ram through a memory controller that  
manages the requests from the two sources. So the vision accelerator  
is reading data from the camera, processing it and dumping the  
results into ram for the cpu to pick up.


> I know that hardware acceleration is the only way computer vision
> can work. Doing it all in software just will never be enough for
> the foreseeable future. But as a software guy, I am deeply intimidated
> by electronics. That's why I haven't pursued this direction.
>
I know what you mean, but the learning curve is probably less steep  
than if appears through the lens of your intimidation. All of the  
hardware in the fpga was written in a language called verilog that  
uses your standard control flow structures (if/else, for, etc.) plus  
some additional structures to facilitate hardware representation. Of  
course, you still have to think in terms of hardware to get an  
efficient design; if you write algorithms directly you will be  
disappointed and frustrated by the results. The language is run  
through a compiler that converts it into logic and then it is  
downloaded to the fpga. You might find it interesting to look into  
this some time in the future.

>
> I'll come clean and say it took me a long time to grasp what SIFT was
> doing. How does this work? Why does it work? I read the papers  
> about it
> and was confused. But one day it clicked.
Well, I wouldn't worry about that. The guys in HBRC doing this are  
extremely bright and one of them commented something along the lines  
of 'I read the papers multiple times and didn't really understand it  
until I wrote a program to emulate it."

> I think I read somewhere that no system using interest points has a
> "vocabulary" of more than about a dozen objects it can recognize. So I
> took from this the view that the big problems have yet to be solved.
> I've also seen some research with SLAM using interest points, kind of
> like "visual odometry".
I think these guys feel that they can have quite a bit more than  
that, like maybe 50 to 100 with the hardware they are using.

> Honestly, I don't know what to do next. Vision is so hard and LIDAR
> SLAM works so well that... it is discouraging. I feel that I need a
> huge boost in computer power, like something in the tens or low
> hundreds of gigaflops range. But this a radical shift in direction.
> It is not embedded at all.
Yeah, I know. But I applaud what you are doing and hope you will keep  
at it. In Stanford's Stanley robot, LIDAR by itself was not  
sufficient, they combined vision and LIDAR to get the results they  
achieved. In addition, one guy who worked on that project came to  
talk at the HBRC and he said that although they had multiple (if  
forget whether it was 4 or 5) blade servers on board, in the end, for  
the competition, they in fact only used one. So they got away with  
one cpu for everything. Maybe you want to consider adding some  
hardware to the problem, if not FPGA, perhaps splitting the work  
between two cpus. Let one handle all the vision stuff and offload all  
other control and decision making to the other?

Anyway, keep pluggin'

David


More information about the DPRG mailing list

Copyright © 1984 - 2006 Dallas Personal Robotics Group. All rights reserved.
Website Design by NCC

For the latest robot news visit robots.net