|
[DPRG] webcam robot vision limitations
Subject: [DPRG] webcam robot vision limitations
From: David Murphy
dfm7 at earthlink.net
Date: Thu Jun 28 12:53:53 CDT 2007
Hi Chris,
On Jun 27, 2007, at 8:09 PM, Chris Jang wrote:
> The thing I wondered then is how the board interfaces to
> another computer? It needs to be like a GPU card or a programmable
> network camera with the GPU embedded inside it. I know...this is not
> a product yet. It's a prototype.
In their case the cpu is inside the fpga and, although I don't know
specifically what they did, at the 50,000ft level the cpu and 'vision
acceleration' hardware share ram through a memory controller that
manages the requests from the two sources. So the vision accelerator
is reading data from the camera, processing it and dumping the
results into ram for the cpu to pick up.
> I know that hardware acceleration is the only way computer vision
> can work. Doing it all in software just will never be enough for
> the foreseeable future. But as a software guy, I am deeply intimidated
> by electronics. That's why I haven't pursued this direction.
>
I know what you mean, but the learning curve is probably less steep
than if appears through the lens of your intimidation. All of the
hardware in the fpga was written in a language called verilog that
uses your standard control flow structures (if/else, for, etc.) plus
some additional structures to facilitate hardware representation. Of
course, you still have to think in terms of hardware to get an
efficient design; if you write algorithms directly you will be
disappointed and frustrated by the results. The language is run
through a compiler that converts it into logic and then it is
downloaded to the fpga. You might find it interesting to look into
this some time in the future.
>
> I'll come clean and say it took me a long time to grasp what SIFT was
> doing. How does this work? Why does it work? I read the papers
> about it
> and was confused. But one day it clicked.
Well, I wouldn't worry about that. The guys in HBRC doing this are
extremely bright and one of them commented something along the lines
of 'I read the papers multiple times and didn't really understand it
until I wrote a program to emulate it."
> I think I read somewhere that no system using interest points has a
> "vocabulary" of more than about a dozen objects it can recognize. So I
> took from this the view that the big problems have yet to be solved.
> I've also seen some research with SLAM using interest points, kind of
> like "visual odometry".
I think these guys feel that they can have quite a bit more than
that, like maybe 50 to 100 with the hardware they are using.
> Honestly, I don't know what to do next. Vision is so hard and LIDAR
> SLAM works so well that... it is discouraging. I feel that I need a
> huge boost in computer power, like something in the tens or low
> hundreds of gigaflops range. But this a radical shift in direction.
> It is not embedded at all.
Yeah, I know. But I applaud what you are doing and hope you will keep
at it. In Stanford's Stanley robot, LIDAR by itself was not
sufficient, they combined vision and LIDAR to get the results they
achieved. In addition, one guy who worked on that project came to
talk at the HBRC and he said that although they had multiple (if
forget whether it was 4 or 5) blade servers on board, in the end, for
the competition, they in fact only used one. So they got away with
one cpu for everything. Maybe you want to consider adding some
hardware to the problem, if not FPGA, perhaps splitting the work
between two cpus. Let one handle all the vision stuff and offload all
other control and decision making to the other?
Anyway, keep pluggin'
David
More information about the DPRG mailing list
|