|
[DPRG] webcam robot vision limitations
Subject: [DPRG] webcam robot vision limitations
From: David Murphy
dfm7 at earthlink.net
Date: Wed Jun 27 13:51:05 CDT 2007
Hi Chris,
I want to share some work with you that you might find interesting
and relevant.
It will involve some background, so bear with me.
About 18months -> 2 years ago, a group of folks in the Home Brew
Robotics Club (S.F. Bay area, Ingolf Sander, John Slater, Brandon
Blodget, Dave Wyland) became interested in object recognition for
robotics and in particular the SIFT algorithm. If you give me some
'artistic license' here for the description as I'm not sufficiently
familiar with it to be accurate, it works something like this.
Scan a scene looking for 'key-points'. A key-point is basically an
intersection of lines, or a place where there is a sharp curve in a
line, plus the gradient of illumination in the immediate vicinity of
the intersection.
An object is known by the set of key-points extracted for it during
training.
After The key-points are extracted from the scene, they are compared
against the data base looking for the highest number of matches, and
this give you the object(s) in the scene.
This algorithm is somewhat tolerant of the rotation of an object and
changes in illumination from the training position and apparently
mimics the activity of some cells in the visual cortex of mammals.
One of the folks working in this project has worked with FPGA's and a
lot of what is going on here in the early stages is very amenable to
implementation in hardware (extracting lines, intersections,
computing gradients, and the like). So their approach was to put all
of this initial stuff into hardware and let the CPU worry about
database matching and decision making.
The FPGA guy in this group, along with another fellow in the club
had, a few years ago, built a board for robotics projects that had a
xilinx spartan fpga; they programmed the fpga with a micoblaze cpu
( a soft core available for xilinx) and ran a version of linux on it.
I think they intended to commercialize it, but for whatever reason
did not.
Ok, now these two groups have gotten together and formed a company
called Roboticore to make this stuff available.
I saw the demo about a year ago before they formed the company. They
had much of the hardware running and in real time could extract the
key-points from an image without stressing the FPGA. They demo'd it
by outputting the results to an frame buffer and displaying on a
monitor. They could wave the camera at the audience and you could see
the results on the monitor in real time with no lag.
I did not see their recent presentation at the HBRC, but you can see
it at the HBRC website. http://www.hbrobotics.org/HBRC_Presentations.htm
look for FPGA vision.
I guess the point is that a lot of the low level vision stuff is
repetitive and uniform and hence amenable to hardware. Putting this
stuff into an FPGA (you can get a development board for a spartan
FPGA for about $100.00) would free up a lot of CPU cycles for other
things.
Cheers,
David
On Jun 26, 2007, at 9:16 PM, Chris Jang wrote:
> Hello, I'm not sure if anyone is interested in this...
>
> But it's something different to discuss.
>
> I have a small robot with a VGA webcam and ARM9 PC104 SBC. Up until
> last night, the performance out of this combination has been
> embarrassing - frame rates of around 1.2 fps with a low percentage
> of corrupted images.
>
> After lots of experimentation, the webcam now runs at 15 fps with
> long sequences of around 10 seconds without any image corruption.
> This includes V4L capture, JPEG decode, and saving to SD flash. I
> hope that with some more twiddling, there will be no more corruption.
>
> Here's the trick (I believe) - cameras have a native frame rate.
> If whatever consumes and processes the video does not operate at
> pretty much exactly this speed, then output is prone to corruption
> due to device/driver sync issues. So I had to put an adaptive
> spinning delay loop which adjusts depending on the measured time
> between each frame capture. This adaptive delay tries to hold the
> capture rate at 15 fps which matches the webcam.
>
> Ok, so boring...here's the interesting part.
>
> Now that the basic stuff is worked out, the amount of CPU time
> available for processing each frame is known. On the 200 MHz ARM9
> based computer board I'm using, roughly half the time is spent
> capturing images, decoding and saving them to SD flash (some time
> could be recovered by not saving images - but then debugging is
> impossible as we can't know what the robot saw). The other half of
> the time is available. That's roughly 30 milliseconds at 15 times
> each second, once for each image frame from the webcam.
>
> I think this is enough time for pixel based image segmentation
> (fancy term for color blobs). There is enough time for some
> statistics and simple morphological filters. But there is not
> enough time for any feature based techniques (no convolution).
>
> Stanford's DARPA Grand Challenge vehicle dedicated one 1.6 GHz
> Pentium M computer to 320x240 monocular RGB video. They did color
> based image segmentation with some morphological filtering. So even
> with over 10x the power of a 200 MHz ARM9, they were still limited
> to very sophisticated color blob detection.
> _______________________________________________
> DPRGlist mailing list
> DPRGlist at dprg.org
> http://list.dprg.org/mailman/listinfo/dprglist
More information about the DPRG mailing list
|