|
[DPRG] webcam robot vision limitations
Subject: [DPRG] webcam robot vision limitations
From: Chris Jang
cjang at ix.netcom.com
Date: Wed Jun 27 22:09:42 CDT 2007
Thanks David,
Ed Okerson mentioned this about six months ago in conversation at
a DPRG meeting - a group in the Bay Area had hardware based SIFT
working. And I think he sent a link to Roboticore before as I recall
seeing this. The thing I wondered then is how the board interfaces to
another computer? It needs to be like a GPU card or a programmable
network camera with the GPU embedded inside it. I know...this is not
a product yet. It's a prototype.
I know that hardware acceleration is the only way computer vision
can work. Doing it all in software just will never be enough for
the foreseeable future. But as a software guy, I am deeply intimidated
by electronics. That's why I haven't pursued this direction.
COTS GPU acceleration is something that does appear interesting. This
appears in published research for Gaussian convolution calculation.
SIFT uses difference of Gaussian approximation to the Laplacian in
scale space (the interest points are local extrema). So being able to
compute Gaussian convolution quickly at different scales is very
significant.
I'll come clean and say it took me a long time to grasp what SIFT was
doing. How does this work? Why does it work? I read the papers about it
and was confused. But one day it clicked.
Laplacians are zero on the edge. On either side of the edge, they are
of opposite sign. As the scale of the Gaussian convolution operator
increases, the disturbance from the edge moves farther away from the
edge. So if you have a circular shape in an image and a Gaussian
convolution operator of roughly the same size,there will be an extremal
point near the center of the shape. These are the potential interest
points.
As you mentioned, there are gradients in a neighborhood around each
interest point (this works out to 128 scalars, I think). This helps
distinguish each point and simplifies later classification of these
clouds of interest points to objects.
I think I read somewhere that no system using interest points has a
"vocabulary" of more than about a dozen objects it can recognize. So I
took from this the view that the big problems have yet to be solved.
I've also seen some research with SLAM using interest points, kind of
like "visual odometry".
Honestly, I don't know what to do next. Vision is so hard and LIDAR
SLAM works so well that... it is discouraging. I feel that I need a
huge boost in computer power, like something in the tens or low
hundreds of gigaflops range. But this a radical shift in direction.
It is not embedded at all.
So for now, I'm just working with the problem at hand with my small
robot Stella.
Chris
-----Original Message-----
>From: David Murphy <dfm7 at earthlink.net>
>Sent: Jun 27, 2007 2:51 PM
>To: Chris Jang <cjang at ix.netcom.com>
>Cc: DPRG <dprglist at dprg.org>
>Subject: Re: [DPRG] webcam robot vision limitations
>
>Hi Chris,
>
>I want to share some work with you that you might find interesting
>and relevant.
>
>It will involve some background, so bear with me.
>
>About 18months -> 2 years ago, a group of folks in the Home Brew
>Robotics Club (S.F. Bay area, Ingolf Sander, John Slater, Brandon
>Blodget, Dave Wyland) became interested in object recognition for
>robotics and in particular the SIFT algorithm. If you give me some
>'artistic license' here for the description as I'm not sufficiently
>familiar with it to be accurate, it works something like this.
>
>Scan a scene looking for 'key-points'. A key-point is basically an
>intersection of lines, or a place where there is a sharp curve in a
>line, plus the gradient of illumination in the immediate vicinity of
>the intersection.
>An object is known by the set of key-points extracted for it during
>training.
>After The key-points are extracted from the scene, they are compared
>against the data base looking for the highest number of matches, and
>this give you the object(s) in the scene.
>
>This algorithm is somewhat tolerant of the rotation of an object and
>changes in illumination from the training position and apparently
>mimics the activity of some cells in the visual cortex of mammals.
>
>One of the folks working in this project has worked with FPGA's and a
>lot of what is going on here in the early stages is very amenable to
>implementation in hardware (extracting lines, intersections,
>computing gradients, and the like). So their approach was to put all
>of this initial stuff into hardware and let the CPU worry about
>database matching and decision making.
>
>The FPGA guy in this group, along with another fellow in the club
>had, a few years ago, built a board for robotics projects that had a
>xilinx spartan fpga; they programmed the fpga with a micoblaze cpu
>( a soft core available for xilinx) and ran a version of linux on it.
>I think they intended to commercialize it, but for whatever reason
>did not.
>
>Ok, now these two groups have gotten together and formed a company
>called Roboticore to make this stuff available.
>
>I saw the demo about a year ago before they formed the company. They
>had much of the hardware running and in real time could extract the
>key-points from an image without stressing the FPGA. They demo'd it
>by outputting the results to an frame buffer and displaying on a
>monitor. They could wave the camera at the audience and you could see
>the results on the monitor in real time with no lag.
>
>I did not see their recent presentation at the HBRC, but you can see
>it at the HBRC website. http://www.hbrobotics.org/HBRC_Presentations.htm
>look for FPGA vision.
>
>I guess the point is that a lot of the low level vision stuff is
>repetitive and uniform and hence amenable to hardware. Putting this
>stuff into an FPGA (you can get a development board for a spartan
>FPGA for about $100.00) would free up a lot of CPU cycles for other
>things.
>
>Cheers,
>David
>
>On Jun 26, 2007, at 9:16 PM, Chris Jang wrote:
>
>> Hello, I'm not sure if anyone is interested in this...
>>
>> But it's something different to discuss.
>>
>> I have a small robot with a VGA webcam and ARM9 PC104 SBC. Up until
>> last night, the performance out of this combination has been
>> embarrassing - frame rates of around 1.2 fps with a low percentage
>> of corrupted images.
>>
>> After lots of experimentation, the webcam now runs at 15 fps with
>> long sequences of around 10 seconds without any image corruption.
>> This includes V4L capture, JPEG decode, and saving to SD flash. I
>> hope that with some more twiddling, there will be no more corruption.
>>
>> Here's the trick (I believe) - cameras have a native frame rate.
>> If whatever consumes and processes the video does not operate at
>> pretty much exactly this speed, then output is prone to corruption
>> due to device/driver sync issues. So I had to put an adaptive
>> spinning delay loop which adjusts depending on the measured time
>> between each frame capture. This adaptive delay tries to hold the
>> capture rate at 15 fps which matches the webcam.
>>
>> Ok, so boring...here's the interesting part.
>>
>> Now that the basic stuff is worked out, the amount of CPU time
>> available for processing each frame is known. On the 200 MHz ARM9
>> based computer board I'm using, roughly half the time is spent
>> capturing images, decoding and saving them to SD flash (some time
>> could be recovered by not saving images - but then debugging is
>> impossible as we can't know what the robot saw). The other half of
>> the time is available. That's roughly 30 milliseconds at 15 times
>> each second, once for each image frame from the webcam.
>>
>> I think this is enough time for pixel based image segmentation
>> (fancy term for color blobs). There is enough time for some
>> statistics and simple morphological filters. But there is not
>> enough time for any feature based techniques (no convolution).
>>
>> Stanford's DARPA Grand Challenge vehicle dedicated one 1.6 GHz
>> Pentium M computer to 320x240 monocular RGB video. They did color
>> based image segmentation with some morphological filtering. So even
>> with over 10x the power of a 200 MHz ARM9, they were still limited
>> to very sophisticated color blob detection.
>> _______________________________________________
>> DPRGlist mailing list
>> DPRGlist at dprg.org
>> http://list.dprg.org/mailman/listinfo/dprglist
>
More information about the DPRG mailing list
|