DPRG
DPRG List  



[DPRG] webcam robot vision limitations

Subject: [DPRG] webcam robot vision limitations
From: Ed Okerson ed at okerson.com
Date: Thu Jun 28 11:58:14 CDT 2007

Chris,

Yes, I was in San Jose when they started working on this project.

I came across another interesting page on optic flow:

http://www.centeye.com/pages/techres/opticflow.html

After reading this description of optic flow, it seems to me that much of
the processing necessary to generate the object vectors is already coded
in an MPEG2 or 4 encoder.  You can visualize the motion vectors using
mplayer by specifying -lavdopts vismv=1, if you are looking at MPEG2 video
you may also need to tell mplayer to use the ffmpeg decoder instead of the
libmpeg2 decoder with this option -vc ffmpeg12,mpeg12, so the command
would look like:

mplayer -vc ffmpeg12 -lavdopts vismv=1 video.mpg

The arrows look amazingly similar to those shown on Yaw right and move
forward samples on the optic flow web page above.  So if you have a camera
that can send an MPEG video stream, you could just analyze the motion
vectors for optical flow.  Otherwise, you could probably hack into the
libavcodec code to just do enough encoding to send you the motion vectors.
 If you do a full MPEG4 encode, it could also solve your video storage
problems, because it would be an easy step to just stream the MPEG4 via
WiFi to a PC for storage.

Ed Okerson

> Thanks David,
>
> Ed Okerson mentioned this about six months ago in conversation at
> a DPRG meeting - a group in the Bay Area had hardware based SIFT
> working. And I think he sent a link to Roboticore before as I recall
> seeing this. The thing I wondered then is how the board interfaces to
> another computer? It needs to be like a GPU card or a programmable
> network camera with the GPU embedded inside it. I know...this is not
> a product yet. It's a prototype.
>
> I know that hardware acceleration is the only way computer vision
> can work. Doing it all in software just will never be enough for
> the foreseeable future. But as a software guy, I am deeply intimidated
> by electronics. That's why I haven't pursued this direction.
>
> COTS GPU acceleration is something that does appear interesting. This
> appears in published research for Gaussian convolution calculation.
> SIFT uses difference of Gaussian approximation to the Laplacian in
> scale space (the interest points are local extrema). So being able to
> compute Gaussian convolution quickly at different scales is very
> significant.
>
> I'll come clean and say it took me a long time to grasp what SIFT was
> doing. How does this work? Why does it work? I read the papers about it
> and was confused. But one day it clicked.
>
> Laplacians are zero on the edge. On either side of the edge, they are
> of opposite sign. As the scale of the Gaussian convolution operator
> increases, the disturbance from the edge moves farther away from the
> edge. So if you have a circular shape in an image and a Gaussian
> convolution operator of roughly the same size,there will be an extremal
> point near the center of the shape. These are the potential interest
> points.
>
> As you mentioned, there are gradients in a neighborhood around each
> interest point (this works out to 128 scalars, I think). This helps
> distinguish each point and simplifies later classification of these
> clouds of interest points to objects.
>
> I think I read somewhere that no system using interest points has a
> "vocabulary" of more than about a dozen objects it can recognize. So I
> took from this the view that the big problems have yet to be solved.
> I've also seen some research with SLAM using interest points, kind of
> like "visual odometry".
>
> Honestly, I don't know what to do next. Vision is so hard and LIDAR
> SLAM works so well that... it is discouraging. I feel that I need a
> huge boost in computer power, like something in the tens or low
> hundreds of gigaflops range. But this a radical shift in direction.
> It is not embedded at all.
>
> So for now, I'm just working with the problem at hand with my small
> robot Stella.
>
> Chris
>
>
> -----Original Message-----
>>From: David Murphy <dfm7 at earthlink.net>
>>Sent: Jun 27, 2007 2:51 PM
>>To: Chris Jang <cjang at ix.netcom.com>
>>Cc: DPRG <dprglist at dprg.org>
>>Subject: Re: [DPRG] webcam robot vision limitations
>>
>>Hi Chris,
>>
>>I want to share some work with you that you might find interesting
>>and relevant.
>>
>>It will involve some background, so bear with me.
>>
>>About 18months -> 2 years ago, a group of folks in the Home Brew
>>Robotics Club (S.F. Bay area, Ingolf Sander, John Slater, Brandon
>>Blodget, Dave Wyland) became interested in object recognition for
>>robotics and in particular the SIFT algorithm. If you give me some
>>'artistic license' here for the description as I'm not sufficiently
>>familiar with it to be accurate, it works something like this.
>>
>>Scan a scene looking for 'key-points'. A key-point is basically an
>>intersection of lines, or a place where there is a sharp curve in a
>>line, plus the gradient of illumination in the immediate vicinity of
>>the intersection.
>>An object is known by the set of key-points extracted for it during
>>training.
>>After The key-points are extracted from the scene, they are compared
>>against the data base looking for the highest number of matches, and
>>this give you the object(s) in the scene.
>>
>>This algorithm is somewhat tolerant of the rotation of an object and
>>changes in illumination from the training position and apparently
>>mimics the activity of some cells in the visual cortex of mammals.
>>
>>One of the folks working in this project has worked with FPGA's and a
>>lot of what is going on here in the early stages is very amenable to
>>implementation in hardware (extracting lines, intersections,
>>computing gradients, and the like). So their approach was to put all
>>of this initial stuff into hardware and let the CPU worry about
>>database matching and decision making.
>>
>>The FPGA guy in this group, along with another fellow in the club
>>had, a few years ago, built a board for robotics projects that had a
>>xilinx spartan fpga; they programmed the fpga with a micoblaze cpu
>>( a soft core available for xilinx) and ran a version of linux on it.
>>I think they intended to commercialize it, but for whatever reason
>>did not.
>>
>>Ok, now these two groups have gotten together and formed a company
>>called Roboticore to make this stuff available.
>>
>>I saw the demo about a year ago before they formed the company. They
>>had much of the hardware running and in real time could extract the
>>key-points from an image without stressing the FPGA. They demo'd it
>>by outputting the results to an frame buffer and displaying on a
>>monitor. They could wave the camera at the audience and you could see
>>the results on the monitor in real time with no lag.
>>
>>I did not see their recent presentation at the HBRC, but you can see
>>it at the HBRC website. http://www.hbrobotics.org/HBRC_Presentations.htm
>>look for FPGA vision.
>>
>>I guess the point is that a lot of the low level vision stuff is
>>repetitive and uniform and hence amenable to hardware. Putting this
>>stuff into an FPGA (you can get a development board for a spartan
>>FPGA for about $100.00) would free up a lot of CPU cycles for other
>>things.
>>
>>Cheers,
>>David
>>
>>On Jun 26, 2007, at 9:16 PM, Chris Jang wrote:
>>
>>> Hello, I'm not sure if anyone is interested in this...
>>>
>>> But it's something different to discuss.
>>>
>>> I have a small robot with a VGA webcam and ARM9 PC104 SBC. Up until
>>> last night, the performance out of this combination has been
>>> embarrassing - frame rates of around 1.2 fps with a low percentage
>>> of corrupted images.
>>>
>>> After lots of experimentation, the webcam now runs at 15 fps with
>>> long sequences of around 10 seconds without any image corruption.
>>> This includes V4L capture, JPEG decode, and saving to SD flash. I
>>> hope that with some more twiddling, there will be no more corruption.
>>>
>>> Here's the trick (I believe) - cameras have a native frame rate.
>>> If whatever consumes and processes the video does not operate at
>>> pretty much exactly this speed, then output is prone to corruption
>>> due to device/driver sync issues. So I had to put an adaptive
>>> spinning delay loop which adjusts depending on the measured time
>>> between each frame capture. This adaptive delay tries to hold the
>>> capture rate at 15 fps which matches the webcam.
>>>
>>> Ok, so boring...here's the interesting part.
>>>
>>> Now that the basic stuff is worked out, the amount of CPU time
>>> available for processing each frame is known. On the 200 MHz ARM9
>>> based computer board I'm using, roughly half the time is spent
>>> capturing images, decoding and saving them to SD flash (some time
>>> could be recovered by not saving images - but then debugging is
>>> impossible as we can't know what the robot saw). The other half of
>>> the time is available. That's roughly 30 milliseconds at 15 times
>>> each second, once for each image frame from the webcam.
>>>
>>> I think this is enough time for pixel based image segmentation
>>> (fancy term for color blobs). There is enough time for some
>>> statistics and simple morphological filters. But there is not
>>> enough time for any feature based techniques (no convolution).
>>>
>>> Stanford's DARPA Grand Challenge vehicle dedicated one 1.6 GHz
>>> Pentium M computer to 320x240 monocular RGB video. They did color
>>> based image segmentation with some morphological filtering. So even
>>> with over 10x the power of a 200 MHz ARM9, they were still limited
>>> to very sophisticated color blob detection.
>>> _______________________________________________
>>> DPRGlist mailing list
>>> DPRGlist at dprg.org
>>> http://list.dprg.org/mailman/listinfo/dprglist
>>
>
> _______________________________________________
> DPRGlist mailing list
> DPRGlist at dprg.org
> http://list.dprg.org/mailman/listinfo/dprglist
>


More information about the DPRG mailing list

Copyright © 1984 - 2006 Dallas Personal Robotics Group. All rights reserved.
Website Design by NCC

For the latest robot news visit robots.net