Advertisement

Robotic Imagery

Started by February 07, 2004 11:42 PM
11 comments, last by CrunchDown 21 years ago
Does anybody know of a program, or library of some sort, where I can scan in real time an image coming from a web cam? Is there some way to interface the TWAIN driver directly into memory for analyzation by my imagery program? I''m developping a program which, hopefully, can convert the 2d image of a 3d scene into 3d objects in the computer''s brain(3d world) using motion for navigation by the robot.
just a comment, you''ll need two cameras which are split by a known offset to get any sort of 3d derivation.
The Love Of Trees
Advertisement
quote:
Original post by strider44
just a comment, you''ll need two cameras which are split by a known offset to get any sort of 3d derivation.


Not true. You can simply move one camera to achieve this. However, even then, you''ll just get a bad-looking heightmap from this method.

-~-The Cow of Darkness-~-
I am interested in this too. And maybe also some technical articles on motion capturing.
Assuming you''re using Windows you can use DirectShow to capture from the webcam, but''s its a real pain to do.
You could also take a look at the ARToolKit source code. Grab the version 2.52 downloads that use DirectShow and Video For Windows. There''s also a version for Linux.

Enigma
Yes, though I respect everyone''s ideas and give them thought, to me it is out of the question to have two cameras for the job.

The human brain doesn''t need two eyes, try for yourself. Because of this I don''t believe any computer imaging system should require two offset cameras because if it does, it uses a different method then our mind does.

The philosophy behind my system is this example: (and I can elaborate hugely if you guys are interested) is that if there''s a ZIP file on your computer, the zip file does not jump up and get running when you click on it, rather, Windows compares the extension with that of all known file extensions. It compares only what it knows to what it sees. My imaging tech will only be able to distinguish what it''s looking at unless it has a predetermined model to compare. The rest is just filling holes.

The short and long is, computers aren''t human and they cannot simply look at an object and make sense of it. They have to try fit their own idea of what it is, and what the computer thinks it should be, on to it. This idea works its way down through the system which I''m abstracting on right now, and the way I work it we should have already established alot of the technology to do this. In fact, all the techniques can be built right ontop of alot of the functions available in OpenGL.(I''m not a DX fan, I''ve only learned GL and it seems to do the job)

The goal of my research is to build a cheap, fast, independent program that needs no more than one camera to understand it''s surroundings and, with a glance, and a little process comparison, it can construct in it''s memory a 3d world which reflects closely the real world around it.

This kind of technology would enable us to build robots that we have always wanted. If you guys want to discuss this, I''m all up for it, I just hope we''re in the right forum!
Advertisement
Oh, and did I forget to say I think the system should be able to look at any scene we are exposed to and resolve it. Not just a dinky little red ball finder, no, this vision system would be told to find a path of adequate size in front, or a machine or object of a certain size.

I''m working on some concept photos right now of the imaging process which I hope will help project my ideas better and so everybody can get a clear idea(I visualize everything) and challenge me that better.

With that said, rip my little theory apart.
Have a look at DirectShow, that should do you for the image capture end of things, not the fastest but it works under windows. Under linux you can use Video 4 Linux.

The image processing end of things has been done before so my suggestion is to use a 3rd party library like OpenCV to give you the best shot at getting finished in a resonable time. It comes with code and samples that deal with stereo vision, and some sophisticated tracking systems (Kalman and HMM).

You might want to look into some of the papers around from Robocup (the robot soccer competition). They get the sony aibo dogs to do some really amazing 3d localisation with very minimal resources.

[edit]typo[/edit]

[edited by - XXX_Andrew_XXX on February 10, 2004 11:40:23 PM]
Thanks. I''ll be sure to check that out
no, you are incorrect when you say our human brain can do it with only 1 eye. You already have memories in your mind stored up of how objects look in 3 dimensions & use that data to make sense of what you see, not even decode it, because you really can''t tell, you just make sense of it. I had a friend who only had 1 eye, & believe me... sometimes he couldn''t tell if a moth was fluttering 3 feet away or a bird was flying 3 dozen feet away. That is just a simple example... lets not even get into the details of all the objects, shapes, colors, light intensities, blah blah blah that might be present in a scene for a camera to decode.
Whatsoever you do, do it heartily as to the Lord, and not unto men.

This topic is closed to new replies.

Advertisement