3D model recontruction from 2D picture(s)
Hello everybody,
Recently a good friend of mine has asked me about a quite interesting problem. I had no clue about the solution, so I decided to ask somebody who is more educated in this way :)
His question was, if it is possible to somehow reconstruct a 3D model(only points) from a picture or a series of pictures(every frame shot from a different angle) of human faces. The pictures would be a normal 32-bit pictures in high resolution. By the way, would a different way of capturing help to solve this problem?
Have somebody any idea if this can be solved somehow? If yes, how?
Thank a lot for our answers
Yes. It is a very difficult problem, one requiring immense amounts of math. Here is the book about how to do it. (Here's another one.)
Quote: Original post by Sneftel
Yes. It is a very difficult problem, one requiring immense amounts of math. Here is the book about how to do it. (Here's another one.)
How difficult this is really depends on what your assumptions are. I did this in college during a one semester course in computer vision. If you want you could create a 3D model, take two shots and try and reconstruct it. It would teach you the basics and give you a very controlable sand box. I wouldn't start with two real world pictures...
Don't shoot! I'm with the science team.....
There are essentially two competing methods for solving this problem.
The first is to have an a priori parametric model of the thing you expect to see in the image and then to find the best set of parameters for the model that explains the data you see. This method is known by a variety of names, but if you look for 'image understanding', you'll find information on this method.
The second method is to infer the 3D position of each point on the surface of the object in the 2D image from the seqence of images and then construct a 2D surface embedded in 3 space from those inferred points. This is the method of surface reconstruction.
The results of the latter are generally less optimal, since they are sensitive to noise in the 2D image. However, the latter method makes no assumption as to the underlying object structure in the image, thus making it more widely applicable.
The actual method you choose would largely depend on what you want to do with the reconstructed 3D object. If you were, for example, trying to do facial recognition, then the first method would give better results than the second since you can perform the classification in the model parameter space (Of course, there are good ways of doing facial classification using just the 2D image). If you simply wanted to create a 3D 'head' of someones face as seen in a camera, so you could display it on a computer screen, then the latter method would be sufficient.
There is a huge amount of literature out there on this problem so you shouldn't be left in the dark if you choose to follow up on the solution methods.
Cheers,
Timkin
The first is to have an a priori parametric model of the thing you expect to see in the image and then to find the best set of parameters for the model that explains the data you see. This method is known by a variety of names, but if you look for 'image understanding', you'll find information on this method.
The second method is to infer the 3D position of each point on the surface of the object in the 2D image from the seqence of images and then construct a 2D surface embedded in 3 space from those inferred points. This is the method of surface reconstruction.
The results of the latter are generally less optimal, since they are sensitive to noise in the 2D image. However, the latter method makes no assumption as to the underlying object structure in the image, thus making it more widely applicable.
The actual method you choose would largely depend on what you want to do with the reconstructed 3D object. If you were, for example, trying to do facial recognition, then the first method would give better results than the second since you can perform the classification in the model parameter space (Of course, there are good ways of doing facial classification using just the 2D image). If you simply wanted to create a 3D 'head' of someones face as seen in a camera, so you could display it on a computer screen, then the latter method would be sufficient.
There is a huge amount of literature out there on this problem so you shouldn't be left in the dark if you choose to follow up on the solution methods.
Cheers,
Timkin
I've a friend who is specifically working in this domain. The results of his team are pretty amazing. I'll try to get in touch with him and show him this thread.
I work& do research in this domain, quite interesting :)
From one picture, the only serious class of method I can think of is called "Shape from shading". You use the variation in shadows on the face to compute the shape...
For two or more images, if you know the exact change in camera position and orientation between your shots, you can use a simple technique generally called "stereo". Basically, you find a per-pixel correspondance between a pair of images. Knowing the camera motion parameters, the magnitude of each pixel displacement will be proportional to the distance of that point from the camera.
Of course, this suppose that the images are taken simultaneously. If they are not, even the smallest motion of the person will screw your result. You also need a high precision and image resolution.
If you generalize this class of technique, it is called "shape from motion".
You can PM me if you want the equations...
From one picture, the only serious class of method I can think of is called "Shape from shading". You use the variation in shadows on the face to compute the shape...
For two or more images, if you know the exact change in camera position and orientation between your shots, you can use a simple technique generally called "stereo". Basically, you find a per-pixel correspondance between a pair of images. Knowing the camera motion parameters, the magnitude of each pixel displacement will be proportional to the distance of that point from the camera.
Of course, this suppose that the images are taken simultaneously. If they are not, even the smallest motion of the person will screw your result. You also need a high precision and image resolution.
If you generalize this class of technique, it is called "shape from motion".
You can PM me if you want the equations...
Check out my website, under Projects. I have a few screen shots from an application I wrote which extracts 3D information from image pairs.
http://www.nentari.com/stereo_image_processing.htm
and again at
http://www.nentari.com/3d_shadow_scanner.htm
Both pages are a work in progress, but I promise I'll update them with more information one of these days. :)
Essentially, what Steadtler said, only I've never done anything quite so precise as knowing the cameras position. :)
Will
http://www.nentari.com/stereo_image_processing.htm
and again at
http://www.nentari.com/3d_shadow_scanner.htm
Both pages are a work in progress, but I promise I'll update them with more information one of these days. :)
Essentially, what Steadtler said, only I've never done anything quite so precise as knowing the cameras position. :)
Will
------------------http://www.nentari.com
Nice work, Rpgeezus.
I see that for stereo you use camera at different angles, the problem is MUCH simpler if you use cameras with parralel optical axis. Camera rotation yield not information about depth anyway.
edit: knowing the camera relative positions is not so important... you can still get results up to a constant factor (as you have done, I presume)
About that shape from shadow on your site, (Bouguet and Perona), thats old stuff. If you want something much more evolved, check out moire patterns. Instead of projecting a band of shadow, you can project one or several grid patterns on the object. I am currently measuring objects with precision in the micron range... thats 1/1000 of a millimeter :P
I see that for stereo you use camera at different angles, the problem is MUCH simpler if you use cameras with parralel optical axis. Camera rotation yield not information about depth anyway.
edit: knowing the camera relative positions is not so important... you can still get results up to a constant factor (as you have done, I presume)
About that shape from shadow on your site, (Bouguet and Perona), thats old stuff. If you want something much more evolved, check out moire patterns. Instead of projecting a band of shadow, you can project one or several grid patterns on the object. I am currently measuring objects with precision in the micron range... thats 1/1000 of a millimeter :P
You can find a lot of information about 3D reconstruction from 2D images on the robotvis project website :
http://www-sop.inria.fr/robotvis/
Unfortunately, the project has ended, but there is still a lot of valuable information on their pages.
About what you're specifically looking for (3D reconstruction of human faces), have a look at this page :
http://www-sop.inria.fr/robotvis/demo/diffprop/
More recent works about this subject are available there :
http://www-rocq.inria.fr/~gouet/Recherche/These/recherche_anglais.html
http://www-sop.inria.fr/robotvis/
Unfortunately, the project has ended, but there is still a lot of valuable information on their pages.
About what you're specifically looking for (3D reconstruction of human faces), have a look at this page :
http://www-sop.inria.fr/robotvis/demo/diffprop/
More recent works about this subject are available there :
http://www-rocq.inria.fr/~gouet/Recherche/These/recherche_anglais.html
I'm working in the related area - augmented reality - 3d registration of the marker (though there are also markerless registration methods).
What you are looking for is a very common task in 3d registration. It's not very difficalt mathematically - require some staff from linear algebra - understanding eigenvalues and egenvectors and, no more.
http://en.wikipedia.org/wiki/Eigenvalue
http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG54
However process itself not quite simple.
3d reconsruction form multiple (usually two) pictures usually go like this:
identify feature points (points of interes) on the pictures , and finde correspondent points on both picture.
http://homepages.inf.ed.ac.uk/rbf/CVonline/feature.htm
Build fundamental matrics
http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG82
solve epipolar constarin
http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG91
with sngular value decompositon, throwing away zero eigenvalues.
http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG61
and you have your 3d reconsruction - complete model, up to some error.
Those are basic steps. You can google each term - there is a lot of articles on the web.
There are also others, more arcane methods, but this is kind of standart method.
What you are looking for is a very common task in 3d registration. It's not very difficalt mathematically - require some staff from linear algebra - understanding eigenvalues and egenvectors and, no more.
http://en.wikipedia.org/wiki/Eigenvalue
http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG54
However process itself not quite simple.
3d reconsruction form multiple (usually two) pictures usually go like this:
identify feature points (points of interes) on the pictures , and finde correspondent points on both picture.
http://homepages.inf.ed.ac.uk/rbf/CVonline/feature.htm
Build fundamental matrics
http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG82
solve epipolar constarin
http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG91
with sngular value decompositon, throwing away zero eigenvalues.
http://homepages.inf.ed.ac.uk/cgi/rbf/CVONLINE/entries.pl?TAG61
and you have your 3d reconsruction - complete model, up to some error.
Those are basic steps. You can google each term - there is a lot of articles on the web.
There are also others, more arcane methods, but this is kind of standart method.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement