I once had a contract job with a similar task: 3D reconstruction of an object from a video (with a few constrains that made the task easier than it soudns). I had to reject it though because of life issues and lack of free time.
Yes, lack of free time is an issue I'm familiar with ^^
Videos have their pros and cons. On the one hand, the change from one frame to the next is very small and can be exploited, and you have tons of frames. On the other hand, video frame resolution is usually rather low (<= Full HD) and most frames will contain motion blur.
Hey, I'd be interested to hear what you're up to. I had an interest in reconstruction which unfortunately (or perhaps fortunately) got put off when I got on an indie game team. I played with some of the tools and some overseas holiday snaps, didn't get great results. Obviously my ability to get more data is limited. ;) It did get me wondering whether you could apply user knowledge to further constrain solutions, for example: this area is sky (infinite distance), automatically or manually specify known geometry to iteratively improve the results (this point cloud really is a solid cylinder), or allow the user to manually tag on the photos areas that match. I was also looking at stitching video from a train ride into panorama-style photos, I can't say the open source image library documentation impressed me. ;)
Getting data shouldn't be a problem because unlike with SAR, LIDAR, ... you don't need any special equipment. Just an off the shelf digital camera.
Integrating knowledge about the world (houses have straight walls, windows tend to be periodic, etc.) is s.th. that I'm going to look into very likely. Obviously I want to refrain from too much user interaction because a) I'm lazy and b) it becomes impossible to compare your results to those in the literature once user interaction becomes a big part of the process.
Apart from exploiting knowledge about the world, I would also like to entertain the idea of trying to extract material parameters like albedo, specularity, glossiness, ambient occlusion and small scale normals from the reconstruction. But I'm not sure how well that works without any control over the light source.
I don't know which libraries you are referring to (OpenCV?), but I noticed that the software that is around is not very beginner friendly. It usually works rather well, but there are no warnings or hints. So if it doesn't work, you get no feedback on how to improve your data. Are you still experimenting with this? If you have any specific questions I'd be happy to help.