1. I know there is (rather was) game called Fuel.
Fuel is not big. Fuel is 249.79 miles across.
http://static.giantbomb.com/uploads/original/3/30984/1366065-xju7q.jpg
by comparison:
Caveman v3.0 is 2500 miles across.
Airships! iv1.0 s 9030 miles across - from the Mississippi to the Urals, and from the Arctic to North Africa.
and SIMSpace v8.0 is a cube about 150 light years across.
all are procedurally generated.
But what about planet sized world?
Caveman is about the size of north america, but could easily be a big as i want. i'm not even paging the world map off of disk yet. everything is stored in ram, generated on the fly, and cached. its so fast there's not even time to display a "loading area..." or "generating area..." message. and that's single thread at 1.3Ghz clock speed.
What I have in mind is something slightly similar to Google earth data (may be even could extract data from there). Only instead of high res images, have either image based (displacement maps) or other type of data for basic terrain generation that then gets detailed out with procedural generation.
So what it would take?
not that big a deal, you use google map data as input to your terrain data conversion code. this spits out your "world map". you're not really generating anything yet, just turning google maps data into world map data. once that's done you can use procedural content generation to populate the world with whatever you want (cities, roads, buildings, people, etc). obviously a chunk based approach - most likely paging from disk - would be called for, given the vast amounts of data.
I would like to keep as much data as possible outside the gaming device.
you're going to need a hard drive, or a LOT of ram.
Also there would be incoming data after initial world is built - terraforming, buildings, vegetation etc (this as much procedurally generated as possible too).
its just more stuff. more of the same.
2. Which game engine would be suitable for something like this? Existing ones can handle it or need new engine?
any engine with built in support for large game worlds. unreal supposedly has this. not sure if unity does it yet other engines may have it as well.
obviously, the terrain data conversion code and procedural content generation code specific to your title are your responsibility.
P.S. If possible I would like to keep this to technological side, for now. I know it isn't the smallest project idea. How to do it... will figure it out later. Probably Indiegogo.
on the contrary, assuming the google map data is available and in a relatively easy to use format, you add a google map file reader to a terrain gen engine, add unreal, add game specific procedural content generation, get them to all talk to each other, press the button, and out comes your game world, complete with object and actors. then you just have to code actor behavior. the real killer is the assets. you'll be procedurally generating ground meshes, and perhaps ground textures too. you will procedurally generate game objects, but they will still need assets (meshes, models, textures, sound effects, AI scripts, etc). and those assets must be made, bought or procedurally generated.
big worlds aren't lots of work. lots of assets are lots of work.