I spent a lot of time (a few years) figuring out the best way to do solar system+ scale worlds (blog post) and the best overall method I came up with was to use floats for everything, where each object has a pointer to a CoordinateFrame object, which itself has a double position offset relative to its parent CoordinateFrame. This forms a hierarchy of coordinate systems, which would support a world of any size (even beyond what double would support).
class CoordinateFrame
{
Vector3<double> position;
// I also have rotation here but it's optional and adds a lot of complexity
CoordinateFrame* parent; // NULL if at root level
std::vector<CoordinateFrame*> children;
CoordinateFrameGenerator* childGenerator;
};
class SceneObject
{
Transform3<float> transform;
CoordinateFrame* frame; // NULL if at root level
// Components, child objects, etc
};
This way you can do most of your calculations in single precision, except for the case when objects are in different frames. Each coordinate frame is large enough (about 1.6km) to handle the majority of objects in a typical scene. Each frame that an object moves, I check to see if it is still within the valid distance from the origin of its CoordinateFrame, and if not, then I do a slower traversal of the frames to find the CoordinateFrame that the object is closest to.
I also have a concept of a CoordinateFrameGenerator, which allows a CoordinateFrame to generate child CoordinateFrames within it, for objects that move outside the bounds of existing frames. For instance, a solar system has a coordinate frame at the center, which has a frame generator that places new frames on a grid as objects move outside of existing frames. Similarly, a planet has its own frame and generator that places frames on the planet's surface every 1.6km. So, I have a solar system frame containing a planet frame, which contains surface frames. It's frames all the way down.
To do something like render an object relative to a camera, I consider the frame of the object and camera. If they are the same, then you do rendering like normal. If they are different, then I compute a (single precision) unit vector which points from the camera to the object's position, as well as double precision distance along that vector, by walking the frame hierarchy and doing the fewest transforms necessary to preserve precision along the way. Using this unit vector and distance, it is possible to render objects at any distance by scaling the far away objects down around the camera.
Physics are harder, if you want to have interactions (collisions, gravity) between frames. I have a custom physics engine where I integrated this CoordinateFrame concept on a deep level. This keeps things efficient in the common case, and things only get slower when interacting objects are far apart (a rare thing), or at the frame boundaries. I see about a 2x slowdown in an N-body gravity simulation with CoordinateFrames vs. no frames. Not bad at all for a rare case.
This is working well for me, but I'll admit its not simple to implement because it requires deep integration into all of the core engine systems.
Doubles:
- - Only works with sufficient precision (~0.1mm) out to Neptune's orbit.
- - 8 bytes vs. 4 bytes, which leads to slower performance due to more cache misses
- - Not supported (efficiently) by GPU, still need to convert to camera-relative floats
- - Not supported by most physics engines
- - SIMD support is not great (requires AVX, which doesn't exist on ARM macs)
- + Simple arithmetic
Int + Float:
- + Fast within a chunk, can use existing physics within chunk
- + can do physics in each chunk separately
- + can do SIMD easily on all platforms
- - boundaries are hard (e.g. how to do physics/render efficiently across chunks).
- - Slow arithmetic between chunks
- - Limited to 32 or 64 bit chunk indices, which is not quite enough for a large galaxy (32 bit) or universe (64).
CoordinateFrame + Float:
- + Can handle any size world by just adding more hierarchy levels
- + Fast performance within a chunk (same as a small-scale game engine)
- + Acceptable performance across chunks
- + can do SIMD easily on all platforms
- - Harder to implement, especially physics.
- - Difficult to spawn objects at absolute positions, because everything is defined relative to a frame generated at runtime. I'm struggling with how to do this properly.
I also looked into other stuff like 80-bit, 128-bit floats, or float-float arithmetic and discarded these because they are slow and/or not supported on many platforms/compilers.