hi,
not really a begginer question as I had implemented similar one back during DirectX 8/9 era. I used Octree and “zones” to render a semi huge city, my question is nowadays with the advent of faster GPUs and more memory, what is the latest algorithms and approaches in rendering huge levels such as that we can see in Ocarina of time or the latest AAA game (not trying to make AAA game here haha) is Octree still the industry standard or are there new tecniques out there like passing the processing to the hardware, thru shaders. etc?
Thanks in advance?
Rendering Huge scene/level how to?
Not here with knowledge about recent industry practices - sorry. Do you recall why you picked an octree over a quadtree back then? Lots of vertical information, perhaps?
cebugdev2 said:
is Octree still the industry standard
To me, BVH seems more attractive because it's more flexible and can do refitting instead rebuild. But that's always a question of details.
What i think is a general trend is having larger branching factors. Taking octree as example, we can use that concept but subdivide each node to 4^3 children instead 2^3. This gives advantages that suit modern hardware: Much less tree levels are necessary so less cache misses, and we get longer lists of content we can iterate with linear access. We likely waste some more memory, but we have a lot of that meanwhile.
it's a combination of methods, depending on your engine, game, story etc…:
- dynamic objects: u can still use octrees to organise your dynamic objects, scenes or terrain, LOD, chunked LOD, etc…
- states: scenegraphs are still in use (for animation, item placement, state sorting…)
- static objects:
- bsps (these guys tend to be compiled in for indoor static geom),
- but also often, depending on the nature of the static scene geom, no bsp is used and no compiling maybe required, instead reliance on z-buffer for early z rejection is preferred ("just throw the whole batch in"); these are convex rooms, caves, gothic enclosures, basically, objects that don't need to be space partitioned …etc…
- frustum culling: still used to get rid of objects outside the camera view frustum
- occlusion culling: still used to get rid of objects (in view frustum) but which are hidden by other objects also in view frustum (for example, a tall building hiding a park from the player's point of view, the park can be removed from the frustum and not rendered if the park is completely hidden behind the building)…
- pvs, clustered pvs, areas, portals, zones, etc…
- voxels also making their way in…
- clustered rendering where the frustum is subdivided in clusters so to speak…
- hardware queries…
- height maps, depth maps, ….
etc…
the list goes on…
and don't forget whatever else you can think of ?
In my experience i dare to say that Octrees are still a good choice, nowadays the problem is sending data to the gpu in the fastest way possibile. I have a project running on, and i tried different methods , first i used an old bsp version of mine, and i found that it is not suitable anymore for such purposes. Gpu works on vertices, sending many surfaces one after the other stalls gpus very quickly. Then i tried the grid approach, large chunks of data were stored in each grid cell, some surfaces spanned multiple cells and i flagged those surfaces to be sent only one. The problem is still the same, creating a vertex buffer with all attribs and send them to the gpu in the rednering loop will stall your gpu. Then i tried the Octree approach, same problem, the problem is not the data structure, its the vertex bandwidth. So I decided to use the Octree approach and to clip each triangle contained in the cell, this gives more triangles, but for each cell i create a vertex buffer and I statically send it to the gpu, when the node is on screen I draw it with a single command.
So basically use the Octree or Grid approach, for each surface of your level compute where it has to go ( I do not want to explain the workings of an Octree or Grid data structure, from what I see you know that very well ), clip that triangle, against the Octree bounding box , store the triangles obtained, you will end with a nice box of triangle perfectly fitting the octree node. At this point create the vertex buffer object for each node and you will have a performant octree visibility data structure. In my project the code is so fast that i am thinking i won't need to store the octree in a file and create it on the fly ( still to be discussed tough )
Levels in Ocarina of Time are both tiny and incredibly low-detail by modern standards. If that's all you've got, you can pretty much just brute force your way through with no visibility culling whatsoever.
Don't know if you are looking for specifics(low level) vs high level concepts here. In the end, the issue remains today just as it did in the past and that is “how do you fit an ‘infinite’ world in a finite discrete space? “ Finite-discrete meaning a computer with limited memory and storage. Until computers are equip with unlimited memory and storage, this problem will always exist so techniques such as spatial partitioning ( of which octree is just 1 specific ex ), streaming etc is still needed. The specifics of which to use is context specific and tailored to a specific use case. Granted a lot of these techniques can be modified to fit ‘modern’ hardware usage or different architecture, but in the end they are still based off these “older techniques”.
I work on a custom AAA game engine used to render very large open-world games. For (legacy) reasons we have two rendering code paths in our engine.
- Legacy (all CPU): Geometry is merged as much as possible offline. Runtime consists of hierarchical frustum cull + BSP check, with an optional occlusion culling check using last frame's reprojected depth buffer.
- New (some CPU, mostly GPU): Drawable instances are aggregated into “sets” of things using the same material and bounding boxes are merged offline. At runtime, merged bounding box is used to do very coarse frustum cull and occlusion cull on the CPU to throw out or accept an entire set (could be tens of thousands of objects). GPU async compute is then used to cull all instances in the set (frustum + occlusion again) and remaining objects are drawn using indirect draws. We successfully process >2million instances per frame using this method.
We actually have a relatively simple version of #2 compared to others in the industry - we've stopped at instance-level culling, but many studios will additionally cull meshlets (~256 tris) of each instance and even individual triangles.
For anything less than a gigantic open-world game, simpler methods are probably fine (especially if using a lower-level graphics API like DX12 or Vulkan) since draw calls are so much cheaper.