Advertisement

Using SDF rendering with large world?

Started by April 10, 2020 03:41 AM
5 comments, last by JoeJ 4 years, 8 months ago

There are lots of cool things you can do with SDFs to render implicit surfaces. Most of the examples I've seen have been “all in the shader”, where the entire distance function is encoded in the shader source code.

Examples:

  • https://www.iquilezles.org/www/articles/distfunctions2d/distfunctions2d.htm
  • https://www.shadertoy.com/view/3lsSzf

The problem is that all of those examples assume the shader has total knowledge of the entire world.

I want to make a large world (too large to fit in a shader, certainly), and am looking for approaches to render it with the SDF approach.

Does anyone have experience with this? I am 2D-only, if it helps.

The stumbling block seems to be: Since the rendering happens in the fragment shader, I somehow have to transfer “game world” information into that shader. But there do not seem to be good ways to send bulk data to the fragment shader.

Some ideas I have had:

  • Write one generic shader that can draw, say, a combination of 500 SDFs.
    • The input to the shader (maybe a UBO?) would contain an encoded version of a piece of the world, with commands like “put a circle at (x,y,radius), do a union with the next object, …” to build the total SDF for that piece of the world.
    • On the CPU side, I'd have to split the world into chunks that could be rendered by that shader, and populate the input data appropriately for each draw call.
    • So for 2D, I might do a tile-based render of the screen, where each tile has “small” amount of data, enough to be handled in the shader.
  • Generate shaders on-the-fly, depending on the part of the world I want to render.
    • Here, the shader code would look a lot like the “everything baked in” shaders, but I'd just be generating the code on the CPU.
    • This approach seems bad though, since compiling shaders is a pretty heavy process, in my experience. I did a test with an SDF that was the union of a few thousand circles, and it took 30sec+ to compile.

Any other ideas?

None

The obvious alternative to processing distance functions per primitive would to generate a single distance field form multiple (or all) primitives.
So, before rendering generate a big texture that has distance to the closest primitive per texel.
Ofc. this causes some error due to texture resolution, so a circle smaller than few texels could appear blocky and not perfectly round.

Looking up some resources about the game ‘Claybook’ bight be helpful, which is 3D and uses SDF for all graphics and physics.

Advertisement

jwdevel said:
Some ideas I have had: Write one generic shader that can draw, say, a combination of 500 SDFs. The input to the shader (maybe a UBO?) would contain an encoded version of a piece of the world, with commands like “put a circle at (x,y,radius), do a union with the next object, …” to build the total SDF for that piece of the world. On the CPU side, I'd have to split the world into chunks that could be rendered by that shader, and populate the input data appropriately for each draw call. So for 2D, I might do a tile-based render of the screen, where each tile has “small” amount of data, enough to be handled in the shader. Generate shaders on-the-fly, depending on the part of the world I want to render. Here, the shader code would look a lot like the “everything baked in” shaders, but I'd just be generating the code on the CPU. This approach seems bad though, since compiling shaders is a pretty heavy process, in my experience. I did a test with an SDF that was the union of a few thousand circles, and it took 30sec+ to compile. Any other ideas?

Your first suggestion sounds much better, but i would describe it a bit differently:

Use a finer grained grid instead large chunks. E.g. each grid cell referring to a tile of e.g. 16 x 16 pixels on screen. For each cell store the list of primitives that are the closest to at least one pixel of the cell.
(Similar preprocessing task as generating global SDF - both ideas could be combined if you wanted a LOD solution for 3D perspective view)

On GPU, each thread of a wavefront would then iterate the same list of primitives, processing them in the same order without data and execution divergence. So this is fast, just make sure wavefronts align to grid tiling.

I'm unsure if you really need distance to primitives, or just the intersections of primitives. The latter would make building the grid acceleration structure easier and faster.

… alternative to processing distance functions per primitive would to generate a single distance field form multiple (or all) primitives.
So, before rendering generate a big texture …

I think you mean to do this on the CPU side? Unfortunately, I'm seriously CPU-bound already so am looking to push things to the GPU where possible.

I will check out Claybook though, thanks.

Use a finer grained grid instead large chunks.

Is the idea here to line up the “chunk size” to the CU size of the GPU?

So this is fast, just make sure wavefronts align to grid tiling.

Could you elaborate here? Is there a generic way to do that, like via OpenGL, etc?

I'm unsure if you really need distance to primitives, or just the intersections of primitives.

Well, it's still experimental right now, so I'm not sure. I think I want to do distance-based effects though (eg: ‘glow’ surrounding an object, per-pixel AA, and similar), so that's my current direction.

Thanks for your responses!

None

jwdevel said:
I think you mean to do this on the CPU side?

Can be done on GPU as well and faster. The Dreams game is an example.

jwdevel said:
Is the idea here to line up the “chunk size” to the CU size of the GPU?

It should match but can be larger. if you have 8 x 8 tiles, 64 threads will be happy. Also 32, because the whole CU still iterates the same data. So 64*64 would also work, but would contain primitives that could be culled using smaller tiles.
A bad example would be if you use 10 x 10 pixel tiles - here different threads of the same CU would end up processing different tiles, resulting in reading different data and executing different branches of code. I'd expect a slowdown larger than factor 2 at least… depends.

Using pixel shaders you just have to know this, and do some experimentation and profiling to find a sweet spot.

With compute shader the underlying HW is better exposed and things become clear more easily. (But if there is no need for threads to interact with each other, pixel shaders could be faster)

jwdevel said:
Is there a generic way to do that, like via OpenGL, etc?

I don't know for sure. Probably, if you draw a full screen triangle, CUs are assigned to quads starting from (0,0). But i don't know if NVidia (and now RDNA) would size them 8x4 or 4x8, while GCN would use 8x8, or what ever else.
There should be some resources around, and some people here should know better.

To be sure, personally i would use compute shaders, which gives full control over this. And after this is working, i'd copy the code to pixel shaders and test on multiple HW.
In doubt, a minimum size of 8 x 8 tiles should just work well everywhere with pixel shaders.

oh... i realize: If you want to scroll the grid, compute shader might be the only option to keep grid and CUs in sync at all.
And if you want to zoom in and out so grid and screen ratio changes, the proposed optimization is only possible if you rebuild the grid each frame.

Though, likely you don't have to worry about any of this until you see an actual performance problem.

jwdevel said:
I think I want to do distance-based effects though (eg: ‘glow’ surrounding an object, per-pixel AA, and similar)

hmmm… maybe then just draw one quad per primitive for now? Would be most flexible and no need to think about acceleration structure like the grid. If perf becomes an issue you could still try the optimizations.

This topic is closed to new replies.

Advertisement