Advertisement

implement and understand voxel cone tracing

Started by April 20, 2018 12:07 PM
20 comments, last by Vilem Otte 6 years, 9 months ago

Images time!

2 hours ago, evelyn4you said:

when tracing a higher resolution grid/3d MipMap the results will also be better but the memory is inreased by factor of 8 which is very much

Yes. Although the quality impact will be huge (I intentionally let reflective sphere in view). Samples have 4x MSAA for deferred shading and 1x MSAA for cone tracing.

eBk5SPQ.png

Fig. 1 - 64^3 volume

zH82K24.png

Fig. 2 - 128^3 volume 

RTgOc2g.png

Fig. 3 - 256^3 volume 

QguX96d.png

Fig. 4 - 512^3 volume 

Note, see how much Voxelization phase took. I intentionally re-voxelized whole scene (no progressive static/dynamic optimization is turned on). The computation is done on Radeon Rx 480 (with Ryzen 7 1700 and 16 GB of DDR4@3GHz if I'm not mistaken).

2 hours ago, evelyn4you said:

when  tracing smaller cones angles the result will be better but the tracing time i expect will grow MORE than linear

Next image shows for comparison cone tracing with 8 random cones that have angle close to 0 (e.g. they're technically rays).

iHTiZtQ.png

Fig. 5 - Cones with small angle

This is pretty unacceptable for realtime rendering, and I assume most GPU path tracers could beat the times for similar quality GI. As you noted, the time grows a lot. The only way how to trace these rays efficiently is to use sparse voxel octree (as octree can be used as acceleration structure for actual ray casting - yet the traversal even for rays is quite complex, and I haven't figured out any optimal way to perform cone tracing in octree - aside from sampling and stepping based on max. reached octree level).

2 hours ago, evelyn4you said:

Did you test the tradeoff from both variations. E.g. 4 cones vs. 5, 6, 7  quality improvement vs rendertime, the same about grid size ?

Here are some results, with highest resolution (512^3), no MSAA.

RR9IwX3.png

Fig. 6 - 1 cone

DOXY4KQ.png

Fig. 7 - 5 cones

wwuol4t.png

Fig. 8 - 9 cones

Note, you're now interested in GlobalIllumination in profiler. Which shows how much time was spent in actual cone tracing. The angles for cones were adjusted to cover the hemisphere. Which brings me to...

2 hours ago, evelyn4you said:

This means that the smaller angles will result in more steps to trace, because the geometrical step sice is smaller. Is this understandig OK ?
How does the timings increase in a pratical scenario.

So this won't make much sense in lighting result - as I intentionally used 9 cones (to have the same amount), but changed the angle - so I could demonstrate how angle can impact performance. Now you're again interested in GlobalIllumination in profiler:

l39ENMg.png

Fig. 9 - 9 cones, angle factor 0.3

nd7qAd1.png

Fig. 10 - 9 cones, angle factor 0.6

m1gZXo6.png

Fig. 11 - 9 cones, angle factor 0.9 

Now here is something important - angle factor for me is cosine of half of the apex angle (ratio between radius and height of the cone). You can clearly see that the higher angle factor we have, the higher performance there is (as less steps in cone tracing loop are performed).

2 hours ago, evelyn4you said:

do you sample every single point in screen space or do you sample only every n th pixel which decreases render time with factor 4

So, this will be a bit complex. I don't render at half resolution, I do always render at full - but I do have MSAA support for my deferred pipeline, and let me show you example:

RTgOc2g.png

Fig. 12 - 1x MSAA cone tracing, 4x MSAA rendering 

FtFSKbC.png

Fig. 13 - 4x MSAA cone tracing, 4x MSAA rendering

You will probably need to take those images and zoom on them (and subtract in GIMP or other editor of your choice). There is some small difference on few edges (where object in the middle intersects with floor F.e.). There are multiple ways how to do this:

  • Using a simple filter that looks at X neighboring pixels and finds most compatible depth/normal - and selecting according GI sample is a good way to go.
  • Bilateral upsampling
  • etc.

Notes

  • I tried to simulate some real-world scenario (with brute-force recomputed voxels), it uses 1 point light and 1 spotlight - both with PCF filtered shadow maps (in first 4 samples I've used PCSS for spotlight). Shadows is computed dynamically (I have virtual shadow maps implementation).
  • All the surfaces are PBR-based materials, reflective surface is using same material as all the others visible.
  • Some objects have alpha masks, transparency is handled using sample-alpha-to-coverage
  • There is some post-processing (filmic tone mapping and bloom)
  • Rendering is done using deferred renderer (in some cases with 4x MSAA enabled, buffer resolve is done AFTER tone mapping step, at the end of the pipeline)
  • Renderer is custom Direct3D 12 based, whole engine runs on Windows 10
  • Hardware, as metnioned before: Ryzen 7 1700, 16 GB of DDR4 RAM and Radeon Rx 480 - if you want full exact specs I can provide
  • There is some additional overhead due to this being an editor, and not a runtime (teh Imgui!), which is why I intentionally showed profiler - which measures time GPU spent on specific part of the application

I'm quite sure I forgot some details!

EDIT: And yes 1st thing I forgot, probably one of the most important ones. The actual reflection (specular contribution) is calculated in completely separate pass. That is named Reflection. It always uses just a single additional cone per pixel (and yes all objects do calculate it!).

DYprjKA.png

This shows the reflection buffer.

My current blog on programming, linux and stuff - http://gameprogrammerdiary.blogspot.com

This topic is closed to new replies.

Advertisement