implement and understand voxel cone tracing

evelyn4you · 2018-04-26T11:45:08

Hello, i try to implement voxel cone tracing in my game engine. I have read many publications about this, but some crucial portions are still not clear to me. At first step i try to emplement the easiest "poor mans" method a. my test scene "Sponza Atrium" is voxelized completetly in a static voxel grid 128^3 ( structured buffer contains albedo) b. i dont care about "conservative rasterization" and dont use any sparse voxel access structure c. every voxel does have the same color for every side ( top, bottom, front .. ) d. one directional light injects light to the voxels ( another stuctured buffer ) I will try to say what i think is correct ( please correct me ) GI lighting a given vertecie in a ideal method A. we would shoot many ( e.g. 1000 ) rays in the half hemisphere which is oriented according to the normal of that vertecie B. we would take into account every occluder ( which is very much work load) and sample the color from the hit point. C. according to the angle between ray and the vertecie normal we would weigth ( cosin ) the color and sum up all samples and devide by the count of rays Voxel GI lighting In priciple we want to do the same thing with our voxel structure. Even if we would know where the correct hit points of the vertecie are we would have the task to calculate the weighted sum of many voxels. Saving time for weighted summing up of colors of each voxel To save the time for weighted summing up of colors of each voxel we build bricks or clusters. Every 8 neigbour voxels make a "cluster voxel" of level 1, ( this is done recursively for many levels ). The color of a side of a "cluster voxel" is the average of the colors of the four containing voxels sides with the same orientation. After having done this we can sample the far away parts just by sampling the coresponding "cluster voxel with the coresponding level" and get the summed up color. Actually this process is done be mip mapping a texture that contains the colors of the voxels which places the color of the neighbouring voxels also near by in the texture. Cone tracing, howto ?? Here my understanding is confus ?? How is the voxel structure efficiently traced. I simply cannot understand how the occlusion problem is fastly solved so that we know which single voxel or "cluster voxel" of which level we have to sample. Supposed, i am in a dark room that is filled with many boxes of different kind of sizes an i have a pocket lamp e.g. with a pyramid formed light cone - i would see some single voxels near or far - i would also see many different kind of boxes "clustered voxels" of different sizes which are partly occluded How do i make a weighted sum of this ligting area ?? e.g. if i want to sample a "clustered voxel level 4" i have to take into account how much per cent of the area of this "clustered voxel" is occluded. Please be patient with me, i really try to understand but maybe i need some more explanation than others best regards evelyn

Graphics and GPU Programming Programming DX11

Started by evelyn4you April 20, 2018 12:07 PM

20 comments, last by Vilem Otte 6 years, 9 months ago

Vilem Otte

3,390

April 26, 2018 11:45 AM

Images time!

2 hours ago, evelyn4you said:
when tracing a higher resolution grid/3d MipMap the results will also be better but the memory is inreased by factor of 8 which is very much

Yes. Although the quality impact will be huge (I intentionally let reflective sphere in view). Samples have 4x MSAA for deferred shading and 1x MSAA for cone tracing.

Fig. 1 - 64^3 volume

Fig. 2 - 128^3 volume

Fig. 3 - 256^3 volume

Fig. 4 - 512^3 volume

Note, see how much Voxelization phase took. I intentionally re-voxelized whole scene (no progressive static/dynamic optimization is turned on). The computation is done on Radeon Rx 480 (with Ryzen 7 1700 and 16 GB of DDR4@3GHz if I'm not mistaken).

2 hours ago, evelyn4you said:
when tracing smaller cones angles the result will be better but the tracing time i expect will grow MORE than linear

Next image shows for comparison cone tracing with 8 random cones that have angle close to 0 (e.g. they're technically rays).

Fig. 5 - Cones with small angle

This is pretty unacceptable for realtime rendering, and I assume most GPU path tracers could beat the times for similar quality GI. As you noted, the time grows a lot. The only way how to trace these rays efficiently is to use sparse voxel octree (as octree can be used as acceleration structure for actual ray casting - yet the traversal even for rays is quite complex, and I haven't figured out any optimal way to perform cone tracing in octree - aside from sampling and stepping based on max. reached octree level).

2 hours ago, evelyn4you said:
Did you test the tradeoff from both variations. E.g. 4 cones vs. 5, 6, 7 quality improvement vs rendertime, the same about grid size ?

Here are some results, with highest resolution (512^3), no MSAA.

Fig. 6 - 1 cone

Fig. 7 - 5 cones

Fig. 8 - 9 cones

Note, you're now interested in GlobalIllumination in profiler. Which shows how much time was spent in actual cone tracing. The angles for cones were adjusted to cover the hemisphere. Which brings me to...

2 hours ago, evelyn4you said:
This means that the smaller angles will result in more steps to trace, because the geometrical step sice is smaller. Is this understandig OK ?
How does the timings increase in a pratical scenario.

So this won't make much sense in lighting result - as I intentionally used 9 cones (to have the same amount), but changed the angle - so I could demonstrate how angle can impact performance. Now you're again interested in GlobalIllumination in profiler:

Fig. 9 - 9 cones, angle factor 0.3

Fig. 10 - 9 cones, angle factor 0.6

Fig. 11 - 9 cones, angle factor 0.9

Now here is something important - angle factor for me is cosine of half of the apex angle (ratio between radius and height of the cone). You can clearly see that the higher angle factor we have, the higher performance there is (as less steps in cone tracing loop are performed).

2 hours ago, evelyn4you said:
do you sample every single point in screen space or do you sample only every n th pixel which decreases render time with factor 4

So, this will be a bit complex. I don't render at half resolution, I do always render at full - but I do have MSAA support for my deferred pipeline, and let me show you example:

Fig. 12 - 1x MSAA cone tracing, 4x MSAA rendering

Fig. 13 - 4x MSAA cone tracing, 4x MSAA rendering

You will probably need to take those images and zoom on them (and subtract in GIMP or other editor of your choice). There is some small difference on few edges (where object in the middle intersects with floor F.e.). There are multiple ways how to do this:

Using a simple filter that looks at X neighboring pixels and finds most compatible depth/normal - and selecting according GI sample is a good way to go.
Bilateral upsampling
etc.

Notes

I tried to simulate some real-world scenario (with brute-force recomputed voxels), it uses 1 point light and 1 spotlight - both with PCF filtered shadow maps (in first 4 samples I've used PCSS for spotlight). Shadows is computed dynamically (I have virtual shadow maps implementation).
All the surfaces are PBR-based materials, reflective surface is using same material as all the others visible.
Some objects have alpha masks, transparency is handled using sample-alpha-to-coverage
There is some post-processing (filmic tone mapping and bloom)
Rendering is done using deferred renderer (in some cases with 4x MSAA enabled, buffer resolve is done AFTER tone mapping step, at the end of the pipeline)
Renderer is custom Direct3D 12 based, whole engine runs on Windows 10
Hardware, as metnioned before: Ryzen 7 1700, 16 GB of DDR4 RAM and Radeon Rx 480 - if you want full exact specs I can provide
There is some additional overhead due to this being an editor, and not a runtime (teh Imgui!), which is why I intentionally showed profiler - which measures time GPU spent on specific part of the application

I'm quite sure I forgot some details!

EDIT: And yes 1st thing I forgot, probably one of the most important ones. The actual reflection (specular contribution) is calculated in completely separate pass. That is named Reflection. It always uses just a single additional cone per pixel (and yes all objects do calculate it!).