Advertisement

Samplig shadow maps

Started by October 30, 2024 09:43 AM
18 comments, last by NubDevice 9 hours, 55 minutes ago

I have worked on area light shadows approximation using shadow maps, but the one method showing good quality seems pretty expensive.

For a lookup i need a 4x4 kernel of SM depth values in GPU registers for a higher order filter. (Just depth, no additional squares or moments.)

I read there is some form of HW acceleration for PCF, and 16 samples isn't uncommon for PCF, so maybe i can profit form that too.
But how can we enable this accleration? Looking up PCF shaders they do just a loop to gather the samples one after another, so i guess it's up to compilers to figure out acceleration can be used. Is there a catch here?

I also know there is a texture lookup function to get all 4 texels of a bilinear lookup.
Maybe this is faster than getting 4 unfiltered samples sequentially as with PCF, and i should use that instead?

And finally, can somebody tell how practical it is these days to use advanced soft shadow mapping techniques?
Remembering related papers from the 2010s, they all were very expensive back then, using just a single light.
On the other hand i see ‘raytraced’ shadow maps in UE5, which is probably more expensive than my stuff.
Is this something one can use for multiple lights, or is it more meant for cutsenes or just a single light?
(I'll need to sample 3 mip levels, so that's 16*3 texture fetches in total for a single lookup.)

I read there is some form of HW acceleration for PCF, and 16 samples isn't uncommon for PCF, so maybe i can profit form that too.
But how can we enable this accleration? Looking up PCF shaders they do just a loop to gather the samples one after another, so i guess it's up to compilers to figure out acceleration can be used. Is there a catch here?

In openGL the shader code sets the sampler/texture type as sampler2DShadow. If on the CPU you bind the texture as LINEAR filtering, then it knows it's doing a linear (2x2) depth compare. I don't think there is any way to set the linear sampling radius via hardware which is why they are doing a loop in a shader. I never messed with soft shadow techniques that much. There something called Summed Area Tables which I believe helps. You can also downsample your shadow map texture (needs a custom downsampling not just averaging). You can sample multiple levels of your shadow map and determine how much coverage a pixel is in by how many times it passes or fails.

There is one thing I did a long time ago called screen space soft shadow. You output a white frame buffer, if a pixel fails a shadow map check, it goes black. Afterwards you blur the black and white image in screen space or you can downsample the image as well, and then combine it back with your standard color output buffer. You have to take depth into consideration so you don't blur a black shadow pixel into the skybox. But it produces decent results.

NBA2K, Madden, Maneater, Killing Floor, Sims

Advertisement

I have to ask: why not consider path tracing, where your light can be any shape?

dpadam450 said:
In openGL the shader code sets the sampler/texture type as sampler2DShadow. If on the CPU you bind the texture as LINEAR filtering, then it knows it's doing a linear (2x2) depth compare. I don't think there is any way to set the linear sampling radius via hardware which is why they are doing a loop in a shader.

Ah, i see. So the HW does 4 samples and returns the percentage which passed the given depth threshold.
That's useful for PCF approaches, as you know where those samples are taken, and thus we can the concept of a sampling radius with weighting.
I can't use it because i do prefiltering instead point sampling, but i see it's not really a big optimization i'm missing out.

dpadam450 said:
There is one thing I did a long time ago called screen space soft shadow. You output a white frame buffer, if a pixel fails a shadow map check, it goes black. Afterwards you blur the black and white image in screen space or you can downsample the image as well, and then combine it back with your standard color output buffer.

I was considering this. I need to inject the direct lighting into a surfel hierarchy for GI, and i can do spatial blurs over the surfels.
But the problem is: To support multiple lights, each causing differnt penumbras to a shading point, we would need a unique blur for each light, so it scales badly.

taby said:
I have to ask: why not consider path tracing, where your light can be any shape?

Actually i was thinking i would never have to work on shadow maps. At some point RT scales batter, because it has only one acceleration structure for the whole scene, not one shadowmap per light.

But sadly, 6 years later RT is still stuck at static topology and acceleration structure is still black boxed. It's not felxible enough to be generally useful and remains a gimmick until they fix it.

I could use my own BVH of surfels for raytracing, but it should be much faster to make shadow maps with a compute rasterizer.
Quality looks promising:


(Only the floor and wall shows area light shadows, the model uses PCF with disabled bias settings. I'm doing only research on CPU til yet.)

But the problem is light leaks. I have the same kind of leaks as variance shadow maps.
Leaks are not just a artifact but also a feature. They are needed so light can leak through an edge which causes penumbra.
Fixing the leaks is possible, but it always causes to loose half of the filter region which is below the occluder, in all mips. If we make soft shadows from that, it causes over occlusion.

I should just accept the leaks, but can't stop working desperately on a true fix which so far nobody has found… : )

Same scene using variance shadow maps:

Looks good too, and this would need only 3 bilinear lookups, so that's really cheap.
Compare this to averaging 30 rays per pixel ; )
But ofc. SMs always miss contact hardening details, and further details in the penumbra region.
Pixar solved this by using multiple depth fragments per texel, similar to OIT fragment lists. They matched quality of RT in practice.
Stochasic shadow maps would be another option.

But i don't want to dig into such things. Seems a waste of time if we already have RT HW acceleration.

@JoeJ That looks really cool. I'd love to see the end result.

I'm fairly certain almost every game uses some variation of cascaded shadow maps. I think Unreal Engine by default does this for global shadows. If you are looking at solutions with many lights casting shadows with various properties then I'd still probably use shadow maps. You can use different way to cull out which pixel needs to lookup which shadow maps rather than sample 100 textures per pixel.

I guess you have to just state what you are looking to do with your game/scenes how complex etc.

NBA2K, Madden, Maneater, Killing Floor, Sims

Advertisement

dpadam450 said:
I guess you have to just state what you are looking to do with your game/scenes how complex etc.

Ideally my stuff ends up flexible enough so it works for any kind of game.
But such goals are not really my problem. It's that i lack experience with shadow maps.
So i don't know related costs, and i also don't know how common and practical advanced soft shadows currently are.

But i'll see once i can port the code to GPU… : )

This might help. I remember about 5 years ago someone mentioning summed area tables (there is a header section at the bottom). If you don't understand it, because it sums one portion and grows, you can compute the average in constant time complexity. For instance if you wan the average of texel data around pixel 140,140, you can take pixel 180,180 and subtract it from pixel value 100,100 and you basically have the average of a 40x40 pixel sample without actually sampling 1600 samples.
https://developer.nvidia.com/gpugems/gpugems3/part-ii-light-and-shadows/chapter-8-summed-area-variance-shadow-maps

NBA2K, Madden, Maneater, Killing Floor, Sims

dpadam450 said:
This might help. I remember about 5 years ago someone mentioning summed area tables

I know about SAT, but i have never used it so far. It's clever and elegant, but here's why i think applications are rare:

Using SAT we can get the average of a queried, rectangular region of an image, using only 4 fetches.
But we rarely want a rectanguloar region. We either want a circular region, or an oriented ellipsoid for anisotropy.
We can't do that with one SAT lookup. But we can approximate the ellipsoid using multiple rectanlges:

(image from Exponantial Soft Shadow Mapping paper using SAT - that's lots of samples)

So far so good. What's the alternative? Mip Maps ofc. Where we can do the same thing with the same number of samples.
But mips have some advantages over SAT:
Lookup is just one fetch, not 4 in each corner of the rectangle, spread over a large area and thus not cache efficient.
No need to calcualte prefix sum over a large image, so no precision problems because the numbers at the end of the sum become really big.
And assuming we still need to keep the original data as well, mips need less memory than SAT. (+33% vs +100%)

Chances are we'll probably both die without having ever used SAT. :D

Killing Floor 2 (Unreal Engine 3) was using an SAT for something. I don't recall if it was screen space reflections or something else or if this was built into Unreal Engine or our code. It's useful for many things. It may even been some other kind of screen space motion blur or something where you actually want screen space rectangular sampling.

NBA2K, Madden, Maneater, Killing Floor, Sims

Advertisement