Advertisement

Clustered shading thoughts

Started by April 14, 2020 10:06 AM
17 comments, last by J. Rakocevic 4 years, 7 months ago

I spent some time looking at alternatives to deal with light culling and came upon clustered shading. Now there are a few resources on how it's done online, but implementations differ a little bit every time.

One thing that warrants attention is a talk by Ola Ollson that he gave in Brisbane, which you can find here https://www.youtube.com/watch?v=uEtI7JRBVXk

In there he said thathttps://pastebin.com/FDqGvpGEof implementations create them for all clusters which is wasting time. I thought about it some and came up with something that might or might not be a good idea, would like to throw it out there and see what you think.

What I usually see when reading some articles online is small frustums being created, either was planes or approximated by a conservative box, per cluster. These are stored in memory and culled against on a per cluster basis. Even the conservative cube approximation needs 2 * 3 floats, which probably ends up being 2 X float4 for alignment (not sure if this matters in structured buffers but it does in constant buffers at least).

Consider the following.

1. Store three arrays of planes (or pack them into one with some offsets if it's better I really can't tell). Each array has planes that slice the original frustum, including the bounding planes of the frustum itself. Not too much memory since a plane is a float4 and there are three linear arrays, one per axis.

2. When creating clusters, assign nothing but three numbers to each cluster. Each number is a plane index for the front, right and top plane (or the back, left and bottom plane, either way the other three can be inferred with +- 1). These can be 8 bit unsigned integers as it can be assumed that one won't need more than 256 planes per axis. This would make a cluster use only 32 bits, with 8 bits wasted but maybe useful for something else idk. If one wants to use more, we can expand this to 16 bits which still leaves the total at 2 bytes i guess but seems unnecessary.

3. When culling, instead of testing each frustum individually over and over, repeating a lot of the plane tests - go through the three arrays and for each light ( I guess the same can go for other things like probes and decals but not sure, I'm not at that stage yet), mark whether it intersects each of the planes. A bit mask can be used for this, it would need to have as many bits as planes (or one could store the indices of two outmost planes in that axis?).

4. Then, when it comes to light - to - cluster assignments, simply check if the planes that the frustum “points” to, or more precisely, the planes it indexes, are intersecting the light we are interested in. If the light intersects either of the planes it can be considered to affect the cluster.
This would, naively speaking, use an if/else but i guess it can be avoided, didn't really give it that much thought.

I'm just halfway implementing this technique so I might be misunderstanding something. Would like some of your thoughts on this.

I wonder why you intend to precompute and store frustum planes at all.

You could just calculate bounding rectangle of lights/ objects in screenspace, from that get the range of froxel cells they intersect, and bin the lights to the z-range of intersecting cells.

Advertisement

You are suggesting an implicit grid. Didn't think of that, fiddling with it a bit then getting back here hopefully.

I've had a case where i did something similar with compute shaders: Calculate bounding rect of a sphere in a sphere map projection (half space environment map). The projected sphere becomes the shape of an ellipsis, and got rectangle from that. The math is more complex here than when working with a planar projection, straight lines and planes.

So i tried to use a look up table to get the bounding rects instead. My sphere map resolution is tiny, so i could pack the rect into a single 32 bit value. But in the end, doing all that math was equally fast than this single fetch from LUT. This makes me very confident storing and reading froxel planes is more expensive than calculating anything on the fly, in general.

My initial idea was to create a view-space planes of the small frustums, transform light spheres to view space as well and cull there. Non-projected spheres is something I know how to cull against frustums (even small frustum issue is solvable with some added tricks to avoid the false positives).

What you suggest seems like a faster solution (and takes less memory since cluster bounds are implicit!). This is a big gain, however, I don't understand how to cull in this way. Spheres will project to ellipses, cones to… depends, depending on the rotation relative to the frustum. Also I'm not sure exactly how I would “bin” them by z. I guess it's something I should look into then.

I think i remember a paper about cones, but only found this about spheres: https://research.nvidia.com/publication/2d-polyhedral-bounds-clipped-perspective-projected-3d-sphere
Maybe getting bounding rect of base spherical cap, and extending with tip point would be good enough.

Otherwise, calculating world space clip planes from froxels likley has similar speed and accuracy.

J. Rakocevic said:
Also I'm not sure exactly how I would “bin” them by z. I guess it's something I should look into then.

I lack experience with GPU binning if objects can occupy multiple buckets.

But maybe it's not that different…

1. For each object calculate intersected range, increase atomic bucket counters and store the range of occupied buckets plus returned atomic count on per object data.
2. Prefix sum over bucket counters to get offsets and space of compacted per bucket lists.
3. Per object, for each of its buckets, store it into list at bucket offset + stored object counter.

Should work. Each object can precompute its maximum of buckets it may ever need (though, still sucks to manage its storage), and likely it's easy to find a list max size that is enough in any situation of the game.
Probably only step 2 has noticeable cost if the grid is really fine.

Did you think about adding raytracing? Unfortunately a froxel grid won't help to shade out of screen hit points.
On the long run, we'll likely end up using BVH of lights instead? (But having froxel grid additionally could be still a win…)

EDIT: btw, if you get away with 8K froxels, a single large 1024-workgroup could do all 3 steps of the entire algorithm in LDS and with one dispatch? Doing this async it would be totally free i guess.

Advertisement

J. Rakocevic said:
What you suggest seems like a faster solution (and takes less memory since cluster bounds are implicit!). This is a big gain, however, I don't understand how to cull in this way. Spheres will project to ellipses, cones to… depends, depending on the rotation relative to the frustum. Also I'm not sure exactly how I would “bin” them by z. I guess it's something I should look into then.

The whole idea of that algorithm is to somewhat overshoot the estimation, to guarantee that at least all the right cells are filled, even taking into consideration the whole projection aspect.
The z-binning is actually the easier part, take a direction parallel to the screen, and take the spheres position +/- this direction * lights range, and thats the z-range you want to fill. This works because the imaginary planes on the z-axis are all perpendicular to each other (I think thats the right term?)
Calculating the ranges on x/y axis then consists on finding a point that when projected encloses the entire sphere, and if you find those point(s) you can apply the same function you use to determine the bin that each pixel goes to, and fill the range between those. I think in my implementation, I simply treated each sphere as if it were a screen-aligned box or something.

And yeah, obviously that means that the culling is a lot more crude than it could be, but it also makes culling itself very lightweight. There might even be some better approximation than what I'm using, just showing a way how it can be done.

Juliean, I think it's the other way in your second paragraph, you mean the Z direction in view space (perpendicular to the screen), and the planes that slice the frustum by depth are parallel to the screen? It makes sense that way, and it's what I would do but its exactly finding the projected points that was the problem. With the link from JoeJ I should be able to do this, or if not could use a box too.

JoeJ, this is likely on me but you totally lost me with the binning, all the way. I don't understand what atomic bucket counters are, whether “object” stands for light volumes or froxels, what is prefix summing etc. I'm not asking you to write me a book on that don't worry just didn't want to leave it without a response because you clearly put effort into replying. Will learn about it and come back to read it again.
I am going to be using ~8000 cells yes, and can think of raytracing when I know more because implementing this is clearly a long shot as it is for me.

I found a way (that makes sense to me) to get a bounding rectangle of a sphere at least… From what I can tell, it's how intel's demo implemented point light projections too. If anyone is interested, the math is explained in the linked article.
https://www.gamasutra.com/view/feature/131351/the_mechanics_of_robust_stencil_.php?page=6

I derived it pen and paper style to better understand what's going on. The author of the article does jump a few steps ahead every time but if you know the math about plane-sphere intersections, dot product and the quadratic equation which is mostly all high school level stuff you can do it yourself from the two starting conditions that describe a sphere-tangent plane in view space, that are very intuitive.

Still not sure about the spotlights but there are a few methods for them around, with either frustum projections or similar. Approximating them with spheres or rectangles is quite wasteful (overly conservative projection can be huge for long/diagonal lights) unless they are pointing towards/away from the camera so I'd rather not do that. If I find a good solution for them, I'll post it here too, just in case someone happens to need help with the same thing.

Edit: I noticed the article was written by Eric Lengyel, and I know he is a member of this forum, so if you ever read this, thank you. ?

You might also find my post in the following thread useful: https://www.gamedev.net/forums/topic/705859-deferred-vs-forward-shading/5422150/

This topic is closed to new replies.

Advertisement