I spent some time looking at alternatives to deal with light culling and came upon clustered shading. Now there are a few resources on how it's done online, but implementations differ a little bit every time.
One thing that warrants attention is a talk by Ola Ollson that he gave in Brisbane, which you can find here https://www.youtube.com/watch?v=uEtI7JRBVXk
In there he said thathttps://pastebin.com/FDqGvpGEof implementations create them for all clusters which is wasting time. I thought about it some and came up with something that might or might not be a good idea, would like to throw it out there and see what you think.
What I usually see when reading some articles online is small frustums being created, either was planes or approximated by a conservative box, per cluster. These are stored in memory and culled against on a per cluster basis. Even the conservative cube approximation needs 2 * 3 floats, which probably ends up being 2 X float4 for alignment (not sure if this matters in structured buffers but it does in constant buffers at least).
Consider the following.
1. Store three arrays of planes (or pack them into one with some offsets if it's better I really can't tell). Each array has planes that slice the original frustum, including the bounding planes of the frustum itself. Not too much memory since a plane is a float4 and there are three linear arrays, one per axis.
2. When creating clusters, assign nothing but three numbers to each cluster. Each number is a plane index for the front, right and top plane (or the back, left and bottom plane, either way the other three can be inferred with +- 1). These can be 8 bit unsigned integers as it can be assumed that one won't need more than 256 planes per axis. This would make a cluster use only 32 bits, with 8 bits wasted but maybe useful for something else idk. If one wants to use more, we can expand this to 16 bits which still leaves the total at 2 bytes i guess but seems unnecessary.
3. When culling, instead of testing each frustum individually over and over, repeating a lot of the plane tests - go through the three arrays and for each light ( I guess the same can go for other things like probes and decals but not sure, I'm not at that stage yet), mark whether it intersects each of the planes. A bit mask can be used for this, it would need to have as many bits as planes (or one could store the indices of two outmost planes in that axis?).
4. Then, when it comes to light - to - cluster assignments, simply check if the planes that the frustum “points” to, or more precisely, the planes it indexes, are intersecting the light we are interested in. If the light intersects either of the planes it can be considered to affect the cluster.
This would, naively speaking, use an if/else but i guess it can be avoided, didn't really give it that much thought.
I'm just halfway implementing this technique so I might be misunderstanding something. Would like some of your thoughts on this.