I also implemented clustered shading based on the description in “idTech 666: The Devil is in the Details”.
The clustering is done on the CPU, SIMD optimised and multithreaded with one task per depth slice. The algorithm takes as input a list of items (lights, decals, probes, ...) each bounded by either an oriented bounding box or frustum.
I first compute the cluster bounds by projecting the item’s axis-aligned bounding box (AABB) and then cull by testing each cluster in the range against the planes of the item. This is done in normalised device coordinates (NDC) because there the clusters are just AABBs.
A cluster is culled when it lies completely in front of any one of the item’s planes. In order to test for this in NDC-space we first have to transform the planes:
For view-space plane \( \mathbf{f} (f_a, f_b, f_c, f_d) \), point \( \mathbf{p} (p_x, p_y, p_z, 1) \) and projection matrix \( P \):
\[ \mathbf{f}^\intercal \mathbf{p} > 0 \\ \mathbf{f}^\intercal P^{-1} P \mathbf{p} > 0 \\ (P^{-\intercal} \mathbf{f})^\intercal (P \mathbf{p}) > 0 \\ \mathbf{g}^\intercal \mathbf{p}’ > 0 \]
Since for all points in front of the view point \( p’_w > 0 \) we can divide by it and get:
\[ \mathbf{g}^\intercal (\mathbf{p}’ / p’_w) > 0 \]
Then the NDC-space cluster AABB with centre \( \mathbf{c} (c_x, c_y, c_z, 1) \) and half-extents \( \mathbf{r} (r_x, r_y, r_z, 0) \) (where \( r_x, r_y, r_z > 0 \) ) is in front of the plane if
\[ min[\mathbf{g}^\intercal(\mathbf{c}\pm\mathbf{r})] > 0 \]This expands to\[ (\mathbf{g}^\intercal \mathbf{c}) + min [ \pm g_a r_x \pm g_b r_y \pm g_c r_z ] > 0 \\ (\mathbf{g}^\intercal \mathbf{c}) - (\left| g_a \right| r_x + \left| g_b \right| r_y + \left| g_c \right| r_z) > 0 \]
Note that we can compute the left hand side once for each plane and then simply update it as we move from cluster to cluster:
f32 d0 = (g.a*c.x + g.b*c.y + g.c*c.z + g.d) - (abs(g.x)*r.x + abs(g.y)*r.y + abs(g.z)*r.z);
f32 dx = 2*g.a*r.x;
f32 dy = 2*g.b*r.y;
for(u32 y = y0; y <= y1; y++)
{
f32 d1 = d0;
d0 += dy;
for(u32 x = x0; x <= x1; x++)
{
if(d1 > 0)
cluster in front of plane
d1 += dx;
}
}