Your “classic” deferred rendering approach is to draw one mesh per light that represents the bounding volume of that light. Then in the pixel shader you read in the G-Buffer, calculating the lighting response for that particular light, and then output the result to a lighting buffer using additive blending. Ultimately you end up with the sum of all lights in your lighting buffer after drawing all of your light meshes.
There are a few nice things about this approach:
- It's pretty straightforward to get working
- You can use a pretty tight bounding mesh for your light, which helps avoids calculating lighting for pixels outside of that bounding volume
- You can have as many light types as you want, you just have different bounding meshes and shaders. For instance adding area lights is pretty straightforward
- You only have to calculate one light per draw which can keep the shader simpler
- It's easy to figure out the cost of any single light source, since each one is a separate draw
But also a few downsides:
- You can end up reading in your G-Buffer many times per-pixel as well as also blending results many times per-pixel if there are large overlapping lights, which can chew through your bandwidth very quickly
- It might require lots of draw calls, although you can potentially solve this using instancing to batch multiple lights into one draw
- Determining which pixels actually intersect with the lighting volume in terms of their Z depth using rasterization and depth buffers is actually not straightforward. One of the old ways to resolve this was to use two passes and the stencil buffer, which is pretty ugly.
One popular approach to solving that first issue was to do some or all lights in a single full-screen pass where each pixel loops over all lights that affect that pixel. This presentation in particular was very influential, it used compute shaders and shared memory to build a list of potentially-intersecting lights for a “tile” of pixels (a group of neighboring NxN pixels) and then looped over that list in the same shader program. This allows you to accumulate lighting in registers instead of memory, and also lets you read the G-Buffer once per-pixel instead of N times. On the downside the shader gets more complicated, it gets trickier to handle N lighting types and shadow maps, and it's now much harder to figure out the cost of a single light source. So it's a trade-off, and hybrid approaches are also possible.
“Clustered" is basically a variant on forward shading, where the pixel shader executed for a mesh loops over all applicable lights and calculates the final lighting response. The main interesting part is that you build a frustum-aligned grid that you bin your lights into ahead of time, and then each pixel figures out which grid cell its in to retrieve the list of relevant lights. Since it's forward rendering it brings in a much different set of performance characteristics compared to deferred, in particular that your shading efficiency becomes heavily tied to density of your meshes (in the worst case suffering 4x over-shading). You can also end up in a really bad “everything is in one mega pixel shader problem” which can make you end up with big unwieldy shaders that take too long to compile, use too many registers, and have too many permutations. But on the positive side it can be much simpler to get working compared to deferred, can work better on older/mobile hardware, and can beat deferred if the triangle density stays low.