Advertisement

Best Practice For Treating Light Volumes in Deferred Lighting Pipline

Started by December 04, 2021 02:28 PM
11 comments, last by pavixavi 2 years, 11 months ago

This topic is about ways to handle light volumes in a deferred lighting pipeline.

I currently have a deferred lighting pipeline where spheres are sent to an openGL GLSL lighting shader. The sphere frags are shaded using my point and spotlight GLSL functions.

Currently, I calculate the radius/position of the volume sphere individually for each light and send to the lighting shader which calculates accumulated light for every light in the scene for each frag in the sphere. So currently I am calculating number-of-lights * number-of-lights iterations in my loop.

There must be a more efficient way to handle this for large volumes of lights. Two alternatives I can think of are:

  1. Combine sphere volumes into a single buffer (especially easy if light positions are static) and make a single drawing call in which the shader iterates over all the spot and point lights for each frag.
  2. Draw each volume seperately and specify single lights to calculate for each volume. Many more drawing calls but calculations limited to total-number-of-lights. I think this may be expensive for large volumes of lights, and I am always trying to reduce the number of my drawing calls.

Any thoughts specific or general appreciated.

My personal favourite is the clustered (defererred) approach:

https://www.humus.name/Articles/PracticalClusteredShading.pdf

Its kind of what you describe in your point 1), with a practical algorithm to solve the problem of how to actually supply the information of what pixel is affected by which light (which in this case is done by subdividing the scene into 64x64xX-pixel blocks (where the z-buffer is subdivided unevenly).

Advertisement

@Juliean Thanks very much for the link I have saved that for future reference. Do you know what the “classic” solution they reffer to in deffered shading is?

pavixavi said:
@Juliean Thanks very much for the link I have saved that for future reference. Do you know what the “classic” solution they reffer to in deffered shading is?

I think the classic solution refers to rendering a mesh for each light. Anything else is most likely too slow when you get to more than just a few lights (which kind of defeats the whole purpose of deferred, doesn't it)? Personally, after understanding clustered shading there is little point in doing any more classic/legacy-solution. Its really easy to implement and maintain (much more so than classic deferred), it works pretty well for forward/deferred and a mix (ie. for including transparency while using the same shading model) and its pretty fast (2000 lights in my example without any problems). So I'd recommend taking the time to fully understand and implemented clustered, or at least look for some other even more modern variant (which I cannot say anthing about).

@Juliean Having had a closer look I think medim term clustered is the way to go. I dont have time just yet to implement so individual light meshes will have to do for the minute. Thanks again for the link.

Your “classic” deferred rendering approach is to draw one mesh per light that represents the bounding volume of that light. Then in the pixel shader you read in the G-Buffer, calculating the lighting response for that particular light, and then output the result to a lighting buffer using additive blending. Ultimately you end up with the sum of all lights in your lighting buffer after drawing all of your light meshes.

There are a few nice things about this approach:

  • It's pretty straightforward to get working
  • You can use a pretty tight bounding mesh for your light, which helps avoids calculating lighting for pixels outside of that bounding volume
  • You can have as many light types as you want, you just have different bounding meshes and shaders. For instance adding area lights is pretty straightforward
  • You only have to calculate one light per draw which can keep the shader simpler
  • It's easy to figure out the cost of any single light source, since each one is a separate draw

But also a few downsides:

  • You can end up reading in your G-Buffer many times per-pixel as well as also blending results many times per-pixel if there are large overlapping lights, which can chew through your bandwidth very quickly
  • It might require lots of draw calls, although you can potentially solve this using instancing to batch multiple lights into one draw
  • Determining which pixels actually intersect with the lighting volume in terms of their Z depth using rasterization and depth buffers is actually not straightforward. One of the old ways to resolve this was to use two passes and the stencil buffer, which is pretty ugly.

One popular approach to solving that first issue was to do some or all lights in a single full-screen pass where each pixel loops over all lights that affect that pixel. This presentation in particular was very influential, it used compute shaders and shared memory to build a list of potentially-intersecting lights for a “tile” of pixels (a group of neighboring NxN pixels) and then looped over that list in the same shader program. This allows you to accumulate lighting in registers instead of memory, and also lets you read the G-Buffer once per-pixel instead of N times. On the downside the shader gets more complicated, it gets trickier to handle N lighting types and shadow maps, and it's now much harder to figure out the cost of a single light source. So it's a trade-off, and hybrid approaches are also possible.

“Clustered" is basically a variant on forward shading, where the pixel shader executed for a mesh loops over all applicable lights and calculates the final lighting response. The main interesting part is that you build a frustum-aligned grid that you bin your lights into ahead of time, and then each pixel figures out which grid cell its in to retrieve the list of relevant lights. Since it's forward rendering it brings in a much different set of performance characteristics compared to deferred, in particular that your shading efficiency becomes heavily tied to density of your meshes (in the worst case suffering 4x over-shading). You can also end up in a really bad “everything is in one mega pixel shader problem” which can make you end up with big unwieldy shaders that take too long to compile, use too many registers, and have too many permutations. But on the positive side it can be much simpler to get working compared to deferred, can work better on older/mobile hardware, and can beat deferred if the triangle density stays low.

Advertisement

@mjp Clustered can actually be used both forward and deferred. Eigther you execute the clustered shader on the mesh (optimally with a z-prepass), or on a gbuffer. Its actually comparatively real easy to get to work in both cases, as you also said clustered is very easy to implement. In my tests in the “sponza” scene with up to 2000 lights, I think if I remember correctly the forward+z-prepass was faster - but its definately a huge plus having the option to chose between forward and deferred based on your needs, and perhaps even per-scene w/o having to write two completely different renderers!

@Juliean indeed, that is a great point. The actual “clustering” part of clustered forward rendering is totally usable for deferred pipelines. For a single-pass deferred shader it can be more appealing than the “build a sub-frustum and cull against it” approach mentioned in that Intel presentation, since that approach tends to suffer in cases where the visible depth buffer range is high within a tile. Plus you can re-use the cluster grid for transparents and other things that aren't present in the G-Buffer, which is great.

@juliean @mjp Have either of you come across this prevously? I am trying to add the results from "classic" volume mesh passes and getting the results below using the opengl settings:

glBlendEquation(GL_FUNC_ADD);
glBlendFuncSeparate(GL_SRC_COLOR, GL_DST_COLOR, GL_DST_ALPHA, GL_DST_ALPHA);

Oddly it appears later mesh draw calls using a sphere volume are subtracting rather then adding to the buffer. See the dark circle toward the bottom left.

pavixavi said:
Oddly it appears later mesh draw calls using a sphere volume are subtracting rather then adding to the buffer. See the dark circle toward the bottom left.

Your blend-function is off. If you want additive blending, you should use

glBlendFuncSeparate(GL_SRC_COLOR, GL_ONE);

This topic is closed to new replies.

Advertisement