Advertisement

Deferred Rendering is still good to use in 2020 and almost 2021?

Started by October 09, 2020 02:39 PM
20 comments, last by vladislavbelov 4 years, 3 months ago

I was going to give my opinion until I remembered that I didn't exactly get on the deferred bandwagon. I never understood why it was being pushed and becoming more and more popular. I guess I didn't and still don't understand what problem it solves? So I'll match Alundra's is deferred wrong in 2020 and beyond and raise to I think deferred was a mistake mainly because the memory bandwidth requirements as well and the problems with transparency were never worth it. Tiled forward made a whole lot of sense but there were no explict compute shaders yet. But GPGPU things like brook gpu were feasible so it might have been possible if nvidia and ati pushed it with gdc/sigraph papers. I wish I spent some time learning brook gpu (or the other one sh was it?) then I'd know if it were feasible.

BTW Doom IIRC used clustered forward rendering and I've heard of no changes for Eternal.

I think the right answer will always be a technique involving both deferred texturing and deferred lighting. Just to be clear the abstract deferred not G-buffer deferred.

-potential energy is easily made kinetic-

Infinisearch said:

I was going to give my opinion until I remembered that I didn't exactly get on the deferred bandwagon. I never understood why it was being pushed and becoming more and more popular. I guess I didn't and still don't understand what problem it solves?

The problem it initially tried to solve was the dependency in performance of the complexity of the scene, in classic forward. In a plain old forward-renderer, you have to render every object, for every light that it affects. That means the when you double the amount of objects, you roughly double the amount of lighting-calculations and setup that have to be performed. Deferred, by “flattening” the scene-geometry into a 2D-texture, means that lighting will technically always be as expensive, regardless of how many objects you have in the scene. That, in theory meant that you could have more lights, even in fairly complex scenes.
Or, in computer-sciene-terms, the complexity of forward is O(N*M) (where N=objects, M=lights), whereas deferred is O(N)+O(M).

I'd argue that neigther old forward nor deferred are particularily great. If you look at Unitys legacy pipeline, which can do about 4 dynamic lights forward/deferred, and compare that to the 100-1000s lights you can push through with even the most basic clustered implementation, I think there's little argument here.

Advertisement

Juliean said:
I'd argue that neigther old forward nor deferred are particularily great. If you look at Unitys legacy pipeline, which can do about 4 dynamic lights forward/deferred, and compare that to the 100-1000s lights you can push through with even the most basic clustered implementation, I think there's little argument here.

All tiled light solutions were intellectually in reach from the beginning of computer graphics. Tiled based rendering were already a thing and I'm not just talking about PowerVR, but they were a popular hardware solution to rendering hardware mid to late 90's.. Deferred came to the forefront of the industry right around Directx 9.0c was becoming a standard, so long programmable shaders were a thing. Brook GPU and so on were also a thing. PS3 and using the SPU's/SPE's to accelerate lighting and post processing was a thing a little later on. DX10 with compute shaders was becoming a thing by then. I see no reason why Tiled based lighting solutions weren't being explored and given a higher priority than deferred rendering by 2004/5. But then again maybe thats why DX10 compute shaders exist at all. But alot of effort goes into the things popularized by sigraph, gdc, and sometimes a properly done write up on a website. So again back when brook gpu was a thing and then stream-out it was possible for the industry to move over to a forward tiled lighting solution. Instead we have I thing Battlefield 3 being the first in 2011 6 years later. The leo demo by ati/amd about 2012 was also tiled forward.

Then of course there's always id based solutions but that didn't make much of a splash either.

-potential energy is easily made kinetic-

Infinisearch said:
I see no reason why Tiled based lighting solutions weren't being explored and given a higher priority than deferred rendering by 2004/5. But then again maybe thats why DX10 compute shaders exist at all

You can certainly do tiled without computer-shaders, as I don't use them myself for my tiled-solution, which can still bang out >1k lights. The one thing is that tiled-shaders tend to be quite branchy, with loops that cannot be unrolled, and data having to be indirectly fetched from at least two cbuffers/textures/UAVs. maybe that wasn't really an option in 2004, as I think I recall GPUs were much worse at this kind of stuff back then?

BTW the old fixed funtion pipeline had up to 8 lights per vertex per pass right?

Juliean said:
You can certainly do tiled without computer-shaders, as I don't use them myself for my tiled-solution

You mind if I ask about your technique and how it works?

Juliean said:
The one thing is that tiled-shaders tend to be quite branchy, with loops that cannot be unrolled, and data having to be indirectly fetched from at least two cbuffers/textures/UAVs. maybe that wasn't really an option in 2004, as I think I recall GPUs were much worse at this kind of stuff back then?

If you treat directional lights separately can't you eliminate most of the branchy code by using separate loops for each light type?

-potential energy is easily made kinetic-

Infinisearch said:
You mind if I ask about your technique and how it works?

For filling the light-tile list, I just do really dumb cpu-based iteration over all lights, calculating a bounding-box for each light and adding it to a per-tile list. This is then uploaded to the GPU in two textures (one list of lights, and one list of tiles with the offset into the light-list). The code for that is already floating around: https://pastebin.com/FDqGvpGE​ (as you will see I do some really dumb stuff like vector<vector<>>, but in practise it doesn't even add up to much. I assume this is what you could/should move to a compute-shader.

The rest i just reading the light-data in eigther the pixel-shader of the model in forward, or in one post-pro pass for deferred. This is pretty much exacly as shown in the avalanche-paper that I already linked.

Infinisearch said:
If you treat directional lights separately can't you eliminate most of the branchy code by using separate loops for each light type?

If you are ok with a (small) maximum number of lights per cluster, then perhaps. But as-is, you will have to get the amounf of point/spot-lights per cluster from the texture/buffer, and loop over that. Which works via this code:

uint2 clusterData = Read(%GRID_LIST%, cluster);
uint lightIndex = clusterData.x;
float3 vSummedLight = float3(0.0f);

vSummedLight += vColor * vAmbientColor;
vSummedLight += calculateDirectionalLight(vColor, vDirectionalDirection, vViewNormal, vViewDir);

for(uint i = 0; i < clusterData.y; i++)
{
	uint index = Read(%LIGHT_LIST%, int3(lightIndex %% 16384, lightIndex / 16384, 0));
	lightIndex++;
	vSummedLight += calculatePointLight(index, vColor, vViewPosition, vViewDir, vViewNormal, specular, roughness, metalness);
}
Advertisement

Juliean said:
If you are ok with a (small) maximum number of lights per cluster, then perhaps. But as-is, you will have to get the amounf of point/spot-lights per cluster from the texture/buffer, and loop over that. Which works via this code:

This is another reason I brought up the limit of the FF pipeline if you take the brightest or accum..ate… for 8 or 16 directions, backface.. so 8 lights, I think you'd get pretty good results.

-potential energy is easily made kinetic-

Infinisearch said:
This is another reason I brought up the limit of the FF pipeline if you take the brightest or accum..ate… for 8 or 16 directions, backface.. so 8 lights, I think you'd get pretty good results.

Perhapts you would have done that on previous GPUs then. I don't think it would be even wise to make that optimization today, since nearby pixels are likely to fall into the same clusters I think it would be faster to fetch the light-count dynamically. Unless most of your clusters then really have about 8 or 16 lights, but from what I've seen you could end up calculating 8 lights when only 1 or two are needed for many more pixels then you would need to.

Also you would have to be very careful with the light-selection algorithm and scene setup, as differences in adjacent clusters are very noticable (blocky), more than if in regular/FF-rendering if one closeby object just didn't receive a light.

Lets see, IIRC constant buffers back then maxed out at 64kb (kib is the new ‘standard’ I think). I forgot when though so I'll look it up and report back.

If you limit yourself to point and spot lights I think you can get away with 32bytes per light, if no colored lights. 48bytes if colored… so at least 1024 lights per constant buffer. In your implementation do you use constant buffers? If so is that the same sort of limitation you experienced?

Because deferred we have MRT so again if possible with brook gpu GPGPU you could do light culling on the GPU back then.

Also I really shouldn't gloss over ID based rendering as an alternative to G-buffer deferred. Forward rendering in two or if you prefer three passes with reduced geometry in pass 2 onwards.

-potential energy is easily made kinetic-

Infinisearch said:
Lets see, IIRC constant buffers back then maxed out at 64kb (kib is the new ‘standard’ I think). I forgot when though so I'll look it up and report back. If you limit yourself to point and spot lights I think you can get away with 32bytes per light, if no colored lights. 48bytes if colored… so at least 1024 lights per constant buffer. In your implementation do you use constant buffers? If so is that the same sort of limitation you experienced?

Yes, I am using a cbuffer for the plain light-data, and textures as buffers for the tile/indirection-data. So indeed I use 32 bytes per light, which kind of limited me to 2048 lights. Not sure again why I'm not using a texture for this as well, I think I read in the avalanche-presentation the gbuffer is better for the “fixed” structure of light-data than having to use a 3rd texture. Also not sure if UAV are any faster, I didn't have support in my API/shader-generator at the time so I didn't try them.

This topic is closed to new replies.

Advertisement