Advertisement

Overhead off unused DecriptorTables inside RootSignatures.

Started by December 04, 2017 02:09 PM
11 comments, last by SoldierOfLight 7 years, 2 months ago

Hi,

I recently started reworking my RootSigature system, I basically want to declare a RootSignature in hlsl and then reuse it across all my shaders. Until this point, I've been generating a custom RS for each PSO (with the exact number of resources that PSO will use). This means that each RS is efficiently used but it also means that each time I draw or dispatch the RS will change.

This new system will use this RS:


#define GFXRS    "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT)," \
                "DescriptorTable" \
                "("\
                        "CBV(b0, numDescriptors = 16)," \
                        "SRV(t0, numDescriptors = 16)," \
                        "UAV(u0, numDescriptors = 16)" \
                ")," \
                "DescriptorTable" \
                "(" \
                        "Sampler(s0, numDescriptors = 16)" \
                ")" \

NOTE: I want to start with only one RS for everything and then maybe have one for gfx and other for compute.

However, I was wondering if there is any overhead of declaring the above RS and not using all the descriptors. As far as I can tell it shouldn't cause much trouble (I'm also aware that UnrealEngine uses something like this). I would love to hear the opinion of someone with more experience on this topic.

Thanks!

49 minutes ago, piluve said:

This means that each RS is efficiently used but it also means that each time I draw or dispatch the RS will change.

Well changing shaders still isn't free so I think sorting by shader is still recommended.  Although Hodgman once said that if you do enough work with one shader its ok not to sort by shader. (I think he said 400pixels but don't quote me on that, I'll try to dig up the quote)

50 minutes ago, piluve said:

However, I was wondering if there is any overhead of declaring the above RS and not using all the descriptors. As far as I can tell it shouldn't cause much trouble (I'm also aware that UnrealEngine uses something like this). I would love to hear the opinion of someone with more experience on this topic.

Now I don't know the internals of anything but I would think compiling a PSO would optimize out anything unnecessary.  @SoldierOfLight Do you know if PSO compilation would optimize out something like piluve describes.

-potential energy is easily made kinetic-

Advertisement

It's going to depend on the hardware. For binding tier 1 and 2, this will have some impact, as changing some descriptor tables devolves into a per-descriptor operation for them. For tier 1, this applies to all descriptors, and for tier 2, this applies to UAVs and CBVs. For everything else (tier 3, or tier 2 SRVs/samplers), changing a descriptor table is just a pointer swap.

As for within the shader, I don't see what optimizations could occur. The root signature indicates where to grab the descriptor, but the descriptor table is populated without knowledge of the root signature that will consume it, and the root parameter is set without knowledge of which PSO will consume it.

@SoldierOfLight thanks for pointing out the differences between hw tiers, I forgot about that. On the other hand, I think it could be possible that compiling the shader + rs at the same time may allow for better optimisations. I'm guessing that because XBox requires the shader to be compiled with the RS (otherwise it will do it on runtime).

@piluve, you'll also have memory overhead/waste in the first table, because UAV u0, for example, will be always expected at offset 16+16=32, no matter if you used 16+16 CBVs+SRVs or much fewer. Also, you have to copy all SRVs+UAVs also if only one of the CBVs changed, into another 'version' of the table.

I'm sure you're aware, that's the price we pay, I think many people are doing this.

3 hours ago, piluve said:

On the other hand, I think it could be possible that compiling the shader + rs at the same time may allow for better optimisations.

No. The Xbox compiler goes from HLSL -> Xbox hardware instructions. The PC compiler goes from HLSL -> DXBC (bytecode), which the driver later translates into hardware instructions. By the time the driver sees the DXBC, they're guaranteed to have a root signature with it.

13 minutes ago, pcmaster said:

@piluve, you'll also have memory overhead/waste in the first table, because UAV u0, for example, will be always expected at offset 16+16=32, no matter if you used 16+16 CBVs+SRVs or much fewer. Also, you have to copy all SRVs+UAVs also if only one of the CBVs changed, into another 'version' of the table.

Thanks for pointing this out, I didn't read close enough to see it was a single table with 3 ranges, I assumed it was 3 tables.

Advertisement
On 12/4/2017 at 10:43 AM, SoldierOfLight said:

As for within the shader, I don't see what optimizations could occur. The root signature indicates where to grab the descriptor, but the descriptor table is populated without knowledge of the root signature that will consume it, and the root parameter is set without knowledge of which PSO will consume it.

Basically I was thinking reduced register usage.  My thinking is that when you compile a PSO against a root signature it will realize that certain bindings aren't used and not bother using a register for it when it is needed or as a "prefetch".(shader setup just prior to shader execution)

@SoldierOfLight BTW if something in the root signature is unused in a particular PSO, is it still validated?  If not that could be a win too.

-potential energy is easily made kinetic-

Oh, of course, I'd expect a shader to only have instructions to fetch (or prefetch) data that will be used within the shader. However for some hardware, a "SetDescriptorTable" operation turns into (e.g.) multiple "SetDescriptor" operations, and since descriptor table setting happens without knowledge of a particular PSO, it needs to respect the bound root signature instead.

As for validation, again it depends on the tier and type of descriptor. On tier 3, nothing is required to be valid unless it's actually referenced in a shader (potentially even including conditional call flow, depending on compilation flags). But on tier 1, everything is required to be valid if it'd declared in the root signature, regardless of whether it's referenced in a shader. And tier 2 is a middle ground, where SRVs/samplers are "bindless"/unvalidated, but CBVs and UAVs need to be valid.

1 hour ago, SoldierOfLight said:

However for some hardware, a "SetDescriptorTable" operation turns into (e.g.) multiple "SetDescriptor" operations, and since descriptor table setting happens without knowledge of a particular PSO, it needs to respect the bound root signature instead.

Are you talking about tier 1 hardware in general or a specific vendor and family?  My guess is the former.  Also in a command list you set a root signature and then set a compatible PSO... is it possible for the driver upon building/validating the command list it would realize it could skip some "SetDescriptor"'s from a "SetDescriptorTable"?

-potential energy is easily made kinetic-

Correct, the former.

No, because the same root signature and the same descriptor tables can be used across multiple draw calls using different PSOs. Theoretically if the driver wanted to spend a lot of time searching and accumulating across all PSOs and draws that used a particular descriptor table, they could, but that goes against the spirit of D3D12, which is that the app knows what it's doing and is being explicit about the work it wants done, so that the driver doesn't need to waste time trying to be smarter than the app.

This topic is closed to new replies.

Advertisement