Overhead off unused DecriptorTables inside RootSignatures.

Author

291

December 04, 2017 02:09 PM

Hi,

I recently started reworking my RootSigature system, I basically want to declare a RootSignature in hlsl and then reuse it across all my shaders. Until this point, I've been generating a custom RS for each PSO (with the exact number of resources that PSO will use). This means that each RS is efficiently used but it also means that each time I draw or dispatch the RS will change.

This new system will use this RS:


#define GFXRS    "RootFlags(ALLOW_INPUT_ASSEMBLER_INPUT_LAYOUT)," \
                "DescriptorTable" \
                "("\
                        "CBV(b0, numDescriptors = 16)," \
                        "SRV(t0, numDescriptors = 16)," \
                        "UAV(u0, numDescriptors = 16)" \
                ")," \
                "DescriptorTable" \
                "(" \
                        "Sampler(s0, numDescriptors = 16)" \
                ")" \

NOTE: I want to start with only one RS for everything and then maybe have one for gfx and other for compute.

However, I was wondering if there is any overhead of declaring the above RS and not using all the descriptors. As far as I can tell it shouldn't cause much trouble (I'm also aware that UnrealEngine uses something like this). I would love to hear the opinion of someone with more experience on this topic.

Thanks!

Infinisearch

3,058

December 04, 2017 03:07 PM

49 minutes ago, piluve said:
This means that each RS is efficiently used but it also means that each time I draw or dispatch the RS will change.

Well changing shaders still isn't free so I think sorting by shader is still recommended. Although Hodgman once said that if you do enough work with one shader its ok not to sort by shader. (I think he said 400pixels but don't quote me on that, I'll try to dig up the quote)

50 minutes ago, piluve said:
However, I was wondering if there is any overhead of declaring the above RS and not using all the descriptors. As far as I can tell it shouldn't cause much trouble (I'm also aware that UnrealEngine uses something like this). I would love to hear the opinion of someone with more experience on this topic.

Now I don't know the internals of anything but I would think compiling a PSO would optimize out anything unnecessary. @SoldierOfLight Do you know if PSO compilation would optimize out something like piluve describes.

-potential energy is easily made kinetic-

SoldierOfLight

2,378

December 04, 2017 03:43 PM

It's going to depend on the hardware. For binding tier 1 and 2, this will have some impact, as changing some descriptor tables devolves into a per-descriptor operation for them. For tier 1, this applies to all descriptors, and for tier 2, this applies to UAVs and CBVs. For everything else (tier 3, or tier 2 SRVs/samplers), changing a descriptor table is just a pointer swap.

As for within the shader, I don't see what optimizations could occur. The root signature indicates where to grab the descriptor, but the descriptor table is populated without knowledge of the root signature that will consume it, and the root parameter is set without knowledge of which PSO will consume it.

piluve

Author

291

December 05, 2017 10:49 AM

@SoldierOfLight thanks for pointing out the differences between hw tiers, I forgot about that. On the other hand, I think it could be possible that compiling the shader + rs at the same time may allow for better optimisations. I'm guessing that because XBox requires the shader to be compiled with the RS (otherwise it will do it on runtime).

pcmaster

1,119

December 05, 2017 02:14 PM

@piluve, you'll also have memory overhead/waste in the first table, because UAV u0, for example, will be always expected at offset 16+16=32, no matter if you used 16+16 CBVs+SRVs or much fewer. Also, you have to copy all SRVs+UAVs also if only one of the CBVs changed, into another 'version' of the table.

I'm sure you're aware, that's the price we pay, I think many people are doing this.

SoldierOfLight

2,378

December 05, 2017 02:28 PM

3 hours ago, piluve said:
On the other hand, I think it could be possible that compiling the shader + rs at the same time may allow for better optimisations.

No. The Xbox compiler goes from HLSL -> Xbox hardware instructions. The PC compiler goes from HLSL -> DXBC (bytecode), which the driver later translates into hardware instructions. By the time the driver sees the DXBC, they're guaranteed to have a root signature with it.

13 minutes ago, pcmaster said:
@piluve, you'll also have memory overhead/waste in the first table, because UAV u0, for example, will be always expected at offset 16+16=32, no matter if you used 16+16 CBVs+SRVs or much fewer. Also, you have to copy all SRVs+UAVs also if only one of the CBVs changed, into another 'version' of the table.

Thanks for pointing this out, I didn't read close enough to see it was a single table with 3 ranges, I assumed it was 3 tables.

Infinisearch

3,058

December 05, 2017 05:12 PM

On 12/4/2017 at 10:43 AM, SoldierOfLight said:
As for within the shader, I don't see what optimizations could occur. The root signature indicates where to grab the descriptor, but the descriptor table is populated without knowledge of the root signature that will consume it, and the root parameter is set without knowledge of which PSO will consume it.

Basically I was thinking reduced register usage. My thinking is that when you compile a PSO against a root signature it will realize that certain bindings aren't used and not bother using a register for it when it is needed or as a "prefetch".(shader setup just prior to shader execution)

@SoldierOfLight BTW if something in the root signature is unused in a particular PSO, is it still validated? If not that could be a win too.

-potential energy is easily made kinetic-

SoldierOfLight

2,378

December 05, 2017 05:25 PM

Oh, of course, I'd expect a shader to only have instructions to fetch (or prefetch) data that will be used within the shader. However for some hardware, a "SetDescriptorTable" operation turns into (e.g.) multiple "SetDescriptor" operations, and since descriptor table setting happens without knowledge of a particular PSO, it needs to respect the bound root signature instead.

As for validation, again it depends on the tier and type of descriptor. On tier 3, nothing is required to be valid unless it's actually referenced in a shader (potentially even including conditional call flow, depending on compilation flags). But on tier 1, everything is required to be valid if it'd declared in the root signature, regardless of whether it's referenced in a shader. And tier 2 is a middle ground, where SRVs/samplers are "bindless"/unvalidated, but CBVs and UAVs need to be valid.

Infinisearch

3,058

December 05, 2017 06:56 PM

1 hour ago, SoldierOfLight said:
However for some hardware, a "SetDescriptorTable" operation turns into (e.g.) multiple "SetDescriptor" operations, and since descriptor table setting happens without knowledge of a particular PSO, it needs to respect the bound root signature instead.

Are you talking about tier 1 hardware in general or a specific vendor and family? My guess is the former. Also in a command list you set a root signature and then set a compatible PSO... is it possible for the driver upon building/validating the command list it would realize it could skip some "SetDescriptor"'s from a "SetDescriptorTable"?

-potential energy is easily made kinetic-

SoldierOfLight

2,378

December 05, 2017 06:59 PM

Correct, the former.

No, because the same root signature and the same descriptor tables can be used across multiple draw calls using different PSOs. Theoretically if the driver wanted to spend a lot of time searching and accumulating across all PSOs and draws that used a particular descriptor table, they could, but that goes against the spirit of D3D12, which is that the app knows what it's doing and is being explicit about the work it wants done, so that the driver doesn't need to waste time trying to be smarter than the app.

Overhead off unused DecriptorTables inside RootSignatures.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Overhead off unused DecriptorTables inside RootSignatures.

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines