Advertisement

When do we need Dx12 Descriptors to be contiguous?

Started by June 19, 2023 06:48 AM
5 comments, last by Gnollrunner 1 year, 6 months ago

In many tutorials and examples code of DirectX 12 we can find ways to manage descriptors by creating a few descriptor heaps and allocating manually descriptors inside of them (e.g. in Mini Engine the DescriptorHeap class or the DescriptorAllocator in the excellent 3dgep tutorial).

So far every example I found on the web does support allocation of n continuous descriptors and therefore implement a non trivial allocator to support it. However all the call sites where several descriptors are allocated at once seems more like a convenience than a requirement since each individual descriptors is then bound individually later in the code.

I don't really understand in which case we do need to have continuous descriptors, other than possible cache friendliness.

Is there specific feature or cases where it's required to have several descriptors next to each others ?

The examples I found:

If they go in a descriptor table, then they need to be contiguous. When you build your root signature you tell it the ranges of CBVs, UAVs and SRVs that the root signature entry will point to. I believe the last one can be variable. You can also put descriptors directly in the root signature but there is some limit to the number you can have.

I can describe how I use this in my engine if you like, but usage is quite dependent on what exactly you are trying to achieve. The main problem I found with DX12 is that there are a lot of features, but little guidance on how to use them to achieve your desired outcome.

Advertisement

Thank you for your reply.

I completely agree with your last point and I'm curious indeed to know how you manage descriptors. At the moment I'm simply trying to get familiar with the API and understand enough to design a small abstraction layer to simplify usage (a very lightweight equivalent of bgfx or the-forge), more as an exercise than anything else.

From what I understood, digging a little more into it today (and analysing the different codebase I linked earlier), I was kind of mixing several things into one: (I take MiniEngine as example, as I supposed it's how the API was meant to be used) There seems to be a DescriptorAllocator, a DescriptorHeap and a DynamicDescriptorHeap.

My understanding is that the DynamicDescriptorHeap in the MiniEngine seems to be used to build descriptor tables that match the root signature (it involves copying descriptors). So in that context, as you said it makes sense to alloc/reserve several contiguous descriptors. DynamicDescriptorHeap seems to be a linear allocator so no big deal to support such feature. And it would make sense to use with CBV, SRV and UAVs.

The DescriptorAllocator however allocates ID3D12DescriptorHeaps that are not “shader_visible” (cf. https://github.com/microsoft/DirectX-Graphics-Samples/blob/master/MiniEngine/Core/DescriptorHeap.cpp#L39​ ). So I assume this is not meant to be used with descriptor tables (that is limited to RTV, DSV? not sure about that). I also noticed that DescriptorAllocator in MiniEngine does not actually implement a complex variable size allocator unlike other links I shared, it's just a classic linear allocation. So maybe there is no relevance to the fact that DescriptorAllocator supports n contiguous allocation in MiniEngine, other than it was easy and convenient to implement. And other implementations provide it simply for ease of use (after all it's much easier to handle a single ID + fixed offset for a group of descriptors than a different ID for each descriptor of the group).

Here's what I do. Keep in mind this is just what I came up with, and I'm not claiming it's an optimal solution. It's what works for me. I'm doing procedural generation, so I have to contend with a lot of creation and destruction at run time. If you're not working under the same constraints, there are likely better solutions. But anyway ……..

First I have frame slots. This is information for every frame that can be “in flight”. I support N frame slots but in practice this will be set at like 2 or 3. For every frame slot I have a descriptor heap. Technically you can use one heap for all slots and you can write to it while it's being used by another slot, as long as you don't write to the same area. However, to keep my sanity I decided to make things simple and have one descriptor heap per frame slot (so again like 2 or 3).

I have the concept of a “material”. A material is just a set of shaders and a set of resources that the shaders can use. The resources are what we are concerned with here. I call the resource part of a material a “data-set”. I currently support 0→7 CBVs, 0→7 UAVs and 0→15 SRVs in each data-set. Each data-set is basically a descriptor table in the heap(s).

The tricky part is managing everything. I have a descriptor allocator which allocates descriptor space in powers of two (1,2,4,8,16,32). When you deallocate, it puts space on a free-list for the size you are dealing with, so it can be reused by another data-set later. So for instance, if I need 6 descriptors it would check the size 8 free list to see if there is anything free. If not it adds space to the end of the used area of the descriptor heap. Note in this case there are 2 wasted descriptor slots.

As I said I have a descriptor heap for each frame slot which means most of the time they will be mirrors of each other. However, if you delete a material, it's data-set sill might be in use by an in-flight frame so you can't just destroy everything at once.

What I have is a command queue which takes two types of commands: add data-set and delete data-set. When you want to add a material, you add a command on the queue to do the add of it's data-set. Commands on the queue get executed before each frame. Note, your add command also has to get executed for subsequent frames, so it doesn't get removed from the queue right away. It has a countdown counter. When that reaches zero, we know it's been added to the descriptor heaps for each frame slot, and so we can remove the command from the queue. The same thing happens with delete data-set commands.

One advantage of the separate descriptor heap for each frame slot, is we can grow a heap if it completely runs out of space, and it won't mess up any other inflight frames.

Now we have to deal with the root signatures. My root signatures have a couple slots for general purpose constant data that's used in all rendering. The final slot is a pointer into to the table for the material's data-set we are currently using. The tricky part is, that slot has to be configured for the number of CBVs, UAVs and SRVs in the material we are rendering with. This means I would normally have to create and destroy root signatures a lot, which I wanted to avoid.

What I have is root signature pool. Since the last one (SRVs) can be variable all I have to worry about is variations in the CBV count and UAV count, which is 8 x 8 or 64. So I have a table of 64 possible root signatures (Edit: Actually, I think it's 128 to handle the case where there are no SRVs at all) . They aren't actually all created, however. Each root signature is created the first time it is needed and then cached for future use, so in general most of the pool will be empty.

This is all from memory, so I hope I didn't miss anything. As you can see this stuff can get a bit sophisticated, but again you may not need all the features I do, or you may need different features.

Thanks for sharing the details.

Having bucketed free-lists for the descriptor allocator is a smart move. That takes away a lot of the complexity.

I have so follow up question:

  • Did you notice significant spikes when creating a new root signature that is not yet in you pool ? I've yet come around profiling anything so that I didn't know whether it could be an option.
  • Did you write some abstraction layer for all of what you describe or is it fairly tied to Dx12 API ? Especially around root signature and handling CBV, SRV, UAVs etc.

I have so follow up question:

  • Did you notice significant spikes when creating a new root signature that is not yet in you pool ? I've yet come around profiling anything so that I didn't know whether it could be an option.

I haven't seen any spikes for the root signature creation. But keep in mind root signatures aren't created verry often. In fact, they will mostly be created when the program starts running and after that it will be pretty rare since they are reused. If it was a problem, I could just fill the table when the program starts, but I haven't noticed anything.

However, I did notice BIG spikes when deleting mesh resources. This might be specific to my usage since my LOD updates may involve deleting dozens of meshes between frames. My solution was to implement a “mesh cemetery”. Instead of deleting meshes in the render thread, they are handed off to a separate thread that is normally waiting. When it gets a signal, it starts up (I use a pair of Semaphores for communication). What it actually does is release smart pointers and let everything that has no references delete itself. Now there are no spikes as near as I tell, but I had to rework the cemetery a couple times to get things smooth. This was mostly because I did some stupid things however.

  • Did you write some abstraction layer for all of what you describe or is it fairly tied to Dx12 API ? Especially around root signature and handling CBV, SRV, UAVs etc.

Yes of course. IMO it's pretty much insane to code most low-level APIs directly into a game. This is especially true of DX12 but it applies to OpenGL etc. The whole thing is abstracted but since I did it myself, it's optimized around my usage. Also, I'm free to add features when I feel something's lacking. I wouldn't say anything is tied to DX12 but I certainly keep DX12 in mind when I wrote my API. Probably if I wrote an OpenGL driver for it there would be a lot of NOPs since it doesn't really have the concept of in-flight frames.

This topic is closed to new replies.

Advertisement