Hello!
I need some guidance on CommandQueue/CommandAllocator/CommandList management. In my current project I have a few "systems" that need to execute graphical commands, such as rendering terrain, rendering water, rendering particles etc. Right now my project is very simple so I'm not even using command lists during initialization. However, that's starting to be required.
Currently I'm just using a single command queue with a ring buffer of 2 command allocators that get recorded by a single command list. Each time I render the scene, a command allocator and a command list is being reset and then recorded. After all commands has been recorded, the list is executed and the swap chain is flipped. Here's some pseudo-code:
void Initialize()
{
[...]
device->CreateCommandQueue(...);
device->CreateCommandAllocator(...); // commandAllocator[0]
device->CreateCommandAllocator(...); // commandAllocator[1]
device->CreateCommandList(...);
commandList->Close();
[...]
}
void Render()
{
WaitForPreviousFrame();
commandAllocator[i]->Reset(); // i = swapChain->GetCurrentBackBufferIndex()
commandList->Reset(...);
RecordAllCommands();
commandList->Close();
commandQueue->ExecuteCommandLists(...);
Signal(...);
swapChain->Present(...);
}
The issue with this is that I cannot record commands during initialization, and with this design it's also quite cumbersome to execute command lists multiple times during one frame since the command allocator ring buffer is tied together with the swap chain buffer index. So I started to think about how I should redesign this, preferably also with future support for threading. And I've thought about it for quite some time now and can't come up with a good solution.
One idea is that each system should have their own command list with a ring buffer of 2 command allocators, and then record it and just use a global command queue to execute the list. This works well from a parallel point of view, but the issue is that now each system need to check individually if the GPU is done with the commands before resetting the command allocator. This feels like a huge CPU waste.
Another idea is that there is only one global command list, that is aviable already during initialization of other systems, and after the initialization this command list gets executed, before entering the game loop. During the game loop, the global command list gets executed once per frame as I do it now. However, there are 2 issues with this. First of all, some systems might want to execute their commands earlier than at the end of each frame. Secondly, if multiple threads record into the same command list, then we might get a situation like this:
commandList->SetPipelineState(pipelineState1); // Thread 1 wants pipelineState1.
commandList->SetPipelineState(pipelineState2); // Thread 2 wants pipelineState2.
[...]
commandList->DrawInstanced(...); // Thread 1 expects pipelineState1 to be set...
I'm out of ideas of how to implement this in a simple and elegant way. Or maybe I'm doing this entirely wrong. Basically what I need is:
- Systems should be able to record commands already during initialization.
- Atleast during initialization, it should be possible to execute commands in multiple steps and even wait for the GPU to complete them.
- When rendering the scene, it would be nice if multiple threads could record commands in parallel.
Does any of you have a good solution to this problem? What is the AAA game engine way of dealing with this?