Hi,
I am writing a linear allocator of per-frame constants using the DirectX 11.1 API. My plan is to replace the traditional constant allocation strategy, where most of the work is done by the driver behind my back, with a manual one inspired by the DirectX 12 and Vulkan APIs.
In brief, the allocator maintains a list of 64K pages, each page owns a constant buffer managed as a ring buffer. Each page has a history of the N previous frames. At the beginning of a new frame, the allocator retires the frames that have been processed by the GPU and frees up the corresponding space in each page. I use DirectX 11 queries for detecting when a frame is complete and the ID3D11DeviceContext1::VS/PSSetConstantBuffers1 methods for binding constant buffers with an offset.
The new allocator appears to be working but I am not 100% confident it is actually correct. In particular:
1) it relies on queries which I am not too familiar with. Are they 100% reliable ?
2) it maps/unmaps the constant buffer of each page at the beginning of a new frame and then writes the mapped memory as the frame is built. In pseudo code:
BeginFrame:
page.data = device.Map(page.buffer)
device.Unmap(page.buffer)
RenderFrame
Alloc(size, initData)
...
memcpy(page.data + page.start, initData, size)
Alloc(size, initData)
...
memcpy(page.data + page.start, initData, size)
(Note: calling Unmap at the end of a frame prevents binding the mapped constant buffers and triggers an error in the debug layer)
Is this valid ?
3) I don't fully understand how many frames I should keep in the history. My intuition says it should be equal to the maximum latency reported by IDXGIDevice1::GetMaximumFrameLatency, which is 3 on my machine. But, this value works fine in an unit test while on a more complex demo I need to manually set it to 5, otherwise the allocator starts overwriting previous frames that have not completed yet. Shouldn't the swap chain Present method block the CPU in this case ?
4) Should I expect this approach to be more efficient than the one managed by the driver ? I don't have meaningful profile data yet.
Is anybody familiar with the approach described above and can answer my questions and discuss the pros and cons of this technique based on his experience ?
For reference, I've uploaded the (WIP) allocator code at https://paste.ofcode.org/Bq98ujP6zaAuKyjv4X7HSv. Feel free to adapt it in your engine and please let me know if you spot any mistakes
Thanks
Stefano Lanza