CONUNDRUM
I'm very new to DirectX C++ programming, I come from a unity background but I'm trying to optimize my procedural mesh generation system to run on the graphics card with the new Unity ECS and Jobs system. For that I need native access to the render API and I've settled on DirectX11, I spent the last week or so learning DirectX11 and implementing a Compute Shader marching cubes implementation to run in native code in a multi threaded environment. Everything seems to be working but because I don't have much experience implementing a Multi-threaded rendering system I thought I'd ask here if I was doing it right. I plan on running my own tests but nothing beats someone with years of experience, if anyone one has any advice to offer that would be amazing :).
IMPLEMENTATION
So for my rendering system I knew I had to minimize the amount of blocking going on between the processing job threads and the rendering thread if I wanted to get any performance out of this at all. I decided to follow a similar double buffer design as modern rendering APIs; I have one front queue of draw calls being rendered by the rendering thread, and then a back queue that is being allocated to from the processing threads. At the end of the frame on the main thread I "present" the back queue to the front queue and swap there pointer memories, I of course do this in a windows CRITICAL_SECTION lock. Then again in the render thread I use the same CRITICAL_SECTION and lock it while I access the front queue. I copy the contents from the front queue into a dynamic buffer and then release the lock, I then proceed to render using this copied version of the front queue buffer. I copy the buffer instead of rendering directly from it because I want to minimize the lock time for the main thread present task.
On top of this I also have to guarantee that the resources in the front queue that are being rendered are not destroyed or corrupted while they are being accessed. To do this I implemented my own thread safe pinning system. It's like a reference counting system except it deletes the data whenever I tell it to delete it in the processing thread, but it does not delete the object holding the data so I can tell whatever other thread that is attempting to acquire the lock that that the data is gone. When all pins are released and the objects gpu data has been killed, the holding object is destroyed. I use another CRITICAL SECTION per renderable object to pin, unpin, and generally access and modify this holder data.
PRESENT QUEUE
EXECUTE DRAW
QUESTIONS
1.) Is it reasonable to copy the whole front buffer, element by element, into a dynamic array and delete it after rendering? Will this be too much allocation? Would it be better to just lock the whole front queue while I am rendering and render directly from it.
2.) Is it reasonable to use a CRITICAL SECTION for every renderable object and for pinning and unpinning? Is that too many critical sections? Is there a possible workaround with atomic functions and would there be a way to do automated pinning and unpinning so I can use std::copy instead of manually going element by element and pinning the data. I feel more secure knowing exactly when the data is pinned and unpinned aswell as when it is alive or dead. (BTW the pin and unpin methods also unlock the CS, that's why you see a lock with no unlock)
3.) Is there a better method for threading that does not require 3 buffers at once or maybe just a better way of preserving the integrity of GPU data while it's in the render thread being rendered.
4.) Am I making any noob D3D11 mistakes . This is my first time using it. Everything seems to be working in the Unity Editor but I want to be sure before I continue and build off of this.