I'm working on a GPU based particle system in DX12. Each frame I run a compute shader that updates all the particles and appends the indices of the active particles into an index buffer. When it's time to render, I use this index buffer as a lookup table so that each particle can find its own data.
Atleast that's the plan. The problem is that I don't know how I should handle the indirect drawing. As I see it, I currently face 2 problems:
- If I use the UAV counter from the index buffer, then I will make a draw call with 1 vertex per particle. This means I'd need to use the geometry shader to expand this single vertex into a quad. And from what I've heard, the geometry shader is quite slow and preferably avoided.
- If I instead use a separate counter that I increase by 4 for each particle that should be rendered, then I can use modulo and division to create a quad in the vertex shader. But in order to prevent a data race for each thread in the compute shader, I will probably need to use some kind of atomic writes into this buffer, which I have no idea how to do.
So I'd really appreciate some guidance of what the best option here is, and if it would be option 2, then how I could prevent a data race when writing to the counter buffer. Or maybe there's a much better way to handle this and I'm just overcomplicating it?