Indirect Draw For Particle System In DX12

fighting_falcon93 · 2020-04-30T18:06:51

I'm working on a GPU based particle system in DX12. Each frame I run a compute shader that updates all the particles and appends the indices of the active particles into an index buffer. When it's time to render, I use this index buffer as a lookup table so that each particle can find its own data. Atleast that's the plan. The problem is that I don't know how I should handle the indirect drawing. As I see it, I currently face 2 problems: If I use the UAV counter from the index buffer, then I will make a draw call with 1 vertex per particle. This means I'd need to use the geometry shader to expand this single vertex into a quad. And from what I've heard, the geometry shader is quite slow and preferably avoided. If I instead use a separate counter that I increase by 4 for each particle that should be rendered, then I can use modulo and division to create a quad in the vertex shader. But in order to prevent a data race for each thread in the compute shader, I will probably need to use some kind of atomic writes into this buffer, which I have no idea how to do. So I'd really appreciate some guidance of what the best option here is, and if it would be option 2, then how I could prevent a data race when writing to the counter buffer. Or maybe there's a much better way to handle this and I'm just overcomplicating it?

Graphics and GPU Programming Programming DX12

Started by fighting_falcon93 April 29, 2020 07:17 PM

34 comments, last by NikiTo 4 years, 9 months ago

NikiTo

245

April 30, 2020 12:24 PM

JoeJ said:
To be clear, you'd need to show from where you read your data and how you index it.

particle.xyzw = bufferInVRAM[(i * 64) + flatThreadID];

No problem - all the data is parsed and pushed. You can use 32 threads too.

It is just an example to show you it works. GPU hates large loops, so it will be super slow. But it works to prove my point.

JoeJ

4,406

April 30, 2020 01:29 PM

I get you, but you still use an LDS buffer large enough to store indices for all particles, which becomes the ‘second buffer’ and so you also agree to my claim of doubling the memory requirement.

So, to defrag particles i see no reasonable alternative to having 2 (or one double sized) buffers.

A swapping pairs algorithm like bitonic sort could avoid this, but would need multiple iterations over all particles and dispatch + barrier for each iteration, so that's surely a big loss for performance.
Maybe it would be worth it if sorting by depth is necessary anyways, which reminds me on this paper: https://de.slideshare.net/DevCentralAMD/holy-smoke-faster-particle-rendering-using-direct-compute-by-gareth-thomas

NikiTo

245

April 30, 2020 01:49 PM

JoeJ said:
I get you, but you still use an LDS buffer large enough to store indices for all particles, which becomes the ‘second buffer’ and so you also agree to my claim of doubling the memory requirement.

No no. The VRAM buffer has 4bln of particles.

JoeJ said:
Maybe it would be worth it if sorting by depth is necessary anyways, which reminds me on this paper:

I'm gonna reinvent that well….

JoeJ

4,406

April 30, 2020 02:04 PM

NikiTo said:
No no. The VRAM buffer has 4bln of particles.

So you claim you have a 8bln sized buffer, containing 8bln particles, randomly active or inactive, and you can defrag this with time complexity of O(1) per article, without a need for other memory (except LDS).

Then i would like to see the full algorithm please. I'd certainly learn from this.

NikiTo

245

April 30, 2020 06:06 PM

Being the travel from VRAM to CU a huge slowdown. Just compare a shader that reads from 4bln pixels and operates on them one by one, and a shader that eats chunks of 8K… HUGEEEEE difference!!!

No program is about just sorting. No program is about just adding two arrays. No program is so simple. The real speedup comes from completely re-designing an algorithm to live comfortably inside the GPU.

Indirect Draw For Particle System In DX12

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Indirect Draw For Particle System In DX12

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines