Compute Shaders Engine Integration

Graphics and GPU Programming Programming

Started by iradicator July 24, 2019 06:10 PM

2 comments, last by MJP 5 years, 6 months ago

Author

5

July 24, 2019 06:10 PM

I want to experiment with offloading some engine tasks to compute shaders (e.g. computing SAT, blur, etc.). Yes, I know I can do it in a pixel shader but I want to experiment with compute.
I'm hesitant to switching from a graphic pipeline to a compute pipeline back and forth during a normal flow of producing a frame
.
I was wondering what are the best practices for integrating compute shaders in an engine?
Is the recommendation to "separate" all compute from graphic related tasks and either do them all at the beginning or ending of a frame?
Any real life examples would be great! Thanks!

JoeJ

4,399

July 24, 2019 09:22 PM

2 hours ago, iradicator said:
Is the recommendation to "separate" all compute from graphic related tasks and either do them all at the beginning or ending of a frame?

Depends on API (what options you have at all), but mainly on hardware (Async Compute supported?).

The ideal case would be a kind of task graph configurable to various hardware settings. E.g. older NV: Do it all serially - AMD: Do it async.

The latter case requires still some experimentation. It is advised to pair bandwidth heavy tasks with ALU heavy tasks (for example do async compute while rendering shadow maps). I remember another advise from AMD to use long presistant compute threads because sheduler can become a bottleneck. And cache thrashing can even hurt performance.

It is also possible to do multiple different compute tasks async using multiple queues, or if there are no dependencies (barriers) between dispatches, async compute happens automatically within a single queue anyways.

So there are many options - a matter of testing and profiling, and not easy to figure out. Personally i struggled initially, because i had used only Vulkans dedicated compute queues for my tests. Results were disappointing. But then i found out only the main gfx/compute queue offers full compute performance, while the dedicated compute queues seem somehow throttled / intended for background tasks. Knowing this i got close to optimal results in the end, but such things are not documented, given queue priorities at API level are ignored, etc. (Not sure how things have changed recently.)

MJP

20,297

July 25, 2019 04:43 AM

You should be fine as long as you're not constantly doing Draw -> Dispatch -> Draw -> Dispatch where all of the draws are to the same render target, because you'll get more stalls than usual. Generally people will do things more like "Draw a bunch of things to a render target, then run a compute shader that reads from that render target and outputs to another texture, then draw some other things to a different render target" which works fine. The kinds of operations you mentioned all sounds totally fine for compute, so I wouldn't worry about it too much.

The Blog | The Book

Compute Shaders Engine Integration

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Compute Shaders Engine Integration

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines