Advertisement

Compute Shaders Engine Integration

Started by July 24, 2019 06:10 PM
2 comments, last by MJP 5 years, 6 months ago

I want to experiment with offloading some engine tasks to compute shaders (e.g. computing SAT, blur, etc.). Yes, I know I can do it in a pixel shader but I want to experiment with compute.
I'm hesitant to switching from a graphic pipeline to a compute pipeline back and forth during a normal flow of producing a frame
.
I was wondering what are the best practices for integrating compute shaders in an engine?
Is the recommendation to "separate" all compute from graphic related tasks and either do them all at the beginning or ending of a frame?
Any real life examples would be great! Thanks!

2 hours ago, iradicator said:

Is the recommendation to "separate" all compute from graphic related tasks and either do them all at the beginning or ending of a frame?

Depends on API (what options you have at all), but mainly on hardware (Async Compute supported?).

The ideal case would be a kind of task graph configurable to various hardware settings. E.g. older NV: Do it all serially - AMD: Do it async.

The latter case requires still some experimentation. It is advised to pair bandwidth heavy tasks with ALU heavy tasks (for example do async compute while rendering shadow maps). I remember another advise from AMD to use long presistant compute threads because sheduler can become a bottleneck. And cache thrashing can even hurt performance.

It is also possible to do multiple different compute tasks async using multiple queues, or if there are no dependencies (barriers) between dispatches, async compute happens automatically within a single queue anyways.

So there are many options - a matter of testing and profiling, and not easy to figure out. Personally i struggled initially, because i had used only Vulkans dedicated compute queues for my tests. Results were disappointing. But then i found out only the main gfx/compute queue offers full compute performance, while the dedicated compute queues seem somehow throttled / intended for background tasks. Knowing this i got close to optimal results in the end, but such things are not documented, given queue priorities at API level are ignored, etc. (Not sure how things have changed recently.)

 

Advertisement

You should be fine as long as you're not constantly doing Draw -> Dispatch -> Draw -> Dispatch where all of the draws are to the same render target, because you'll get more stalls than usual. Generally people will do things more like "Draw a bunch of things to a render target, then run a compute shader that reads from that render target and outputs to another texture, then draw some other things to a different render target" which works fine. The kinds of operations you mentioned all sounds totally fine for compute, so I wouldn't worry about it too much.

This topic is closed to new replies.

Advertisement