CS slow on the driver side

Graphics and GPU Programming Programming DX11

Started by 51mon November 24, 2018 03:21 PM

3 comments, last by MJP 6 years, 2 months ago

51mon

Author

345

November 24, 2018 03:21 PM

Hey

I'm working in an engine. Whenever we run dispatch calls they are much more expensive than draw calls for the drivers on nvidia cards(checked with nsight). Same cost maybe for running 5000 draw calls as a few 100 dispatch. Is this normal?

Adam Miles

3,468

November 25, 2018 02:53 AM

That doesn't sound right, but then again I've never measured it. I assume all your profiling is being done in an optimised build without the debug layer enabled?

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

51mon

Author

345

November 25, 2018 12:48 PM

10 hours ago, ajmiles said:
That doesn't sound right, but then again I've never measured it. I assume all your profiling is being done in an optimised build without the debug layer enabled?

It occurred on both dev & release builds. It happened close to a deadline so I haven't done too detailed investigation yet. In this case the high numbers of CS-dispatch was due to some unoptimized setup on the art side so could be "fixed". However I'm planning to do some closer investigation soon, make some tests, swap with ATI card etc. I just wanted to check beforehand in case other people had an opinion. So cheers for the reply

MJP

20,297

November 26, 2018 12:05 AM

Do you know which cost you were measuring in this case? As in, were you measuring CPU cost or GPU cost? Only Nvidia could say for sure what's going on in their driver, so I don't think that anybody here could tell you definitively why that might be the case.

In terms of GPU cost, one thing that can really cause some trouble in certain cases is that the D3D11 spec requires a full GPU sync point between dispatches. This can be expensive in terms of GPU cost, since it usually requires a full thread/execution barrier as well as flushing of caches. Normally you need this in cases where a 1 dispatch writes a result and the next dispatch immediately reads from it, but in cases where you have lots of dispatches writing to the same buffer or texture with no dependencies the flushes are unnecessary. Nvidia and AMD actually have special "extension" API's that can let you disable the sync/flush for cases where you know it's safe to do so, but you have to be verrrrrrrrrry careful when using these. If you mess it up, you can get unpredictably corrupted results. See the section called "UAV Overlap" from my list of D3D11 driver hacks for more info.

In terms of CPU cost, that's much harder to say. It really depends on what's going on the driver. For instance it might be doing some expensive check to see if it can skip the flush depending on which resources are bound when you call Dispatch, but it's impossible to say for sure without seeing the driver code.

The Blog | The Book

CS slow on the driver side

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

CS slow on the driver side

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines