Hey
I'm working in an engine. Whenever we run dispatch calls they are much more expensive than draw calls for the drivers on nvidia cards(checked with nsight). Same cost maybe for running 5000 draw calls as a few 100 dispatch. Is this normal?
Hey
I'm working in an engine. Whenever we run dispatch calls they are much more expensive than draw calls for the drivers on nvidia cards(checked with nsight). Same cost maybe for running 5000 draw calls as a few 100 dispatch. Is this normal?
That doesn't sound right, but then again I've never measured it. I assume all your profiling is being done in an optimised build without the debug layer enabled?
Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group
10 hours ago, ajmiles said:That doesn't sound right, but then again I've never measured it. I assume all your profiling is being done in an optimised build without the debug layer enabled?
It occurred on both dev & release builds. It happened close to a deadline so I haven't done too detailed investigation yet. In this case the high numbers of CS-dispatch was due to some unoptimized setup on the art side so could be "fixed". However I'm planning to do some closer investigation soon, make some tests, swap with ATI card etc. I just wanted to check beforehand in case other people had an opinion. So cheers for the reply
Do you know which cost you were measuring in this case? As in, were you measuring CPU cost or GPU cost? Only Nvidia could say for sure what's going on in their driver, so I don't think that anybody here could tell you definitively why that might be the case.
In terms of GPU cost, one thing that can really cause some trouble in certain cases is that the D3D11 spec requires a full GPU sync point between dispatches. This can be expensive in terms of GPU cost, since it usually requires a full thread/execution barrier as well as flushing of caches. Normally you need this in cases where a 1 dispatch writes a result and the next dispatch immediately reads from it, but in cases where you have lots of dispatches writing to the same buffer or texture with no dependencies the flushes are unnecessary. Nvidia and AMD actually have special "extension" API's that can let you disable the sync/flush for cases where you know it's safe to do so, but you have to be verrrrrrrrrry careful when using these. If you mess it up, you can get unpredictably corrupted results. See the section called "UAV Overlap" from my list of D3D11 driver hacks for more info.
In terms of CPU cost, that's much harder to say. It really depends on what's going on the driver. For instance it might be doing some expensive check to see if it can skip the flush depending on which resources are bound when you call Dispatch, but it's impossible to say for sure without seeing the driver code.