39 minutes ago, maxest said:
I tested both. No difference.
I thought about something along those lines but quickly came to a conclusion that it should not take place. I thought that everything should go and take as much time as in no-VSync case because it's the Present where the waiting happens; why would any redundant work happen in my actual computation time?
I just checked how much time Present takes with VSync and indeed it's something around 15 ms, with some variance of course. So still it's a mystery to me why the computation code I profile would take more time in VSync mode. Wonder if that would also be the case under D3D12.
EDIT: Encompassing the whole Render function with one disjoint ( http://reedbeta.com/blog/gpu-profiling-101/ ) actually works when VSync is off. I made wrong observation. It behvaes exactly the same as Begin/End of disjoint right before and after block we're profiling.
Even if you time only the work you're interested in (and not the whole frame), it's still going to take a variable amount of time depending on how high the GPU's clock speed happens to be at that point in time.
If the GPU can see it's only doing 2ms of work every 16ms, then it may underclock itself by a factor of 3-4x such that the 2ms of work ends up taking 6ms-8ms instead.
What's happening is something like this:
1) At 1500MHz, your work takes 0.4ms and ~16.2ms is spent idle at the end of the frame.
2) The GPU realises it could run a bit slower and still be done in plenty of time so it underclocks itself just a little bit to save power.
3) At 1200MHz, your work takes 0.5ms and ~16.1ms is spent idle at the end of the frame.
4) Still plenty of time spare, so it underclocks itself even further.
5) At 900MHz, your work takes 0.6ms and ~16.0ms is spent idle at the end of the frame.
6) *Still* plenty of time spare, so it dramatically underclocks itself.
7) At 500MHz, your work takes 3x longer than it did originally, now costing 1.2ms. There's still 15.4ms of idle time at the end of the frame, so this is still OK.
8) At this point the GPU may not have any lower power states to clock down to, so the work never takes any more than 1.2ms.
In D3D12 we (Microsoft) added an API called ID3D12Device::SetStablePowerState, in part to address this problem.
This API fixes the GPU's clock speed to something it can always run at without having to throttle back from due to thermal or power limitations. So if your GPU has a "Base Clock" of 1500MHz but can periodically "Boost" to 1650MHz, we'll fix the clock speed to 1500MHz. Note that this API does not work on end-users machines as it requires Debug bits to be installed, so can't be used in retail titles. Note also that performance will likely be worse than on an end-user's machine because we've artificially limited the clock speed below the peak to ensure a stable and consistent clock speed. With this in place, profiling becomes easier because the clock speed is known to be stable across runs and won't clock up and down as in your situation.
Since I don't think SetStablePowerState was ever added to D3D11, it should be simple enough to create a dummy D3D12 application, create a device, call SetStablePowerState and then put the application into an infinite Sleep in the background. I've never tried this, but that should be sufficient to keep the GPU's frequency fixed to some value for the lifetime that this dummy D3D12 application/device is created and running.