This is pretty much a question for @SoldierOfLight, probably..
I've read a ton of information about the different flip modes and the various ways of configuring the swap chain. Would really like to get down to near 0ms latency at 60fps.
My GPU is somewhat old - NVidia GTX 430 - but my software is up to date. Latest NVidia drivers, latest Windows 10 (April 2018 version 1803)
PresentMon indicates dwm.exe is "Hardware: Legacy Flip" (not sure if this is important but thought I'd include it since 'Legacy' sounds bad)
If I run windowed, PresentMon indicates "Composed: Flip" with a latency around 48ms
If I run fullscreen with SetFullscreenState(true), PresentMon indicates "Hardware Composed: Independent Flip: Plane 0" with a latency around 46ms
If I run fullscreen as just a borderless window covering the whole screen, PresentMon indicates "Hardware Composed: Independent Flip: Plane 0" and around 32ms latency
In windowed mode, DXGI_SWAP_CHAIN_DESC1 setup is:
swapChainDesc.SwapEffect = DXGI_SWAP_EFFECT_FLIP_DISCARD;
swapChainDesc.Flags = DXGI_SWAP_CHAIN_FLAG_FRAME_LATENCY_WAITABLE_OBJECT;
SetMaximumFrameLatency is 1 frame
(DISCARD seems to have the same latency as SEQUENTIAL)
In fullscreen mode with SetFullscreenState, I find I have to remove the WAITABLE_OBJECT flag - if I don't, DX gives an error when SetFullscreenState is called. Running in DX debug mode, it logs a message saying that the WAITABLE_OBJECT flag can't be combined with fullscreen (although I've seen other posts claiming that this restriction was lifted at some point?? not on my machine hehe)
when I call present, I'm just calling swapChain->Present(1,0)
Questions:
1) Why can't I combine WAITABLE_OBJECT with SetFullscreenState?
2) Do I need to use SetFullscreenState anyway? Currently the lowest latency is just borderless window covering the screen, with 32ms latency. But why is it not 16ms?
3) Why is SetFullscreenState slower, at 48ms latency? It's worth mentioning that I am in this case also creating a borderless window that covers the screen.. and then calling SetFullscreenState on that window.. maybe that's confusing the system (?)
4) Is "Hardware Composed: Independent Flip: Plane 0" the best I can hope for or is there some other flip mode that is optimal? If so, what changes do I need to make to the code to get there?
-------
More information after further testing:
With the borderless fullscreen window (not using SetFullscreenState), loop looks like this:
1) WaitForSingleObject(WAITABLE_OBJECT)
2) Spin loop for 15ms (almost the entire duration of the frame) <-- added after writing the original post
3) read controller/user inputs
4) Draw the next frame of the game
5) Present
With the above, PresentMon indicates around 17ms latency with "Hardware Composed: Independent Flip: Plane 0"
Is this as good as I can do or can I somehow get the latency reported by PresentMon even lower?
I am measuring controller-to-display latency with a 240hz camera and a gamepad with an LED wired into the start button. I am seeing as low as 5 240hz frames (just over 16ms latency) between the LED lighting up on the controller and visible results appearing on screen. But, sometimes I see up to 14 240hz frames. The average is probably around 8/9 frames. Have I minimized the latency from the perspective of the application? For some reason I feel like I should be able to achieve very close to 0ms latency. Conceptually if I wait until the very end of a vertical refresh cycle.. then sample the user input, draw the game, call Present() *right* before the gpu is ready to display the next frame.. then it would get my back buffer and swap it to front only 0-2 ms after I call Present. How do I get to a solution like this?
If you're curious I'm using the Dell 2414H monitor which is reviewed to have 4ms latency, and other tests I've done with dedicated hardware more or less confirm this (http://www.tftcentral.co.uk/reviews/dell_u2414h.htm#lag)
Thanks!