D3D12 Fence and Present

acerskyline · 2019-01-22T08:04:00

I have been trying to figure out how does fence and present synchronize the pipeline when using vsync. I have read https://computergraphics.stackexchange.com/questions/2166/how-does-vsync-affect-fps-exactly-when-not-at-full-vsync-fps, https://www.gamedev.net/forums/topic/677527-dx12-fences-and-swap-chain-present/, https://www.gamedev.net/forums/topic/679050-how-come-changing-dxgi-swap-chain-descbuffercount-has-no-effect/, https://software.intel.com/en-us/articles/sample-application-for-direct3d-12-flip-model-swap-chains and https://docs.microsoft.com/en-us/windows/desktop/api/dxgi/nf-dxgi-idxgiswapchain-present. But I'm still a little confuesd. My major question is, assuming we are using triple buffer, will Present block cpu thread? If yes, when will it block cpu thread? I made this picture, please tell me which combination is the correct situation for next frame? In my opinion it should be B,E,H. But if it is really B,E,H, it doesn't conform to what the link#4 suggest under the classic mode section. As a matter of fact, I don't even understand how could GPU thread be 2 vsync late than CPU thread in the first place in that situation. Also, if it is really B,E,H, it doesn't conform to what Nathan Reed suggested in link#1. It seems in his example, cpu thread is not throttled by Present or vsync at all. Cpu threads start to work right after gpu finish its work.

Graphics and GPU Programming Programming DX12 Graphics

Started by acerskyline January 21, 2019 07:00 AM

15 comments, last by acerskyline 6 years ago

acerskyline

Author

January 22, 2019 02:14 AM

Based on your reply, I changed the original intel diagram a little bit just to make sure I understand what you mean.

The first diagram is the original one. The second diagram is what I made. The third one has some marks so that you know what I'm talking about.

Looking at the third diagram, you can notice the red rectangle indicates what I changed. I made the GPU work last longer. It caused some other changes to the pipeline. Indicated by the yellow rectangle, I presume this is what you mean by

1 hour ago, SoldierOfLight said:
A frame in the "present queue" is waiting for all associated GPU work to finish before actually being processed and showing up on screen

. The GPU work lasts longer for that frame. Consequently, the "present queue" has to wait for the GPU to finish this frame. Also, by

1 hour ago, SoldierOfLight said:
The way I prefer to think about / visualize it is that a frame is waiting in the GPU queue until all previous work is completed, and is then moved to a present queue after that, where it waits for all previous presents to complete.

I think you are saying now that the "present queue" will wait for GPU work to finish, we might as well think it as it will not be put in the "present queue" until GPU finish its work for that frame.

1.Now, my first question is, which way visualize what happens on the hardware level better? (Even though they make no difference conceptually. It only changes where the start of a "colored block" in "present queue" is conceptually and the start does not matter as much as the end.)

2.My second question is, within the green rectangle, the (light blue) CPU thread is blocked by a fence(dark blue) and then blocked by Present(purple), am I right?

3.My third question is, within the blue rectangle, the brown "GPU thread" (command queue) is blocked by a present to render target barrier, am I right?

SoldierOfLight

2,378

January 22, 2019 02:25 AM

3 minutes ago, acerskyline said:
1.Now, my first question is, which way visualize what happens on the hardware level better? (Even though they make no difference conceptually. It only changes where the start of a "colored block" in "present queue" is conceptually and the start does not matter as much as the end.)

On modern hardware under Windows, the number of commands submitted to the hardware at any given time is pretty small - generally one or two per piece of schedulable hardware (or zero if it's idle). Depending on what type of swapchain we're dealing with, a "present" operation is either a hardware operation (i.e. a flip / scanout pointer swap) or a software operation (notifying some other component about the frame). In both cases, the present is queued up alongside rendering work in a software queue, until the hardware is ready to process it. If the present is a hardware command, then it's submitted to the hardware when it reaches the front of the queue. If it's a software command, then it's processed by the OS at that time.

With that said, for some types of presents, a "present" object is constructed at the time where the present is enqueued. So, really both models are right - something is created at the time when Present is called, even though nothing actually happens with the present until all prior rendering work is complete.

8 minutes ago, acerskyline said:
2.My second question is, within the green rectangle, the (light blue) CPU thread is blocked by a fence(dark blue) and then blocked by Present(purple), am I right?

Any waiting due to a fence is completely up to the application. If the app only allows 3 frames of GPU work in flight, then yes that's where the app would block waiting for a fence. And yes, that is where the app would block in Present.

9 minutes ago, acerskyline said:
3.My third question is, within the blue rectangle, the brown "GPU thread" (command queue) is blocked by a present to render target barrier, am I right?

More or less - the GPU is not processing commands, because working on the next command list would involve writing to / modifying the swapchain buffer, and it's not ready to be modified yet. It's the entire command list that's stopped. The GPU does not process any commands in the command list before the barrier prior to waiting.

acerskyline

Author

January 22, 2019 03:07 AM

45 minutes ago, SoldierOfLight said:
More or less - the GPU is not processing commands, because working on the next command list would involve writing to / modifying the swapchain buffer, and it's not ready to be modified yet. It's the entire command list that's stopped.

Question 1, does this mean the present to render target barrier is unnecessary? (since the entire command list stopped (as opposed to the command list is being executed but get blocked at the barrier) because of some magic that the driver(?) made)

A separate question is, according to the Microsoft DX12 page, the buffer count parameter of DXGI_SWAP_CHAIN_DESC is:

Quote
A value that describes the number of buffers in the swap chain. When you call IDXGIFactory::CreateSwapChain to create a full-screen swap chain, you typically include the front buffer in this value. For more information about swap-chain buffers, see Remarks.

So, question 2, in the above example, isn't the actual buffer count is 4 (the number you created the swap chain with)? 1 of them is front buffer and 3 of them are back buffer. Only this way can it support the point that

45 minutes ago, SoldierOfLight said:
the GPU is not processing commands, because working on the next command list would involve writing to / modifying the swapchain buffer, and it's not ready to be modified yet

Because if the top "colored block" is not a part of the swapchain (means you created the swap chain with buffer count 3), why is the GPU blocked by that?

acerskyline

Author

January 22, 2019 03:23 AM

5 minutes ago, acerskyline said:
Because if the top "colored block" is not a part of the swapchain (means you created the swap chain with buffer count 3), why is the GPU blocked by that?

Oh wait I think I found a possible reason. Maybe it's because the copy operation in blt model is not finished. It's holding the front buffer. There ARE 3 buffers (1 front 2 back) but the "display buffer" is currently using one (front buffer) of them (to copy from) so the GPU command list is blocked by it until the copy operation is finished. Is this valid?

SoldierOfLight

2,378

January 22, 2019 07:25 AM

4 hours ago, acerskyline said:
Question 1, does this mean the present to render target barrier is unnecessary? (since the entire command list stopped (as opposed to the command list is being executed but get blocked at the barrier) because of some magic that the driver(?) made)

No, it's still necessary as a command inside the command list in order for the command list to be correct, I just wanted to clarify that while the normal role of barriers can include synchronization, this specific barrier is not synchronizing with presentation directly, but the rest of the things it's doing are still important.

As for buffer count, the number of buffers you ask for is the number of buffers you get, period. If you ask for 3 buffers, there are 3 buffers. The reason for the comment on that page was simply to contrast with D3D9 swapchains, where you specified a back buffer count instead of a buffer count, so there was sometimes one more than what you asked for.

Also, D3D12 doesn't support blt model, so there are no copies. The resource is either consumed by a composition pass, or directly by the screen. While a resource is being scanned out or composed from, you can't write to it.

acerskyline

Author

January 22, 2019 08:04 AM

Thank you so much for answering all my questions! All your answers are very helpful. I learned a lot. Thanks again!

D3D12 Fence and Present

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

D3D12 Fence and Present

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines