ccherng said:
Secondly, as alluded to at the end of the article Vulkan and D3D12 support multithreading. Exactly how does this give benefits. Can the command buffer mentioned above avoid having to be fully sorted and have some commands issued independently and in parallel to the gpu and that is where the performance benefit comes. If that is the case then how does the gpu driver handle the synchronization of all these gpu calls coming in so that the overhead of this contention synchronization still do appreciable better than serially making all the gpu calls on one thread.
Let's say you need to issue 10,000 draw calls to render your frame, and each draw call takes roughly 1 microsecond. Doing all of those on a single thread means that it will take about 10 milliseconds to issue all of those draw calls, which would be more than half of a 16.6ms (60 Hz) frame. Without API support for multithreading (which is what you're stuck with in D3D11 and OpenGL) you can possibly do other work simultaneously on other cores while issuing all of those draw calls, but it's always going to take 10ms from when you start issuing draw calls until when you finish. This means it's impossible for you to run at full framerate on a 144 Hz monitor for instance, since you would need to get below 6.94ms for that to happen. Or alternatively if you increased to 20,000 draw calls you wouldn't hit 60Hz, since now you need 20ms to issue those draw calls.
With D3D12/Vulkan you can actually spread the work of issuing those draw calls over multiple threads. This means that if you have 4 cores completely available to you when it comes time to issue draw calls, you could crunch through those draws in only 2.5ms (at least in an idealized world with perfect parallelism and no issues from cache/memory contention or downclocking). Therefore you actually have a chance of hitting that 144 Hz target, assuming you can do the rest of the frame's work in 4.5ms. This also lets you potentially achieve lower latency than other techniques that can be used for achieving parallelism, in particular the “render thread” approach where the rendering thread issues draw calls a frame behind the gameplay code.
GPUs consume draws and other commands in large batches of commands encoded in a chunk of memory called a command buffer. So the CPU doesn't really feed the GPU 1 draw at a time, instead the CPU batches up hundred or thousands of commands into a buffer that's then submitted to the GPU at a later time. In D3D11 and OpenGL this is all hidden from you, but in D3D12 and Vulkan it's something you explicitly handle yourself. The way multithreading typically works with those APIs is that each command buffer can only be written by 1 thread at once. So you might break up your frame into say a dozen or so command buffers, and for each kick off a task on a multithreaded job scheduler that fills in those command buffers with draws and other commands. When all of those tasks complete you can then submit those command buffers to the GPU in a single function call. So there's not really much contention to worry about in this case: the GPU ultimately just executes a serial list of commands. Where things get more complicated is when you submit command buffers to multiple hardware queues, but that's a much more advanced topic.