Hi Folks,
I am currently designing a new rendering pipeline and I have a serious question about the mechanism behind MultiDrawIndirect/ExecuteIndirect...
Say that, for the purpose of efficient culling, I split all my geometries into batches and store these batches into a large buffer containing N batches.
A batch simply contains a pointer to the triangle StartIndex and an integer defining the triangle count (there can be from 64 to 512 triangles per batch).
Let's assume that I want to use MultiDrawIndirect to draw these N batches in a single draw call...
The draw arguments buffer would thus look somewhat like that:
----- batch 0 -----
vertex count: 89
start index: 0
----- batch 1 -----
vertex count: 394
start index: 89
----- batch 2 -----
vertex count: 145
start index: 483
...
Will the graphics hardware execute these draw calls in serial, e.g. draw batch 0 first then batch 1, etc...
In case the batches are executed in serial (batch 0 must be finalised before batch 1) then I reckon that given the size of the batches, some of my SM (streaming multiprocessor) will remain idle the whole time...
That's my assumption ATM so an alternative I came up with, is to use a single DrawIndirect (one vertex shader per batch) and using the tessellation shader to dynamically inject the geometries of the batches, that way all the batches will be execute simultaneously and my SM occupancy will always be 100% all the time.
I know that the tessellation shader has an certain overhead but maybe that would still be worth it if that could mean full SM usage...
Is there any GPU gurus out there that could tell me if I am too far off from reality ?