My Direct3D 11 game has a number of large vertex buffers (10K+ vertices) that get updated every frame, but on any individual frame only a subset of the vertices in the buffers are actually changed (perhaps ⅓ of them on average, but it varies greatly). The particular vertices that will be updated is not predictable ahead-of-time and changes frame-to-frame, making it impractical to split them into separate buffers or group them contiguously within a buffer.
So my question is: What is the most performant (i.e., uses the least amount of CPU time) way to update these vertex buffers every frame?
These are the methods I've tried:
1. Create the vertex buffer with D3D11_USAGE_DEFAULT
and update it with UpdateSubresource
from a copy of the vertex data stored in CPU memory. (The individual vertices are updated as-needed in the CPU copy, and then the whole thing is sent to the GPU.) With this approach, my game spends the majority of its CPU time within the UpdateSubresource
call.
2. Create the vertex buffer with D3D11_USAGE_DYNAMIC
and updating it by calling Map
/memcpy
/Unmap
from a copy of the vertex data stored in CPU memory. (As with the previous approach, the individual vertices are updated as-needed in the CPU copy, and then the whole thing is sent to the GPU. My understanding of D3D11_USAGE_DYNAMIC
is that it must be mapped using D3D11_MAP_WRITE_DISCARD
, which is what forces me to keep a local copy as not all vertices are updated every frame.) With this approach, my game spends spends the majority of its CPU time within the memcpy
call. This approach is slightly faster than the first approach. (My guess is because my game is generally CPU-bound, using a dynamic vertex buffer in this case is slightly preferable to default.)
3. Create the vertex buffer with D3D11_USAGE_DEFAULT
and update it by calling CopyResource
from another vertex buffer created using D3D11_USAGE_STAGING
. This staging buffer itself gets updating by Map
ing it when the individual vertices are about to get modified, modifying the individual vertices within it, and then Unmap
ing it before copying. This approach is several times slower than either of the previous two approaches, for reasons that I don't really understand, since I thought that this was pretty much what UpdateSubresource
does behind the scenes. (If someone can shed some light on this, I would be curious to know more!) In this approach, my game spends the vast majority of its CPU time within the Map
call (I'm guessing it's stalled by the previous CopyResource
call.)
Is there any approach you know of that might be better than these? I'm currently using the 2nd method above, and I would really like avoid the large amount of time my game is spending in that memcpy
call if at all possible, but maybe that's just something I have to live with?
Thanks in advance!