Question:
When a DX11 application executes Map/Unmap with D3D11_MAP_WRITE_NO_OVERWRITE, is there a way to efficiently know which region of the provided buffer was modified?
Situation:
I hooked all DX11 calls of “Heroes of the storm” game and made it playable on low-end machines.
https://www.reddit.com/r/heroesofthestorm/comments/g3piro/i_reprogrammed_hots_so_you_can_play_it_on_a_poor/
The game creates 2MB vertex and 0.5MB index buffer for all the dynamic geometry (fancy magic spells/particles). Assume there are 200 Map/Unmap D3D11_MAP_WRITE_NO_OVERWRITE writes per frame into these 2 buffers.
Let's say I want to have my own memory management, so when the game calls my hooked Map/Unmap version, I provide my own cpu buffer and later I need to pass these writes to the real DX11 Map/Unmap functions. The problem is, that there is no way the API can tell me which regions of the memory I have to propagate (Map uses D3D11_MAPPED_SUBRESOURCE structure that tells us only the base pointer and size of the whole buffer, we cannot request to map a specific region).
Solution 1:
When the "Draw" function is called, I know if the dynamic vertex buffer is bound and if it is, then I know which range it tries to draw, which means I know exactly what range had to be modified. Similary, when "DrawIndexed" is called, I know exactly the range of indices that is used, so I just copy that range. The problem in "DrawIndexed" is that I don't know what range of vertices I have to update, because multiple indices can point to the same triangle, hence I don't know the length of the vertex data. I could simply manually traverse the list of indices to get min/max index, from which I can calculate the range or I can determine the theoretical maximum range presuming there is not a single index pointing to the same vertex. The disadvantage is, that if the game executes a single Map/Unmap with a bigger region and issues 10 Draw calls with this region, I will have to also execute 10 Map/Unmaps or somehow smartly group them with some heuristics.
Solution 2:
Provide my cpu buffers with PAGE_READ access, so when the game tries to write to a page, it triggers the interruption, so I should know at least page aligned region and then change to PAGE_READWRITE??? I don't have any experience here to judge the efficiency/possibility of this solution. This leads to a generic question, if there is any way to efficiently know when a CPU writes into a memory.
Solution 3:
Provide my cpu buffers filled with a magic number pattern and after Map/Unmap we can read where the pattern gets changed. Simplest solution, but requires additional writes then reads.
Bonus question:
Because of this, I realized that if I cannot know the range that is being copied, neither can DX11 API. Is it really necessary to call Map/Unmap every time we are writing with D3D11_MAP_WRITE_NO_OVERWRITE? I checked all the memory flags of such mapped pointer:
PAGE_READWRITE | PAGE_WRITECOMBINE
MEM_COMMIT
MEM_PRIVATE
Such mapped memory is created with these flags and after calling Map/Unmap/Flush, none of the flags are ever changed and the pointer remains the same over all D3D11_MAP_WRITE_NO_OVERWRITE Maps/Unmaps (unless we call D3D11_MAP_WRITE_DISCARD which makes sense).
I didn't find much about the mysterious PAGE_WRITECOMBINE flag, but I can only guess that even if there is some HW thing that tells the GPU, oh now the cpu wrote some bytes here, quickly copy them to your video memory, such information would be probably just page aligned. But if we are writing in the middle of the page (which we can during D3D11_MAP_WRITE_NO_OVERWRITE), we couldn't copy the whole page, if the GPU is currently using the lower part of the page (at least MSDN prohibits it, or maybe it is allowed, because we didn't modify it, so when it copies the whole page, it overwrites the lower part with the same data). Anyone knows how is it done and if the Map/Unmap are needed every time we write with D3D11_MAP_WRITE_NO_OVERWRITE?