taby said:
do I need to use a staging buffer to copy from CPU memory to image? Or can I somehow just use memcpy? You'd think it would be well documented, especially on stack exchange.
The problem is that docs read like patents, and tutorials etc. can't get around that slang easily either.
I helps me to know about the HW. From that i make an assumption of what should work deally technically, and then i research resources to confirm those assumptions and providing details on how to implement them.
So in this is case i start from the example of a texture in VRAM. I know very little about framebuffers, but i guess it's similar enough to a texture.
The texture must be tiled. Otherwise for a filtered texel fetch, we would have a large vertical stride between rows of pixels if they span the whole horizontal resolution of the image.
So let's guess they tile it to 16x16 texel blocks, and arrange the tiles in Morton or Hilbert order in memory, or some other cache efficient space filling curve.
Details don't matter, but they surely do something like that. Thus, no matter if our VRAM is on dGPU or iGPU, resource transitions are needed to convert forth and back to those vendor specific formats ('swizzling').
Likely the driver has to dispatch compute shaders to do this work, beside handling the memory transfer itself.
Another point is the type of memory we want to use. Ideal choices depend on HW, so vendors but mainly if it's dGPU or iGPU.
Newer HW can address the whole VRAM on a dGPU afaik, and maybe this could avoid a need for a staging buffer. Idk.
But you'd need a fallback using the staging buffer for older HW anyway, so i would not bother and just use the staging buffer.
Maybe, if you really know what you do, you could use memcpy using all hacks and tricks, on some HW configurations.
But i would not want to do this, because we can't treat all those various memory pools as general RAM. Sometimes the memory can't be cached for example, and so i rather use API functions over memcopy, assuming they implement related care, optimizations, scheduling AGP transfers, etc. properly.
Having all this in mind, i expect related VK functions require information about all those points, and translating patents blah to english becomes a bit easier… : /