Hey
I do some atomic-add operations in a shader on groupshared data (groupshared = local shared memory). At the end of the shader I write this data to VRAM. I'm trying to figure out if I need a barrier (GroupMemoryBarrierWithGroupSync) at the end of the shader or not. If I put the VRAM-write on the last thread in the group is that likely to be executed last in the group? Or is there some other heuristic/method that might work? There will be some dozens of instructions between the last atomic operation and the VRAM operation in the shader. The code will function even if occasionally some atomic-add operations “go missing”, but it would only be acceptable if this happens rarely, e.g. less than 0.1% chance.
I imagine this could be platform specific as well. I mainly just wanted to reach out and check if anyone seeing this message got a clue or experience in this regard ?