NikiTo said:
@JoeJ It looks like the driver will reorder the shaders in groups(of the wave size) if there is not a meaningful usage of the LDS. https://www.khronos.org/opengl/wiki/Compute_Shader#Compute_space
Why do you think so? The picture quote does not mention any driver side work on on this.
Some vendors may implement this, some not. So we can not rely on this and using too small workgroups remains bad practice.
Because AMD has the larger WGs and most devs focus on NV i can imagine they do this optimization, but would need to test within a project that is not bound on CPU - GPU communication.
@taby Do you also have an older repo with CPU code to calculating the fractal?
I have CPU multithreaded iso surface extraction and could paste your code in to compare performance.
My iso should be slower than marching cubes because i have guaranteed manifolds and i generate full mesh adjacency information, but could be still interesting.