dpadam450 said:
Same concept as wavefronts. If all 4 pixels don't execute the same code path there are some issues that arrive.
Performance wise, the behavior is the same no matter if pixel or compute shaders, because pixel shaders also run in wavefronts / warps. If only one pixel out of 32/64 diverges, the other quads in the wave slow down too.
However, programmers need branches, otherwise there's not much you can do at all. Which is why i never understood the advice of ‘avoid branches’ even on very old GPUs.
Personally, i've had one problem which helped me to get rid of the compulsive desire to keep all your threads 100% busy all the time: Simple N-body problem.
Imagine we have workloads of many N-body problems, each having a variable size between 30 and 200 bodies.
The fastest way to process them is to have dispatches of varying workgroup sizes 32, 64, 128 and 256. Each workgroup processing problems ≤ their size. And > their half size, because then the next, smaller workgroup wastes less threads.
Rephrasing this, we can say: ‘To solve the problem most efficiently, you'll have 75% of your threads busy.' The rest is idle, but this does not change the fact of ideal efficiency.
The same is true for branches in general, if you look at it from this angle.