Alberth said:
That is, a single instruction stream shared by all processors. In such a setup, “delete for some” isn't feasible at all.
This should not matter, since most likely an entire CU / SM is filled with the same draw call.
Personally i do not think they analyze pixel shaders to optimize former vertex shaders in the current pipeline. Since API and GPU scheduler can not know which combinations of shaders will be used during the frame, the optimization would need to happen per draw call at runtime, requiring to modify code that has been already compiled.
Since this is usually resolved using shader permutations, which is the responsibility of devs not drivers, related optimizations done by driver / on GPU would be redundant and cause more harm than good.
But maybe using GPU profiling tools would help to clarify. Ideally they show which interpolators are active for a draw call, and compiler output as well.