Advertisement

glUseProgram

Started by October 12, 2024 11:10 PM
2 comments, last by JoeJ 2 days, 10 hours ago

I’m writing CAD/CAM software and started rendering arcs and splines. There’s more vertex data required to render a spline than an arc, and more data required to draw an arc than a line. The shader code is different too. Currently, I have separate shader programs to handle drawing different types of lines, and switch programs (i.e glUseProgram) before making draw calls. Changing programs require a GPU state change, right? Which can be expensive. Is my approach normal, or do professional CAD softwares handle the differences within the shader?

Different state changes take different times. They take time, but being “too much” is relative. Today's most premium graphics cards will be different performance than a card 5 or 10 years old.

Things like vertex binding are relatively fast, you can do many million per second. Changing render targets is slow, you might reach 100k per second if you are pushing just that.

You might be able to hit a million or so GPU shader programs per second if you try, but your program is presumably doing more useful things like drawing to the screen. The difficulty is that the specific numbers are context dependent, state changes generally require waiting for existing work to finish, then the next state entered.

Microsecond scale tasks aren't something to do constantly, but hundreds or thousands per frame are possible if the systems are organized well. It is often cheaper to have a more complex program with more options than it is to swap shader programs, and much better to sort work to only change programs once instead of repeating then, but it is also not a sin to change programs as needed.

Advertisement

tlewick1 said:
Changing programs require a GPU state change, right? Which can be expensive. Is my approach normal, or do professional CAD softwares handle the differences within the shader?

The usual way to minimize state changes is batching all draws per program (or material, texture, whatever is your bottleneck).
I do not think CAD software would go much further than that.
In games, and using lower level APIs like Vulkan / DX12, we can go further. The related term is ‘GPU driven rendering’, and the goal is that the GPU can do all rendering work on its own without a need to frequently sync CPU and GPU, so they wait for each other to submit / recaive work.
This might look like so, for example:

We generate a single command buffer, which performs the following tasks:
Frustum culling per object.
Occlusion culling per cluster of triangles.
generate indirect draw calls for all the surviving clusters, binned by material.
Draw all the stuff.

After that we upload the command buffer to GPU once at startup.
Per frame we only upload a buffer describing which objects have changed their position, then we execute the command buffer.
Now the GPU does its work and the CPU is completely free for other tasks. Basically we only need a single ‘draw call’ from the CPU side per frame, which is enquing the command buffer.
That's the ideal case, but in practice it's very hard and complex to do. (Btw, recently they introduced a new feature ‘Work Graphs’ to DX12 and Vulkan, which helps so the GPU can generate its own work. But sadly it works only on the very latest GPU genrations.)

However, i would say that's total overkill for CAD software. Likely you prefer to keep your code maintainable over maximum framerates, and low level APIs really are a pain. OpenGL can do indirect draws, but no command buffers, no multithreaded context, etc. But it's simple and i would try to stick at that. OpenGL can still do frustum culling on GPU with a compute shader and genreate indirect draws. You would need some more draw calls / dispatches than with Vulkan to do this, but it should be good enough.

But i would recommend profiling tools such as Nvidia Nsight or Radeon GPU profiler. Those tools show a timeline of your frame and you can see the gaps caused from things like state changes or barriers. Those tools also tell you things like ‘Your shader uses too many registers, so it spills them to VRAM and thus your stuff is slow’.
Really helpful, and much better than speculating ‘I've heard state changes can be a performance problem, so maybe i'm affected and i could optimize that.’, only to see later that the optimizations do not affect performance at all.

Advertisement