PolarWolf said:
This is what I mean,maybe in vertex shader or after but before fragment shader,when the 3 vertices of a triangle are all out of view,the pipe line will cull out this triangle,and if a triangle is partially visible,the pipe line will cull out the invisible part,in both cases the cull happens once per triangle,that's why I assum less triangle count make the pipe line cull more efficient.
That's right, yes. Such culling can only happen after the vertex shader, because only after that it is known if the triangle is inside the screen, clipped by the edges, or off screen.
But using GPUs such clipping really is irrelevant to us, and no motivation for specific optimisations, or compromises related to content. And we do not need to know how this clipping process precisely works at all.
As with anything that works using parallel processing, we always have to work and optimize using larger batches, not single units. So we cull either entire models, or clusters of many triangles.
And we do this only coarsely. For example testing a bounding box against the frustum to cull an entire model is quick, simple and thus worth it.
Contrary, if we would use some more accurate and complicated test, e.g. like in Quakes BSP levels, we would get very accurate culling almost per triangle, but the complexity and fine granularity of the algorithm would cost us more than we save.
Then - on a modern GPU - it would be faster to do no culling at all, and render the entire Quake level as a single mesh.
But: Notice how the introduction of GPUs has educated us over decades to replace optimization for work efficiency with dump simple brute force solutions. This worked til yet, but now that Moores Law is dead, we need to change our mindset again, going back to complex algorithms to achieve further progress in software, while progress on HW stagnates and becomes unaffordable. Thus, this quote:
PolarWolf said:
As of occlusion culling,I think the overhead will be too much
Is not actually true in general. It's the opposite: We now need to research complicated algorithms again to get further progress. UE5 Nanite is a again a good example.
But the effort and complexity is much higher now than it was back then when Quake was awesome.
So you have to choose: Do i want to make a game, or do i want to make cutting edge rendering tech?
Likely you don't have the time for both, and thus i try to guide towards using the simple solutions first and see if you get good enough performance for your goals. Usually that should be the case.
It's also this problem in parts why so many people nowadays use off the shelf engines. Because this way they outsource the technical problems to experts who work full time just on that.
As you work on your own engine, you need to find some middle ground. Likely you can not expect to beat those engines, but you gain a lot of experience and flexibility. And it's a good long term investment i think.