Advertisement

Tips for increasing fps?

Started by March 05, 2022 06:01 PM
12 comments, last by LtFutt 2 years, 7 months ago

Acosix said:

Dude you do realize in 20 minutes of playing, speed could be too hard to follow?

yes.. But the world is seed based, So you could practice and go longer and longer each time. But eventually it becomes “impossible”.

LtFutt said:
Im currently looking in to what you previously mentiond “hierarchical Z pyramid”. Do you still think that could be a good way to do it after seeing the picture?

Yes. But my understanding is only very basic. The idea is similar to how we use occlusion queries. I think it would work somehow like this:

You have clusters of geometry (like your ‘regions’). Each cluster has a bounding box, which you convert to bounding rectangles in screen space, which also have a closest depth value..

You reproject your previous frames depth to current time (using camera transformation but also motion vectors for non static stuff, which isn't easy).
Then you make a Z pyramid from the resulting depth. But not using an average color as usually with mip maps - instead you keep the max depth of the 4 texels.

Then you use the pyramid to cull your clusters.
You take the bounding rectangle of the cluster, see how many pixels the rectangle would cover, then select the mip of the pyramid so you do not need to make too many tests.
For example, bounding rect is 80 x 40 pixels, so an area of 3200 pixels which is too much. If we reduce resolution 3 times, we get 10 x 5 = only 50 pixels, which is a nice small number fitting into workgroups of 64 threads. So we pick mip 3 of the pyramid, do the tests, and if all depths are closer than the closest depth of the rectangle, we can cull the cluster.
The surviving clusters can be attended to a list which we then draw using an Indirect Draw, so no need for any CPU ←> GPU sync.

Beside the Z pyramid, you can also make a hierarchy from the clusters. E.g. a BVH. Then you could cull a whole branch of subclusters with a single test on the parent bound.
That' i'd call a cool end efficient non-brute-force-algorithm, but as always: To make it worth it, your initial problem has to be a really big one. Otherwise the complexity of the neat algorithm causes a higher constant cost than the savings you get.

I would implement all this with compute shaders.

Some tricks would make sense, like excluding the player ship. As the rest of the world is static, you would not need to mess with motion vectors and reprojection becomes easier / more robust.
Then there are problems, e.g. on camera rotation you'll miss information at the edges of the screen. So depth at the edges would keep infinite, and you end up drawing everything which intersects edges. To prevent this, i would diffuse valid depth values to cover the edges.
The whole reprojection process isn't perfect even in the middle of the screen. There is a chance of error. Maybe you noticed single frame white flashes of missing geometry in UE4 games, e.g. if walking around a corner. This artifact seems to be caused from failures of this system. They are rare, but noticeable. Seemingly they improved this now for UE5 with some two pass approach, and they talk about that in detail in the Nanite Deep Dive video.

That said, you see it's a lot of work and new stuff, i guess. It's no longer just fun, but becomes hard work, taking a lot of time. Not sure if that's worth it. And somehow i think there is indeed another issue causing your perf issues, which might be easier to fix.

LtFutt said:
I also compile the regions so that in the next frame i can just call a compile list.

‘Compiling' sounds you would use the ancient OpenGL Display Lists? You don't do that, no?
You do not upload vertex data for your persisting regions each frame?
You do frustum culling, i guess?
And could you eventually just reduce draw distance?

LtFutt said:
I would like that “code snippet” yes.

It's easy to use. But your density values need to be real numbers in the range 0 - 1, not a boolean which just says solid or empty. So it depends on your data - is it real?

Advertisement

JoeJ said:
Yes. But my understanding is only very basic. The idea is similar to how we use occlusion queries. I think it would work somehow like this: You have clusters of geometry (like your ‘regions’). Each cluster has a bounding box, which you convert to bounding rectangles in screen space, which also have a closest depth value.. You reproject your previous frames depth to current time (using camera transformation but also motion vectors for non static stuff, which isn't easy). Then you make a Z pyramid from the resulting depth. But not using an average color as usually with mip maps - instead you keep the max depth of the 4 texels. Then you use the pyramid to cull your clusters. You take the bounding rectangle of the cluster, see how many pixels the rectangle would cover, then select the mip of the pyramid so you do not need to make too many tests. For example, bounding rect is 80 x 40 pixels, so an area of 3200 pixels which is too much. If we reduce resolution 3 times, we get 10 x 5 = only 50 pixels, which is a nice small number fitting into workgroups of 64 threads. So we pick mip 3 of the pyramid, do the tests, and if all depths are closer than the closest depth of the rectangle, we can cull the cluster. The surviving clusters can be attended to a list which we then draw using an Indirect Draw, so no need for any CPU ←> GPU sync. Beside the Z pyramid, you can also make a hierarchy from the clusters. E.g. a BVH. Then you could cull a whole branch of subclusters with a single test on the parent bound. That' i'd call a cool end efficient non-brute-force-algorithm, but as always: To make it worth it, your initial problem has to be a really big one. Otherwise the complexity of the neat algorithm causes a higher constant cost than the savings you get. I would implement all this with compute shaders. Some tricks would make sense, like excluding the player ship. As the rest of the world is static, you would not need to mess with motion vectors and reprojection becomes easier / more robust. Then there are problems, e.g. on camera rotation you'll miss information at the edges of the screen. So depth at the edges would keep infinite, and you end up drawing everything which intersects edges. To prevent this, i would diffuse valid depth values to cover the edges. The whole reprojection process isn't perfect even in the middle of the screen. There is a chance of error. Maybe you noticed single frame white flashes of missing geometry in UE4 games, e.g. if walking around a corner. This artifact seems to be caused from failures of this system. They are rare, but noticeable. Seemingly they improved this now for UE5 with some two pass approach, and they talk about that in detail in the Nanite Deep Dive video. That said, you see it's a lot of work and new stuff, i guess. It's no longer just fun, but becomes hard work, taking a lot of time. Not sure if that's worth it. And somehow i think there is indeed another issue causing your perf issues, which might be easier to fix.

Thank you so much for helping me out!! I really appreciate it. I will have to spend the evening to see if i can understand any of it!! =)

This topic is closed to new replies.

Advertisement