Advertisement

Software rasterization for occlusion culling using planes?

Started by March 05, 2018 01:47 PM
5 comments, last by JoeJ 6 years, 10 months ago

In all presentations regarding software occlusion culling that I have stumbled upon there is mention to use mipmapping of the software rasterized depth buffer to speed up queries. Given that a bounding box covers say 500 pixels on the screen it's better (faster) to test a single pixel from a higher (low-res) mip-level than test many pixels in the lower (high-res) mip-level. But these mipmaps are usually generated using a conservative min or max filter which greatly reduces occlusion culling efficiency.

So an idea popped in my head. Why not, in some cases, instead of storing min or max of depth values, store a plane equation of the triangle that is covering the area? That would greatly help in cases where we have long corridors at angles oblique to the camera. In those cases min filter would act very poorly whereas storing plane equation would work extra precisely.

Variant of this idea was used in one GPU Pro book article about speeding up shadow map testing. Depending on situation, either min/max of Z are stored in helper shadow map or plane equation of the underlying geometry.

I am wondering if anyone here has used or tried using this approach and what the outcomes were.

I have tried this for a different usecase, but the problem is that the plane you pick might intersect with another plane in the same texel, so you either store multiple planes or calculate a new plane that bounds the farthest texel corner intersections.

What i do for occlusion culling is to use scanline spans instead (half vertical screen resolution, full horizontal). Testing a span is a simple 2D line intersection from z coords, so faster than testing hundrets of individual pixels.

Advertisement

AFAIK, there's GPUs that actually do this as part of their internal Hi-Z implementations, so definitely not crazy. 

There's loads of options too. Instead of the typical 4-value form of a plane equation you can store z, ddx(z) and ddy(z) for one of the pixels, and a pixel-index. To handle edge cases slightly better, you can store multiple planes and a plane index per pixel. 

e.g. You could have an 8x8 tile, two planes with 16bit z, ddx(z) and ddy(z), and a 6bit pixel index, and then 8x8 1bit plane indices, which would be very accurate in many cases but around one third the storage of a float per pixel. 

The goal is to fit into caches, if the plane equations help you to reduce the resolution enough to use less memory then a down scaled depth, then it would be worth it. In a good case, you'd want to fit into the L1, e.g. on AMD Jaguar, you might go for 128x64x16bit twice (for minZ|maxZ). If your plane equations get below that, then you only need to bench the rejection rate.

But testing is usually the smaller problem, generating the occlusion buffer is where cost and complexity is. If you want faster testing, you can simply go for more mip levels of minZ|maxZ


If you want faster testing, you can simply go for more mip levels of minZ|maxZ

But that's the problem that with high mips minZ/maxZ will be *very* crude.


But testing is usually the smaller problem, generating the occlusion buffer is where cost and complexity is.

Sure, but the *real* complexity is spitting out draw calls of objects that are occluded but that have passed the occlusion test due to very crude high-mipmap minZ/maxZ.


What i do for occlusion culling is to use scanline spans instead (half vertical screen resolution, full horizontal). Testing a span is a simple 2D line intersection from z coords, so faster than testing hundrets of individual pixels.

Could you elaborate a little more on this?

26 minutes ago, maxest said:
Quote


What i do for occlusion culling is to use scanline spans instead (half vertical screen resolution, full horizontal). Testing a span is a simple 2D line intersection from z coords, so faster than testing hundrets of individual pixels.

 

Could you elaborate a little more on this?

I have a traditional rasterizer that walks left / right poly edges top down and calculates start and end point for a scanline.

Instead drawing that line pixel by pixel, i intersect it with other scanlines already in the framebuffer. It may be (partially) visible, so storing or replacing start and end points in the framebuffer, or it may be occluded. So instead testing all pixels, i test only against longer spans until one is visible. I think Quake 1 used a similar technique for HSR resulting in zero overdraw. (Note with BSP and guaranteed front to back rendering intersection is not necessary, i used this in an older engine.)

Actually i use an octree for coarse front to back order, and in one go i add occluder polys and also test bounding boxes appending stuff to visible list. Advantage is that a wall not only occludes the geometry behind it, but also the occluders. The system is very work efficient but not that cache friendly and not well parallelizable (like anything that heavily relies on early termination to avoid unnecessary work.) It's good for a city where houses have full interior visible through windows, but it's overkill for many game like scenes.

However, the span idea might be good for a simpler algorithm that first renders a sparse set of good occluders and tests geometry against the result. This could use multithreading and tiles for cache efficiency.

 

Edit:

IIRC my framebuffer stores an offset to the next discontinuity (span start / end), and a Z value for each pixel. Most pixels will be unused. I failed to do this efficiently with a list approach that would pack discontinuities in left right order so a scanline would be cache friendly, but i might try again sometime...

This topic is closed to new replies.

Advertisement