24bit depthbuffer is a sub-optimal format?

Author

545

August 19, 2017 11:58 AM

In DirectX 11 we have a 24 bit integer depth + 8bit stencil format for depth-stencil resources ( DXGI_FORMAT_D24_UNORM_S8_UINT ). However, in an AMD GPU documentation for consoles I have seen they mentioned, that internally this format is implemented as a 64 bit resource with 32 bits for depth (but just truncated for 24 bits) and 32 bits for stencil (truncated to 8 bits). AMD recommends using a 32 bit floating point depth buffer instead with 8 bit stencil which is this format: DXGI_FORMAT_D32_FLOAT_S8X24_UINT.

Does anyone know why this is? What is the usual way of doing this, just follow the recommendation and use a 64 bit depthstencil? Are there performance considerations or is it just recommended to not waste memory? What about Nvidia and Intel, is using a 24 bit depthbuffer relevant on their hardware?

Cheers!

Wicked Engine

Hodgman

52,718

August 19, 2017 01:03 PM

Yeah AFAIK AMD doesn't even support 24bit depth buffers in hardware. They might have some hack where they use the 24bit mantissa of a 32bit float buffer, but in any case it's basically emulation of an old legacy feature.

The default depth format these days should be 32bit floating point.

On a side note, you should combine your floating point depth buffer with a projection matrix that maps the far plane to 0 and the near plane to 1 (e.g. swap the near/far params that are being fed into your projection-matrix construction function) and use GEqual depth comparison instead of LEqual.

32bit floating point depth buffers, when combined with these reversed projection matrices, produce amazing precision. See here: https://developer.nvidia.com/content/depth-precision-visualized

I'm not sure about Intel/NVidia for performance / memory use trade-offs... It may be that Intel uses different packing to AMD, etc.
However, there's a massive quality difference between doing the "reversed float" format above and using the traditional 24-bit format. The old way results in huge amounts of z-fighting, forcing you to always be tweaking your near/far values to hide it, while the new way practically solves z-fighting.

. 22 Racing Series .

turanszkij

Author

545

August 19, 2017 02:47 PM

Yes, I am aware of the reversed depth buffer trick. I wanted to try it once, but was a bit more work than I was expecting so I just delayed it for later. And I also switched to the 24bit depthbuffer which wouldn't really benefit from it.

But now I will definetly try it again now that I switched back to floats.

Wicked Engine

Adam Miles

3,468

August 19, 2017 02:50 PM

AMD hardware (GCN at least) allocates two separate planes for Depth and Stencil. D24/D32 reside in a 32-bit plane and S8 resides in a separate 8-bit plane. There's no Depth-Stencil format that has the memory footprint of 64-bits per sample that I'm aware of.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

turanszkij

Author

545

August 19, 2017 03:11 PM

20 minutes ago, ajmiles said:
AMD hardware (GCN at least) allocates two separate planes for Depth and Stencil. D24/D32 reside in a 32-bit plane and S8 resides in a separate 8-bit plane. There's no Depth-Stencil format that has the memory footprint of 64-bits per sample that I'm aware of.

Hm, this is confusing, I thought this: S8X24_UINT from the format declaration meant that the stencil is 8bit and 24 bit is unused.

Wicked Engine

Adam Miles

3,468

August 19, 2017 03:15 PM

If it's unused/unavailable and the API gives you no way to write to it or read from it then the IHVs are free to not allocate it. The 'X24' part isn't necessarily allocated (but it might be). There are no guarantees whether it is or not.

On DX12 you can use the GetResourceAllocationInfo API to query how much memory a particular driver/GPU needs to allocate for a surface and you'll be able to see whether DXGI_FORMAT_D32_FLOAT_S8X24_UINT is 64-bit or not.

Adam Miles - Principal Software Development Engineer - Microsoft Xbox Advanced Technology Group

Matias Goldberg

9,638

August 19, 2017 04:03 PM

51 minutes ago, turanszkij said:
Hm, this is confusing, I thought this: S8X24_UINT from the format declaration meant that the stencil is 8bit and 24 bit is unused.

ajmiles has already explained it, but I just wanted to put it in clearer words: The GPU only has to pretend you get what you ask via DirectX. It doesn't have to do exactly that way internally as long as it produces the same results.

Twitter: @matiasgoldberg

Distant Souls ? Alliance AirWar ? My Free Royalty-Free Music Library

turanszkij

Author

545

August 19, 2017 06:24 PM

Thanks guys, I learned something new today!

Wicked Engine

ErnieDingo

619

August 19, 2017 10:25 PM

9 hours ago, Hodgman said:
Yeah AFAIK AMD doesn't even support 24bit depth buffers in hardware. They might have some hack where they use the 24bit mantissa of a 32bit float buffer, but in any case it's basically emulation of an old legacy feature.
The default depth format these days should be 32bit floating point.
On a side note, you should combine your floating point depth buffer with a projection matrix that maps the far plane to 0 and the near plane to 1 (e.g. swap the near/far params that are being fed into your projection-matrix construction function) and use GEqual depth comparison instead of LEqual.
32bit floating point depth buffers, when combined with these reversed projection matrices, produce amazing precision. See here: https://developer.nvidia.com/content/depth-precision-visualized
I'm not sure about Intel/NVidia for performance / memory use trade-offs... It may be that Intel uses different packing to AMD, etc.
However, there's a massive quality difference between doing the "reversed float" format above and using the traditional 24-bit format. The old way results in huge amounts of z-fighting, forcing you to always be tweaking your near/far values to hide it, while the new way practically solves z-fighting.

Is this only a trick you apply to the Projection matrix? Or is there something more to this? Struggling to comes to terms with the impacts of making this change or where to start. I see Z fighting from afar (thankfully my camera is zoomed in most of the time), so it might not be worth my while in my current project.

Indie game developer - Game WIP

Strafe (Working Title) - Currently in need of another developer and modeler/graphic artist (professional & amateur's artists welcome)

Insane Software Facebook

Hodgman

52,718

August 20, 2017 03:50 AM

On 8/20/2017 at 8:25 AM, ErnieDingo said:
Is this only a trick you apply to the Projection matrix? Or is there something more to this? Struggling to comes to terms with the impacts of making this change or where to start. I see Z fighting from afar (thankfully my camera is zoomed in most of the time), so it might not be worth my while in my current project

If Z-fighting is an issue, then yeah I'd definitely recommend doing this.

Make sure you're using a 32_FLOAT depth format.
Swap the near/far params of your projection matrix creation code (e.g. instead of Projection(fov, aspect, near, far), use Projection(fov, aspect, far, near)).
Swap your depth comparison function (e.g. replace LESS_EQUAL with GREATER_EQUAL).
If you have any shaders that read the depth buffer (e.g. deferred lighting reconstructing positions from depth), then fix the bugs that this has introduced to that code).
[edit] Clear your depth buffer to 0.0f instead of 1.0f. [/edit]

The link I posted earlier explains why this is magic.

But quickly -- z-buffers store z/w, which is a hyperbolic curve that focuses most precision on values that are close to the near plane (something like 50% of your depth buffer values cover the range of (near, 2*near]!!), and floating point formats do a similar thing -- they're a logarithmic format that focuses most precision on values that are close to zero. If you simply use floating point format to store z/w, you make the problem twice as bad -- you've got two different encodings that both focus on making sure that values next to the near plane are perfect, and do a bad job of values next to the far plane... So if you invert one of the encodings (by mapping the far plane to zero), then you've now go two encodings that are fighting against each other -- the z/w hyperbolic curve is fighting to focus precision towards the near plane, and the floating point logarithmic curve is fighting to focus precision towards 0.0f (which we've mapped to the far plane). The result is that you end up with an almost linear distribution of values between near and far, and great precision at every distance.

. 22 Racing Series .

24bit depthbuffer is a sub-optimal format?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

24bit depthbuffer is a sub-optimal format?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines