Integer math on average, made up about a quarter of in-game GPU operations

Graphics and GPU Programming Programming

Started by FantasyVII September 06, 2020 12:56 PM

3 comments, last by MJP 4 years, 4 months ago

Author

1,077

September 06, 2020 12:56 PM

As games have become more complex, developers have begun to lean more heavily on integers. An NVIDIA slide from the original 2018 RTX launch suggested that integer math, on average, made up about a quarter of in-game GPU operations.

https://www.engadget.com/nvidia-rtx-3090-3080-3070-cuda-core-int32-fp32-210059544.html

So I am wondering, when do GPUs actually use integer math? I have written a game engine in OpenGL and DirectX and I have never used integer math in either my pixel or vertex shader.

At what stage do engines use integer math rather than floating math? matrix multiplication? vertex manipulation? in the depth buffer? Can anyone explain when it usually happens and what operations are they used in?

Is there an advantage to using integers rather than floating? I read a few posts 8 years ago saying GPUs were really bad at integer math, but apparently they are not anymore? Also why would you use FP16 rather than FP32? or INT16 rather INT32 In the gpu? and how do you specify that you want to use INT or FP16? do you just create a variable that is 16 bits rather than 32?

Edit:

One last thing. Any article or book you guys recommend to know the ins and outs of gpu architecture? an in depth look, not just the vertex, pixel, raster pipeline. Something more in-depth. For example I heard that gpu rastartizers draw pixels in chunk of 4 or something like that. Any book that can give reasons and talk about things like that in-depth?

Cheers

MJP

20,297

September 12, 2020 04:05 AM

The integer operations they're referring to happen in the shader core, so they would directly correspond to integer usage in shader code. The #1 reason is indexing to read data from buffers: as GPU's and shading languages have gotten more sophisticated, shader programs have started have a more active role in pulling data from data structures rather than being spoon-fed things like matrices and vertices. So you will end up having things like instanced meshes pulling their per-instance transform from a StructuredBuffer in the vertex shader, or perhaps even going through an indirection buffer first in order to allow the per-instance data to remain constant after culling. You also may have compute shaders running that do the actual culling, which would require integer operations both for reading in the data used for culling as well as for writing out the culling results to a buffer. You also have using bitwise operations for computing masks or unpacking data, looking up lighting data from per-tile lists or bitfields, traversing tree structures for computing intersections…the list is very long.

The “advantage” of integers is that they're integers! They exactly represent all integer values in their range, unlike floats which can exactly represent some integers but not most. So if you need to do something like “multiply the index of by the size of the struct to get a byte offset you can use for ByteAddressBuffer” you'll get a more exact result from integer math. There's also some things that can only be naturally done with integer math, like logical bitwise operations and shifts.

On many GPU's still out in the wild it is common for integer operations to run more slowly than floating point operations, and/or for their to be more limited resources for doing the integer operations. Getting about ¼ integer throughput compared to floating point is not uncommon, although it depends on the exact operation as well as the specific GPU architecture. For instance, on some GPU's an integer add is full speed but an integer multiply is slower.

Lower-precision types like FP16 and INT16 have three main advantages:

They are smaller, so you can pack more of them into your data structures
Typically they consume less space in the registry file when they're natively supported, which helps avoid register pressure (which leads to low occupancy)
They are often faster than their full-precision counterparts, getting 2x throughput relative to 32-bit is pretty common.

Using them depends on the API and shader language you're using. There are a few different flavors, but typically you need to do some kind of modification to your shader code to indicate that you are okay with lower precision. You generally don't want to just use the lower-precision types everywhere, because some calculations won't fit in their precision limits. You also usually need to check if the GPU you're running on actually supports lower-precision types, since on desktop they're fairly new (on mobile it's quite common to support fp16). If you're interested, I wrote an article about using fp16 in HLSL/D3D here if you want to get a sense of how it works: https://therealmjp.github.io/posts/shader-fp16/

If you haven't read through it, this is a pretty good overview of how GPU's work: https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/

It explains the reason for the “pixel shader quads” thing that you mentioned (spoiler: it's for computing the mip level to sample), as well as bunch of other important bits.

I also always recommend this presentation, which talks about how shader threads actually execute on GPU SIMD hardware: https://engineering.purdue.edu/~smidkiff/KKU/files/GPUIntro.pdf

Another good article on that subject: https://anteru.net/blog/2018/intro-to-compute-shaders/

If you're willing to purchase a book, then “Real Time Rendering 4th Edition” is my go-to suggestion.

The Blog | The Book