The integer operations they're referring to happen in the shader core, so they would directly correspond to integer usage in shader code. The #1 reason is indexing to read data from buffers: as GPU's and shading languages have gotten more sophisticated, shader programs have started have a more active role in pulling data from data structures rather than being spoon-fed things like matrices and vertices. So you will end up having things like instanced meshes pulling their per-instance transform from a StructuredBuffer in the vertex shader, or perhaps even going through an indirection buffer first in order to allow the per-instance data to remain constant after culling. You also may have compute shaders running that do the actual culling, which would require integer operations both for reading in the data used for culling as well as for writing out the culling results to a buffer. You also have using bitwise operations for computing masks or unpacking data, looking up lighting data from per-tile lists or bitfields, traversing tree structures for computing intersections…the list is very long.
The “advantage” of integers is that they're integers! They exactly represent all integer values in their range, unlike floats which can exactly represent some integers but not most. So if you need to do something like “multiply the index of by the size of the struct to get a byte offset you can use for ByteAddressBuffer” you'll get a more exact result from integer math. There's also some things that can only be naturally done with integer math, like logical bitwise operations and shifts.
On many GPU's still out in the wild it is common for integer operations to run more slowly than floating point operations, and/or for their to be more limited resources for doing the integer operations. Getting about ¼ integer throughput compared to floating point is not uncommon, although it depends on the exact operation as well as the specific GPU architecture. For instance, on some GPU's an integer add is full speed but an integer multiply is slower.
Lower-precision types like FP16 and INT16 have three main advantages:
- They are smaller, so you can pack more of them into your data structures
- Typically they consume less space in the registry file when they're natively supported, which helps avoid register pressure (which leads to low occupancy)
- They are often faster than their full-precision counterparts, getting 2x throughput relative to 32-bit is pretty common.
Using them depends on the API and shader language you're using. There are a few different flavors, but typically you need to do some kind of modification to your shader code to indicate that you are okay with lower precision. You generally don't want to just use the lower-precision types everywhere, because some calculations won't fit in their precision limits. You also usually need to check if the GPU you're running on actually supports lower-precision types, since on desktop they're fairly new (on mobile it's quite common to support fp16). If you're interested, I wrote an article about using fp16 in HLSL/D3D here if you want to get a sense of how it works: https://therealmjp.github.io/posts/shader-fp16/
If you haven't read through it, this is a pretty good overview of how GPU's work: https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-graphics-pipeline-2011-index/
It explains the reason for the “pixel shader quads” thing that you mentioned (spoiler: it's for computing the mip level to sample), as well as bunch of other important bits.
I also always recommend this presentation, which talks about how shader threads actually execute on GPU SIMD hardware: https://engineering.purdue.edu/~smidkiff/KKU/files/GPUIntro.pdf
Another good article on that subject: https://anteru.net/blog/2018/intro-to-compute-shaders/
If you're willing to purchase a book, then “Real Time Rendering 4th Edition” is my go-to suggestion.