Recently a lot of videos from 3do and atari jaguar pop up on youtube. That got me thinking about perspective correct texure mapping again. Google values a tutorial by Chris Hecker very high. Hardware seem to do it like MS Flight Simulator 3 or so: Tiles. Software does it with spans with 2^x length.
Tiles are independent from orientation but look bad on the horizon. Doom looked good there. It uses const Z spans. So I (like others) figured the best way to interpolate would be to choose vertical or horizontal spans, or diagonals, depending on the orientation. In fact the best direction for the span is where the delta_Z * delta_screen_distance is at minimum. So with 3 cases we covered most. Probably we could add 5 more by sitting down and doing the math. Anyhow, what I want to conclude from this is: We need integer vertex coordinates on the screen. I was dabbling with floating point, but for that I would have to draw all tris from top to bottom. Same with scanline rendering. Does not fit. But luckyly, scanline rendering is dead, and screen vertex coordinates are integers / fixed point in most implementations.
So we have spans where z is varrying slowly. Why not aim for perfection and somehow use the last value of z and the new of 1/z to get a good starting value for the 1/z calculation. I figured out that extrapolation is not stable. All algorithms use interpolation on all stages. So the way to get perfect 1/z with some speed up by coherence is to subdivde each span into a binary tree (node based). In software working with such a tree would be a nightmare, but in HW this only means some gates. For a mid point we take the mid (mean) of the surounding Z (left=l and right=r) and the sourrounding 1/Z as a guess. Z * 1/Z = 1 . We multipy out and find that we are still left with a product before we can even start to the division for the small correction value (Z_r-Z_l)*((1/Z)_l-(1/Z)_R) + Z_correction*(1/Z)= 1. ( r= right, and l=left) (1/Z is usually written as W).
But anyhow: quotient with a low number of digits => fast division ( the number of digts is often 0 or 1 in the leafs something which can be looked up in a single cycle using the msb of the mantissa)
In the product the factors are delta values => also somewhat small => low energy consumption in CMOS multiplier.
The product between 1/Z and U/Z and V/Z may even cost more. Doing this in the tree and multiplying out here also only deltas are multiplied. In the leafes of the tree, the deltas are the smallest. Same for RGB (texture caching?).
With perspective correction I always feared to overflow the texture buffer. But as said, we are doing only interpolation. So no overflow here. Furthermore, a division looks horrible in fixed point (and we use fixed point for the edges), but for Z, U and 1/Z floating point is much better. Division becomes negation on the exponent, and the mantissa only looses about 1 bit precision. Then with subtexel precision we only overflow out of the texture by max 1 texel. For first gen 3d (not much memory)(in an alternative universe) I would love the texels to precisely match many edges. Saturation (cheap in HW) takes care of the handful of pixels slightly overflowing.
I wonder how you guys prove the correctness of complicated shaders. Phong shading correctness is already a miracle to me. Raytracers anyone? Sorry that this is a zero-replies follow up from 2016.