Advertisement

Taking advantage of T&L

Started by October 29, 1999 11:23 AM
0 comments, last by Jonathan 25 years, 3 months ago
I've noticed it mentioned in a lot of places that for hardware T&L to be truly effective, the vertices need to be stored in local video memory. I was wondering if anyone knows how much that affects one's ability to implement various vis solutions(BSPs, Portals, Octrees, whatever). As far as I can tell, you'd need to run your vis stuff to figure out what polygons you're going to need for the frame, upload them to the card, and then the T&L engine can take care of the rest and transform them. Is that need to upload going to kill a lot of the benefits of using the T&L, or is there another way around this that I can't see just yet?

Jonathan

Not just local video memory, you can also do AGP memory, and get a good performance boost. AGP memory is quick to write, but slow to read.

There is a way to use the above knowledge to get good performance out of hardware TnL, while keeping our vertices local in system memory.

We create a single vertex buffer of fixed size, say 1K vertices. We specify it as "write-only." The driver should put it in AGP memory. We initialize an index value to 0.

We start our HSR, portal, BSP, whatever.

When we see that we have vertices to render, we check if there are enough slots in the vertex buffer left. If so, we lock with the "Do Not Overwrite" flag. If there are not enough slots, we reset the index to 0, and lock with the "discard contents" flag. We put our vertices in the vertex buffer, unlock. We then call DP() on the vertex buffer, with a start vertex of index and number of vertices equal to how many vertices we just inserted. We then increment index.


Be sure to create a write-only VB! This will allocate the VB in AGP, for quick-write, and quick-read by the card. If you don't specify this, the VB will be in video memory and the copying over the bus will kill your frame rate.

Be sure to lock with NOOVERWRITE & DISCARDCONTENTS. No overwrite just returns a cached pointer, and takes about 50 cycles. If the driver is busy drawing the VB, we won't have to wait for it to complete because we're promising that we're not going to touch memory in use by previous DP operations. Naturally, if you do touch memory used by previous DP operations, the behavior is undefined. DISCARDCONTENTS will swap the VB to another one that is not in use. This is slower than NOOVERWRITE, but not by much since we don't have to wait for a DP operation to complete before a lock can be taken. The first few times this is called, the driver will allocate new vertex buffers, which is slow. Eventually, it will have a pool of VB's and just round-robin between them.

The above techniques all require DX7, they won't work on DX6 and below.

Sameer Nene has given a good overview of this procedure several times on the DXDev mailing list, you may want to search the archives for further information.

This topic is closed to new replies.

Advertisement