DirectX: Are Indexed Vertex Buffer calls faster?
Suppose we do our own TnL. We then call DrawIndexedPrimitive() with a pointer to our transformed vertices. D3D must then copy the vertices. This is to ensure that during the DMA operation downloading the vertices to the card, the vertex data will not change. A vertex buffer in this case, since it must be locked to modify, won't have to be copied.
So in this situation, if we were to do Lock, Fill, Unlock, draw, and repeat the whole cycle, there would be no performance benefit (the locks must wait until rendering is complete)
If we fill once, draw many times, there should be a performance increase.
If we use D3D's software TnL, D3D should automatically buffer the transformed vertices it creates. So everything is getting transformed into vertex buffers anyway. No real performance increase would be seen.
For hardware TnL, we would see a performance increase again as extra copies are avoided. Vertex buffers can be allocated in AGP memory for hw TnL also, eliminating extra memory transfers or bus transfers.
Static geometry that is never altered should be created in a vertex buffer, and then Optimize()ed. This enables D3D to reorganize the contents of the vertex buffer for better performance. You won't be able to lock an optimized vertex buffer. On a PIII or a K6 with 3DNow!, the vertices will probably be arranged to take advantage of the floating point SIMD instructions. A performance increase using VB's with D3D's SW TnL would be seen here also, on these processors.
There are ways to use the DISCARDCONTENTS and NOOVERWRITE flags when locking a vertex buffer to get increased vertex buffer performance when you lock, fill, unlock, draw. Check the SDK docs for more info.
This topic comes up frequently on the DXDEV mailing list. You can find more information by searching the DXDEV archives at http://www.microsoft.com/DirectX
Thanks in advance!
-ns