Even though DirectX 8 (and now v9.0) has been out for quite a while now, a lot of people still seem to be having problems getting used to the "proper" usage of VertexBuffers. Even a cursory glance through the DirectX forum reveals some posts by confused programmers and/or game developers who are making the switch from OpenGL to Direct3D, or just simply trying to figure out why their frame rate isn't as high as it should/could be. I will attempt to cover some known (and maybe not so known) tidbits of knowledge on VertexBuffers, and hopefully people will benefit from it.
To try and offer help to those who need it, I decided to whip up a small document/tutorial which would try and ease the suffering. (Note that I'm not really covering anything "new" here, I'm just compiling together information gathered from the MSDN and other online documentation).
What Are Vertex Buffers?
Vertex Buffers were created into Direct3D8, as a way of creating a rendering pipeline system which allows the processing to be shared by both the CPU and the GPU (of the video hardware). Vertex Buffers provide us with a mechanism of being able to fill in vertex buffer data with the CPU, while at the same time allowing the GPU to process an earlier batch of vertices. In effect, giving us the ability to achieve a small degree of parallel processing during our game.
So what does using a vertex buffer in our game help us over just allocating a hunk of memory to stick our vertex data in? Well, theoretically, a vertex buffer is optimized by the device driver for faster access and flexibility within our rendering pipeline.
Static or Dynamic?
Vertex Buffers can be created in two forms: static and dynamic. Once static vertex buffers are created, they are stuck in an "optimal location by the device driver". This location, chosen by the device driver, enables the switching between static vertex buffers as fast as possible.
Dynamic Vertex Buffers, on the other hand, are filled and tossed away every frame. One of the advantages of Dynamic VB's is that you can create large batches of triangles to send of to the GPU, which according to both ATI and NVidia, is the way to go in terms of maximizing performance. Note that even this point is argumentative. According to the "Performance Optimizations" tips included with the SDK, Microsoft recommends to use static vertex buffers wherever possible.
The scope of this article will deal with Dynamic Vertex Buffers, as the majority of your vertex information will probably be changing throughout the lifetime of your scene. IMHO even though the jury is still kinda out on Dynamic vs. Static, Dynamic VB's are the way to go. In all of my projects (so far) using DirectX8, I just create one dynamic VB that I empty and fill every frame to maximize the amount of triangle batching I can send to the GPU.
Creation of Dynamic Vertex Buffers
The first step in optimizing our use of Vertex Buffers, is to carefully examine the creation of them. The DirectX SDK documentation, along with whitepapers from NVidia both claim that improper initialization of Vertex Buffers will seriously impede proper performance of your application.
Direct3D8.0
HRESULT CreateVertexBuffer(
UINT Length,
DWORD Usage,
DWORD FVF,
D3DPOOL Pool,
IDirect3DVertexBuffer8** ppVertexBuffer
);
Direct3D9.0
HRESULT CreateVertexBuffer(
UINT Length,
DWORD Usage,
DWORD FVF,
D3DPOOL Pool,
IDirect3DVertexBuffer9** ppVertexBuffer,
HANDLE* pHandle
);
The only parameters we need to worry about for this article, are the Usage and Pool parameters. For dynamic vertex buffers, which contain information about primitives that change often in the scene, we need to specify the D3DUSAGE_DYNAMIC | D3DUSAGE_WRITEONLY flags for the Usage and the D3DPOOL_DEFAULT flag for the Pool. The DYNAMIC and WRITEONLY flags, tell the Direct3DDevice interface to create a vertex buffer within AGP memory since we are accessing it more often than static vertex data.
Locking / Unlocking Dynamic Vertex Buffers
In order to update the vertex information contained within a vertex buffer, we need to get a handle to the Vertex Buffer resource. Using the Lock method of the VertexBuffer8 interface, we signal the hardware that we wish to acquire a handle to the area in memory containing our primitive information. Note that while an area of memory is Locked, no other area of memory containing primitive information can be touched. It is for this reason, that we naturally keep the Locks to a bare minimum.
Direct3D8.0
HRESULT Lock(
UINT OffsetToLock,
UINT SizeToLock,
BYTE** ppbData,
DWORD Flags
);
Direct3D9.0
HRESULT Lock(
UINT OffsetToLock,
UINT SizeToLock,
VOID** ppbData,
DWORD Flags
);
Again, the most important parameter of this method is the Flags type. Here we have two options available to us. The first, D3DLOCK_NOOVERWRITE is used when we wish to keep the existing vertex information within the buffer. By specifying this flag, we are able to append more vertex information to the primitive data already contained in the vertex buffer. The alternative is the D3DLOCK_DISCARD flag. This flag signals the device that we wish to empty the current contains of the vertex buffer and start anew. Note that the VertexBuffer interface returns a new area of memory to us with this call, just in case there's a DMA conflict with the existing area of memory (ie. the existing vertex buffer could be used at the same time by the GPU of the video hardware).
Once we are finished either appending additional vertices or creating new ones, we need to Unlock the area of memory.
Direct3D8.0
HRESULT Unlock();
Direct3D9.0
HRESULT Unlock();
Very simple, and no explanation needed. It just frees our handle to the area of video memory we were working with.
Some Common Usages of Vertex Buffers : Good and Bad
Now that we've basically gone over the preamble to using VertexBuffers, we should follow up with some clearer code examples that demonstrate the points I attempted to outline above : proper dynamic VB usage.
Example #1: The vanilla "OpenGL" way (OpenGL v1.1)If you're coming from the world of OpenGL, then you might make some mistakes in using VBs. Consider this:
//OpenGL method of drawing some particles
for(int i = 0; i < max_particles; i++){
glLoadIdentity();
glTranslatef(particles.x, particles.y, particles.z);
glBegin(GL_TRIANGLESTRIPS);
//vertex data here
glEnd();
}
Not that there's anything wrong with that, but look what it might translate to in Direct3D.
Direct3D8.0
//first crack using Direct3D method of drawing
//particles in an OpenGL port
for(int i = 0; i < max_particles; i++){
D3DXMatrixIdentity(&matWorld);
D3DXMatrixTranslation(&matWorld,
particles.x,
particles.y,
particles.z);
m_lpD3DDevice->SetTransform( D3DTS_WORLD, &matWorld);
m_lpD3DDevice->SetVertexShader( D3DFVF_PARTICLEVERTEX );
m_lpD3DDevice->SetStreamSource( 0, m_pVB,
sizeof(PARTICLE_VERTEX) );
m_lpD3DDevice->DrawPrimitive(D3DPT_TRIANGLELIST, 0, 2);
}
Direct3D9.0
//first crack using Direct3D method of drawing
//triangles in an OpenGL port
for(int i = 0; i < max_particles; i++){
D3DXMatrixIdentity(&matWorld);
D3DXMatrixTranslation(&matWorld,
particles.x,
particles.y,
particles.z);
m_lpD3DDevice->SetTransform( D3DTS_WORLD, &matWorld);
m_lpD3DDevice->SetFVF( D3DFVF_PARTICLEVERTEX );
m_lpD3DDevice->SetStreamSource(0, m_pVB, 0,
sizeof(PARTICLE_VERTEX) );
m_lpD3DDevice->DrawPrimitive(D3DPT_TRIANGLELIST, 0, 2);
}
Looks perfectly reasonable right? We might even try to set the StreamSource and VertexShader BEFORE the for loop in an attempt to increase performance right?
This example is a good one to show, as it points out the bad usage of Vertex Buffers. Here we are only sending a measly 2 triangles to the GPU every iteration of the for loop. Not only does this waste state changes for EACH iteration of the loop, but we're nowhere NEAR flexing the muscle of our VertexBuffer. In fact, our application is almost entirely CPU-bound, as we are fiddling with the vertices in memory before sending off a paltry few to the rendering pipeline.
Well now that we've seen so much about Vertex Buffers, let's take a good crack at speeding up the performance of the rendering mechanism we outlined in Example #1. Our approach will try to take into account the dynamic usage of a Vertex Buffer, and try to keep the GPU and CPU more in parallel. After all, we've just spent a lot of money on our new video hardware and want to sport it!
Direct3D8.0
//second crack of drawing particles
//first we lock down our vertex buffer memory, using the
//DISCARD parameter to flush out any existing vertices
HRESULT hr;
if(FAILED(hr = m_lpVB->Lock(0, m_iVBSize * sizeof(VERTEX_PARTICLE),
(BYTE **) &pVertices, D3DLOCK_DISCARD))){
//we failed, so return our error code
return hr;
}
//create a variable to store how many particles we've
//dumped into the VertexBuffer
DWORD dwVertexCount = 0;
//set the vertex shader
m_lpD3DDevice->SetVertexShader( D3DFVF_PARTICLEVERTEX );
//begin our loop, going through our particle vertex
//array
for(int i = 0; i < max_particles; i++){
pVertices->vecPos = m_particleArray.vecPos;
pVertices->dwColor = m_particleArray.dwColor;
pVertices++;
//increase our counter
dwVertexCount++;
if(dwVertexCount == m_iVBSize){
//we're at our max size for our vertex buffer
//so we should flush it out to the GPU
m_lpVB->Unlock();
//use DrawPrimitive for our rendering purposes.
if(FAILED(hr = m_lpD3DDevice->DrawPrimitive(
D3DPT_TRIANGLELIST,
0,
dwNumParticlesToRender))){
return hr;
}
//now that the GPU has handled the above vertex data,
//relock our vertex buffer area with DISCARD to flush
//it
if(FAILED(hr = m_lpVB->Lock(0, m_iVBSize * sizeof(VERTEX_PARTICLE),
(BYTE **) &pVertices, D3DLOCK_DISCARD))){
return hr;
}
//reset our counter
dwVertexCount = 0;
}
} //end for
// Unlock the vertex buffer
m_lpVB->Unlock();
// Render any remaining vertices
if( dwVertexCount ){
if(FAILED(hr = m_lpD3DDevice->DrawPrimitive( D3DPT_TRIANGLELIST, 0, dwVertexCount )))
return hr;
}
Direct3D9.0
//second crack of drawing particles
//first we lock down our vertex buffer memory, using the
//DISCARD parameter to flush out any existing vertices
HRESULT hr;
if(FAILED(hr = m_lpVB->Lock(0, m_iVBSize * sizeof(VERTEX_PARTICLE),
(VOID **) &pVertices, D3DLOCK_DISCARD))){
//we failed, so return our error code
return hr;
}
//create a variable to store how many particles we've
//dumped into the VertexBuffer
DWORD dwVertexCount = 0;
//set the vertex format
m_lpD3DDevice->SetFVF( D3DFVF_PARTICLEVERTEX );
//begin our loop, going through our particle vertex
//array
for(int i = 0; i < max_particles; i++){
pVertices->vecPos = m_particleArray.vecPos;
pVertices->dwColor = m_particleArray.dwColor;
pVertices++;
//increase our counter
dwVertexCount++;
if(dwVertexCount == m_iVBSize){
//we're at our max size for our vertex buffer
//so we should flush it out to the GPU
m_lpVB->Unlock();
//use DrawPrimitive for our rendering purposes.
if(FAILED(hr = m_lpD3DDevice->DrawPrimitive(
D3DPT_TRIANGLELIST,
0,
dwNumParticlesToRender))){
return hr;
}
//now that the GPU has handled the above vertex data,
//relock our vertex buffer area with DISCARD to flush it
if(FAILED(hr = m_lpVB->Lock(0, m_iVBSize * sizeof(VERTEX_PARTICLE),
(VOID **) &pVertices, D3DLOCK_DISCARD))){
return hr;
}
//reset our counter
dwVertexCount = 0;
}
} //end for
// Unlock the vertex buffer
m_lpVB->Unlock();
// Render any remaining vertices
if( dwVertexCount ){
if(FAILED(hr = m_lpD3DDevice->DrawPrimitive( D3DPT_TRIANGLELIST, 0, dwVertexCount )))
return hr;
}
Okay so the rendering loop is a teeny bit longer here than the OpenGL port. But we're not doing anything entirely complex here. Our goal is to use the GPU to our advantage, and render some vertex information while we continuing filling up another area of video memory with the CPU.
The samples above could probably be used for rendering particles, but they can be easily modified to render simple triangle data of other objects, such as a terrain engine.
Example #3: The Microsoft WayYet another way to use Vertex Buffers is used in most of the samples contained within the SDK. While the algorithm is similar to the one used above, it differs in some subtle ways. Primarily, we simply append vertices to the existing vertex buffer using the D3DLOCK_NOOVERWRITE flag. Once we've finished appending a small chunk of vertices, we blast it to the GPU using the DrawPrimitive method. We then repeat this process of locking, appending, unlocking, blasting until we run out of space within the vertex buffer. The bonus of this method as well, is that our GPU is able to again render our primitive data while we're filling up some more vertices.
Direct3D8.0
//The Microsoft Way - join us or die Doctor!
//first we lock down the vertex buffer within our memory (hopefully AGP). We want to add a small chunk
//of vertex information to the VB. Thus, we use the D3DLOCK_NOOVERWRITE flag. Should our VB no longer
//have any room, we need to clean away the existing handle to the VB and create a new one with
//D3DLOCK_DISCARD.
if(FAILED(hr = m_lpVB->Lock(m_dwBase * sizeof(VERTEX_PARTICLE), m_dwFlush * sizeof(VERTEX_PARTICLE),
(BYTE **) &pVertices, m_dwBase ? D3DLOCK_NOOVERWRITE : D3DLOCK_DISCARD))){
return hr;
}
DWORD dwNumParticlesToRender = 0;
for(int i = 0; i < max_particles; i++){
pVertices->vecPos = m_particleArray.vecPos;
pVertices->dwColor = m_particleArray.dwColor;
pVertices++;
dwNumParticlesToRender++;
if(dwNumParticlesToRender == m_dwFlush){
// Done filling this chunk of the vertex buffer. Lets unlock and
// draw this portion so we can begin filling the next chunk.
m_lpVB->Unlock();
//send our vertex buffer data over to the GPU.
if(FAILED(hr = m_lpD3DDevice->DrawPrimitive( D3DPT_TRIANGLELIST, m_dwBase, dwNumParticlesToRender)))
return hr;
// Lock the next chunk of the vertex buffer. If we are at the
// end of the vertex buffer, DISCARD the vertex buffer and start
// at the beginning. Otherwise, specify NOOVERWRITE, so we can
// continue filling the VB while the previous chunk is drawing.
m_dwBase += m_dwFlush;
if(m_dwBase >= m_dwDiscard)
m_dwBase = 0;
if(FAILED(hr = m_lpVB->Lock(m_dwBase * sizeof(VERTEX_PARTICLE), m_dwFlush * sizeof(VERTEX_PARTICLE),
(BYTE **) &pVertices, m_dwBase ? D3DLOCK_NOOVERWRITE : D3DLOCK_DISCARD))){
return hr;
}
//reset our particle counter variable
dwNumParticlesToRender = 0;
}//end if
}//end for
// Unlock the vertex buffer
m_lpVB->Unlock();
// Render any remaining particles
if( dwNumParticlesToRender ){
if(FAILED(hr = m_lpD3DDevice->DrawPrimitive( D3DPT_TRIANGLELIST, m_dwBase, dwNumParticlesToRender )))
return hr;
}
Direct3D9.0
//The Microsoft Way - join us or die Doctor!
//first we lock down the vertex buffer within our memory (hopefully AGP). We want to add a small chunk
//of vertex information to the VB. Thus, we use the D3DLOCK_NOOVERWRITE flag. Should our VB no longer
//have any room, we need to clean away the existing handle to the VB and create a new one with
//D3DLOCK_DISCARD.
if(FAILED(hr = m_lpVB->Lock(m_dwBase * sizeof(VERTEX_PARTICLE), m_dwFlush * sizeof(VERTEX_PARTICLE),
(VOID **) &pVertices, m_dwBase ? D3DLOCK_NOOVERWRITE : D3DLOCK_DISCARD))){
return hr;
}
DWORD dwNumParticlesToRender = 0;
for(int i = 0; i < max_particles; i++){
pVertices->vecPos = m_particleArray.vecPos;
pVertices->dwColor = m_particleArray.dwColor;
pVertices++;
dwNumParticlesToRender++;
if(dwNumParticlesToRender == m_dwFlush){
// Done filling this chunk of the vertex buffer. Lets unlock and
// draw this portion so we can begin filling the next chunk.
m_lpVB->Unlock();
//send our vertex buffer data over to the GPU.
if(FAILED(hr = m_lpD3DDevice->DrawPrimitive( D3DPT_TRIANGLELIST, m_dwBase, dwNumParticlesToRender)))
return hr;
// Lock the next chunk of the vertex buffer. If we are at the
// end of the vertex buffer, DISCARD the vertex buffer and start
// at the beginning. Otherwise, specify NOOVERWRITE, so we can
// continue filling the VB while the previous chunk is drawing.
m_dwBase += m_dwFlush;
if(m_dwBase >= m_dwDiscard)
m_dwBase = 0;
if(FAILED(hr = m_lpVB->Lock(m_dwBase * sizeof(VERTEX_PARTICLE), m_dwFlush * sizeof(VERTEX_PARTICLE),
(VOID **) &pVertices, m_dwBase ? D3DLOCK_NOOVERWRITE : D3DLOCK_DISCARD))){
return hr;
}
//reset our particle counter variable
dwNumParticlesToRender = 0;
}//end if
}//end for
// Unlock the vertex buffer
m_lpVB->Unlock();
// Render any remaining particles
if( dwNumParticlesToRender ){
if(FAILED(hr = m_lpD3DDevice->DrawPrimitive( D3DPT_TRIANGLELIST, m_dwBase, dwNumParticlesToRender )))
return hr;
}
Common Mistakes
Some of the more common mistakes with Vertex Buffers can help boost performance in your app tremendously. Again, these gems are straight from the DX SDK documentation, but I'll drop them here for the lazy people.
- Render your scene from front to back.
- Minimize vertex buffer switching
- Use triangle strips instead of lists and fans wherever possible
- Batch, batch, batch!
- Keep vertex buffer locking down to a minimum
- Use D3DLOCK_DISCARD wherever possible
- Triple-check Vertex Buffer creation flags!
I urge you to read the documents in the MSDN pertaining to Direct3D performance optimization for more tips!
Closing
Well that's a pretty whirlwind tour of the usage of Vertex Buffers. Hopefully we all learned something along our journey, or at the very least, enough to get us interested to find out more about them! If you are struggling to make your application faster than 10 FPS even though you're not doing much at all, chances are it can be due to poor vertex buffer usage. Use them right, and they scream, but use them wrong and they become quicksand.
If you have any questions or comments about this article, feel free to send me some email.
References
- MSDN Microsoft DirectX8 Developer FAQ, February 2001.
- MSDN Microsoft DirectX9 Developer FAQ, May 2003.
- Huddy, Richard D3D Optimization, GDC 2001.
- Huddy, Richard Basic Mistakes, GDC 2001.