Direct3D 9 glDrawElements equivalent
I am trying to render an array of vertices (Quake 3 BSP). In OpenGL, the glVertexPointer, glDrawElements functions work very well (same process as GameTutorials.com). What I want to do is be able to render the same vertices in Direct3D. I can do this, but it only runs at a fraction of the speed OpenGL. Here is what I am doing:
//************************************************
// Begin Render
Lock the vertex buffer.
Copy the BSP vertices into the vertex buffer.
Unlock the vertex buffer.
For each face (polygons only...no patches yet)
Lock the index buffer
copy face mesh indices to the index buffer
Unlock the index buffer
use DrawIndexPrimitive(D3DPT_TRAINGLELIST, 0,first face vertex index, number of face vertices, 0, number face mesh indices / 3)
// End Render
//*************************************************
This process renders the level perfectly, just much much slower.
I am using all the same data structures, classes, etc. with both OpenGL and Direct3D. Nothing in the code is changing except the specific API calls and how the window is created.
I am wondering if the problem lies in my rendering process?
Am I possibly doing something wrong with my window creation in Direct3D?
Could it be my video card? I can't tell you what kind of video card that I have except that it is integrated and sucks!
Thanks for any help
The first thing that comes to mind is your creation of your Direct3D device, you may be setting software vertex processing. If this is the case, then you will see a nice boost if you set hardware vertex processing.
But, there are still problems with your approach.
if you use VBO in openGL to do the same thing, you will likly get the same bad performance.
The problem is you are creating the vertex and index buffers every frame, and then deleting them every frame. The problem with this is that in D3D, everything is designed to be as efficient as possible, and by forcing you to render in that way, all your vertex/index buffers will be stored in video memory (or both system and video if they are managed).
The problem here is that the video memory must be allocated, then copied, and this is a slow operation that is usually run parallel to the cpu doing something else.
Your problem is that you are asking D3D to render from these buffers straight away. So not only must the cpu wait for the buffers to be allocated, but also wait for them to fill before it can continue on.
The fact you have an intergrated video card would more likly point to this, as it won't be as optimized for this case.
In GL, when you use glVertexPointer, and point to system memory, that memory is streamed in an efficient way to the video card while the CPU does something else. The problem with this, (and this is why VBO is faster) - is that it's limited by the AGP/PCI-X bus the video card is sitting on, and the video card can most likly render geometry must faster than can be sent to it.
So with VBO, and also with Direct3D vertex/index buffesr, the idea is to either have static buffers that never (or rarly) change, or to have dynamic buffers that are allocated and filled early (well before rendering). Both methods hopfully prevent the cpu->gpu synconisation from breaking down.
That said, you can use the glVertexPointer(...) streamed method of rendering in D3D, the DrawIndexedPrimitiveUP command does this. But if you read the D3D msdn docs, you will see that this is not recommended for performance reasons.
So.
Unfortunatly, the BSP map format that Quake3 uses is really not at all suited to modern graphics hardware because it demands a lot of dynamic data (index buffers mostly). This can be overcome but it's tricky.
I once made a q3 map renderer;
I created a single vertex buffer. That stored every vertice in the entire q3 map.
I also created a single index buffer.
Then, I traversed the entire bsp tree, and at each leaf, I grabbed all the groups of triangles in that leaf that would be drawn, and stuck those into the big index buffer. I also stored where in the index buffer they were, etc.
Then when rendering, I would therefore traverse the bsp tree again, and instead of getting the groups of triangles that had to be drawn, I would get a range of indices in the index buffer.
Merging any continuous ranges, I would then end up with a large bunch of ranges to draw. Maybe 40 or so.
It can be a bit tricky with 16-bit indices as many q3 maps have more than 65k vertices. (eg nv15)
Therefore rendering became quite easy, as it was just a series of calls to DrawIndexedPrimitives with approriate inital index values, and the like.
There was, however, if I remember correctly, an issue with Q3 in that a leaf node of the bsp tree could share triangles with other leaf nodes. I'm not entirly sure how I got around this, but problably just with a tag of some type to say the triangle was already part of the index buffer....
anywho.
I hope that may help out a bit.
But, there are still problems with your approach.
if you use VBO in openGL to do the same thing, you will likly get the same bad performance.
The problem is you are creating the vertex and index buffers every frame, and then deleting them every frame. The problem with this is that in D3D, everything is designed to be as efficient as possible, and by forcing you to render in that way, all your vertex/index buffers will be stored in video memory (or both system and video if they are managed).
The problem here is that the video memory must be allocated, then copied, and this is a slow operation that is usually run parallel to the cpu doing something else.
Your problem is that you are asking D3D to render from these buffers straight away. So not only must the cpu wait for the buffers to be allocated, but also wait for them to fill before it can continue on.
The fact you have an intergrated video card would more likly point to this, as it won't be as optimized for this case.
In GL, when you use glVertexPointer, and point to system memory, that memory is streamed in an efficient way to the video card while the CPU does something else. The problem with this, (and this is why VBO is faster) - is that it's limited by the AGP/PCI-X bus the video card is sitting on, and the video card can most likly render geometry must faster than can be sent to it.
So with VBO, and also with Direct3D vertex/index buffesr, the idea is to either have static buffers that never (or rarly) change, or to have dynamic buffers that are allocated and filled early (well before rendering). Both methods hopfully prevent the cpu->gpu synconisation from breaking down.
That said, you can use the glVertexPointer(...) streamed method of rendering in D3D, the DrawIndexedPrimitiveUP command does this. But if you read the D3D msdn docs, you will see that this is not recommended for performance reasons.
So.
Unfortunatly, the BSP map format that Quake3 uses is really not at all suited to modern graphics hardware because it demands a lot of dynamic data (index buffers mostly). This can be overcome but it's tricky.
I once made a q3 map renderer;
I created a single vertex buffer. That stored every vertice in the entire q3 map.
I also created a single index buffer.
Then, I traversed the entire bsp tree, and at each leaf, I grabbed all the groups of triangles in that leaf that would be drawn, and stuck those into the big index buffer. I also stored where in the index buffer they were, etc.
Then when rendering, I would therefore traverse the bsp tree again, and instead of getting the groups of triangles that had to be drawn, I would get a range of indices in the index buffer.
Merging any continuous ranges, I would then end up with a large bunch of ranges to draw. Maybe 40 or so.
It can be a bit tricky with 16-bit indices as many q3 maps have more than 65k vertices. (eg nv15)
Therefore rendering became quite easy, as it was just a series of calls to DrawIndexedPrimitives with approriate inital index values, and the like.
There was, however, if I remember correctly, an issue with Q3 in that a leaf node of the bsp tree could share triangles with other leaf nodes. I'm not entirly sure how I got around this, but problably just with a tag of some type to say the triangle was already part of the index buffer....
anywho.
I hope that may help out a bit.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement