Advertisement

Fast OpenGL 2D wrap C++ ??

Started by March 22, 2008 07:16 AM
6 comments, last by YanPL 16 years, 7 months ago
I rolled in to the forums just to ask few questions - greetz all BTW. Let's say, i got the STL list of something like this:
class blit
{
GLuint tex_id; // or, list_id if it would be faster
float x; //x position of texture wanted output place
float y; //y position of texture wanted output place
float w; //width
float h; //height
float dir; //direction rotated around z-axis
}

My render function creates such list, and I want a wrapper func to run through the list and draw all "blit" elements AFAP; conditions are: 1. Textures of same type must be drawn in order; 2. Types of textures must be drawn in order (so, I output all textures with tex_id equal to list.front().tex_id, and erase them from the list, then pop the front element, and do the same for each of tex_id's in the list) What I need:
void StartDrawingSequence(); // function called before the "blitting block"

void StartDrawingTexture(GLuint tex); // function called once for each unique tex_id in the list

void OutputTexture(float x, y, w, h, dir); // blits at [x,y] texture set by last called StartDrawingTexture
//sized [w,h] and rotated z-axis around its centre by dir degrees

void EndDrawingSequence();//After drawing all of the blit elements we can do some cleanup to go back
//to the state the app was before StartDrawingSequence call.

I kwow how to do it to just make it work, but FPS drops down by half if i draw more than 15 blit objects (even if they are same tex_id and even if I compile some parts of sequence and use glCallList...) I tried many many variants but it gets me nowhere - ogl_technique_demo from nehe site runs at 73fps with any settings while even simple 2D textures make my app fall below 40fps. Questions are: 1. What shall i put in those functions to make them optimal? 2. What may make my code slower than 3D apps of other people? 3. Is this the best possible aproach to the 2D render - (texture-grouped draw-list)? I'll try to put my code here just for comparision right after I'll strip all of non-OGL specific data.
#include <list>
class EBlit
{
public:
unsigned int tex_id;
float x;
float y;
float w;
float h;
float dir;
EBlit(unsigned int _id,float _x,float _y,float _w,float _h,float _dir);
};
std::list<EBlit> EList;
void EDrawList()
{
	std::list<EBlit>::iterator it1;
    unsigned int t_id;
//StartDrawingSequence part
	glEnable(GL_TEXTURE_2D);									// Enable Texture Mapping
	glColor4f(1,1,1,1);
//~StartDrawingSequence part
	for ( it1=EList.begin() ; it1 != EList.end(); it1=EList.begin() )
    {
        t_id= (*it1).tex_id;
//StartDrawingTexture part
        glBindTexture(GL_TEXTURE_2D,t_id);
//~StartDrawingTexture part
        for(;it1!=EList.end();)
        {
            if((*it1).tex_id != t_id){it1++;continue;}

//OutputTexture part
glPushMatrix();
    glTranslatef((*it1).x,(*it1).y,0);
	glRotatef((*it1).dir,0,0,1);
	glBegin(GL_QUADS);
		glTexCoord2f(1.0f,1.0f); glVertex2f((*it1).w,-(*it1).h);
		glTexCoord2f(0.0f,1.0f); glVertex2f(-(*it1).w,-(*it1).h);
		glTexCoord2f(0.0f,0.0f); glVertex2f(-(*it1).w,(*it1).h);
		glTexCoord2f(1.0f,0.0f); glVertex2f((*it1).w,(*it1).h);
	glEnd();
glPopMatrix();
//~OutputTexture part
            it1=EList.erase(it1);
        }
    }
//EndDrawingSequence part
	glDisable(GL_TEXTURE_2D);
//~EndDrawingSequence part
	return;
}

[Edited by - YanPL on March 22, 2008 9:45:48 AM]
You can improve your code in several ways:
* use the std algorithm sort in conjunction with a functor, then you dont need the 2nd list
* create an Array of all the vertices (2 floats each in this case) and take a look at vertex arrays how to send them to OpenGL

But that is really weird that its so slow.. you using non-power-of-two textures (NPOT)? Or too big textures, resolution 512x512 is maximum [smile] at least if you're using lots of different textures.
Advertisement
Quote: Original post by Caste
You can improve your code in several ways:
* use the std algorithm sort in conjunction with a functor, then you dont need the 2nd list
* create an Array of all the vertices (2 floats each in this case) and take a look at vertex arrays how to send them to OpenGL

oh, yes, sorted list could be faster a lot than running trough it multiple times. this really gets rid of the second loop :D:D
about vertex arrays, are they really faster than multiple Vertex calls? how big is the difference?

I got one more aproach: CompiledList with constant vertexes between glBegin and glEnd AND glScale before compiled block;
glScalef(w,h,1);//compiled block:	glBegin(GL_QUADS);		glTexCoord2f(0.0f,0.0f); glVertex2f(1,1);		glTexCoord2f(1.0f,0.0f); glVertex2f(-1,1);		glTexCoord2f(1.0f,1.0f); glVertex2f(-1,-1);		glTexCoord2f(0.0f,1.0f); glVertex2f(1,-1);	glEnd();	glScalef(1,1,1);// compiled block end
this does the difference.
Quote: Original post by Caste
But that is really weird that its so slow.. you using non-power-of-two textures (NPOT)? Or too big textures, resolution 512x512 is maximum [smile] at least if you're using lots of different textures.

this was the case, but i kinda figured it out by myself - after taking into consideration what is the (only) difference between my texture and texture I've used for font, which was relatively fast - i've checked format, load-procedure's data, and only thing that was different was the size - and I eventually realized my texture was NPOT.
Being hit by obvious mistake sometimes hurts :P it took me 14 hours of coding, google-ing and specification-reading to notice this plain fact...
Well in your case (translating and rotating before each quad) vertex arrays wont give you a big performance hit, but in general they're a lot faster (and you only need about 5 lines of extra code to use them as VBO which is even faster).

Another argument for VAs is that immediate mode (glBegin() - glEnd()) wont be available in OpenGL 3.0 any more.. in case it ever gets released [wink]
Yes it's true, legacy GL functions such as glBegin and glEnd (and probably most things relating to the fixed function pipeline) will disappear in gl3. This is for a good reason though, the legacy gl functions are extremely slow and remain in the library only for compatibility. The thing is though, there's quite a lot of these functions.

VARs are good for performance, but VBO's offer better flexibility which gives you the leverage you need to improve performance. Most notably you can specify storage - though not explicitly. All you can do is hint to opengl what the usage will be.

The most optimal code will work within the following simple guidelines:
- Indexed triangles are the fastest drawing method
- Runtime allocations are slow, so make all your storage static
- std::list is not a good data structure, prefer std::vector, or in this case a vanilla c-style array is optimal
- Use VBO's because they're the fastest drawing method. You will need a separate indices VBO
- Make your indices VBO static - the winding order of your vertices wont change
- NPOT textures are evil.
- State changes incur a performance penalty, try to draw as much as you can in a single call without any state changes (particularly shader or texture changes)

Hope this helps, and good luck!
Yes, I second the proposition for using a VBO. Run through your sorted object list and output the transformed vertex data for each render state to a vertex set and index set. Simply create your data buffer with the appropriate options (typically, asynchronous sending of data in order not to interfere with rendering) and send the buffer where needed.

Also, it would improve performance if your object transforms were represented in complex similarity format t(z) = az + b, since then transforming your objects only requires 4 muls and 4 adds instead of the more costly 4x4 matrix transform used by OpenGL.
Advertisement
I'm new to VBO's. Please tell me what version of OpenGL I need to use VBO (i know the extension name), aslo please supply me with code snippet how to
a) set up VBO, VBO extension, and return error if failed,
b) put Texturepos, Color and Outputpos vertex data into VBO (AFAIK order of data in VBO matters),
c) render such buffer content
d) safely kill/dellocate buffer before program termination

one more question: how can I use 2D with VBO, do I have to add third dimension as 0 to make it work?
nevermind, I've found almost all I needed by myself once again.
for those who may hhave similar problems: http://www.songho.ca/opengl/gl_vbo.html helps a lot.
the tricky thing is that when it comes to work with openGL I'm much like penguin
in jet's cockpit - can't fly either way. I got used to simple organised wraps of
directx based engine - now when I write such wrap by myself going for speed it's
hard for me. But, despite minor fallbacks and quite common educational breaks I'm making progress.

This topic is closed to new replies.

Advertisement