Optimize this

Started by NewDeal September 04, 2001 09:19 AM

10 comments, last by NewDeal 23 years, 5 months ago

Author

134

September 04, 2001 09:19 AM

Im working on a split-only ROAM implementation. Im having some problems with framerates though and i pretty much narrowed the trouble down to this function. Its WAY too slow and is taking up most of my running time. The function traverses a tree which is built up each frame. The tree is also different each frame which means i cant just put the whole thing in a displaylist or vertex array. I think maybe the recursion is taking up alot of time but i really dont know enough about the mechanisms behind to tell.

  
void Patch::RenderR(TriNode *Tri,int LeftX, int RightX, int TopX, int LeftY, int RightY, int TopY)
{
	if (Tri->LeftChild)
	{
		int CenterX = (int)((LeftX + RightX) / 2);
		int CenterY = (int)((LeftY + RightY) / 2);

		RenderR(Tri->LeftChild, TopX, LeftX, CenterX, TopY, LeftY, CenterY);
		RenderR(Tri->RightChild, RightX, TopX, CenterX, RightY, TopY, CenterY);
	}
	else
	{
		TrisRenderedTotal++;

		int LeftZ  = HeightMap[(LeftY * MAP_SIZE)+LeftX];
		int RightZ = HeightMap[(RightY * MAP_SIZE)+RightX];
		int TopZ  = HeightMap[(TopY * MAP_SIZE)+TopX ];

		float p = (float)MAP_SIZE;
		float l = (float)PATCH_SIZE/4.0;
		

		int lX = WorldX+LeftX;
		int lY = WorldY+LeftY;

		int rX = WorldX+RightX;
		int rY = WorldY+RightY;

		int tX = WorldX+TopX;
		int tY = WorldY+TopY;

		glMultiTexCoord2fARB(GL_TEXTURE0_ARB , lX/p, lY/p);
		glMultiTexCoord2fARB(GL_TEXTURE1_ARB , lX/l, (float)lY/l);
		glVertex3f((GLfloat)LeftX, (GLfloat) LeftZ, (GLfloat) LeftY);
		glMultiTexCoord2fARB(GL_TEXTURE0_ARB ,rX/p, rY/p);	
		glMultiTexCoord2fARB(GL_TEXTURE1_ARB ,rX/l, rY/l); 
		glVertex3f((GLfloat) RightX,(GLfloat) RightZ,(GLfloat) RightY);
		glMultiTexCoord2fARB(GL_TEXTURE0_ARB ,tX/p, (float)tY/p);
		glMultiTexCoord2fARB(GL_TEXTURE1_ARB ,tX/l, (float)tY/l); 
		glVertex3f((GLfloat) TopX, (GLfloat) TopZ, (GLfloat) TopY);
	}

}

If you got any ideas, any at all, please let me know. I dont mind major changes so if im missing some general optimization techniques in ROAM let me know as well.

intellibug

122

September 04, 2001 10:03 AM

you''re doing pretty many float/int conversions, i dont know anything about ROAM though ...

Anonymous

September 04, 2001 10:11 AM

const your parameters and calculated numbers possible

psychic_treason

122

September 04, 2001 10:14 AM

Okay, this is a very rough first look based on some general optimisation principles. I would check the algorithm if it is taking too long, perhaps that isn''t quite right. A B-Tree is always going to be O(n) to traverse it completely and process every node.

1. Check your MAP_SIZE constant. If possible, make your mapsize a power of 2 and then you can bit shift instead. that will speed up those calcs to start with.

2. Do you need to work in floats? You seem to be working with Ints and then cast MAP_SIZE to a float and divide an int by what is effectively a floated int?? Do you need to do this?

3. l is PATCH_SIZE/4.0, does this change? If not, cache it at the start and also try and make it a power of 2. Does it need to be a float?

3. If you can keep them as ints, use bit shifting again to mimimise your divides through the last part. You don''t re-use anything in each operation as they are all different but you can minimise the number of ops to do.

4. The casting to GLFloat probably isn''t helping either particularly as this is a full re-shuffle of the bits.

Not sure if this has helped at all.

-- That''s my $0.02 worth --

Hang on, where I come from, $0.02 is rounded down. Does that mean my opinion is worthless or priceless?

CHROM

-- That's my $0.02 worth --
Hang on, where I come from, $0.02 is rounded down. Does that mean my opinion is worthless or priceless?
CHROM

Anonymous

September 04, 2001 10:52 AM

Off the top of my head;

1) Remove the glMultiTexCoord2fARB() and glVertex() functions but keep the work they''re doing (the float division). You want to know how much time is spent doing algorithmic vs gl work.

2) Make the function a loop with a stack preallocated to the depth you''ll need. This will eliminate the function overhead. The STL has a stack you can use.

3) Make the data structure an implicit binary tree. This will allow you eliminate the pointer overhead (memory and CPU) and give you quick access time to children. Check out http://www.longbowdigitalarts.com/ubb/Forum3/HTML/000016.html for details.

4) Try to work with either ints or floats. You''re doing a bunch of conversions in the glVertex() and glMultiTexCoord2fARB() functions.

5) Choose PATCH_SIZE such that PATCH_SIZE/4.0 is a power of two to allow faster division. Same with MAP_SIZE.

6) Const you function. For example, make LeftZ, RightZ, and TopZ const. Any additional information that you give the compiler helps it make better optmization decisions.

7) Eliminate the temporaries for variables p and l.

Good luck.

NewDeal

Author

134

September 04, 2001 11:40 AM

Thanks for all your replies. At least im learning a lot

. Ive changed some of the things you suggested.

    void Patch::RenderR(TriNode *Tri, const int LeftX, const int RightX, const int TopX, const int LeftY, const int RightY, const int TopY){	if (Tri->LeftChild)	{		int CenterX = (LeftX + RightX) >> 1;		int CenterY = (LeftY + RightY) >> 1;		RenderR(Tri->LeftChild, TopX, LeftX, CenterX, TopY, LeftY, CenterY);		RenderR(Tri->RightChild, RightX, TopX, CenterX, RightY, TopY, CenterY);	}	else	{		TrisRenderedTotal++;		int LeftZ  = HeightMap[(LeftY * MAP_SIZE)+LeftX];		int RightZ = HeightMap[(RightY * MAP_SIZE)+RightX];		int TopZ  = HeightMap[(TopY * MAP_SIZE)+TopX ];		float lX = (float)WorldX+LeftX;		float lY = (float)WorldY+LeftY;		float rX = (float)WorldX+RightX;		float rY = (float)WorldY+RightY;		float tX = (float)WorldX+TopX;		float tY = (float)WorldY+TopY;		glMultiTexCoord2fARB(GL_TEXTURE0_ARB , lX / BASE_SIZE, lY / BASE_SIZE);		glMultiTexCoord2fARB(GL_TEXTURE1_ARB , lX / DETAIL_SIZE, lY / DETAIL_SIZE);		glVertex3i(LeftX, LeftZ, LeftY);		glMultiTexCoord2fARB(GL_TEXTURE0_ARB ,rX / BASE_SIZE, rY / BASE_SIZE);			glMultiTexCoord2fARB(GL_TEXTURE1_ARB ,rX / DETAIL_SIZE, rY / DETAIL_SIZE); 		glVertex3i(RightX, RightZ, RightY);		glMultiTexCoord2fARB(GL_TEXTURE0_ARB ,tX / BASE_SIZE, tY / BASE_SIZE);		glMultiTexCoord2fARB(GL_TEXTURE1_ARB ,tX / DETAIL_SIZE, tY / DETAIL_SIZE); 		glVertex3i(TopX, TopZ, TopY);	}}

Ive elimminated the variables l and p and replaced them with constants instead. The CenterX, CenterY no longer needs conversion thanks to bit-shifting. The list lx,ly,rx etc is converted to float at calculation only. I need to work in floats with these variables since they are used for texture mapping. The function parametres are now consts. And finally the glvertex() calls now use ints instead of floats.

Still got some questions i hope you will take the time to answer.

quote:

Make the function a loop with a stack preallocated to the depth you'll need. This will eliminate the function overhead. The STL has a stack you can use.

Could you point me to a place where this is described in more detail. sounds interesting.

quote:

Make the data structure an implicit binary tree. This will allow you eliminate the pointer overhead (memory and CPU) and give you quick access time to children.

Im already using implicit binary trees other places in the program. I will still have to send at least an int along to let the function know what tree-node i wish to access right ?. If thats the case then why is it faster than sending a pointer instead ?

And again, thanks for all your help, appreciate it

.

Edited by - NewDeal on September 4, 2001 12:41:59 PM

Anonymous

September 04, 2001 12:37 PM

I don''t have an immediate reference for the stack. You basically want something like:

struct Params
{ TriNode *Tri,
int LeftX, RightX, TopX, LeftY, RightY, TopY
}

Anonymous

September 04, 2001 01:05 PM

Check out "Essentials of Programming Languages", Friedman, Wand and Haynes. The work is in Scheme, but there is information for turning recursion into a stack-loop. (The method has a name, but I can''t remember as its been too long.)

Of the top of my head, its something like:

struct Params
{
TriNode *Tri,
int LeftX, RightX, TopX, LeftY, RightY, TopY
};

stack myStack;
bool doPop = false;

while (!doPoP || !myStack.empty())
{
if (doPop)
{
// Reached leaf node.

// Extract saved params from stack
TriNode* Tri = myStack.top().Tri;
int LeftX = myStack.top().LeftX;
...

// Pop stack
myStack.pop();

// Do work.
...
}

// Assume next item is a leaf
doPop = true;

if (Tri->LeftChild)
{
// Node has children.

// Push params onto stack.
myStack.push(Params(Tri, LeftX, TopX, LeftY, RightY, TopY);

// Set stack indicating that the top isn''t a leaf node.
doPop = false;
}
}

You''ll probably want to use references and try to avoid
extra copying. However, if you use an stl stack, be sure
you''re done using the data before you pop if using references.
(There may be consequences based on the stl implementation.)

Implicit Binary Trees:
You are correct, you still have to pass along the location to reference the child. You probably already know this, but the savings are storage itself (no explicit pointers) and quicker access to the children (mathmatical instead of following pointers). Also, with your stucture taking less memory, you reduce your probabilty of cache hits.

Don''t forget to post back to tell us how you made out.

NewDeal

Author

134

September 04, 2001 03:46 PM

Thanks AP, appreciate it

zedzeek

529

September 04, 2001 04:29 PM

what u have there is gonna be slow.
the biggest problem is u are calculating one tri at a time and then drawing it.

try this ( u will need to rewrite/replan your code

).

*calculate ALL the triangles in the scene.
*discard the ones that aint visable
*build a list of the visable triangles
*use vertex arrays to draw those.

using the above method will increase the speed from 3-30x (not percent but times)

Captain Courgette, eat more veg

Optimize this

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Optimize this

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines