|
Optimize this
Im working on a split-only ROAM implementation. Im having some problems with framerates though and i pretty much narrowed the trouble down to this function. Its WAY too slow and is taking up most of my running time.
The function traverses a tree which is built up each frame. The tree is also different each frame which means i cant just put the whole thing in a displaylist or vertex array. I think maybe the recursion is taking up alot of time but i really dont know enough about the mechanisms behind to tell.
If you got any ideas, any at all, please let me know. I dont mind major changes so if im missing some general optimization techniques in ROAM let me know as well.
you''re doing pretty many float/int conversions, i dont know anything about ROAM though ...
Okay, this is a very rough first look based on some general optimisation principles. I would check the algorithm if it is taking too long, perhaps that isn''t quite right. A B-Tree is always going to be O(n) to traverse it completely and process every node.
1. Check your MAP_SIZE constant. If possible, make your mapsize a power of 2 and then you can bit shift instead. that will speed up those calcs to start with.
2. Do you need to work in floats? You seem to be working with Ints and then cast MAP_SIZE to a float and divide an int by what is effectively a floated int?? Do you need to do this?
3. l is PATCH_SIZE/4.0, does this change? If not, cache it at the start and also try and make it a power of 2. Does it need to be a float?
3. If you can keep them as ints, use bit shifting again to mimimise your divides through the last part. You don''t re-use anything in each operation as they are all different but you can minimise the number of ops to do.
4. The casting to GLFloat probably isn''t helping either particularly as this is a full re-shuffle of the bits.
Not sure if this has helped at all.
-- That''s my $0.02 worth --
Hang on, where I come from, $0.02 is rounded down. Does that mean my opinion is worthless or priceless?
CHROM
1. Check your MAP_SIZE constant. If possible, make your mapsize a power of 2 and then you can bit shift instead. that will speed up those calcs to start with.
2. Do you need to work in floats? You seem to be working with Ints and then cast MAP_SIZE to a float and divide an int by what is effectively a floated int?? Do you need to do this?
3. l is PATCH_SIZE/4.0, does this change? If not, cache it at the start and also try and make it a power of 2. Does it need to be a float?
3. If you can keep them as ints, use bit shifting again to mimimise your divides through the last part. You don''t re-use anything in each operation as they are all different but you can minimise the number of ops to do.
4. The casting to GLFloat probably isn''t helping either particularly as this is a full re-shuffle of the bits.
Not sure if this has helped at all.
-- That''s my $0.02 worth --
Hang on, where I come from, $0.02 is rounded down. Does that mean my opinion is worthless or priceless?
CHROM
-- That's my $0.02 worth --
Hang on, where I come from, $0.02 is rounded down. Does that mean my opinion is worthless or priceless?
CHROM
Hang on, where I come from, $0.02 is rounded down. Does that mean my opinion is worthless or priceless?
CHROM
September 04, 2001 10:52 AM
Off the top of my head;
1) Remove the glMultiTexCoord2fARB() and glVertex() functions but keep the work they''re doing (the float division). You want to know how much time is spent doing algorithmic vs gl work.
2) Make the function a loop with a stack preallocated to the depth you''ll need. This will eliminate the function overhead. The STL has a stack you can use.
3) Make the data structure an implicit binary tree. This will allow you eliminate the pointer overhead (memory and CPU) and give you quick access time to children. Check out http://www.longbowdigitalarts.com/ubb/Forum3/HTML/000016.html for details.
4) Try to work with either ints or floats. You''re doing a bunch of conversions in the glVertex() and glMultiTexCoord2fARB() functions.
5) Choose PATCH_SIZE such that PATCH_SIZE/4.0 is a power of two to allow faster division. Same with MAP_SIZE.
6) Const you function. For example, make LeftZ, RightZ, and TopZ const. Any additional information that you give the compiler helps it make better optmization decisions.
7) Eliminate the temporaries for variables p and l.
Good luck.
1) Remove the glMultiTexCoord2fARB() and glVertex() functions but keep the work they''re doing (the float division). You want to know how much time is spent doing algorithmic vs gl work.
2) Make the function a loop with a stack preallocated to the depth you''ll need. This will eliminate the function overhead. The STL has a stack you can use.
3) Make the data structure an implicit binary tree. This will allow you eliminate the pointer overhead (memory and CPU) and give you quick access time to children. Check out http://www.longbowdigitalarts.com/ubb/Forum3/HTML/000016.html for details.
4) Try to work with either ints or floats. You''re doing a bunch of conversions in the glVertex() and glMultiTexCoord2fARB() functions.
5) Choose PATCH_SIZE such that PATCH_SIZE/4.0 is a power of two to allow faster division. Same with MAP_SIZE.
6) Const you function. For example, make LeftZ, RightZ, and TopZ const. Any additional information that you give the compiler helps it make better optmization decisions.
7) Eliminate the temporaries for variables p and l.
Good luck.
Thanks for all your replies. At least im learning a lot
. Ive changed some of the things you suggested.
Ive elimminated the variables l and p and replaced them with constants instead. The CenterX, CenterY no longer needs conversion thanks to bit-shifting. The list lx,ly,rx etc is converted to float at calculation only. I need to work in floats with these variables since they are used for texture mapping. The function parametres are now consts. And finally the glvertex() calls now use ints instead of floats.
Still got some questions i hope you will take the time to answer.
Could you point me to a place where this is described in more detail. sounds interesting.
Im already using implicit binary trees other places in the program. I will still have to send at least an int along to let the function know what tree-node i wish to access right ?. If thats the case then why is it faster than sending a pointer instead ?
And again, thanks for all your help, appreciate it
.
Edited by - NewDeal on September 4, 2001 12:41:59 PM
data:image/s3,"s3://crabby-images/94034/94034f0805586c371a939655147d4e17c8e84481" alt=""
|
Ive elimminated the variables l and p and replaced them with constants instead. The CenterX, CenterY no longer needs conversion thanks to bit-shifting. The list lx,ly,rx etc is converted to float at calculation only. I need to work in floats with these variables since they are used for texture mapping. The function parametres are now consts. And finally the glvertex() calls now use ints instead of floats.
Still got some questions i hope you will take the time to answer.
quote:
Make the function a loop with a stack preallocated to the depth you'll need. This will eliminate the function overhead. The STL has a stack you can use.
Could you point me to a place where this is described in more detail. sounds interesting.
quote:
Make the data structure an implicit binary tree. This will allow you eliminate the pointer overhead (memory and CPU) and give you quick access time to children.
Im already using implicit binary trees other places in the program. I will still have to send at least an int along to let the function know what tree-node i wish to access right ?. If thats the case then why is it faster than sending a pointer instead ?
And again, thanks for all your help, appreciate it
data:image/s3,"s3://crabby-images/94034/94034f0805586c371a939655147d4e17c8e84481" alt=""
Edited by - NewDeal on September 4, 2001 12:41:59 PM
September 04, 2001 12:37 PM
I don''t have an immediate reference for the stack. You basically want something like:
struct Params
{ TriNode *Tri,
int LeftX, RightX, TopX, LeftY, RightY, TopY
}
struct Params
{ TriNode *Tri,
int LeftX, RightX, TopX, LeftY, RightY, TopY
}
September 04, 2001 01:05 PM
Check out "Essentials of Programming Languages", Friedman, Wand and Haynes. The work is in Scheme, but there is information for turning recursion into a stack-loop. (The method has a name, but I can''t remember as its been too long.)
Of the top of my head, its something like:
struct Params
{
TriNode *Tri,
int LeftX, RightX, TopX, LeftY, RightY, TopY
};
stack myStack;
bool doPop = false;
while (!doPoP || !myStack.empty())
{
if (doPop)
{
// Reached leaf node.
// Extract saved params from stack
TriNode* Tri = myStack.top().Tri;
int LeftX = myStack.top().LeftX;
...
// Pop stack
myStack.pop();
// Do work.
...
}
// Assume next item is a leaf
doPop = true;
if (Tri->LeftChild)
{
// Node has children.
// Push params onto stack.
myStack.push(Params(Tri, LeftX, TopX, LeftY, RightY, TopY);
// Set stack indicating that the top isn''t a leaf node.
doPop = false;
}
}
You''ll probably want to use references and try to avoid
extra copying. However, if you use an stl stack, be sure
you''re done using the data before you pop if using references.
(There may be consequences based on the stl implementation.)
Implicit Binary Trees:
You are correct, you still have to pass along the location to reference the child. You probably already know this, but the savings are storage itself (no explicit pointers) and quicker access to the children (mathmatical instead of following pointers). Also, with your stucture taking less memory, you reduce your probabilty of cache hits.
Don''t forget to post back to tell us how you made out.
Of the top of my head, its something like:
struct Params
{
TriNode *Tri,
int LeftX, RightX, TopX, LeftY, RightY, TopY
};
stack
bool doPop = false;
while (!doPoP || !myStack.empty())
{
if (doPop)
{
// Reached leaf node.
// Extract saved params from stack
TriNode* Tri = myStack.top().Tri;
int LeftX = myStack.top().LeftX;
...
// Pop stack
myStack.pop();
// Do work.
...
}
// Assume next item is a leaf
doPop = true;
if (Tri->LeftChild)
{
// Node has children.
// Push params onto stack.
myStack.push(Params(Tri, LeftX, TopX, LeftY, RightY, TopY);
// Set stack indicating that the top isn''t a leaf node.
doPop = false;
}
}
You''ll probably want to use references and try to avoid
extra copying. However, if you use an stl stack, be sure
you''re done using the data before you pop if using references.
(There may be consequences based on the stl implementation.)
Implicit Binary Trees:
You are correct, you still have to pass along the location to reference the child. You probably already know this, but the savings are storage itself (no explicit pointers) and quicker access to the children (mathmatical instead of following pointers). Also, with your stucture taking less memory, you reduce your probabilty of cache hits.
Don''t forget to post back to tell us how you made out.
what u have there is gonna be slow.
the biggest problem is u are calculating one tri at a time and then drawing it.
try this ( u will need to rewrite/replan your code
).
*calculate ALL the triangles in the scene.
*discard the ones that aint visable
*build a list of the visable triangles
*use vertex arrays to draw those.
using the above method will increase the speed from 3-30x (not percent but times)
the biggest problem is u are calculating one tri at a time and then drawing it.
try this ( u will need to rewrite/replan your code
data:image/s3,"s3://crabby-images/8ae42/8ae426cc991394500232d48f900f034f63ca7e47" alt=""
*calculate ALL the triangles in the scene.
*discard the ones that aint visable
*build a list of the visable triangles
*use vertex arrays to draw those.
using the above method will increase the speed from 3-30x (not percent but times)
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement