La Mothe - Optimated vector length-allgorithm
I copied an allgorithm to approximate the length of a vector by 96%. The page where I got this allgo, said that its written in one of Andre LaMothe's books.
It's based on the McLaurin-formula. (but for 3 unknown vars).
I found a point where I could optimize the allgo and that's astonishing, since LaMothe should have done this himself, I think.
I commented the multiplication by 1024 and replaced it by a bitshifting of 10.
I know this can be more unexact, but I tested it and found the quote of 96% was hardly influenced by my manipulation.
Here's the code.
VVal TVector::FastLength()
{
int temp;
int x, y, z;
x = (int)(fabs(X)/* * 1024*/) << 10; //<-- my manipulation
y = (int)(fabs(Y)/* * 1024*/) << 10; //<-- my manipulation
z = (int)(fabs(Z)/* * 1024*/) << 10; //<-- my manipulation
if (y < x) {
temp = y;
y = x;
x = temp; }
if (z < y) {
temp = y;
y = z;
z = temp; }
if (y < x) {
temp = x;
x = y;
y = temp; }
int dist = ( z + 11*(y >> 5) + (x >> 2) );
return (VVal(dist >> 10));
}
Why didn't LaMothe do this himself???
Edited by - Wunibald on 10/26/00 6:57:36 AM
(Computer && !M$ == Fish && !Bike)
Bitshifting by 10 is exactly the same as multiplying by 1024. Furthermore, compilers do this for you already. I guarantee if you test this algorithm in an optimized build against the "old multiply by ten" algorithm in an optimized build that they''ll perform exactly the same.
It''s good that you''re looking for ways to optimize, though. The author of the original algorithm should have probably commented his/her code with "using 1024 because compiler will optimize it to << 10".
It''s good that you''re looking for ways to optimize, though. The author of the original algorithm should have probably commented his/her code with "using 1024 because compiler will optimize it to << 10".
Yes, a good optimization if you know what you''re doing. Suppose, however, that you pass it a unit vector? Those (int) casts will probably cause you to get a length of zero. Of course, most of the time when you are dealing with unit vectors, you know it''s a unit vector, so you don''t need to calculate its length... Something to keep in mind though.
If a man is talking in the forest, and there is no woman there to hear him, is he still wrong?
I think this method is out of date!!!
The FPU is quite fast nowadays. FABS, conversion to int, a set of jumps and an integer multiplication make this quite slow.
I haven''t calculated the exact time it takes, but I won''t be supprised if this function is slower! I would recommend simply using:
If you want to squeeze max speed out of this:
- Set FPU to lowest precision.
- Use some asm FPU pipelining optimization. Use the FSQRT function - not the C version.
- Get FMUL down to 1 cycle per multiplication.
Otherwise you could always use a lookup table.
The FPU is quite fast nowadays. FABS, conversion to int, a set of jumps and an integer multiplication make this quite slow.
I haven''t calculated the exact time it takes, but I won''t be supprised if this function is slower! I would recommend simply using:
sqrt(x*x + y*y + z*z);
If you want to squeeze max speed out of this:
- Set FPU to lowest precision.
- Use some asm FPU pipelining optimization. Use the FSQRT function - not the C version.
- Get FMUL down to 1 cycle per multiplication.
Otherwise you could always use a lookup table.
Wunibald: you are adding to the innaccuracy of the approximation by casting the float/double to int before multiplying by 1024. Let''s say the float value was 1.9 the difference between 1.9*1024 and int(1.9)<<10 is almost 100%. (I know this is an extreme example :-)
Also casting a float to int is expensive (if you''re looking to squeeze more efficiency)
Wunibald: Here's some code that may help to get your code a little faster (also the FloatToLong() function will round the float properly).
baskuenen: I think AMD processors have very fast sqrt() calculations. Intel's FPU is still a bit lagging in this department.
Edited by - neocron on October 26, 2000 6:58:48 PM
inline long FloatToLong( float a ){ assert( !_isnan( a )); int b; _asm { fld a; fistp b; } return b;}//// Inlining fabs()//inline float FABS( float a ){ assert( !_isnan( a )); if ( a < 0 ) return -a; return a;}
baskuenen: I think AMD processors have very fast sqrt() calculations. Intel's FPU is still a bit lagging in this department.
Edited by - neocron on October 26, 2000 6:58:48 PM
I hope you don''t mind neocron, I want to make a small contribution to your FABS function
The MSB (Most significant bit) of a float also stands for it''s sign. If you treat it like an int, it will be much faster.
But watch out with above code! Only use in 32bit systems!
Oh - and maybe it''s better to use an int for the FloatToLong function, because long''s are unsigned. Or am I wrong? I''m sometimes getting a bit rusty on assemly
The MSB (Most significant bit) of a float also stands for it''s sign. If you treat it like an int, it will be much faster.
inline float FABS( float a ){ assert( !_isnan( a )); int *p = (int *)&a if ( *p < 0 ) return -a; return a;}
But watch out with above code! Only use in 32bit systems!
Oh - and maybe it''s better to use an int for the FloatToLong function, because long''s are unsigned. Or am I wrong? I''m sometimes getting a bit rusty on assemly
I'm pretty sure long's are signed. On a 32bit system int & long are identical on the bit level, or at least on 32bit x86 with M$VC. If you want a 16bit value, you need to use a short...
Anyway, is you're final use of the vector length integral or floating? If you convert it to and from float->int->float I think you're wasting cpu ticks. It will probably take less time to just do floating point math and skip the conversions.
And that may be why he did *1024 and not <<10 :-) Actually, if X Y & Z are float, you *must* multiple by 1024 prior to converting to an int, otherwise you will lose all your precision. It would only produce 96% accuracy for whole numbers. Try |<.25, .5, 0.1275>| and see how accurate it is
//mo betta abs()
Edited by - Magmai Kai Holmlor on October 26, 2000 9:36:43 PM
Anyway, is you're final use of the vector length integral or floating? If you convert it to and from float->int->float I think you're wasting cpu ticks. It will probably take less time to just do floating point math and skip the conversions.
And that may be why he did *1024 and not <<10 :-) Actually, if X Y & Z are float, you *must* multiple by 1024 prior to converting to an int, otherwise you will lose all your precision. It would only produce 96% accuracy for whole numbers. Try |<.25, .5, 0.1275>| and see how accurate it is
//mo betta abs()
float x= -10.5; cout<<"x="<<x<<endl; //x = (x & 0x7FFFFFFF); //can't do this in C++, could do an improper cast in C though... __asm{ mov eax, x; and eax, 0x7FFFFFFF; mov x, eax } cout<<"abs(x)="<<x<<endl;
Edited by - Magmai Kai Holmlor on October 26, 2000 9:36:43 PM
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
LOL!
Why didn''t I think of that?
Anyway, I forgot to mention a slightly off-topic optimization:
If you need the length of the vector to compare to another value, you can drop the sqrt(), and multiply the thing you want to compare with, with itself:
Why didn''t I think of that?
Anyway, I forgot to mention a slightly off-topic optimization:
If you need the length of the vector to compare to another value, you can drop the sqrt(), and multiply the thing you want to compare with, with itself:
if ((x*x + y*y + z*z) > (r*r)){}
Disclaimer: This has nothing to do with the thread .
Welcome to gamedev Wunibald, seems a lot of people are moving here from programmer''s heaven, I assume you''re the same person, heh...
Null and Void
At least I don't know COBOL...
http://www.crosswinds.net/~druidgames/
Welcome to gamedev Wunibald, seems a lot of people are moving here from programmer''s heaven, I assume you''re the same person, heh...
Null and Void
At least I don't know COBOL...
http://www.crosswinds.net/~druidgames/
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement