Problems with "dirty" fabs
Hiho!
I have a program that''s in need of a fabs function (for calculating the distance between a point and an axis aligned plane). Since this function gets called VERY often, I thought about optimizing it by simply stripping away the sign bit of my float with a bitwise & operator.
This _basically_ works, but it seems to get more and more inaccurate the smaller the numbers become (and normally the numbers I deal with are in the 0.0-1.0 range).
Now, I know that this form of calculating the absolute value is non-IEEE conform, but why actually, and is this inconformity causing my troubles?
Thanks for any advice
Ciao, ¡muh!
They're watching us...
There''s been some discussion of this on the GDAlgorithms list recently (and lots in the past). Some points which arose, and some from me:
- On a P4, the "fabs" x86 instruction is _cheaper_ than a "fadd". If this "optimization" isn''t profile led, it''s probably an un-optimization (depending on the platform and compiler of course). i.e. a knowledgable bloke once said "premature optimization is the root of all evil", he was right.
- A straight *((uint*)&fp) &= 0x7fffffff; isn''t pointer type alias safe (i.e. the compiler isn''t expecting you to interpret something defined as float as an int. For example that goes a bit screwy if the compiler decides that it can do something with SSE with your floats). Use a union to make it safe.
- Internally on the x86 the entries on FPU floating point stack are between 80 and 128 bits, much wider than the 32/64 bit representation of a floating point value in memory.
Normally compiled code tries to keep the current result of any calculation on the FP stack for as long as possible so that it has the extra bits of precision to play with. Any time it has to write to memory all of the lower precision bits get lost in the transfer.
By accessing the value as an integer to remove the sign bit you force the value back into memory from the FP stack which removes any latency hiding FPU optimisations the compiler may have done AND chops off the lower bits of precision. That, I suspect is where the "inaccuracy" is coming from.
--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com
- On a P4, the "fabs" x86 instruction is _cheaper_ than a "fadd". If this "optimization" isn''t profile led, it''s probably an un-optimization (depending on the platform and compiler of course). i.e. a knowledgable bloke once said "premature optimization is the root of all evil", he was right.
- A straight *((uint*)&fp) &= 0x7fffffff; isn''t pointer type alias safe (i.e. the compiler isn''t expecting you to interpret something defined as float as an int. For example that goes a bit screwy if the compiler decides that it can do something with SSE with your floats). Use a union to make it safe.
- Internally on the x86 the entries on FPU floating point stack are between 80 and 128 bits, much wider than the 32/64 bit representation of a floating point value in memory.
Normally compiled code tries to keep the current result of any calculation on the FP stack for as long as possible so that it has the extra bits of precision to play with. Any time it has to write to memory all of the lower precision bits get lost in the transfer.
By accessing the value as an integer to remove the sign bit you force the value back into memory from the FP stack which removes any latency hiding FPU optimisations the compiler may have done AND chops off the lower bits of precision. That, I suspect is where the "inaccuracy" is coming from.
--
Simon O''Connor
Creative Asylum Ltd
www.creative-asylum.com
Simon O'Connor | Technical Director (Newcastle) Lockwood Publishing | LinkedIn | Personal site
Thanks a bunch!
I guess I will leave that out then.
My calculations were a lot faster using the (broken) dirty fabs, however I suspect that this was more due to the fact that other parts of my algorithm were skipped because of the wrong fabs results, than because of the faster fabs itself.
Ciao, ¡muh!
I guess I will leave that out then.
My calculations were a lot faster using the (broken) dirty fabs, however I suspect that this was more due to the fact that other parts of my algorithm were skipped because of the wrong fabs results, than because of the faster fabs itself.
Ciao, ¡muh!
They're watching us...
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement