Fast casting
Hi
I need to know how to cast a float to an int using FPU code (or any other way), As I do loads of casting and I think it''s really slowing down my code. I do use fixed point math for some routines but I keep reading bad things about it and also my code does not speed up much. What do you all use for casting? Thanks alot.
P.S. Anything would be appreciated thnaks.
July 17, 2000 07:14 PM
>What''s wrong with:
>
>int x = (int) 9.0f;
>
>Let the compiler deal with it.
What''s wrong is that it can be horribly slow, especially inside a big loop.
>
>int x = (int) 9.0f;
>
>Let the compiler deal with it.
What''s wrong is that it can be horribly slow, especially inside a big loop.
Try this:
// IEEE754 compliant rounding to nearest
inline int __stdcall round( float x )
{
int t;
__asm fld x
__asm fistp t
return t;
}
When optimization is enabled, MSVC compiles this function just to 2 instructions.
// IEEE754 compliant rounding to nearest
inline int __stdcall round( float x )
{
int t;
__asm fld x
__asm fistp t
return t;
}
When optimization is enabled, MSVC compiles this function just to 2 instructions.
You want it for Alpha?
Edited by - Serge K on July 18, 2000 12:56:57 AM
extern "C" { int64 __asm (char *, ...); };#pragma intrinsic(__asm)#ifdef _ALPHA_21264inline int round( float x ){ return (int) __asm("cvttq f16, f0;" "ftois f0 , v0", x);}#elseinline int round( float x ){ int64 t; __asm("cvttq f16, f0;" "stt f0 ,(a1)", x, &t); return int(t);}#endif
Edited by - Serge K on July 18, 2000 12:56:57 AM
--== FLOAT 2 INT ==--
by: Alex Chalfin (aka Phred)
achalfin@one.net
Since the introduction of the Intel Pentium chip, many programmers have
switched from fixed point mathematics system to floating point. This is due to
the superior floating point unit on the Pentium chip. However, this has left
a small debate within the programming world. What is the best way to perform
a floating point to integer conversion?
I will present 4 methods for float to int conversion in this document. It is
up to you to decide which is best for you.
Method 1:
---------
Typecasting. This method is a high level language method for converting a
float to an integer. Here is a small piece of code demonstrating it:
MyInt = (int)MyFloat;
Advantages:
- Completely portable and standard.
- Works with float and double without modification.
- Performs correct rounding.
Disadvantages:
- Heavily compiler dependant.
- Tends to be slow (i.e. Watcom''s slow typecast).
Method 2:
---------
Explicit FPU instruction to convert to an integer. On the x86 platform, the
instructions take the following form:
fist dword ptr [eax] ; store integer
-or-
fistp dword ptr [eax] ; store integer and pop
Using this form on x86 platforms generally avoids the overhead associated
with the compiler type casting. When compared to the typecasting under the
Watcom 10.6 compiler, the cycle count dropped from 40 to 6.
Advantages:
- Good performance
- Works with float and double without modification.
Disadvantages:
- x86 CPU dependant
- requires assembler (not really a disadvantage)
- requires 6 cycles (6 cycles for 1 instruction is quite a bit)
- Ignores rounding state of the FPU
Method 3:
---------
Magic number/fadd trick. This method uses a trick in the IEEE double format
to perform the typecasting without actual conversion.
int FLT2INT {0,0x43380000};
int FLT2FXD24_8 {0,0x42B80000};
int FLT2FXD16_16 {0,0x42380000};
int FLT2FXD8_24 {0,0x41B80000};
int TEMP {0,0};
fadd qword ptr [FLT2INT];
fstp qword ptr [TEMP];
Mov eax,[TEMP+4];
Advantages:
- Good performance
Disadvantages:
- Dependant on "double" data type, doesn''t work on "float".
- Extra constants (has to be stored as a double or two ints).
- Ignores rounding
Method 4:
---------
Integer pipeline conversion. This method takes the IEEE float format and uses
it completely to convert to an integer.
FltInt = *(int *)&MyFloat
mantissa = (FltInt & 0x07fffff) | 0x800000;
exponent = 150 - ((FltInt >> 23) & 0xff);
if (exponent < 0)
MyInt = (mantissa << -exponent);
else
MyInt = (mantissa >> exponent);
if (FltInt & 0x80000000)
MyInt = -MyInt;
Advantages:
- Good performance
- Pure integer pipeline based (good for pairing with FPU)
Disadvantages:
- Separate routines necessary for floats and doubles.
- Costly jump to handle negatives (can hurt on PPro machines)
- Ignores rounding
Stuff
-----
The main purpose of this document is to introduce the fourth method of float
to int conversion. I had never seen anything like it and I thought it was
pretty cool. Here is how it works in a little bit more detail:
IEEE 32-bit floating point number:
31 30 23 0
________________________________
|s| exp | mantissa |
--------------------------------
What this diagram shows is the 23-bit mantissa, the 8-bit exponent, and the
1-bit sign.
The first stage of the conversion is to extract the mantissa. This is done
with simple bit masking.
mantissa = (FltInt & 0x07fffff);
With IEEE numbers, the most significant bit is always assumed to be set. This
is why the mantissa bits are all zeros for numbers which are powers of two
(like 16, 256, etc.). This bit nee
by: Alex Chalfin (aka Phred)
achalfin@one.net
Since the introduction of the Intel Pentium chip, many programmers have
switched from fixed point mathematics system to floating point. This is due to
the superior floating point unit on the Pentium chip. However, this has left
a small debate within the programming world. What is the best way to perform
a floating point to integer conversion?
I will present 4 methods for float to int conversion in this document. It is
up to you to decide which is best for you.
Method 1:
---------
Typecasting. This method is a high level language method for converting a
float to an integer. Here is a small piece of code demonstrating it:
MyInt = (int)MyFloat;
Advantages:
- Completely portable and standard.
- Works with float and double without modification.
- Performs correct rounding.
Disadvantages:
- Heavily compiler dependant.
- Tends to be slow (i.e. Watcom''s slow typecast).
Method 2:
---------
Explicit FPU instruction to convert to an integer. On the x86 platform, the
instructions take the following form:
fist dword ptr [eax] ; store integer
-or-
fistp dword ptr [eax] ; store integer and pop
Using this form on x86 platforms generally avoids the overhead associated
with the compiler type casting. When compared to the typecasting under the
Watcom 10.6 compiler, the cycle count dropped from 40 to 6.
Advantages:
- Good performance
- Works with float and double without modification.
Disadvantages:
- x86 CPU dependant
- requires assembler (not really a disadvantage)
- requires 6 cycles (6 cycles for 1 instruction is quite a bit)
- Ignores rounding state of the FPU
Method 3:
---------
Magic number/fadd trick. This method uses a trick in the IEEE double format
to perform the typecasting without actual conversion.
int FLT2INT {0,0x43380000};
int FLT2FXD24_8 {0,0x42B80000};
int FLT2FXD16_16 {0,0x42380000};
int FLT2FXD8_24 {0,0x41B80000};
int TEMP {0,0};
fadd qword ptr [FLT2INT];
fstp qword ptr [TEMP];
Mov eax,[TEMP+4];
Advantages:
- Good performance
Disadvantages:
- Dependant on "double" data type, doesn''t work on "float".
- Extra constants (has to be stored as a double or two ints).
- Ignores rounding
Method 4:
---------
Integer pipeline conversion. This method takes the IEEE float format and uses
it completely to convert to an integer.
FltInt = *(int *)&MyFloat
mantissa = (FltInt & 0x07fffff) | 0x800000;
exponent = 150 - ((FltInt >> 23) & 0xff);
if (exponent < 0)
MyInt = (mantissa << -exponent);
else
MyInt = (mantissa >> exponent);
if (FltInt & 0x80000000)
MyInt = -MyInt;
Advantages:
- Good performance
- Pure integer pipeline based (good for pairing with FPU)
Disadvantages:
- Separate routines necessary for floats and doubles.
- Costly jump to handle negatives (can hurt on PPro machines)
- Ignores rounding
Stuff
-----
The main purpose of this document is to introduce the fourth method of float
to int conversion. I had never seen anything like it and I thought it was
pretty cool. Here is how it works in a little bit more detail:
IEEE 32-bit floating point number:
31 30 23 0
________________________________
|s| exp | mantissa |
--------------------------------
What this diagram shows is the 23-bit mantissa, the 8-bit exponent, and the
1-bit sign.
The first stage of the conversion is to extract the mantissa. This is done
with simple bit masking.
mantissa = (FltInt & 0x07fffff);
With IEEE numbers, the most significant bit is always assumed to be set. This
is why the mantissa bits are all zeros for numbers which are powers of two
(like 16, 256, etc.). This bit nee
There is the question : which sort of a floating point to integer conversion you want?
Regular C typecasting uses truncation.
Don't know about you, but usually I need round , ceil or floor .
It is very frustrating that C has quite poor support for floating point to integer conversion.
Even Java is better in this field : at least, it has function round .
> Method 1:
> ---------
> Typecasting.
...
> Advantages:
...
> - Performs correct rounding.
Hmm, it's depend... It performs correct truncation.
> Disadvantages:
> - Heavily compiler dependant.
> - Tends to be slow (i.e. Watcom's slow typecast).
It is slow (for x86) - because x86 FPU can convert floating point to integer only with current rounding mode.
Normal rounding mode is rounding to nearest .
In order to force FPU to do truncation you have to change FPU state - and this is very slow .
> Method 2:
....
> Disadvantages:
> - x86 CPU dependant
> - requires assembler (not really a disadvantage)
You just have to write different implementation of 1 function for all target platforms (and some generic code for all unknown).
> - requires 6 cycles (6 cycles for 1 instruction is quite a bit)
Anyway, it is the fastest practically possible method.
> - Ignores rounding state of the FPU
Nonsense. It does use rounding state of the FPU.
(which is usually "round to nearest")
> Method 3:
---------
Magic number/fadd trick. This method uses a trick in the IEEE double format
to perform the typecasting without actual conversion.
.....
> Advantages:
> - Good performance
It's not so simple...
This code was fast for Pentium.
But for PPro/PII/PIII it can be even slower then software conversion in pure integer code - and it is slower in my applications.
It happens because of store-to-load forwarding.
If you store double and then load just lower 32bit - you have memory access stall (load must wait for the store to write to memory before it can access required data).
Hmm, I guess it's okay for Athlon - as I remember, it can do fast forwarding if load need lower part of the data from the same address as store was.
> Disadvantages:
> - Dependant on "double" data type, doesn't work on "float".
Incorrect. Dependant on FPU internal precision.
For fastest code you may set FPU to single (float) precision.
In this case it doesn't work.
> - Ignores rounding
It uses curent rounding state.
> Method 4:
> ---------
> Integer pipeline conversion. This method takes the IEEE float > format and uses it completely to convert to an integer.
...........
> Advantages:
> - Good performance
Not so good. I tried it before. Nothing special.
Slower then Method #2.
> - Pure integer pipeline based (good for pairing with FPU)
Well, maybe - if you really want to write FPU code in pure assembler.
> Disadvantages:
......
- Costly jump to handle negatives (can hurt on PPro machines)
Hmm, my version was without "costly" (hard to predict) jumps but with correct support for overflow, infinitys and NANs:
> - Ignores rounding
It performs truncation.
P.S.
If you are fine with limited range for integer numbers, you can use Method #3 with float:
// argument must be in range -0x200000..0x1FFFFF
const float FLT2INT = 0xC00000;
inline int __stdcall trunc( float x )
{
float t = x + FLT2INT;
return ((*(int*)&t)<<10)>>10;
}
This code just a little bit slower then Method #2 (for me).
Edited by - Serge K on July 19, 2000 9:39:37 AM
Regular C typecasting uses truncation.
Don't know about you, but usually I need round , ceil or floor .
It is very frustrating that C has quite poor support for floating point to integer conversion.
Even Java is better in this field : at least, it has function round .
> Method 1:
> ---------
> Typecasting.
...
> Advantages:
...
> - Performs correct rounding.
Hmm, it's depend... It performs correct truncation.
> Disadvantages:
> - Heavily compiler dependant.
> - Tends to be slow (i.e. Watcom's slow typecast).
It is slow (for x86) - because x86 FPU can convert floating point to integer only with current rounding mode.
Normal rounding mode is rounding to nearest .
In order to force FPU to do truncation you have to change FPU state - and this is very slow .
> Method 2:
....
> Disadvantages:
> - x86 CPU dependant
> - requires assembler (not really a disadvantage)
You just have to write different implementation of 1 function for all target platforms (and some generic code for all unknown).
> - requires 6 cycles (6 cycles for 1 instruction is quite a bit)
Anyway, it is the fastest practically possible method.
> - Ignores rounding state of the FPU
Nonsense. It does use rounding state of the FPU.
(which is usually "round to nearest")
> Method 3:
---------
Magic number/fadd trick. This method uses a trick in the IEEE double format
to perform the typecasting without actual conversion.
.....
> Advantages:
> - Good performance
It's not so simple...
This code was fast for Pentium.
But for PPro/PII/PIII it can be even slower then software conversion in pure integer code - and it is slower in my applications.
It happens because of store-to-load forwarding.
If you store double and then load just lower 32bit - you have memory access stall (load must wait for the store to write to memory before it can access required data).
Hmm, I guess it's okay for Athlon - as I remember, it can do fast forwarding if load need lower part of the data from the same address as store was.
> Disadvantages:
> - Dependant on "double" data type, doesn't work on "float".
Incorrect. Dependant on FPU internal precision.
For fastest code you may set FPU to single (float) precision.
In this case it doesn't work.
> - Ignores rounding
It uses curent rounding state.
> Method 4:
> ---------
> Integer pipeline conversion. This method takes the IEEE float > format and uses it completely to convert to an integer.
...........
> Advantages:
> - Good performance
Not so good. I tried it before. Nothing special.
Slower then Method #2.
> - Pure integer pipeline based (good for pairing with FPU)
Well, maybe - if you really want to write FPU code in pure assembler.
> Disadvantages:
......
- Costly jump to handle negatives (can hurt on PPro machines)
Hmm, my version was without "costly" (hard to predict) jumps but with correct support for overflow, infinitys and NANs:
inline int __stdcall trunc( float x ){ DWORD e = (0x7F + 31) - ((*(DWORD*)&x & 0x7F800000) >> 23); if(e < 32) { int s = *(int*)&x >> 31; return int((((0x80000000 | (*(DWORD*)&x << 8)) >> e) ^ s) - s); } else return (e & 0x80000000);}
> - Ignores rounding
It performs truncation.
P.S.
If you are fine with limited range for integer numbers, you can use Method #3 with float:
// argument must be in range -0x200000..0x1FFFFF
const float FLT2INT = 0xC00000;
inline int __stdcall trunc( float x )
{
float t = x + FLT2INT;
return ((*(int*)&t)<<10)>>10;
}
This code just a little bit slower then Method #2 (for me).
Edited by - Serge K on July 19, 2000 9:39:37 AM
Thanks alot Serge K that works fine. I hav''nt throughly tested the perforemance yet but when I do I''ll put it the results up here. All the best ,cheers.
Hello
It seem''s to be three times faster,but when compiler optimizations are turned on it seems to be slighly slower.
I have a AMD 450 K6-2, and also it might have something to do with the code I testing it with. So I don''t know if it is faster or not,What do all you think? and why is casting with the compiler so slow surely Microsoft could easily sort it out?
Thanks for all your help.
P.S I''m using Pro VC++ 4.0
It seem''s to be three times faster,but when compiler optimizations are turned on it seems to be slighly slower.
I have a AMD 450 K6-2, and also it might have something to do with the code I testing it with. So I don''t know if it is faster or not,What do all you think? and why is casting with the compiler so slow surely Microsoft could easily sort it out?
Thanks for all your help.
P.S I''m using Pro VC++ 4.0
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement