Advertisement

float variables

Started by December 18, 2000 03:23 PM
6 comments, last by Densun 24 years, 1 month ago
I should already know this, but could someone explain how floats work? I know they can store a number with a mantissa, but that''s all. --- psilord.com
Byte Size Games
Okay, you want to know how a float works...
I''ll briefly summarize the IEEE floating point standard

First of all, you must understand that floating point numbers are in scientific notation; all of it is scientific notation.
e.g., 834.642 in floating point would *actually* be stored as 8.34642 * 10^2.

sizeof(float) should be 32bits (4 bytes), the same size as a DWORD. 1 bit is used for the sign of the number (1 == negative number, 0 == positive number).
A few bits (I think around 8bits, give or take a few) are used for the value of the exponent.
The rest of the bits are used as the "precision" bits, which give the rest of the content, the number.

So, the data format would be:
first segment: sign
second segment: integer containing exponent value
third segment: precision bits

I could be much more technical, if you wanted =P
Everything is actually in binary, so the exponent part is actuall 2^x instead of 10^x, and because the first precision bit will always be 1 (unless the number is zero), the first precision bit will be left off, etcetera.

Anymore questions on how floating point numbers work, ask.

--Rob
Advertisement
I think that about clears it up. There''s one question I want to ask to make sure I understood you. Lets say I wanted to store the number 9.2. Would the precision bits equal 92 and the exponent equal -1 (assuming the 10^x is used instead of 2^x)?

---
psilord.com

Byte Size Games
> I think that about clears it up. There''s one question I want to ask to make sure I understood you. Lets say I wanted to store the number 9.2. Would the precision bits equal 92 and the exponent equal -1 (assuming the 10^x is used instead of 2^x)?


Close. But remember: this is in _scientific_ notation.

So, in reality, it would store this number:
9.2 * 10^0

The precision integer would be 92 (the processor will always presume that the decimal point will follow the first digit)
The exponent integer would be 0 (because you''re not sliding it over any; it''s already in scientific-notation form).

--Rob
No, it is base 2. The decimal part is negative powers of 2. Since it is normalized, i.e. exponent such that there is one digit to the left of the decimal, base 2 the first digit of the mantisa is always 1 so it isn''t stored. 9.2 is a bad example because I don''t believe it can be stored precisely since it is 45/5. 9.5 would be much eaiser. In hex the decimal value would be 9.8 or binary 1001.1000. You would shift this right three places to 1.0011000 drop the one and pad on the right to the number of bits of the mantisa. Your exponent is then 3, i.e. 11, padded on the left for the number of bits of the exponent. I believe IEEE also has a few other rules such as zero always being a zero mantisa and exponent even though any exponent with a mantisa of zero is zero. I think the first bit of the mantisa is actually used to represent sign as well so -9.8 would be stored as a mantisa of 1.0011000 and 9.8 as 0.0011000.
Keys to success: Ability, ambition and opportunity.
All right, a few more things. First of all, the exponent is stored as a biased-127 format, so the actual exponent value stored in the floating point is not what you'd at first expect. To be clear, here's what a floating-point number really looks like:

SEEE EEEE EFFF FFFF FFFF FFFF FFFF FFFF

where the S is the sign bit (0 for positive, 1 for negative), the Es are the eight-bit exponent, stored as a biased-127 integer, and the Fs are the mantissa. The 1 to the left of the decimal point is assumed, and therefore not included in the mantissa. This is sometimes referred to as the "hidden bit."

Here's an example. Suppose you want to know the floating-point representation for a number x=17.875. In binary, this is 10001.1110. So we have:

x = 10001.1110 * 2^0
x = 1.00011110 * 2^4

So now we have a mantissa of 00011110 (the leading 1, remember, is not represented explicitly), and an exponent of 4. But since the exponent is stored as a biased-127 integer, we actually store the unsigned representation of 4+127=131, which is 10000011. The sign bit is 0 for a positive number, and we simply add 0s to the end of the mantissa to extend it to 23 bits. Thus the floating-point representation of 17.875 is:

0100 0001 1000 1111 0000 0000 0000 0000

Finally, as LilBudyWizer said, there are some special representations defined according to the IEEE standard. Zero is represented by storing 0 for both the exponent and the mantissa:

0000 0000 0000 0000 0000 0000 0000 0000

The sign bit is inconsequential here. Infinity is defined by setting the exponent to 255 and the mantissa to 0, with the sign bit determining the sign of infinity. So positive and negative infinity are:

0111 1111 1000 0000 0000 0000 0000 0000
1111 1111 1000 0000 0000 0000 0000 0000

respectively. Finally, there is a "control code" called NaN, which stands for "not a number" that is used when the result is undefined, such as the result of a division by zero. This is represented by setting the exponent to 255, and the mantissa to anything EXCEPT 0.

Note that because exponent=0 and exponent=255 are reserved for special cases, the functional range of the floating point standard are exponent values between 1 and 254 - without the bias, this corresponds to the range between -126 and 127.

One last thing. The double-precision floating point standard is analogous, except the exponent is 11 bits and stored as a biased-1023 integer, and the mantissa is 52 bits.

Hope this helps-

-Ironblayde
 Aeon Software

Next thing you know, they'll take my thoughts away.

Edited by - Ironblayde on December 18, 2000 6:40:50 PM
"Your superior intellect is no match for our puny weapons!"
Advertisement
The new posts are correct.

Being precise is definately very important in computer science--but my motive in simplifying things above was to try to teach him the *essentials* of what he needed to know to use it, without any excess information that might confuse him.

As a clarification, I did briefly mention the following: that it is actually stored in binary (e.g., 2^x exponents), and I also mentioned that the first 1 is implicit (in binary it will always be one, so it can be left off).

Otherwise, good work in copying a technical manual''s worth of information over here

--Rob
Thanks for the info. That should be all I ever need to know about floating point variables.

---
psilord.com
Byte Size Games

This topic is closed to new replies.

Advertisement