well thats because you get a shift by 32 instead of 16 if you mul them. multiplications basically double the amount of bits needed to hold the highest number. ie 65535 is the highest 16bit number and 65535 squared is 4294836225 while the highest 32bit number is 4294967295. so estenially you get 20 shifted by 32 which overflows the undigned int.
instead you should use either the slower __int64 which is MUCH slower then a normal 32bit int. or drop the accuracy of the fractions. you will have to compromise somewhere.
how much accuracy is good enough? basically you have only 16bits to work with since when you multiply you double the bits required. being that you want signed numbers ypu really only have 15bits of percision total. ie a shift by 15 means all the numbers are fractional and anything over 1.0 will overflow.
normally you would just drop the lower 32bits instead of shifting if you were coding this in asm. the compiler cant do this for you since it needs to remain consistent.
asm example (please learn asm and dont rip the code its basically useless as it is (read that as incomplete):
mov eax, pointx // move pointx into the register eaxshr eax, 16 // shift right by 16mov ebx, COS{0] // move COS[0] to ebx its already shiftedmul ebx // mulitply eax*ebx top 32bits go to edx, bottom 32bits goto eaxmov newx, edx // just take the top 32bits and we dont need to shift
the compiler instead detects it as an overflow (since it is) and drops the upper 32bits. the compiler cant just drop the upper 32bits into newx (which gives you the answer you want without needing to shift it) since the number is actually less then it should be (since 32bit multiplies produce 64bit results).
i probably should have mentioned that (i completly forgot to actually try the code and forgot about how the compiler handles things like that since i normally use much smaller shifts or only shift one side. though i have been using mmx for some code too, so i dont notice the overflow proplem since i can just grab the top bits and ignore the lower ones.
my advice is ranked like the olympics where gold is the "best" way (ie best compromise in speed, ease of use, and accuarcy):
bronze. take the easy and very slow way and use _int64
silver. learn asm and code the sections of fixed point in asm
gold. lower the percision you need. remeber you dont need the same decimal point placment for each set of numbers. thus you can use a shift of 4 for the points and maybe 15 (need one bit for sign) for the COS/SIN table. this limits your points to a max value of -4095 to 4095. then when you multiply you could get;
(COS[0]/(2^15))*(20*(2^4))
(20*COS[0])/(2^19)
so you shift by 19 (ie 15+4).
maybe you dont need -4095 to 4095 maybe only -1023 to 1023 for the points. thus you can get 6 bits of fractional percision. you see the larger the fractional portion, the less you get for the whole number range. remember that you only can have 15bits to work with. 1 is for the sign (which is handled automatically yet you should still consider it) and then 15 bits for your actual number. this is only when multipling numbers. pure addition and subtraction you dont need to deal with that stuff. though to add or subtract numbers they must be in the same fixed point format.
thus you cant add 20<<15 and 20<<6 and would need to shift the lower one up to the higher one. so you get 20<<15 and (20<<6)<<9 and now they are in the same format. multiplies automatically get a common denominator just like normal fractional arithmatic. in fact this adjustment is similar to how floating point must deal with things and why it can handle large or small numbers but can have inconsistent percision. you can see in this example the lower percision number just gets zeros padded to its fractional portion. but dont worry about how floating point deals with things, i only mention it as a tidbit and since you may be adding/subtracting numbers in different fixed point formats.