Advertisement

Fast Trig function

Started by February 08, 2002 06:10 PM
47 comments, last by dragon376 23 years ago
quote:
Original post by Dredge-Master
Regarding Beer-Hunter''s post (still a cool name) I think under Win32, one of the Microsoft headers has a FSINCOS type function in it. I forgot the name, but it is faster if you ever have to calculate both sin and cos (or use it for tan).

I know that borland has a SinCos function, and msvc might have one, but it''s non-standard. g++ certainly doesn''t have one. I gave the assembly version just to be sure.

For calculating tangents, I''m pretty sure there''s an FTAN instruction...
OK,

I did some benchmarks. I'm using Visual C++ 6.0, and I tested both Beer Hunter's asm version and the standard sin() function.

I did 1 million iterations, and looked at the optimized code to make sure nothing funny was happening that I was unaware of (the sin() version uses the fsin asm instruction).

System:
AMD 1.33GHz
512MB
WinXP Home

Results:
asm version: 146 cycles
sin version: 85 cycles

Make of that what you will. It seems the asm version is only useful if you want to calculate both the sin and cos values of an angle at the same time. Otherwise it's not worth the hassle.

BTW, these results could be different on a Pentium vs Athlon. Someone care to test it?

SS

Edited by - Axter on February 10, 2002 2:24:10 PM
SS
Advertisement
Infinisearch: actually the "sin" instruction i was talking about isn''t for a PC. it''s not even Intel x86 based. and on the hardware i am speaking about i know it does not take that many cycles to acess memory.

furthermore, before "dissing" my code logic. you might want to actually run the code first.

  float cvt_s_d(double d){  float            result;  unsigned long    *dst, fixed, mantissa, sign;  unsigned __int64 *src;  if (d != 0.0) {    src = (unsigned __int64 *)&d    /* extract the sign-bit. */    sign = (unsigned long)((*src) >> 32);    sign &= 0x80000000;    /* calculate the mantissa. */    mantissa = (unsigned long)(((*src) << 1) >> 53);    mantissa = (((mantissa - 896) & 0xff) << 23);    /* truncate the 52-bit fixed point value to 23-bit. */    fixed = (unsigned long)((*src) >> 29);    fixed &= 0x7fffff;    /* compound the result. */    dst = (unsigned long *)&result    (*dst) = (sign | mantissa | fixed);  }  else result = 0.0f;  return (result);}  


try calling this function and see the results. i tried it with "-897" and it yielded incorrect results.

To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
Jenova, I know u wasn''t talkin bout the PC but I figured i''d point this out in defense of Abstraction''s statement of: "For most modern processors calling sin or cos is about the same or even faster than acessing into RAM (i.e. into your array)" The simple fact is there are very few platforms where RAM speed is greater or equal to processor speed, and even when it is the latency is what counts.

As to me "dissing your code", i thought it didn''t work (your edit)... it was merely a suggestion. But if you know of a free MIPs simulator that supports 64bit registers I will run your original assembly and a version i wrote. Right now I have PCSPIM and it only supports MIPS32 instructions/32bit registers. I already wrote one version of your original assembly but i wanted to work on it some more... i think conditional moves could be used to make it faster. But when I realized i couldn''t test it...

-potential energy is easily made kinetic-

-potential energy is easily made kinetic-

quote:
Original post by Axter
It seems the asm version is only useful if you want to calculate both the sin and cos values of an angle at the same time. Otherwise it''s not worth the hassle.


What did you expect?

Someone mentioned msvc having a SinCos function (or something) of its own. Perhaps you could compare that one to my version as well?
Infinisearch: sorry, i didn''t mean to be so offensive. unfortunately i don''t know of any MIPS 64-bit simulators/emulators.

To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
Advertisement
Might be stating the obvious here, but if you do constuct lookup tables don''t create a SIN & a COS table just make the table 360+90 degrees, ie, instead of 512 entries make it 640 entries. Cosine table starts at table + 128.

BTW, aren''t SIN/COS functionality provided by the MMX instructions nowadays ?

Finally, I am amazed why people feel they have to be rude, patronising and belittling to each other during a discussion about SIN & COS implementation. Why ?
quote:
Original post by Axter
It seems the asm version is only useful if you want to calculate both the sin and cos values of an angle at the same time. Otherwise it's not worth the hassle.



quote:
Original post by Beer Hunter
What did you expect?





I was merely stating this because it was implied (not necessarily by you) that some compilers might not use the latest features of the hardware. As far as sin, cos etc. goes on VC++ at least, this does not seem to be a problem. So it’s not necessary to implement an asm version.

A quick and dirty table lookup method I checked ran roughly twice as fast as the fsin call, but as was mentioned before, the fact that you need to access a large lookup table that would screw with the cache might not be significantly better to warrant the added effort, time, and loss in accuracy to bother in the first place. What I’m saying is, the table ran faster, but in a real situation where there’s other stuff going on as well, the additional cache misses that would result because of other parts of the program also accessing memory, etc, results would probably be worse than in the ideal test case that I did. You might only gain 10% or whatever, not worth the hassle.

SS


Edited by - Axter on February 11, 2002 1:15:00 PM
SS
arrgggghhhh.
Instead of yadding on about different trig variations, why don`t we check out how it is calculated.
I, unfortanatly, cannot analytically calculate the a trig function of a angle.
I`m certain theres someone on this board who can do the math out for any angle a, and find sin/cos/tan, etc.

Therefore, I suggest that this person weigh in with the mathematical algorithim, and we turn our attention to optimizing this algorithim.

OK ?

Thank you,
~V''lion

Bugle4d
~V'lionBugle4d
quote:
Original post by Vlion
arrgggghhhh.
Instead of yadding on about different trig variations, why don`t we check out how it is calculated.
I, unfortanatly, cannot analytically calculate the a trig function of a angle.
I`m certain theres someone on this board who can do the math out for any angle a, and find sin/cos/tan, etc.

Therefore, I suggest that this person weigh in with the mathematical algorithim, and we turn our attention to optimizing this algorithim.

>


I can just about guarantee you now that any custom version of a trig function will not be faster than the hardware version. Why would they waste precious silicon on implementing hardware trig functions if it was possible to get faster results in software? It doesn''t make sense.

My test shows the sin/cos function to be roughly 85 cycles, including the loop overhead (probably faster), so trying to implement a complex function like that in less than 85 cycles would be hard to do.

I can see what you mean about just trying it out, and I don''t know what the formula for sin is, but I do know it''s not trivial.

SS
SS

This topic is closed to new replies.

Advertisement