Advertisement

Fast Trig function

Started by February 08, 2002 06:10 PM
47 comments, last by dragon376 23 years ago
TO ECKILLER:
I said, sin/cos function are slow, right. Do you think I am just guessing that or do you think I tested that? I know where my bottleneck is, that is why I am trying to improve it.
My question was : "Anybody has a trick to get fast trig function" which basically means "what are the optimization methods".
I did need your ass to tell me, "look into optimization methods for them" since it is my question.
So, you either, help by telling where I can find this info, or you don''t post an answere.
You obviously think you know everything about programming and that people that ask question are some moron that haven''t thought before posting a question. But your answere shows that you obviously haven''t done a full 3D engine, cause you would know how much cos & sin are used every frame.

TO EVERYONE ELSE:
Thanks, all your answers were very useful

if you are currently using and you want it slightly faster, use instead (if your compiler has it). It is the "math" library without any error checking.



Beer - the love catalyst
good ol'' homepage
Beer - the love catalystgood ol' homepage
Advertisement
I saw a post about this a couple weeks ago, and what makes using the processor''s instructions better than a table is that it doesn''t soak up the cache.
But since you''d probably being doing most transformations all in a row, then it should keep the table in the cache, which would of course make it lightning fast compared to the processor.
I''d say try both in your actual engine and see which works better.



-Deku-chan

DK Art (my site, which has little programming-related stuff on it, but you should go anyway^_^)
Firstly, I apologize that you did not like my reply. Sometimes people ask about how to optimize their code when they don''t even need optimization.

Before we get into optimization methods, what are you trying to accomplish. You must be calling these trig function very often for them to be your applications bottleneck. Either that or you are devloping on a 486 There may be a non trig. method to solve your problem.

ECKILLER
ECKILLER
You might want to try google. Google is what they call a search engine. A search for "trig function optimizations" yielded 889 results. Let me know if you have trouble using the search engine.

ECKILLER
ECKILLER
TO ECKILLER:
oh, I apologize, I thought that your where sarcastic in your first answere, but apparently, it is a brain damage that you have, you don''t know how answere any other way.
Advertisement
FSINCOS can be a handy optimisation. Here''s a function for it (it requires radians).

Intel version (untested):
  inline void SinCos(float Angle, float *SinAns, float *CosAns) {    __asm {        FLD Angle        FSINCOS        FSTP [CosAns]        FSTP [SinAns]        FWAIT    }}  


AT&T version (tested under g++):
  inline void SinCos(float Angle, float *SinAns, float *CosAns) {    __asm__ __volatile__ (        "FLD %0;"         // AT&T version        "FSINCOS;"        "FSTP (%2);"        "FSTP (%1);"        "FWAIT;"        : : "m" (Angle), "r" (SinAns), "r" (CosAns));}  
Insulting people because they didn''t give you the answer you wanted won''t get you anywhere. Anyway I won''t be checking back on this thread, so insult me all you want.

ECKILLER
ECKILLER
Regarding Beer-Hunter''s post (still a cool name) I think under Win32, one of the Microsoft headers has a FSINCOS type function in it. I forgot the name, but it is faster if you ever have to calculate both sin and cos (or use it for tan).



Beer - the love catalyst
good ol'' homepage
Beer - the love catalystgood ol' homepage
quote:
originally posted by jenova
i can speak for one piece of hardware where it takes "27 cycles" for "sin" ; and this is one instruction. please, don''t tell me it takes "27 cycles" to access memory.


27 cycles to access memory is a conservative figure for PC''s, assuming the access results in a cache miss. First you got to deal with the cache latency, then if it is a miss the address is sent to the North Bridge over the FSB which is a fraction of the processor speed. Then you deal with contention latency, so if RAM is being access by the AGP or PCI bus the processors request is queued. Then the access is sent to ram, where you have RAM access latency. Then a whole cache line has to be transfered (if the page is cacheable), and there is your data tranfer latency. The whole process can easily be 100+ processor cycles depending on the CPU''s clock multiplier. If you want numbers, ask and I''ll post them.

Oh and jenova I forgot to post this on your floating point thread, you need to subtract 897 from the exponent not 896.

-potential energy is easily made kinetic-

-potential energy is easily made kinetic-

This topic is closed to new replies.

Advertisement