Fast Trig function
Anybody has a trick get fast trig function?
I tried to have a array of 360 degrees pre calculated, but for many application, averaging to the closest degree isn''t precice enough. And the C function sin() and cos() are kinda slow.
quote:
Original post by YourOtherLeft
You could always do them in asm if u can b bothered....
What do you think C function calls do ?
Even asm sin/cos are SLOW (~50 cycles IIRC)
You can implement sin & cos as a lookup table holding the values for the first quadrant and using the appropriate trig transforms for the other quatrants.
Optionally, add linear interpolation for values that fall in between two table entries, but if it is just to implement rotations & such, 256 entries (for 1/4 circle, remember), is usually more than enough.
"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan
The easiest way, as said Fruny, is to use lookup tables with a power of 2 entries (usually 256) because it allows you, using simple binary shifting, to do the modulus that map the angle into 0-255.
I know that I don't know nothing... Operation Ivy
For most modern processors calling sin or cos is about the same or even faster than acessing into RAM (i.e. into your array)
A lot of CPUs have have special intructions to handle floating point values. Lookup tables are beginning to loose their value as computers get faster. At least when your talking about minor calculations like blending operations or trig functions.
To get a complete set of values from the unit circle you only need to calculate the cosine of those in quadrant 1. Just like Fruny said, apply transformations to get the values you want.
cos(0) = sin(90) | cos(180) = -cos(0)
256 values for 1 quadrent mean 1024 values for the whole circle. That''s a heck of a lot.
I would not interpolate between values of your lookup table. This is going to be really slow. Robust but slow. Making a memory access and then doing more calculations is a lot less efficient than just calling cos(). I''ve never actually profiled this sort of thing so it''s possible I''m wrong and Fruny is on to something. (trying not to step on people''s toes) My gut feeling say no though.
A lot of CPUs have have special intructions to handle floating point values. Lookup tables are beginning to loose their value as computers get faster. At least when your talking about minor calculations like blending operations or trig functions.
To get a complete set of values from the unit circle you only need to calculate the cosine of those in quadrant 1. Just like Fruny said, apply transformations to get the values you want.
cos(0) = sin(90) | cos(180) = -cos(0)
256 values for 1 quadrent mean 1024 values for the whole circle. That''s a heck of a lot.
I would not interpolate between values of your lookup table. This is going to be really slow. Robust but slow. Making a memory access and then doing more calculations is a lot less efficient than just calling cos(). I''ve never actually profiled this sort of thing so it''s possible I''m wrong and Fruny is on to something. (trying not to step on people''s toes) My gut feeling say no though.
You say the sin / cos functions are slow, but are they slow enough that your application won''t function as intended? Most likely not, your bottle neck will be somewhere else which means even if you implement faster trig functions your application will only go as fast as the slowest link.
If the trig functions are your bottleneck then look into optimization methods for them.
ECKILLER
If the trig functions are your bottleneck then look into optimization methods for them.
ECKILLER
ECKILLER
"For most modern processors calling sin or cos is about the same or even faster than acessing into RAM (i.e. into your array)"
Do you even know how many clock ticks the C Runtime cos/sin use and how few a lookup table can use? If you use them enough, cos and sin can be a performance drain (140 clock ticks, or around if I remember).
Do you even know how many clock ticks the C Runtime cos/sin use and how few a lookup table can use? If you use them enough, cos and sin can be a performance drain (140 clock ticks, or around if I remember).
there is no f''in way it takes the same amount of time to access memory as it takes to calculate sin/cos.
i can speak for one piece of hardware where it takes "27 cycles" for "sin" ; and this is one instruction. please, don''t tell me it takes "27 cycles" to access memory.
dragon376: if you need better percision, use greater intervals than 1/360 for you lookup table. do so in powers of 2, so you can shift instead of multiplying.
To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
i can speak for one piece of hardware where it takes "27 cycles" for "sin" ; and this is one instruction. please, don''t tell me it takes "27 cycles" to access memory.
dragon376: if you need better percision, use greater intervals than 1/360 for you lookup table. do so in powers of 2, so you can shift instead of multiplying.
To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
February 08, 2002 07:30 PM
The main problem is that most C/C++ compilers are still behind when it comes to taking full advantage of the latest CPU''s features. Using assembly language to take advantage of an FPU''s trig functions will benefit you greatly.
Most modern CPU''s use lookup tables to speed trig functions nowadays anyway so the need to create your own lookup tables is not really relevant anymore.
Most modern CPU''s use lookup tables to speed trig functions nowadays anyway so the need to create your own lookup tables is not really relevant anymore.
no need to speculate. i have documentation from a given piece of hardware that says "27 cycles". it does not take 27 cycles to access memory. therefore (today on that piece of hardware), it''s faster to use a lookup table than use this instruction. i''m sure the x86 "fsin" instruction takes more time to execute than a lookup table.
AP: the "sin" or "cos" function in the "libc" library should be written to take care of hardware features.
NOTE: we can always time the code to be sure?
To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
AP: the "sin" or "cos" function in the "libc" library should be written to take care of hardware features.
NOTE: we can always time the code to be sure?
To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
To the vast majority of mankind, nothing is more agreeable than to escape the need for mental exertion... To most people, nothing is more troublesome than the effort of thinking.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement