sin, cos, since when?

RolandofGilead · 2002-10-29T15:35:21

Alright, I''ve seen in several posts, that using a look up table for sin and cos is not the fastest way. I assume that the fastest way is to use a function. I''ve been programming games for awhile now and I am pretty sure that when I started, using a look up table was faster. Could someone please tell me when things changed? Also in all the posts, no one bothered to mention why things changed. The only things I can guess is that sin and cos were made into an instruction that can be done on the processor or that the computer memory is now so much slower than the processor. Am I right or is it something else?

Math and Physics Programming

Started by RolandofGilead October 27, 2002 06:22 PM

21 comments, last by RolandofGilead 22 years, 3 months ago

LilBudyWizer

491

October 28, 2002 03:49 PM

I would assume a lookup table is still faster and rather what changed is the CPU now uses one or at least a bigger one.

Keys to success: Ability, ambition and opportunity.

kindfluffysteve

100

October 28, 2002 04:48 PM

i''m pretty sure, even on a 50mhz amiga with FPU, sin and cos was faster than lut. not quite sure on that though.

perhaps programming for mobile phones luts would be good again for the moment.

libertarianism and somalia EU vs USA

CodeJunkie

144

October 28, 2002 06:36 PM

First, use a small lookup table say 16K or less. Then only in inner loops only where addressing the LUT is faster because it will replace several calculations and/or comparisons.

Or am I crazy?!?!?

Here''s an idea I was giving someone for alpha blending http://www.gamedev.net/community/forums/topic.asp?topic_id=120728

Was I wrong about a speed increase?! Even if the code Craazer posted had been better designed(removing the 6 function calls per pixel).

Yes, I''m old school C mostly! But I''m working on it!

maxsteel

122

October 28, 2002 06:39 PM

Check out Intel''s Site, if you are interested in performance on Intel based computers.

Some Links:

http://www.intel.com/design/PentiumII/manuals/24512701.pdf
See Page 2-21, Transcendental Functions
Mentions that software implementations will NEVER be as fast as the hardware implementations, unless accuracy is sacrificed.

http://www.intel.com/technology/itj/q41999/articles/art_5.htm
Talks about how the Transcendental Functions are comupted & gives some latency for double precision :
Function Latency (cycles) Max. Error (ulps)
cbrt 60 0.51
exp 60 0.51
ln 52 0.53
sin 70 0.51
cos 70 0.51
tan 72 0.51
atan 66 0.51

Also, here is a assembly snippet from an asm function I had lying around:

__asm {
fld DWORD PTR [esp+4]
fsin
ret 4
}

Hopefully this helps.

CodeJunkie

144

October 28, 2002 06:57 PM

Wow, how low level can you go. Those are good sources to get an idea of what it was like to optimize code for an Intel PII.

Well cos/sin 70 cycles....hmmm? VS Addressing a double out of a 360 element look up table... interesting

Can anyone post some test results in this case???

Sorry, I can't. Don't know how to accurately determine cycles.

Crap - Way back in the days it took a 486 257-354 cycles to perform FSIN or FCOS. Well they have made quite an improvement over the years.

[edited by - CodeJunkie on October 28, 2002 8:16:59 PM]

kindfluffysteve

100

October 28, 2002 07:50 PM

#include
#include
#include
#include <time.h>

#ifndef PI
#define PI 3.141592654
#endif

#define ndegrees 100000

int main(){

long int t0,t1,delLUT,delSIN;

double s1n[ndegrees];
double d2r=PI/(ndegrees*0.5);
double r2d=ndegrees*0.5/PI;
for(int i=0;i<360;i++){
s1n=sin((float)i*d2r);
}

double val;
t0=clock();
for(int j=0;j<1000;j++){
for(i=0;i val=s1n; } } t1=clock(); delLUT=t1-t0; t0=clock(); for(j=0;j<1000;j++){ for(i=0;i<ndegrees;i++){ val=sin(d2r*i); } } t1=clock(); delSIN=t1-t0; printf("del time (milisecs) for LUT=%d and for processor sin=%d\n processor faster by %f\n", delLUT,delSIN,(float)delLUT/(float)delSIN); return 0; } can somebody remind me how to post source. supprisingly to me, LUT's are 6-8 times faster on my 1.2 ghz duron with DDR memory. [edited by - kindfluffysteve on October 28, 2002 8:52:17 PM]

libertarianism and somalia EU vs USA

LilBudyWizer

491

October 28, 2002 09:22 PM

Um, an integer to double conversion followed by a multiplication doesn''t seem to me to accurately reflect what you would actually do in a program. It seems you should at least use a double for that loop.

Keys to success: Ability, ambition and opportunity.

Premandrake

175

October 28, 2002 11:57 PM

First of all, in contrived examples like that the lut will win out every time as that is exactly what data caches are built to handle.

However, in real world scenarios, you do not want your 256k of precious L2 cache taken up by one lookup table. A cache miss is quickly becoming the most expensive operation on a CPU (if it isn''t already) and when you have that massive sin table sitting around it will send your cache misses through the roof.

If you are building your app to target the new generation of processors (P3,Athlon,etc..) then sin/cos LUT''s are generally more harm than good.

My advice, stay away from them and if you are using VC++ enable intrinsic operations which will change a sin(x) call from an actual function call to just a straight fsin asm opcode.

Premandrake

175

October 29, 2002 12:03 AM

I noticed I didn''t really explain *why* things changed however.

Basically on the older generation of computers, 386,486, etc, calculating a sin was extremely slow (on the order of hundreds of cycles) and a cache miss wasn''t that painful as the execution speeds were still very close to the overall memory speed.

However, in more modern systems RAM is much slower than the processor and having to go outside the processors cache (L1 or L2) is a very expensive operation that is to be avoided at all costs.

So, in the old days, a 64k sin table would be great as it gave you nearly free sin''s as compared to the dog-slow fpu operations (if you even had an fpu). Slowly however, processors began to become much faster than the memory speeds and so cache''s started gaining in importance. Suddenly that 64k tradeoff didn''t look so great when your fsin operation only took 35 cycles or so. And on newer processors it''s gotten even better with on-die lookup tables for the most frequently used math operations (which are nigh on impossible to beat

).

As to when the change took place, it was a gradual shift (that is still happening) that started 4 or 5 years ago IIRC.

CGameProgrammer

640

October 29, 2002 08:02 AM

But how many clocks does a stall take away? 60-70 clock cycles is a huge amount. A data lookup takes two clocks.

~CGameProgrammer( );

~CGameProgrammer( ); Developer Image Exchange -- New Features: Upload screenshots of your games (size is unlimited) and upload the game itself (up to 10MB). Free. No registration needed.

sin, cos, since when?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

sin, cos, since when?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines