Advertisement

Do we still need lookup tables for sin/cos?

Started by November 10, 2000 02:05 AM
21 comments, last by Quantum 24 years, 2 months ago
Anyhow, to come out fighing in favour of NOT using tables for sin/cos, my wonderful MSDN reference say this about sin/cos/tan/sqrt/atan/atan2/log/log10/exp for MSVC6:

"The floating-point functions listed below have true intrinsic forms when you specify both the /Oi and /Og compiler options (or any option that includes /Og: /Ox, /O1, and /O2):"

Remembering the nasty effects that an 8kb table might have on a small L1 data cache (8 or 18 kb) - which is not the same as textures, because unless you''re editing them, they should only be in the ram or the video card''s memory

Interesting thread. I''d like to point out something...

1- The explanation given of the sin/cos functions seem to me some kind of taylor approximation serie formulae. Sin/Cos are geometrical functions and their explanation should be geometrical. (the x/y position of a point at angle theta for a circle positioned at 0,0 having radius 1).

2- Could the cache be blown using LUTs? Being the data cache based upon the principles of both spatial and temporal locality, the use of LUTs perfectly fits inside this scheme.
If you have to use 10 FSIN/FCOS per frame, go for FSIN/FCOS;
when you need massive sin/cos calculations (real-time filters and things like this) the use of a LUT implicitly ensures the data you need are available in a small (and sequential) space and you''ll access them multiple times in the same loop.

3- As far as I know there''s no official timing document for Pentium II (or newer) from intel. FSIN/FCOS takes 18-124 clock cycles on a non-MMX pentium. The instruction isn''t pairable at all. Assuming Intel improved P2/P3/P4 FPU, I can think they could have made it pairable, nothing more. If you can have a virtual TOS control inside your pipeline, pairing a 18-124 cycles instruction should be possible under certain conditions. Coming back to the formula posted by Magmai Kai Holmlor and civguy, I suppose the FSIN/FCOS operations are implemented as microcode in the Pentium processor using this algorithm. Maybe the high gap between best and worst case is due to the microcode control for special cases in which you can stop after few iterations. FADD and FMUL take 3/1 clock cycles and they''re pairable. Probably the execution time for those instructions is still the same, even on P3s.
We are at the starting point: 18-124 cycles because of the use of FMUL/FADD (which are already highly optimized) in the microcode.
With SIMD-oriented extensions like MMX already part of the standard 80x86 instruction set, I''d not call a 18-124 cycles instruction a "on the fly" one...

Do you have a doc with execution times for P2/P3, or do
you know a book talking about them? Thank you.
Advertisement
sin/cos are more than likely converted to several microops by the decoder in the PII/PIII, and these microops are able to execute in parallel with other instructions.

This topic is closed to new replies.

Advertisement