Advertisement

trig precision question for certified math genius

Started by July 19, 2012 01:43 AM
0 comments, last by alvaro 12 years, 7 months ago
[font=georgia,serif]Recenly I wrote some[font=courier new,courier,monospace] f64vec2 [/font]and[font=courier new,courier,monospace] f64vec4 [/font]sine and cosine routines in SIMD/AVX/FMA assembly-language. For example, the two "fundamental" functions take a 4-element vector containing 4 angles and return the sine or cosine of those 4 angles in a second[font=courier new,courier,monospace] f64vec4[/font]. Essentially...[/font]

[font=courier new,courier,monospace]error = sine (f64vec4* result, const f64vec4* angle);
error = cosine (f64vec4* result, const f64vec4* angle);[/font]


[font=georgia,serif]The routines compute all 4 elements simultaneously and in parallel with SIMD/AVX/FMA instructions. The more generalized function looks like this:[/font]

[font=courier new,courier,monospace]error = sincos (f64vec4 result, const f64vec4* angle, int select);[/font]

[font=georgia,serif]Where the low 4-bits of the[font=courier new,courier,monospace] select [/font]argument specify whether the sine or cosine is desired for each of the[font=courier new,courier,monospace] f64 [/font]elements in the angle argument. And yes, believe it or not, the routine computes any arbitrary combination of sine and cosine, and it computes them all simultaneously and in parallel. I didn't even realize that was possible (efficiently) until I wrote these routines, but mostly thanks to the[font=courier new,courier,monospace] vcmppd [/font]and[font=courier new,courier,monospace] vpcmov [/font]instructions, they are![/font]

[font=georgia,serif]Anyway, my problem is testing the results! I have several versions of these functions based upon 7, 8, 9, 10 and 11 coefficient chebyshev polynomials. Obviously the fewer terms the faster they run... though the difference is strangely tiny. My problem is this. I need to know what errors are produced to make a final decision on which routines to "keep" and make standard.[/font]

[font=georgia,serif]I can't trust the results of the FPU sine and cosine instructions or fancy math libraries to be precise to the final bit... partly because I don't trust anyone or anything, but partly because my results imply they are not precise. For example, my 9, 10, 11 coefficient routines are often strangely different than those "standard" results by several bits, but strangely similar to each other. Hmmmm. And I learned that some of those "standard" computations are done with 7 and 8 coefficient chebyshev routines, which should be on the low end of the precision scale. My suspicion is further reinforced by observing those "standard" computations give results much closer to my 7 and 8 coefficient routines than my 9, 10, 11 coefficient routines.[/font]

[font=georgia,serif]But the above is purely "reasonable inference" and "educated guesswork" on my part. I need someone to tell me how to create a "gold routine" to test ALL of them against. I'm plenty math-savvy to take an existing equation (like the chebyshev equations for sine and cosine) plus a table of coefficients --- and make working routines. I am NOT math-savvy enough to figure out how to make "gold standard" sine and cosine routines to verify the precision of all these "standard/common" routines, and my routines.[/font]

[font=georgia,serif]So, is anyone out there enough of a math genius to face this challenge?[/font]
Hi. Math genius here. But you don't actually need one: You need someone that knows about the CORDIC algorithm, or about arbitrary-precision calculators, like Unix's `bc -l'. You can compute very precise sines and cosines that way.

You can also take the complex number z = cos(Pi/n)+i*sin(Pi/n), compute pow(z,n) and see how close you get to 1 with different implementations of sine and cosine.

EDIT: You can also enter something like cos(0.1) in Wolfram's Alpha and click on "more digits" to your heart's content.

ANOTHER EDIT: I think the FPU values are precise to the last bit. But it's OK to be a little paranoid.

This topic is closed to new replies.

Advertisement