Advertisement

How does a 64-bit OS affect performance of floats vs doubles?

Started by August 21, 2008 11:57 AM
10 comments, last by Chadivision 16 years, 2 months ago
I'm in the VERY early stages of planning for a realtime audio synthesis application. It's going to be a native Windows application, written in C++, that uses DirectX for audio output, but all of the signal generation and processing will be done with my own code (rather than using DirectX to generate effects, etc.). I wrote a non-realtime synthesis program about ten years ago, and I used floats to represent the audio signals (which I later converted to 8- or 16-bit samples that I could write into a wavefile). Floats work fine for many types of synth modules, but anything that requires a lot of operations on the same data (a complex filter or a reverb, for example) could definitely benefit from the extra precision offered by a double instead of a float. This time around, I'm thinking of using doubles, instead of floats, for all audio signals. Right now I'm running WindowsXP (a 32 bit platform), but this program is going to be a very longterm project, and I'm sure that it will mainly be used on 64-bit operating systems in the future. I don't know much about operating systems, so here's my question... In the not-too-distant future, 32 bit operating systems are going to be far less common than they are now. How should this affect my decision on whether to use floats or doubles? Obviously, if I use floats I can make the program use less memory and it seems to me that it should run faster (am I right about that?), but I really think I'm going to want the precision of doubles. Will a 64-bit operating system be much more efficient at doing math on a 64-bit double? It seems to make sense to me, but I really don't know much about operating systems, chips, etc. Sorry for the long post. I just wanted to make sure that I gave all the details of the project. Thanks.
Atari 2600 Homebrew Author - Someone who is smart enough to learn 6502 assembly language but dumb enough to actually want to use it.
64 bit primarily affects pointer sizes, and ints to a lesser extent. As the FPU is a somewhat separate entity, 32 bit vs 64 bit won't really make much of a difference either way.
SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.
Advertisement
If you're not sending your doubles to a 3D accelerator, you could probably use doubles for your signal processing math without *much* of a performance penalty.

Memory load/store is the only thing I know of that might affect you. If you're heavily into SSE, you might not be able to use as many doubles for packed operations at once (but I'm not really that knowledgable on SSE instructions yet).

You should get the same performance on a 32 or 64 bit OS, since the FPU, MMX, and SSE instructions haven't changed in EMT64 as far as I know...
Thanks for the feedback. I think I probably will use doubles for the audio code. I may or may not be using 3D acceleration (right now I'm mostly just trying to figure out the design of the audio engine itself--I haven't really thought too much about the UI yet), but if I do I'll definitely be using floats for that.

I probably will write a math library that uses SSE, though that's a topic that I haven't really learned too much about yet--aside from just a basic understanding of what it is.
Atari 2600 Homebrew Author - Someone who is smart enough to learn 6502 assembly language but dumb enough to actually want to use it.
Quote: I think I probably will use doubles for the audio code. I may or may not be using 3D acceleration (right now I'm mostly just trying to figure out the design of the audio engine itself--I haven't really thought too much about the UI yet),


You could actually use a GPU to accelerate signal processing, though I'm not sure how well that would work with your realtime requirement (You'd get plenty of throughput but the latency may be too high). Check out CUDA if you're interested, though you need a recent GeForce to use it. If you don't have one you can just use DirectX graphics and vertex and pixel shaders, you just don't actually display anything to the screen. Check out GPGPU for more info.

As for doubles vs floats I believe the FPU on a modern processor will process both just as quickly, the problem is you'll be using twice the storage space so you may end up with more cache misses. However with the kind of data processing you're doing you'll have very good spatial locality so this may not be too much of a problem. If you write your code in such a way that changing between double and float is easy (e.g. use typedefs and don't assume a particular variable is a double or float anywhere) then you can profile versions of the code using floats, and versions using doubles and see if there's much of a performance difference.
I guess using typedefs is probably the best way to go for right now...keep my functions generic until I have enough code written to profile it and see which way to go. That would also give me the flexibility to easily change it in the future.

It probably makes sense to wrap any <math.h> function calls with my own inline functions. That way I can get the code up an running without having to create an entire math library from scratch. Then I can profile it and only optimize the functions that are causing performance problems. I know that optimizing too early can cause all kinds of problems, but so can charging ahead without a plan. I'm trying to find a balance between the two.

I like the idea of using the GPU to do the processing. I think I'll look into that a little more, but I might end up building the user interface on DirectX, so my application may end up needing all of the graphics card's power to do graphics. I guess I could have the audio engine support a few different options and have the app choose between them at startup.

So many choices! Sometimes I miss writing BASIC on my Atari 400. That was a simpler time.
Atari 2600 Homebrew Author - Someone who is smart enough to learn 6502 assembly language but dumb enough to actually want to use it.
Advertisement
Quote: Original post by Chadivision
It probably makes sense to wrap any <math.h> function calls with my own inline functions. That way I can get the code up an running without having to create an entire math library from scratch. Then I can profile it and only optimize the functions that are causing performance problems.

I only skimmed this thread, so perhaps I missed something obvious, but why on Earth would you do that?
First, why would you need to create a math library from scratch? Isn't that what math.h provides? (oh, and you should use the cmath header instead, in C++)
Why the need to wrap it at all?
And second, what makes you think you're able to optimize the standard library math functions? They're already pretty efficient, and on MSVC at least, most of them map directly to compiler intrinsics, so they won't even be compiled as function calls, but directly mapped to assembler instructions.
I could be wrong, but I was under the impression that certain math functions can be rewritten using SSE in order to get better performance. What I was thinking was to wrap the math functions so that, after profiling and determining where the performance issues are, I can rewrite my math function using SSE (instead of using it to call the math function). That way I could optimize the function but not have to track down and change ever single function call in all of my code.

But I really don't know that much about SSE at this point, so I could be way off base here.
Atari 2600 Homebrew Author - Someone who is smart enough to learn 6502 assembly language but dumb enough to actually want to use it.
Quote: I could be wrong, but I was under the impression that certain math functions can be rewritten using SSE in order to get better performance


You can get better performance by using SSE, but that won't come from simply replacing calls to basic maths functions to ones that use SSE (Chances are the versions your compiler uses may end up being inline SSE anyway). To gain extra performance using SSE you'd need to rewrite the algorithms that are calling the basic maths functions to use SSE.
That makes sense. Thanks for clearing that up.

Also, the more I think about it, I think I'm going to use a mixture of floats and doubles. Even if I get comparable speeds, going to all doubles is still doubling the amount of memory, which would limit the number of synthesis modules that could be used in a patch.

There are some types of modules that could benefit from using doubles, but most of them would only need to use doubles internally and could then convert the output to a float. Or I could write two versions of some modules--one that outputs a float and one that outputs a double.

So I think I'll design a flexible system that can handle both and then let the user decide which is appropriate for the situation.
Atari 2600 Homebrew Author - Someone who is smart enough to learn 6502 assembly language but dumb enough to actually want to use it.

This topic is closed to new replies.

Advertisement