Support the open source community
sqrt() slow? and Shortest distance, 2 pts in C
Hello, I was just wondering how one very talented physics guru might go about creating a macro/function to find the shortest distance between 2 points (I''d like it to return float):
r^2=(x1-x2)^2+(y1-y2)^2
I''ve heard that standard c''s exp() and sqrt() functions are dead slow and should be avoided when programming games. I have to perform about a combined 35 sqrt''s and exp''s every frame. will this be a problem?
Ciphersoftware website
Ciphersoftware website
Support the open source community
Well, I'm certainly no physics guru, but here goes:
Since you know that the exponent is always 2 you can easily avoid exp() by observing that X2 = X*X. so you get
r2 = (x1-x2)*(x1-x2) + (y1-y2)*(y1-y2)
sqrt is more difficult to avoid. If you don't actually need the distance, but only want to compare two distances, then compare the squares of the distances. That way you don't need the sqrt. Of course, if you really need the distance, then I guess you're screwed![](wink.gif)
There might be more efficient ways than sqrt() to calculate the square root though, especially if you can tolerate less accuracy.
Edited by - Dactylos on October 19, 2001 2:48:31 AM
Since you know that the exponent is always 2 you can easily avoid exp() by observing that X2 = X*X. so you get
r2 = (x1-x2)*(x1-x2) + (y1-y2)*(y1-y2)
sqrt is more difficult to avoid. If you don't actually need the distance, but only want to compare two distances, then compare the squares of the distances. That way you don't need the sqrt. Of course, if you really need the distance, then I guess you're screwed
![](wink.gif)
There might be more efficient ways than sqrt() to calculate the square root though, especially if you can tolerate less accuracy.
Edited by - Dactylos on October 19, 2001 2:48:31 AM
35 squareroots is not much at all and is perfectly acceptable on nowadays computers. (Do you have any idea how many squareroots 3D games use?) However, sqrt is indeed not such a fast way to compute squareroots. For such a small number of squareroots it might not matter, but for higher numbers you''ll need an alternative.
davepermen posted a nice templated fast squareroot function for 32-bit floats on the Flipcode forum: http://www.flipcode.com/cgi-bin/msg.cgi?showThread=00003300&forum=3dtheory&id=-1
The beauty of it is that you can specify how accurate you want it to be. If it''s to slow, lower the accuracy. Fast but not accurate enough, raise the accuracy.
I''d ask him if you may code his code if I were you.
davepermen posted a nice templated fast squareroot function for 32-bit floats on the Flipcode forum: http://www.flipcode.com/cgi-bin/msg.cgi?showThread=00003300&forum=3dtheory&id=-1
The beauty of it is that you can specify how accurate you want it to be. If it''s to slow, lower the accuracy. Fast but not accurate enough, raise the accuracy.
I''d ask him if you may code his code if I were you.
![](smile.gif)
Dirk =[Scarab]= Gerrits
October 19, 2001 10:19 AM
Hi,
The squareroot found at flicode is more or less standard newton-raphson-iteration for square roots, I think, where in that case, the number of iterations is not fixed. A similar implementation can be found in the Quake III Tools sources, function rsqrt. I don''t think that precision is critical in games (in geometry-preprocessing steps, however, it can be critical). I''ve compared several algorithms on my machine once. The fastest and most precise was the AMD Math Lib''s sqrt, but it processor- and platform specific, and therefore not well suited.
Another one is simply using the FPU:
__asm {
fld [floatvalue]
fsqrt
fstp [somefloat]}
Thats very simple, comparetively precise, and a bit faster than Quake''s reciprocal square root.
Standard C''s sqrt was a bit slower, but in the end, the difference wasn''t big.
I''ve tested that on an AMD Athlon 600MHz. You might want to run your own tests.
However, keep in mind that 1000 standard sqrt''s in DEBUG mode on such a machine take less than 0.000080 seconds...
The squareroot found at flicode is more or less standard newton-raphson-iteration for square roots, I think, where in that case, the number of iterations is not fixed. A similar implementation can be found in the Quake III Tools sources, function rsqrt. I don''t think that precision is critical in games (in geometry-preprocessing steps, however, it can be critical). I''ve compared several algorithms on my machine once. The fastest and most precise was the AMD Math Lib''s sqrt, but it processor- and platform specific, and therefore not well suited.
Another one is simply using the FPU:
__asm {
fld [floatvalue]
fsqrt
fstp [somefloat]}
Thats very simple, comparetively precise, and a bit faster than Quake''s reciprocal square root.
Standard C''s sqrt was a bit slower, but in the end, the difference wasn''t big.
I''ve tested that on an AMD Athlon 600MHz. You might want to run your own tests.
However, keep in mind that 1000 standard sqrt''s in DEBUG mode on such a machine take less than 0.000080 seconds...
If you only want an approx distance between two points, the following works well providing the points are too close (<10 units).
|
October 20, 2001 10:11 AM
Is that fsqrt portable? I mean, do all processor types support it?
If all you have to do is sort objects by distance, you can take out the square root and it will still work. The curve won''t be the same, but objects with greater distances will still have greated "pseudo distances" (that is, without the square root) because, even though the difference in distances between two objects won''t be correct, the fact that a given distance is greater or less than another will be.
A word of warning, cipher.
Dont worry about speed of the code while implementing something for the first time.
First get it implemented.
If its slow, then do some research on better algorithms.
If you are using the fastest known algorithm for the problem only then start optimizing the code.
And even then, first profile, and then optimize the code that needs to be optimized.
Those 35 sqrts per frame are like nothing. In fact, you should worry about speed of the standard C functions only when you are executing them like million times per frame.
Dont worry about speed of the code while implementing something for the first time.
First get it implemented.
If its slow, then do some research on better algorithms.
If you are using the fastest known algorithm for the problem only then start optimizing the code.
And even then, first profile, and then optimize the code that needs to be optimized.
Those 35 sqrts per frame are like nothing. In fact, you should worry about speed of the standard C functions only when you are executing them like million times per frame.
Beelzebub: Won''t the conditional testing in your hypotq() stall the processor?
With good code generation it shouldn''t since there''s an assembler/compiler optimization trick for doing the abs() without any jumps, something along the lines of...
memorys a bit foggy, but I think that''s right - if you imagine temp and value are both CPU registers.
Anyway, back in the old days (when div''s took 50+ cycles), a processor stall was faster.
|
memorys a bit foggy, but I think that''s right - if you imagine temp and value are both CPU registers.
Anyway, back in the old days (when div''s took 50+ cycles), a processor stall was faster.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement