Interesting!! Very interesting....
I always thought that the first IF statement will execute faster than the last one. Well I was wrong!
The first IF statement is slower than the last one. The last one is the fastest (don''t know why)
Does anyone know the reason behind this?
Eg, (this is taken straight from my test code):
inline float Ticker::Elapsed(int which)
{
if (which == 1) {
QueryPerformanceCounter(&temp);
return (temp.QuadPart - now.QuadPart) * res;
}
if (which == 2) {
QueryPerformanceCounter(&temp);
return (temp.QuadPart - now.QuadPart) * res;
}
if (which == 3) {
QueryPerformanceCounter(&temp);
return (temp.QuadPart - now.QuadPart) * res;
}
return 0.0;
}
Now, if you called Elapsed(3), this would execute faster than Elapsed(1) or Elapsed(2). Why? I don''t know!!
sig u say?
Yes! There are kangaroos in Australia but I haven't seen them...yet
Are you ever calling it with a value other than three? If not, your compiler is probably optimizing the other two cases away altogether.
no, I used 1,2 & 3.
Also, there was no difference when I used Debug mode (from Release Mode)
Also, there was no difference when I used Debug mode (from Release Mode)
Yes! There are kangaroos in Australia but I haven't seen them...yet
November 08, 2000 07:11 PM
quote: Original post by Xtreme
I always thought that the first IF statement will execute faster than the last one. Well I was wrong!
The first IF statement is slower than the last one. The last one is the fastest (don''t know why)
Does anyone know the reason behind this?
My guess, without seeing more of the code or the machine code, is branch prediction. To keep performance high, modern pipelined processors will predict which way a branch (that is, an "if" statement) will go and prepare the code for that.
If the chip''s prediction is correct, then the cost of the branch instruction is zero or almost zero. However, if the branch prediction is wrong, the cost of the branch instruction will be no more (or possibly slightly more) than if the processor didn''t use branch prediction.
Realize that without branch prediction (or when the guess is incorrect), things can get real slow. Why? The processors are pipelined; this means that while one instruction is executing, the next instruction is being decoded and the one following that is being loaded. There can be just a few stages or many, depending on the chip design.
When a branch is taken that the processor didn''t expect, it has to empty the pipeline, since the wrong instructions are in it. It then restarts the pipeline at the right location, but it can take several pipeline stages to refill it again.
Anyway, to make a long story short, my guess is that in your particular instance, the first call is predicting incorrectly, but future guesses are correct and therefore faster.
November 08, 2000 07:11 PM
Why are you so certain that if(which==3) ends up last in the optimized object code? A lot of optimizers will reorder code.
To speed it up more, just go
Using the else if should speed it up. The others are never even evaluated if the preceding is true. Also by putting the one that is most likely to be true first you can save time running through the else if''s.
"We are the music makers, and we are the dreamers of the dreams."
- Willy Wonka
if (which==1){// do whatever}else if (which==2)......
Using the else if should speed it up. The others are never even evaluated if the preceding is true. Also by putting the one that is most likely to be true first you can save time running through the else if''s.
"We are the music makers, and we are the dreamers of the dreams."
- Willy Wonka
I showed it to my friend who also predicted it may be faster because of branch prediction. But, it that were the case, wouldn''t the branch prediction choose the 1st if statement rather than the last?
We tried it in debug mode, and stepped into it and it seems like
it is not jumping into the 3rd if statement at all (which is not unusual).
selecting 2 instead of 3 is slower too.
Bitblt, surprise surpise!! I just tried using your method:
if () {} else if () {} else if ()...
and it turns out to be SLOWER than only ifs without the else''s.
Don''t believe me? Here is the code for what you suggested:
inline float Ticker::Elapsed(int which)
{
if (which == 1) {
QueryPerformanceCounter(&temp);
return (temp.QuadPart - now.QuadPart) * res;
}
else if (which == 2) {
QueryPerformanceCounter(&temp2);
return (temp2.QuadPart - now2.QuadPart) * res;
}
else if (which == 3) {
QueryPerformanceCounter(&temp3);
return (temp3.QuadPart - now3.QuadPart) * res;
}
return 0.0;
}
This is slower not faster!
We tried it in debug mode, and stepped into it and it seems like
it is not jumping into the 3rd if statement at all (which is not unusual).
selecting 2 instead of 3 is slower too.
Bitblt, surprise surpise!! I just tried using your method:
if () {} else if () {} else if ()...
and it turns out to be SLOWER than only ifs without the else''s.
Don''t believe me? Here is the code for what you suggested:
inline float Ticker::Elapsed(int which)
{
if (which == 1) {
QueryPerformanceCounter(&temp);
return (temp.QuadPart - now.QuadPart) * res;
}
else if (which == 2) {
QueryPerformanceCounter(&temp2);
return (temp2.QuadPart - now2.QuadPart) * res;
}
else if (which == 3) {
QueryPerformanceCounter(&temp3);
return (temp3.QuadPart - now3.QuadPart) * res;
}
return 0.0;
}
This is slower not faster!
Yes! There are kangaroos in Australia but I haven't seen them...yet
Personally I wouldn''t rely on your measurements. It is fine and great that you have a high resolution timer, but that doesn''t mean your measurements are repeatable. The call to the function most likely took far longer than what you are trying to measure and syncronization between the function and the hardware that maintains the counter most likely produces an extreme variance. You have to be able to repeat the measurement to within +/- a very small percentage, a couple of percent, to have any confidence your measurement isn''t completely wrong. Perhaps your timeslice ended between one query and the next.
If you want to measure a very small time you have to measure a larger one and take an average. Execute it a few thousand or million times. Measure how long the loop without what you are measuring took and subtract it from your measurement then divide by the number of times you executed it. That is a far better measurement of how long it took. One call to QueryPerformanceCounter per second has a negliable impact on your measurement. Once per 10 or 20 clock cycles is an entirely differant story. Repeat your measurement ten times and see how much it varies. Which one of those is actually right?
It does make any sense that three if statements are faster than one. The only reason it seems that way is the inaccuracy of your measurement. You might be able to repeat it to some degree, but if so it is because of induction variables, caching or some other issue and not because three ifs run consistantly faster than one.
If you want to measure a very small time you have to measure a larger one and take an average. Execute it a few thousand or million times. Measure how long the loop without what you are measuring took and subtract it from your measurement then divide by the number of times you executed it. That is a far better measurement of how long it took. One call to QueryPerformanceCounter per second has a negliable impact on your measurement. Once per 10 or 20 clock cycles is an entirely differant story. Repeat your measurement ten times and see how much it varies. Which one of those is actually right?
It does make any sense that three if statements are faster than one. The only reason it seems that way is the inaccuracy of your measurement. You might be able to repeat it to some degree, but if so it is because of induction variables, caching or some other issue and not because three ifs run consistantly faster than one.
Keys to success: Ability, ambition and opportunity.
If you are interested where cache is concerned you have to either prime or flush cache, depending on what you are trying to do, to get accurate and repeatable measurements. Priming is doing it once before you measure. Flushing is overrunning the cache before you start measuring. As an example with the disk cache a prime would be to read a small file once then measure how fast you can read it after you have forced it into cache to measure how fast you can read from cache. If you wanted to see how fast you could read from disk on the other hand you would read a file repeatedly where the entire file cannot possibly fit in memory. The first read fills the cache with pages from the end of the file then the second read is forced to get the first page of the file from disk because the end of the file is what is still in cache. A simplified view, but if you really want to get into performance tuning it is enough to get you started in the right direction. You actually have to know a little about how the caching algorithm works to be sure you flushed or primed it, but brute force works pretty good most of the time.
Keys to success: Ability, ambition and opportunity.
LilBudyWizer, do you mean put it in a loop and then find the average? like:
int i=0;
float times[10];
for (i=0; i<10; i++)
times=Elapsed(3);
and then find the average of these?
Coz, I repeated it about 5 times and almost got the same result.
Here are the stats:
For Elapsed(3):
either 0.0050 or 0.0042 milliseconds.
For Elapsed(2):
either 0.0092 or 0.0101 milliseconds.
For Elapsed(1):
either 0.0134 or 0.0142 milliseconds.
But clearly, Elapsed{3) is still faster.
int i=0;
float times[10];
for (i=0; i<10; i++)
times=Elapsed(3);
and then find the average of these?
Coz, I repeated it about 5 times and almost got the same result.
Here are the stats:
For Elapsed(3):
either 0.0050 or 0.0042 milliseconds.
For Elapsed(2):
either 0.0092 or 0.0101 milliseconds.
For Elapsed(1):
either 0.0134 or 0.0142 milliseconds.
But clearly, Elapsed{3) is still faster.
Yes! There are kangaroos in Australia but I haven't seen them...yet
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement