Clock cycle cost of inlining
I''m wondering about how much overhead a function really has. Does anyone knoe what a function as follows will save in clock cycles when inlined?
unsigned long CastToFloat(const float Value);
Is calling a fucntion from a function pointer more expensive that a normal function? My thought is that both functions will live at an address so will take the same execution time...
Many thanks
Chris
Chris Brodie
February 21, 2001 02:06 AM
When inlineing a function some time will be saved by removing the function entry/exit code (a few push and pop instructions). Overall, there''s no point of inlineing a function unless it''s in an innermost loop (in which case, it can save you a large amount of time).
As for function pointers, the timeing sholud be (almost) exactly the same. The only difference is the addressing mode used for the call instruction (CALL immediate vs. CALL register).
As for function pointers, the timeing sholud be (almost) exactly the same. The only difference is the addressing mode used for the call instruction (CALL immediate vs. CALL register).
In addition to saving a few cycles the compiler can generate
more sophisticated code when inlining is used. For example
the contents of an innermost loop can be unrolled, or copy
of argument/return variables can be removed, which can save
significantly more cycles. These variables then may be stored
in a register again increasing speed.
In my projects the speed up using inline throughout versus
optimizing code without inlining is something between
factor 3 and 10. Even more in special cases, when it comes
to matrix-multiplication or so.
more sophisticated code when inlining is used. For example
the contents of an innermost loop can be unrolled, or copy
of argument/return variables can be removed, which can save
significantly more cycles. These variables then may be stored
in a register again increasing speed.
In my projects the speed up using inline throughout versus
optimizing code without inlining is something between
factor 3 and 10. Even more in special cases, when it comes
to matrix-multiplication or so.
I was able to call a function taking no parameters, doing nothing and returning nothing about 46.6 million times a second on a 450MHZ Pentium III. Passing three integers and returning an integer, but otherwise doing nothing, dropped that to about 30.7 million times a second. The call overhead is relatively small, but if you are doing it a large number of times it can make a big differance. If you where calling a function once per pixel at 1024X768 and 30FPS that would be 23.5 million calls a second and that would be a minimum of half of your time spent on call overhead. The other optimizations that could be done by the compiler due to optimization should really be done by you in optimizing the routine, i.e. if the call could be moved outside the loop then you should move it outside the loop rather than counting on the optimizer to do it.
Keys to success: Ability, ambition and opportunity.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement