Back to General and Gameplay Programming

local variable == SLOW?

jho · 2000-10-25T23:34:03

I was trying to find the bottleneck of my program which I thought it''s the math calculation that I''m doing but not so. I have my own vector3D and matrix3D class, their member functions get called quite frequently.. 1000 times per second or so. What I''ve noticed is that low level function like this: inline void matrix3D::mult(const vector3D &V, vector3D &result) { vector3D temp; .... } It would be faster if I take out the "vector3D temp;" local variable. My program has increased in speed noticably. Vector3D is a 40byte class. If I take out the math it doesn''t make much difference. I put more local variables of vector3D to test it and it can slow my program to a halt. If I replaced the vector3D class with a struct it''s ALSO noticibly FASTER! Can you confirm if classes are NOT the way to go for writing fast code? Is there something wrong with my vector3D class that''s making it slow to work as a local variable? What to do now? Should I rewrite my vectors and matrix with struct and a set of mult, add, functions like C style? Assembly is not an option for me.

General and Gameplay Programming Programming

Started by jho October 17, 2000 05:41 PM

33 comments, last by jho 24 years, 2 months ago

Stoffel

250

October 18, 2000 01:53 PM

jho:
Hint on how to fix your last problem, insert this into your code:

    void vector3d::mult (const vector3d& rhs, vector3d& result){  // protect against self-reference  ASSERT (&rhs != &result && this != &result);

This will throw an exception if result is the same object as rhs or this, but only in debug builds. (BTW, ASSERT is an MFC macro, but there''s a C++ assert lower-case command as well). Assert is your friend.

Also, I wouldn''t initialize vector3d to (0,0,0) at all. How many null vectors do you use? For example, if you have this code:
vector3d v;
v.x = 1.0; v.y = 2.0; v.z = 3.0;

..then you just wasted cycles setting x, y, and z to 0, 0 and 0--they''re overwritten almost immediately. Instead, make a meaningful constructor:
vector3d (double xval, double yval, double zval)
: x (xval), y (yval), z (zval) { }

..and make your default constructor do nothing. This saves on execution speed, though it leaves you vulnerable to using uninitialized values in your vectors. You just have to be careful.

Wilka

122

October 18, 2000 02:55 PM

ASSERT (or assert) won''t throw an exception, it''ll just give you a "debug assertion failed" message. There''s no point in using ASSERT to check things like this, the check wont be there in release and even in debug you can still hit ''ignore'' and it''ll do it''s stuff anyway. And even if did throw an exception, it seems a bit extreme to do that in this case. You''ll need to have try and catch all over the place.

You should use a normal ''if'' instead of ASSERT (you might want to use assert as well, but don''t use only assert).

Z01

134

October 18, 2000 04:27 PM

Interesting topic. This thread raises some questions I''ve wanted to ask for a while now.

Stoffel: I think I understood what you were saying, but just to be sure: Are you saying that

for() {
char buffer[80];
// do stuff to buffer
}

will execute at the same speed as

char buffer[80];
for() {
// do stuff to buffer
}

You also mentioned that "all variables that are created on the stack have their memory allocated at the same time." When exactly is the memory on the stack allocated for a local variable? For instance, if you have

int aFunction(int x, int y) {
char buffer2[80];
// do some stuff
char buffer3[80];
for () {
char buffer4[80];
// do stuff
}
// do some more stuff
return (someInt);
}

int main(void) {
char buffer1[80];
aFunction(2,3);
}

When are buffer2 and buffer3 allocated? Before the function parameters are popped onto the stack, or after? How about buffer4?

Neophyte: Why are classes with methods slower than those that don''t have methods? I understand the cost associated with a V-table, but why would normal methods slow down a class?

To anyone: Passing by pointer is slower than using globals to pass in parameters, right? As I understand this, it is because of two costs: 1)having to dereference the pointer and 2)having to pass in pointer.

How about passing by reference? Is a reference just a constant pointer so the cost is the same as a passing by pointer?

Thanks for you answers!

jho

Author

122

October 18, 2000 05:02 PM

thanks, I know about the ASSERT checking. I just recently take out my temp variable so I havn''t put in the ASSERT yet. ASSERTs are almost FUN to use

(piece of mind). On platforms without ASSERTs my hands start to shake badly and I have a mild headache

Oh.. I think declaring your variable outside the loops is a good thing so you don''t have to allocate for the variable again and again. I think for neatness sake we are used to use local variables.. but globals has gotta be faster.

Stoffel

250

October 18, 2000 05:12 PM

quote:
Are you saying that

for() {
char buffer[80];
// do stuff to buffer
}

will execute at the same speed as

char buffer[80];
for() {
// do stuff to buffer
}

Yes. There's only one buffer in the function. Since they built-in type char has no constructor, there's no initialization. These two execute with the same number of instructions.

quote:
You also mentioned that "all variables that are created on the stack have their memory allocated at the same time." When exactly is the memory on the stack allocated for a local variable? For instance, if you have...*snip*

(Disclaimer: I don't make it a habit to keep one window open with the disassembly, so forgive me if some of the minor details are inaccurate.)

Then answer is at the beginning of the function. You have to look at what the compiler does at a very low level to understand why.

When you call a function in C (and I believe C++), the compiler generates assembly calls that push each argument to the function onto the stack, and then jumps to the function by modifying the instruction pointer (IP).

The first thing the function does is allocate room on the stack for all local variables in that function. In C, this was really obvious, because you HAD to declare all of your locals at the top of the function before the first statement. This is an indicator of why C is a lower-level language than C++: the syntax of the language very closely resembles what the compiler is going to churn out in assembly.

The compiler's job is to check the function and add up all the sizes of all the local variables. Think of it as saying, "I'm going to go through and add up the sizeof each local object declared in this function in every single control path". When the function starts, it just moves the stack pointer backward by this amount. It's now reserved that much space on the stack for all the local variables to exist.

Another thing it does is replace every single one of your variables with its location in memory in reference to the base pointer. Each of these you can think of as a pointer to the reserved block of memory on the stack we got at the beginning of the function call.

Look at this simple function "func" and it's related source listing:

        void func (){    int x;    x = 5;    int* y;    y = &x    double z;    z = 3.14;}// source listing portionPUBLIC	?func@@YAXXZ					; funcEXTRN	__fltused:NEAR;	COMDAT ?func@@YAXXZ_TEXT	SEGMENT_x$ = -4_y$ = -8_z$ = -16?func@@YAXXZ PROC NEAR					; func, COMDAT; 2    : {	push	ebp	mov	ebp, esp	sub	esp, 80					; 00000050H	push	ebx	push	esi	push	edi; 3    :     int x;; 4    :     x = 5;	mov	DWORD PTR _x$[ebp], 5; 5    :     int* y;; 6    :     y = &x	lea	eax, DWORD PTR _x$[ebp]	mov	DWORD PTR _y$[ebp], eax; 7    :     double z;; 8    :     z = 3.14;	mov	DWORD PTR _z$[ebp], 1374389535		; 51eb851fH	mov	DWORD PTR _z$[ebp+4], 1074339512	; 40091eb8H; 9    : }	pop	edi	pop	esi	pop	ebx	mov	esp, ebp	pop	ebp	ret	0?func@@YAXXZ ENDP					; func_TEXT	ENDS

Notice x, y, and z (_x$, _y$, etc.) are defined as pointers with negative value. It's counting back from the base pointer (ebp) to find these variables.

The first thing the function does is save the previous value of the base pointer (ebp), move the stack pointer (esp) into the base pointer--this will be our point-of-reference to local variables--and then reserve 80 bytes on the stack. I'm not sure why it uses 80 when it only needs 28 bytes to take care of our locals--maybe somebody here knows?

Regardless, those three lines, and the way x, y, and z are defined, allocated memory for all those variables at once. Even though they're not all declared at the same time, that one instruction (sub esp, 80) allocates all the memory needed for all local variables.

Then, it pushes all the registers it's going to use to save their values. Finally, it executes the first statement, x = 5, with "mov DWORD PTR _x$[ebp], 5". As you can see, x exists as a negative offset from the base pointer. And the rest of the function you should be able to piece together yourself.

So, stack variables don't cost you anything in memory allocation, even in loops. However, these are all standard types. If any of these were user types (i.e. user-defined structs or classes), and that type had a construction, you would see a call to the constructor whenever the statement that creates the type exists.

I hope you can see now that it takes the same amount of time to allocate an int[1] and and int[1000] on the stack, that declaring an array inside a loop doesn't cost anything, but that you pay cost of the call to a constructor every single time it's declared in your code, whether the object being constructed is on the stack or on the heap.

Edited by - Stoffel on October 18, 2000 6:15:45 PM

Neophyte

595

October 18, 2000 05:36 PM

Wow. Very interesting thread this is shaping up to be...

Stoffel:
Looking over the code you tested with I notice that while you make the method in the C++-class inline, the similar c-function is not. This will likely influence the speeds you''re seeing. The reason why you''re seeing different speed-comparisons between the debug and the release-builds is probably that there is more debug-overhead to a C++-class than a C-function (was that clear? I actually don''t think it was, but I can''t figure out a clearer way of putting it right now, sorry. Hope you understand what I mean anyway).
That said, I must admit I am surprised by the result, and that I dont''t really have a good way of explaining it...

Anonymous:
Yes, yes. I should think it goes without saying that optimized C++-code is no slower, and likely faster, than unoptimized c-code. There is still no disputing that you are incurring extra overhead when passing a class/struct with a constructor as opposed to a class/struct without a constructor or a pointer (or ref) to an existing class/struct.

Z01:
Alright, I admit it; parts of the post you are referring to was written under a period of intense caffeine-deprivation, and I can''t think of a good reason why normal (not constructor) methods will incurr (sp?) extra overhead.

Stoffel (again):
That was a very interesting post you had there (the ASM-code).
Now, I can read assembly and understand what it does and so on, but I''m not very strong on speed... So I''m wondering, how costly are those ''push''es and ''pop''s that are done at the start and end of a function-call?

-Neophyte

- Death awaits you all with nasty, big, pointy teeth. -

Stoffel

250

October 18, 2000 06:25 PM

((obviously not getting any work done today))

quote:
Looking over the code you tested with I notice that while you make the method in the C++-class inline, the similar c-function is not.

Can you even inline global functions? Since I never use them, I guess I didn''t know.

I believe you''re correct--I''ll leave it to someone else to test.

quote:
So I''m wondering, how costly are those ''push''es and ''pop''s that are done at the start and end of a function-call?

Um, as far as cycles? No idea. I believe that pushing and popping is the fastest thing a processor can do, though, so maybe 1? I''m not THAT low-level. Keep in mind I learned C/ASM on an 8088 platform (embedded system, not even a PC). My info might be a little dated, but the principles are all still there.

BTW, the pushes, pops, stack moving, return, and the stuff I didn''t show you (pushing the IP and loading the function IP in the calling context) are the mystical "function call overhead" people talk about. Every time you call a function you pay for these operations.

Z01

134

October 18, 2000 08:37 PM

Wow! I didn''t expect such a detailed and useful explaination. I understand much(!) better now - thanks Stoffel.

Coconut

122

October 18, 2000 09:01 PM

Hey, I don''t know if this has been posted yet, but you could put a global instance of your Vector3D type, and just use that instance everytime you need to use a generic instance. This way you would not need to allocate and destroy 40 bytes of memory every time you called a function using a temp instance.

Brent Robinson
"What if this is as good as it gets?"

"The computer programmer is a creator of universes for which he alone is the lawgiver...No playwright, no stage director, no emperor, however powerful, has ever exercised such absolute athority to arrange a stage or a field of battle and to command such unswervingly dutiful actors or troops." - Joseph Weizenbaum-Brent Robinson

Houdini

266

October 19, 2000 07:45 AM

quote: Original post by Wilka

ASSERT (or assert) won''t throw an exception, it''ll just give you a "debug assertion failed" message. There''s no point in using ASSERT to check things like this, the check wont be there in release and even in debug you can still hit ''ignore'' and it''ll do it''s stuff anyway. And even if did throw an exception, it seems a bit extreme to do that in this case. You''ll need to have try and catch all over the place.

You should use a normal ''if'' instead of ASSERT (you might want to use assert as well, but don''t use only assert).

Wilka, the point of assertions is to check for BAD CODE written by the programmer. It''s just an extra "safty net" during developement and isn''t needed for release mode. In Stoffel''s example, if it works in debug it WILL work in release, so why would you use an ''if'' statement that just do extra checking on code you already know is correct?

Use ''if'' statements for checking conditions that the user can affect, and use ASSERT for conditions only the programmer can affect.

- Houdini

- Houdini

local variable == SLOW?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

local variable == SLOW?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines