Advertisement

Code optimisation

Started by January 28, 2000 06:47 AM
13 comments, last by Spura 24 years, 10 months ago
I have much problems with the speed of my code. I am writing a Pacman clone in Delphi, but the code is very slow, especially graphics. I gave the code to one better programmer , and he made it six times as fast as it was before. The difference between us is that he knows more of system stuff and which things are faster. Stuff like: if you load bitmaps into TBitmap object, the system loads it from file every time I use it, but if you load it from resource file it wstays in memory. Could you tell me about general things, which are faster than others. Or specify on-line document about this.
1) C/C++ is faster than Delphi.
2) DirectDraw is faster than GDI.

Advertisement
Yes, I am aware of this, but I meant stuff not related to specific language or library, stuff like various sentence structures and comparison of speed of basic variable types(shortint,pointer,longint,fixed point,floating point) etc.
Hmmm, it's really hard to say exactly. There's no one way to optimize code, and there are many different levels you can do it on. I'll try to explain a few common techniques used in C/C++ code. I don't know how well they'll go over to Pascal.

At the top level, make sure that you are only moving as much data around as needed. Pointers can be a good way to pass structures to a function, since you only have to copy the pointer to the stack. Also, function calls add a little overhead, so don't write a function for every little thing (macros or inline functions in C++ also work).

At a lower level, make sure that your data is alligned on 32 bit boundries. This makes memory moves faster. Also, only use 32 bit data types. Fixed point can be good, but it's not as useful as it used to be due to faster floating point operations in today's processors. It's still faster for addition and subtraction. Don't forget about bit shifts with integers. You can very quicky multiply and divide by powers of two this way.

At the lowest level, there's always assembly, but you don't what to go overboard here. Assembly should really only used for the most speed critical routines, since most compilers can do a pretty good job with normal code.

Edited by - I-Shaolin on 1/28/00 1:36:19 PM
quote:
C/C++ is faster than Delphi.

Actually the Borland C++ compiler and Object Pascal compiler have the same back-end. That is to say, they share intermediate code representation. There is no speed difference between C++ and Object Pascal. Perceived speed differences in Delphi versus other Windows Programs based in C++, come more from the fact that the VCL library Delphi uses for its windows functions is slower than equivalent functionality granted by, for example, MFC. For proof benchmark a C++ Builder program versus a Delphi program. They both share the VCL library for their windows functionality.

By the same logic, as long as you can translate I-Shaolin''s comments into Pascal equivalent form, you can still benefit.

Also don''t forget things like table driven logic, and taking advantage of temporal and spatial locality in code and data.
quote: Original post by SiCrane


Also don''t forget things like table driven logic, and taking advantage of temporal and spatial locality in code and data.


Could you explain what is table driven logic,temporal and spatial locality in code and data.
Advertisement
Temporal and spatial locality in code and data refers to how the cache system works. Your cache assumes that and code or data you just used you are likely to use again. Similarly it assumes that any code or data near the code or data you''ve just used is likely to be used again. To take advantage of this you need to code carefully. For example, if you have a an array of structs with two values, and you want to multiply the first value by 2 and the second value by 5, it would be faster to do both in the same pass of a loop than to do all the first multplies in one loop then all the second multiplies in another loop. That''s because your cache will probably load both values at the same time from memory. For the same reason, arrays work faster than linked lists by more than just the indirection overhead would indicate, because linked-list nodes can be anywhere in memory.

Code works the same way, the cache work best for loops and stretches of uninterrupted code. So swiss-cheesing your code with lots of function calls to functions outside the current object file will incur an overhead. Poor use of goto''s can really louse up your cache as well. However, this is less of an issue, because your compiler can optimize most of this for you.

Table driven logic, essentially is storing results of function calls in arrays (or hash tables) and accessing the array rather than recomputing the function every time. A trivial example: let''s say you need the value of sin and cos for all the angles from 0 to 359 in one degree increments for your program. You can precompute the sin and cos values, and stick them in arrays. Then you never have to call the sin or cos function. AI''s use this technique often when speed is an issue.
quote: Original post by I-Shaolin

At a lower level, make sure that your data is alligned on 32 bit boundries. This makes memory moves faster. Also, only use 32 bit data types. Fixed point can be good, but it's not as useful as it used to be due to faster floating point operations in today's processors. It's still faster for addition and subtraction.


How exactly do you make sure your data is 32 bit boundary aligned?
Only use 32bit data types - So it's better to use DWORD's than CHAR's even for small values?
What exactly is fixed point?

Thanks



Edited by - TUna on 2/1/00 4:23:51 AM
Fixed point maths involves using half the variable for the value before the decimal point half for afterwards.
On most computers it''s not necessary, in fact it''s only genrally used for console programming, such as the PlayStation which doesn''t have a maths co-processor.
Personally I''d say don''t worry about it.

With regard to data structure memory alignment, the way you do it is to place the data in a structure in order so that the compiler can make 32 bit compartments.
The fact is, the compiler will buffer everything smaller than 32 bits up to 32 bits.
For example if you had a struct
MyStuct
{
char a
float b
double c
char d
float e
}
Because a is 8 bit and b is 32 bit the compiler would buffer a to 32 bit adding 24 bits you don''t need. The same would happen to d.
If you put a and d at the begining of the structure you would only lose 16 bits between them to buffering rather the 48 bits you would lose in the original ordering.
You could also put two 8 bit vars and a 16 bit var next to each other and they would all be put into one 32 bit compartment.

I hope this makes sense

Mike
Yeah that does thanks, but how would take increase performance? Just by stopping the L2 cache being waisted with the extra 24bits that aren''t needed?

This topic is closed to new replies.

Advertisement