Advertisement

Optimizations!!!!

Started by November 14, 2000 04:28 PM
2 comments, last by Bully 24 years, 1 month ago
Everyone has their own optimization tricks. Why don''t you share it with us. Feel free to discuss anything from memory management to making the transform and lighting pipeline more efficent. Alrighty then. -David
" The fastest code is the code you don't call "
cheat.
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
Advertisement
When using C++ tell the computer only to inline functions you specify. If you let the compiler determine what gets inlined it will only do small functions.

When you inline a function you save on pushing parameters on the stack, a CALL and a RET.

The down side is if you inline a large function into another large function you can loose on some global function optimization that you would normally get.



D.V.

--------------------------
Carpe Diem
D.V.Carpe Diem
Do as little work as possible. The quickest functions are the ones you don''t call. The quickest triangles are the ones you don''t draw.

As for speeding up the transform and lighting pipeline, get out of the API''s way. D3D''s software TnL has specialized pipes written by teams at Intel and AMD. I''ve heard stories of people doing their own TnL, and after months of work and tailor made to their app, it''s only 10% quicker. Quality of OpenGL''s TnL varies from driver to driver, apparently Nvidia''s is very good, as for others . . .
(This only applies if you''re doing a traditional TnL pipe. If you have some extremely specialized one, you can probably do it quicker.)

But for a stupid optimization trick, I''ve done stuff like this:

for(p = start; p != end; p+=8){   int junk = *(p+8);   // Do stuff with first eight entries at p.} 


The dereference of (p+8) causes the CPU to do a hit of the next cache line. While that is waiting on the slow trip out to main memory, the rest of the loop continues its processing (because there is no dependencies.) By the time it gets around to the next iteration, the data it needs for that iteration is in the cache, and we saved some cycles.

BTW, this only works on a CPU''s that have dynamic microarchitectures, like Pentium Pro''s and K6''s on up.

One thing I want to emphasize, don''t optimize early, don''t optimize if its not working, profile your code, and obey Amdahl''s Law (Look it up!)

This topic is closed to new replies.

Advertisement