Introduction to Software Optimization

Published August 21, 2014 by Bartlomiej Filipek, posted by Fen
Do you see issues with this article? Let us know.
Advertisement

As a software/game developer, you usually want more and more... of everything actually! More pixels, more triangles, more FPS, more objects on the screen, bots, monsters. Unfortunately you don't have endless resources and you end up with some compromises. The optimization process can help in the reduction of performance bottlenecks and it may free some available powers hidden in the code.

Optimization shouldn't be based on random guesses: "oh, I think, if I rewrite this code to SIMD, the game will run a bit faster". How do you know that "this code" makes some real performance problems? Is investing there a good option? Will it pay off? It would be nice to have some clear guide, a direction.

In order to get some better understanding on what to improve, you need to detect a base line of the system/game. In other words, you need to measure the current state of the system and find hot spots and bottlenecks. Then think about factors you would like to improve... and then... start optimizing the code! Such a process might not be perfect, but at least you will minimize potential errors and maximize the outcome.

Of course, the process will not be finished with only one iteration. Every time you make a change, the process starts from the beginning. Do one small step at a time. Iteratively.

At the end your game/app should still work (without new bugs, hopefully) and it should run X times faster. The factor X, can be even measured accurately, if you do the optimization right.

The Software Optimization Process

According to this and this book, the process should look like this:

  1. Benchmark
  2. Find hot spots and bottlenecks
  3. Improve
  4. Test
  5. Go back
optprocess.png

The whole process should not start after the whole implementation (when usually there is no time to do it), but should be executed during the project's time. In case of our particle system I tried to think about possible improvements up front.

1. The benchmark

Having a good benchmark is a crucial thing. If you do it wrong then the whole optimization process can be even a waste of time.

From The Software Optimization Cookbook book:

The benchmark is the program or process used to:
  • Objectively evaluate the performance of an application
  • Provide repeatable application behavior for use with performance analysis tools.

The core and required attributes:

  • Repeatable - gives the same results every time you run it.
  • Representative - uses large portion of the main application's use cases. It would be pointless if you focus only on a small part of it. For a game such a benchmark could include the most common scene or scene with maximum triangles/objects (that way simpler scenes will also work faster).
  • Easy to run - you don't want to spend hours setting up and running the benchmark. A benchmark is definitely harder to make than a unit test, but it would be nice if it runs as fast as possible. Another point is that it should produce easy to read output: for instance FPS report, timing report, simple logs... but not hundreds of lines of messages from internal subsystems.
  • Verifiable - make sure the benchmark produces valid and meaningful results.

2. Find hot spots and bottlenecks

search2.png

When you run your benchmark you will get some output. You can also run profiling tools and get more detailed results of how the application is performing.

But, having data is one, but actually, it is more important to understand it, analyze and have good conclusion. You need to find a problem that blocks the application from running at full speed.

Just to summarize:

  • bottleneck - place in the system that makes the whole application slower. Like the weakest element of a chain. For instance, you can have a powerful GPU, but without fast memory bandwidth you will not be able to feed this GPU monster with the data - it will wait.
  • hot spot - place in the system that does crucial, intensive job. If you optimize such a module then the whole system should work faster. For instance, if CPU is too hot then maybe offload some work to GPU (if it has some free compute resources available).

This part may be the hardest. In a simple system it is easy to see a problem, but in large-scale software it can be quite tough. Sometimes it can be only one small function, or the whole design, or some algorithm used.

Usually it is better to use a top-down approach. For example:

Your framerate is too low. Measure your CPU/GPU utilization. Then go to CPU or GPU side. If CPU: think about your main subsystems: is this a animation module, AI, physics? Or maybe your driver cannot process so many draw calls? If GPU: vertex or fragment bound... Go down to the details.

3. Improve

tools_bw.png

Now the fun part! Improve something and the application should work better :)

What you can improve:

  • at system level - look at utilization of your whole app. Are any resources idle? (CPU or GPU waiting?) Do you use all the cores?
  • at algorithmic level - do you use proper data structures/algorithms? Maybe instead of O(n) solution you can reduce it to O(log n) ?
  • at micro level - the 'funniest' part, but do it only when the first two levels are satisfied. If you are sure, that nothing more can be designed better, you need to use some dirty code tricks to make things faster.

One note: Instead of rewriting everything to Assembler use your tools first. Today's compilers are powerful optimizers as well. Another issue here is portability: one trick might not work on another platform.

4. Test

After you make a change test how the system behaves. Did you get 50% of the speed increase? Or maybe it is even slower?

Beside performance testing, please make sure you are not breaking anything! I know that making systems 10% faster is nice, but your boss will not be happy if, thanks to this improvement, you introduce several hard-to-find bugs!

5. Go back

reload_bw192.png

After you are sure everything works even better than before... just run your bechmark and repeat the process. It is better if you make a small, simple change, rather than big, but complex. With smaller moves it is harder to make a mistake. Additionally, it is easy to revert the changes.

Profiling Tools

Main methods:

  • custom timers/counters - you can create a separate configuration (based on Release mode) and enable a set of counters or timers. For instance, you can place it in every function in a critical subsystem. You can generate call hierarchy and analyse it further on.
  • instrumentation - tool adds special fragments of code to your executable so that it can measure the execution process.
  • interception - tool intercepts API calls (for instance OpenGL - glIntercept, or DirectX) and later on analyses such register.
  • sampling - tool stops the application at specific intervals and analyses the function stack. This method is usually much lighter than instrumentation.

Below is a list of professional tools that can help:

  • Intel(R) VTune(TM) Amplifier
  • Visual Studio Profiler
  • AMD CodeXL - FREE. AMD created a good, easy to use, profiling tool for CPU and GPU as well. Does the best job when you have also AMD CPU (that I don't have ;/) but for Intel CPU's it will give you at least timing reports.
  • ValGrind - runs your app on a virtual machine and can detect various problems: from memory leaks to performance issues.
  • GProf - Unix, uses a hybrid of sampling and instrumentation.
  • Lots of others... here on wikipedia

Something more

Automate

I probably do not need to write this... but the more you automate the easier your job will be.

This rule applies, nowadays, to almost everything: testing, setup of application, running the application, etc.

Have Fun!

The above process sounds very 'professional' and 'boring'. There is also another factor that plays an important role when optimizing the code: just have fun!

You want to make mistakes, you want to guess what to optimize and you want to learn new things. In the end, you will still get some new experience (even if you optimized a wrong method).

You might not have enough time for this at your day job, but what about some hobby project?

The more experience with the optimization process you have, the faster your code can run.

References

Article Update Log

17th August 2014: Initial version, based on post from Code and Graphics blog
Cancel Save
0 Likes 5 Comments

Comments

Aardvajk
Good, accessible overview. Nicely written and structured. Thanks.
August 20, 2014 07:17 PM
Navyman

Well written.

August 20, 2014 11:52 PM
NightCreature83

Are you going to follow this up with an introduction on how to use the profiling tools in a game related settings?

I am asking this because I have tried CodeAnalyst and VTune before, but always fell back to SN tuner for PS3 because it would actually allow me to do sampling within a frame instead of continous over the time. I have to say that I only used the PC tools a long time ago and because of their evolutions might now be able to do what I want.

Just had a quick look at the VTune features and this is now indeed capable of detecting frame flushes :). Still might be a good article to have to know how to set this up in the other profilers out there. Seeing we are generally more interested in seeing why a particular frame sequence is slow, instead of seeing where most of our time is spent in the application.

August 21, 2014 11:40 AM
Fen
Fen

@all - thanks for you positive feedback!

@NightCreature83 Actually, I will not spend much time writing about 'profiling tools'. Next article will be about some compiler tweaks, then how to use SIMD. So, it is a basic stuff. Maybe next time, with some other example.

VTune is awesome, but unfortunately, costs a decent amount of money.

August 27, 2014 08:52 AM
NightCreature83

I found out how you can make VTune recognise frame programmatically and will allow you to make more sense of the VTune data in the context of your game. https://software.intel.com/en-us/articles/use-itt-frame-apis-from-intel-vtune-amplifier-xe-2011-to-analyze-each-frame-in-critical

September 13, 2014 11:06 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement