Every Game Should Have an In-Code Profiler
In-code profiling has really fallen out of style in recent years. The wide availability of non-intrusive profiling tools really took the steam out of building profiling into your code. For basically zero effort, you can get a full breakdown of where your code is spending time, how much time is being spent (including or excluding children), and all sorts of other neat statistics. Intrusive profiling tools give you even more information: function hit counts and average time spent per hit, for example. Despite all that, these tools are not adequate. Why not?
One of the more common performance bugs is stuttering. Basically, some frames are fast enough, and others are not fast enough. How are you going to diagnose and repair an app that is stuttering? Your profiler output is worthless, because it's gone and averaged the fast frames with the slow ones, destroying information about the slow frames in the process. Moreover, you do not care about the information in fast frames. A frame that slides in under that 16ms boundary is not interesting. Even considering it is a waste of time. It's the slow frames you want to take a look at, and you don't want to average those together either. The fact of the matter is that in order to effectively study what is happening, we need to isolate frames, profile them separately, and discard the results when a frame finishes quickly.
While there's no technical reason a profiler couldn't provide an API to support this type of usage pattern, none that I'm aware of actually do so. VTune allows you to pause and resume sampling, but that doesn't really help, because you can't throw away results from a frame, or isolate frames. And good luck finding a profiler that supports the necessary API calls on every platform you need to target. It's clear, then, that an in code profiler is a necessity for effectively tuning a game for performance.
It's common to use the excellent memory tracker written by Paul Nettle to find memory leaks in C++ code. Unfortunately, I was unable to find a similar library for in-code profiling. (If any of you know one, please comment. [EDIT] I'm told that Game Programming Gems has one by Scott Bilas.) I don't think it would've mattered if I found one, though; I do not think the fine coders at CodeProject are likely to write one that is adequate. What defines adequate? Well, this is a very game centric bit of coding, so it's important to be aware of the complexities of a modern game:
* As I mentioned before, we need frame-to-frame measurements, with the ability to discard uninteresting data and keep the rest. This is also a lot of data. Simply vomiting it out to a text file is not good enough, unless it's a format that can be parsed into a more effective system (a database maybe).
* The definition of "frame" differs depending on what part of the game you're looking at. We need to be able to profile gameplay, graphics, physics, etc separately, since they will frequently be running at different frequencies. Besides, it's helpful to be able to see a per-subsystem breakdown of where time's being spent at a high level. The reporting needs to support this as well.
* We're threading games heavily now. The profiler needs to be thread safe and thread aware, and it shouldn't mix the results from each thread together. We also have to consider that the same subsystem may well be using multiple threads, which adds another bit of complexity to our reporting. In other words, every function call that is recorded needs to be tagged with both its thread and its subsystem.
* You want to collect results during QA and playtesting as well. Writing to a hardcoded C:\perf.log file isn't anywhere near good enough here. We need support for sending the results over a network to servers that can accumulate and parse the data.
* Data is needed over extended sessions as well (especially after the game's been running a couple hours and your heap is getting a little akward), and depending on the format of the game, you may want to split up the data depending on what level/map is running.
* Related to the above, a timestamp relative to when the game started is necessary, both as a frame count and as a human time.
* For extreme sophistication, you might want a sampling based profiler included that can look up symbols or map files. That will allow you to diagnose where time is being spent even outside your code, mainly in Windows components. (Mixing this in with an intrusive stack tracing profiler could be complicated though.)
I'm sure there's more (comment!), but those are the ones that immediately come to mind. Some of this stuff is pretty high end; most indies aren't going to need network based reporting so that they can get perf data from remote testlabs. It's the sort of thing I like to keep in mind though. Still, I think the absolute basics should be built into everyone's code from the beginning, rather than being retrofitted in. I've seen enough posts on these forums by people trying to figure out why their game is stuttering or otherwise slow.
This is actually sort of a new revelation for me. Up until recently, I was perfectly happy to use the conventional profiling tools. Then it came to actually analyzing performance at work, and suddenly it hit me like a ton of bricks. Conventional profiling is practically useless, because of the averaging effect. How on earth are you going to find out what made some arbitrary frame slow? What if it's only on frame out of every hundred that is off? You're completely in the dark with something like VTune. In-code is really the only way to go.