Advertisement

Worst profiler results

Started by January 22, 2014 01:39 AM
9 comments, last by frob 10 years, 9 months ago

Question is pretty straight-forward:

You just finished writing a bit of code, and it doesn't quite meet your performance demands, so you run it through a profiler to find out where you need to trim fat. What are you really hoping doesn't show up in your profiler results?

For me, I just got the result that memory allocation is accounting for roughly 75% of my runtime. This really is one of those things that I hope not to see, mostly because it's one of those kinds of problems that can't really be addressed in one place, and is a sign that everything everywhere needs changed.

I just got the result that memory allocation is accounting for roughly 75% of my runtime.

Dynamic memory allocations are expensive. Heap fragmentation is devastating. Analyze your objects' lifetimes carefully. Use the stack where it makes sense (whenever an object's lifetime is FILO ordered), object pooling where it makes sense, and avoid frequent heap allocations by reserving the size of containers before using them.
Advertisement

I have one fear when profiling. I fear that when I run it, nothing interesting shows up. Not hot spots, no obvious mistakes, no clear points to tackle. Just a nice even trace with lots of things taking up small amounts of time to do small things. Because optimizing that is the stuff of nightmares.

Also, any time I see large amounts of time spent in middleware or especially the graphics driver is likely to be hell.

SlimDX | Ventspace Blog | Twitter | Diverse teams make better games. I am currently hiring capable C++ engine developers in Baltimore, MD.


You just finished writing a bit of code, and it doesn't quite meet your performance demands, so you run it through a profiler to find out where you need to trim fat. What are you really hoping doesn't show up in your profiler results?
FifoFullCallback sad.png

You just finished writing a bit of code, and it doesn't quite meet your performance demands, so you run it through a profiler to find out where you need to trim fat. What are you really hoping doesn't show up in your profiler results?

Seeing nothing big is a serious one for me. I've done a lot of profiling and instrumenting of code, especially on some code bases destined for 20MHz and 66MHz processor systems. You can pick out the low hanging fruit pretty quickly.

What is devastating to me is finding hundreds or even thousands of tiny things, each with a cost that is only slightly too high, and each requiring a manual effort to fix.


For me, I just got the result that memory allocation is accounting for roughly 75% of my runtime. This really is one of those things that I hope not to see, mostly because it's one of those kinds of problems that can't really be addressed in one place, and is a sign that everything everywhere needs changed.

That one is usually pretty simple. Count yourself lucky.

Assuming your profiler lets you sort by caller you can usually navigate quickly up the tree to find a small number of serious offenders. If the profiler itself doesn't do that (most do) then export your data into a big spreadsheet with calling data, use an Excel pivot table and play with it until you discover the offenders.

If this is the first pass through the code base there are a few common patterns. One common pattern is not reserving space in dynamic arrays (such as std::vector), instead simply adding items one at a time causing a large number of resizes. Usually these are evidenced by a brief stutter as the system drops a frame or two doing twenty thousand allocations. Another is the frequent creation of temporary objects, and since it is a performance concern it is likely happening in a relatively tight loop, so it is probably just a single location where you need to adjust object lifetimes. It may be a resource that is frequently being created in destroyed where a cache or persistent buffer would help. Or it could be a memory leak, premature object release and allocation, or similar problem.

All of those problems can be quickly identified with a good profiler. With 75% of your time spent in the allocator it should be glaringly obvious from the profile which functions need examination.

Man, allocation is one of my favorite profiler results, because it's basically trivial to change to better allocation strategies without rewriting a ton of code. You know exactly what you need to hit and you should ideally also know its usage patterns well enough to know immediately how to pick a better allocation scheme. It's like free performance.

Algorithmic improvements are a little worse, because they require hitting more code to improve; but they usually are only painful to me in the sense that they mean I made a dumb implementation choice up-front.

Past that is micro-optimization, where I have to do fiddly stupid things to try and squeeze out a few thousand cycles here and there, deep in some hot inner loop or something.

But I'm in agreement with earlier posters in that seeing nothing is by far the worst. A similar cousin is seeing only calls that block inside the kernel, such as waiting on mutexes in a multithreaded program. Seeing only blocking calls/suspended threads means you're going to have a nasty time finding the actual performance problem, because your wall-clock performance is dominated by not doing anything.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Advertisement

I agree, seeing nothing would be worse.

The reason that I hate seeing 'new' float to the top of stuff is mostly because it almost guarantees me having to make a bunch of tiny tweaks in a bunch of tiny places to implement either a pooling mechanism, or a more intelligent copy-by-value mechanism for composite classes. The second one is that which is currently killing me. It either ends up meaning a few high-level structural changes to things, or a whole bunch of small and annoying changes in a bunch of places.

Normally my first pass on code is sort of written to the algorithmic complexity of the problem at hand, and not really taking into consideration performance much. Seeing "new" at the top of my profiler hot-spot list is pretty much the computer telling me "no", and demanding that I now mire myself in the fine-grained performance details. Admittedly, there are worse things the profiler can tell me, but this is always an annoying one that tends to mean far-reaching changes.

Yes though, seeing nothing is definitely worse. Once got a profiler result that had not a single hit over 0.1% without counting children in the call graph. That one turned out to be not so bad because there were some very high-level changes that could be made that made a big impact, but it wasn't obvious from the profiler results.

Sounds like some others are working on embedded systems, in which case the profiler can't hurt you any more than you're already used to being hurt on a daily basis. Embedded systems are so painful in general. Did some work on FPGA & cpu hybrid systems a while back, and it took so much work to get even the smallest things done.

If you're using "new" that heavily, you're probably doing something wrong.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

There are three situations I fear:

  1. The one mentioned: Lots of small things adding up (aka nothing "big" to focus on). This is by far fear #1
  2. Everything is a disaster: There are so many "big" things, the code is so badly written it's just better to rewrite from scratch. It's very similar to #1 (if everything's big, nothing's big). But the situation is so bad, it needs to be put in its own category.
  3. The "big" things can't be optimized further: Something big is showing up on the profiler, but it is already cache friendly, SIMD optimized, and multithreaded. The reason it's showing up is because... well, it's the nature of the job being performed. There's just too much data to process. The only solution is to search for other algorithmic optimizations, but you have a hard time thinking of anything better than the algo being used. Fortunately this one is really hard to happen, because rarely I see code so well written and designed. Where I see this problem the most is in emulators.
I'd agree with all the 'nothing' and 'lots of small things' above but I'd also like to add one more to the mix; something that you caused ;)

This topic is closed to new replies.

Advertisement