Back to General and Gameplay Programming

Benchmark for Search Algorithms in Common Games Development Data

Erfan Ahmadi · 2019-05-07T08:14:12

I'm writing a benchmark for on data-oriented search algorithms and data structures in full details when there is a high need for efficient searching each frame and about what to choose depending on How cache friendly it can be Algorithm's order Insertion/Deletion frequency (how often data needs to change each frame) I have these Data Structures in mind right now Array (linear) Binary search trees B-trees Red Black Trees Hash Tables Bloom Filters I'm looking for common data used in games or graphics programming that is frequently modified and searched for to complete my benchmark. Data-Oriented Book uses Simple Toy Animation Data to benchmark linear search and binary search but I believe there's some data more challenging.

General and Gameplay Programming Programming C++ Optimization

Started by ahmadierfan99 May 01, 2019 11:12 AM

14 comments, last by lawnjelly 5 years, 6 months ago

frob

46,297

May 04, 2019 07:46 PM

The advice from 45 years ago remains true today. It was an ad-hoc estimate but the point is valid:

"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

Don't make stupid choices or choose known-bad algorithms, but on the other hand, don't sweat it. When the time comes that you need to make changes, don't guess. Measure, then measure again. You'll see where those few painful spots are. When they have been identified, you make changes to the algorithms or structures or use patterns, then measure again.

If you know something is potentially bad write some code that measures the performance and flags it as a problem when it starts to get bad. That often means adding some of your own timing code to see when any particular system is out of budget. I've worked on several projects where each subsystem's updates were wrapped in timers, and there was a short table of processing times. If the subsystem took more microseconds than were permitted a suitable runtime error was generated. Use what works for you, measure what you can, and gain experience on what works and what tends to cause problems.

There are several "usual suspects" for things that are slow, but usually on the first few optimization passes the program suffers from other problems that you don't expect.

ahmadierfan99

Author

104

May 05, 2019 04:10 PM

Quote

When the time comes that you need to make changes

I heard you, I couldn't agree more.

I completely ignored that you have to profile your app constantly and make changes; Because deep down in my mind I had to start writing everything the best version which is pure nonsense and impossible.

I have other questions for you about your experiences on the projects you've worked on

Since you might have had to ship your application on multiple hardwares and operating systems,

1. Which Hardware(s) did you run your app on and how important was that?

I ask you this because in your previous replies I understood an algorithm performance and superiority to other algorithms heavily depends on the hardware.

2. How did you budget your "subsystem's updates"?

3. Was there a time which you had to refer to your code's compiled assembly or referred to CPU Instruction Cycles on your specific hardware to see the actual or estimated time needed to do a specific job?

hyyou

234

May 06, 2019 02:02 AM

10 hours ago, ahmadierfan99 said:

2. How did you budget your "subsystem's updates"?

If you code it yourself, the value is easy to guess.

For example, if physics engine's update usually takes 2-3 ms in current version of my software, it is reasonable to create an assert fail / warning if it will start to takes 6 ms in future versions.

Another example : In my situation, it is reasonable to put that a graphic engine should take no more than 40% of overall game.

Drastically (but not unfortunately), it is an art. I may want to be on a safe-side for an expensive system and put a low threshold number that will warn fast, or I may feel lucky and use a CPU% number = twice of the version that everything looks perfectly fine. The numbers can also be changed later.

My values are based on feeling, approximation, experiment and experience .... like ingredient ratio I used when cook food.

Taste it ... ahh, I mean test it and you will find a figure that you are satisfied with ... I start to be hungry.

frob

46,297

May 06, 2019 11:48 PM

On 5/5/2019 at 11:10 AM, ahmadierfan99 said:

Which Hardware(s) did you run your app on and how important was that?

I ask you this because in your previous replies I understood an algorithm performance and superiority to other algorithms heavily depends on the hardware.

The final hardware is usually minimally important. Most of the things people fear turn out to be unimportant when it comes to performance. People spend a lot of effort (and fear) that an algorithm won't work well on some specific system, but in practice, usually it isn't an issue at all.

But sometimes the hardware is critically important. Sometimes there are things on the critical performance path and one system happens to have radically different performance than the others. You can ONLY determine that with careful measurements. It usually cannot be figured out in advance.

Many factors of algorithms do matter in general. First, not doing work is faster than doing work. Doing less work is generally faster than doing more work. Doing a small amount of work up front to prune out an enormous amount of work later is generally a good idea. Said differently: Don't be stupid. If you have a choice between using quicksort in a library versus rolling your own bubblesort, use the quicksort. Don't take pessimizations nor follow known problems.

On 5/5/2019 at 11:10 AM, ahmadierfan99 said:

How did you budget your "subsystem's updates"?

You already know how long you've got in total. If your target is 60 frames per second from the old television standards (which are long outdated) you have about 15 milliseconds total. If your target is 72 or 75 or 120 or 144 or even the 288 Hz that some high-end gamer displays are pushing, you can calculate how many microseconds you've got for your target.

From there, start with a measurement of where it is now. If you run all your updates in a single thread, you know that simulating + rendering + networking + audio + whatever must be less than the target of 15 milliseconds. If you measure today and see that simulating is taking 4 milliseconds, rendering is peaking at 6 milliseconds, audio is peaking at 1 millisecond, networking is peaking at 3 milliseconds, then you have (4+6+1+3=14 milliseconds) so you know you're at risk of hitting a 15 millisecond target if everything hits a peak frame. But if you're looking at 7 or 6 or 10 milliseconds when you combine all your times, make those your limits.

Over time as you gain experience, you learn that one system or another will become slow. Pathfinding has the potential to be slow. Many simulation steps like physics updates have the potential to be slow. Measure and monitor, then make decisions based on the details of that specific game.

On 5/5/2019 at 11:10 AM, ahmadierfan99 said:

Was there a time which you had to refer to your code's compiled assembly or referred to CPU Instruction Cycles on your specific hardware to see the actual or estimated time needed to do a specific job?

CPU instruction times haven't been a thing on modern hardware for about two decades. There are too many inter-dependencies on surrounding code. Multiprocessing context switches, hardware parallelism, hyperthreading, out-of-order cores, and long CPU pipelines all combine to make this nearly useless.

Yes, there are times when people need to look at the raw assembly code. It is fairly rare, getting rarer every day, but it still happens. It usually isn't necessary until you hit very large professional games trying to fine-tune a specific segment of code. It best to swap out with better algorithm choices, but sometimes you hit the point where there are no better algorithms and you need to hand-tune the details even more.

Normally there are better options, different algorithm choices, and different broad design choices that are more effective at finding performance gains.

lawnjelly

2,021

May 07, 2019 08:14 AM

8 hours ago, frob said:

On 5/5/2019 at 5:10 PM, ahmadierfan99 said:

Was there a time which you had to refer to your code's compiled assembly or referred to CPU Instruction Cycles on your specific hardware to see the actual or estimated time needed to do a specific job?

CPU instruction times haven't been a thing on modern hardware for about two decades. There are too many inter-dependencies on surrounding code. Multiprocessing context switches, hardware parallelism, hyperthreading, out-of-order cores, and long CPU pipelines all combine to make this nearly useless.

Yes, there are times when people need to look at the raw assembly code. It is fairly rare, getting rarer every day, but it still happens. It usually isn't necessary until you hit very large professional games trying to fine-tune a specific segment of code. It best to swap out with better algorithm choices, but sometimes you hit the point where there are no better algorithms and you need to hand-tune the details even more.

Normally there are better options, different algorithm choices, and different broad design choices that are more effective at finding performance gains.

The trend over the past 20 years has been to try and move the biggest performance bottlenecks to the GPU, as GPUs are better at pushing lots of data. It is very easy to make games that are entirely GPU bound.

When it comes to CPU instruction timings, you are more likely to get concerned and pay attention to this when using SIMD to optimize a bottleneck. With SIMD comes an appreciation of alignment and cache misses. But SIMD isn't a magic bullet, usually the biggest determinant of speed is memory access. SIMD might increase an ideal situation by 2 - 8x, but better data access might improve speeds by 100-1000x.

Memory access is why for example, vectors work much faster in the real world in most situations compared to linked lists, even though 'in theory' a linked list might do less work on an insertion / removal. Processors work much faster on linear data, compact in memory, hence the move towards data orientated design, things like structure of arrays versus array of structures, ECS etc.

Benchmark for Search Algorithms in Common Games Development Data

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Benchmark for Search Algorithms in Common Games Development Data

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines