AI speedups for running game in accelerated time

Norman Barrows · 2013-03-31T18:22:41

I'm working on a caveman fps / rpg / simulator game. because its a simulator, you can accelerate time, like in a flight simulator, or a submarine simulator, to skip past the dull stuff. the game currently supports real time, 2x, 900x, and 5400x game speeds. Unlike waiting in Oblivion, you can do this with potential hostiles nearby. It automatically drops out of accelerated time if anything tries to attack you. in the game, you can interact with the environment and perform actions (like in The SIMs) such as gathering resources and making weapons. when performing an action such as making a wood spear, which takes 4 game hours, it runs fast with no active targets, and slow with a herd of 20 antelope mulling about. with no active targets, at 900x normal speed, making a spear takes 4 seconds (in real world time). with 20 active targets, at 900x normal speed, it takes 22 seconds (in real world time). so i'm looking for ways to speedup the AI so it still does (or approximates) what it does in real time combat, but runs faster at 900x and 5400x normal speed. so far i've gone to selecting target (which requires a range check to every active target) once every 5 seconds, and made the animals steer based on a desired heading calculated every 5 seconds (which requires a call to atan). these calculations are performed on every 15th target in succession over 15 frames. IE: if ( target_number % 15 == frame_number ) do_AI_stuff. i'm using a fast RISC style 2d bounding box distance calculation for the range check when selecting target. i tried an atan lookup function, but the math coprocessor is faster. the amount of time between target selections is the amount of lag in response the targets exhibit. if an antelope is programmed to run when you get within 50 feet, it can take up to 5 seconds for it to target you, decide you're too close, and start running. So selecting targets even less frequently doesn't seem to be an option. everything else in the AI except the range check to all targets and the atan call is fast simple code, mostly ADDs and COMPs. however it does do a check once per second for being "cornered", which requires moving 4 points in 2d (4 sins and 4 cos's) and 4 collision checks (point vs axis aligned bounding boxes). i can spread out the target and heading calculations so i'm only doing it for one target per frame, but i'd still be doing it for all targets over 5 seconds of game time. this would reduce per frame overhead to a maximum of one set of target calculations per frame, but not overall overhead. i can do the "cornered" check every 5 or 10 seconds, that will get me a little something, but other than that, I'm not sure what to try next. faking it? is that the answer? when all else fails, fake it?

Artificial Intelligence Programming

Started by Norman Barrows March 01, 2013 05:11 PM

27 comments, last by Norman Barrows 11 years, 7 months ago

Anonymous

March 05, 2013 05:10 AM

Reading through your discussion points suggests to me that you are optimizing the wrong problem in general. Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level. I.e. the quote: "so far i've gone to selecting target (which requires a range check to every active target) once every 5 seconds" should draw your scrutiny and atan in general should be ignored. Instead, what you probably want to be looking at is how to remove calls to the range check in the first place.

The first thing that jumps out at me in the mentioned quote quote is that it is a description of a very bad "O" notation processing time curve. Your curve is basically number of calls versus number of objects or: 2 objects means 2 calls (i.e. distance from me to you and the inverse), 3 objects means 6 calls, 4 objects means 12, 5 objects means 20 calls etc. That's an "O( n^2-n )" pattern which is very bad. By trying to update less frequently you are reducing only by a linear amount and as such, no matter what you do at some point the number of calls still drags everything to a stand still. What you really want is something to get rid of the "n^2" portion of this problem so you are only calling the code in question once in a while, or not at all preferably.

Before I suggest my solution, I'll make a specific point:

atan(2) is not really evil, it's not even that slow unless you call it in massive amounts. What is more evil is that it is often a function where you are getting a cache miss since it is used to figure out an angle between this object and another object and that other object is unlikely to be in cache at the moment. A profiler will often glob up the cache miss into the atan call since the compiler will often make the actual "access" to the other object while setting up the stack prior to calling atan(2). Usually this means you are seeing a cache hit as included with the atan(2) call and miss-interpreting what is actually slow. If you see atan(2) in your profiles, check the call count, if it is very high then the cost is likely true, if it is a couple hundred, you are likely looking at a cache miss and not a function call problem. (Per frame that is.) Profiling is critical but also reading "between the lines" is exceptionally important since cache misses are more of a black art to figure out than a gospel from any profiler or guess you can make.

Anyway, my suggestion for you is to stop optimizing things the way you are because your first goal seems like it should be to reduce the O notation issue. (I.e. the n^2 portion of the number of calls I described above.) You also want to reduce cache misses which suggests you need to step back and work the problem at a higher level. The higher level would be SAS (spacial awareness system) which works over all objects and only triggers your AI bits as actual "entry/exit" events happen. An SAS is optimized to correctly figure out the distances between objects and trigger events only when an object properly enters a given range, without going n^2 in processing time or incurring the dreaded cache misses constantly.

You might want to check out: http://codesuppository.blogspot.com/2008/03/spatial-awareness-system.html for an overall description of several variations. I tend to use a sweep/prune method myself as flocking is part of my simulation and while it is slower overall it doesn't get the big performance spikes of other solutions when many objects move together.

Given the problems you seem to be facing though, working the problem at a higher level seems to be required as the current direction is not likely to be good enough without sacrificing a lot of game play. My personal unoptimized single core version deals with 5k objects at 500fps easy in debug, while release is 1700+ (stl usage in debug sucks). Given something like this in your game, you would simple start making something, add an awareness bubble around the character and if there are any "entry" events for enemies, go back to normal speed. No more n^2 issues and optimizing small repeatedly called functions won't be a big issue anymore.

Norman Barrows

Author

7,180

March 05, 2013 05:56 AM

Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level.

and only triggers your AI bits as actual "entry/exit" events happen.

random encounter is the entry event. death or range > visual_range + 50 ft or range > 300 feet (depending on the target type) is the exit condition (target removed from simulation). so the only AI crunched is for alive critters within 300 feet or less of a player character.

add an awareness bubble around the character and if there are any "entry" events for enemies, go back to normal speed. No more n^2 issues and optimizing small repeatedly called functions won't be a big issue anymore.

the whole idea is to get it to run fast at 900x and 5400x normal speed with up to 200 active units in the player's awareness bubble, preferably without "faking" it.

Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level.

i'm now down to selecting target once every 5 seconds, and doing the atan at that time. there are a couple other places where i do atan, when AI state changes, like from targeting a location to move towards, to targeting a nearby hostile critter. but even that 5 seconds is probably too much. it means its up to 5 seconds before a critter notices and responds to the player or any other hostile.

atan and the range checks are the two bottlenecks. atan now gets pre-calculated. so the range checks are the killer.

it is a description of a very bad "O" notation processing time curve.

yep, checks go up as the square of the number of targets.

gotta sign off

will finish reply later.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Anonymous

March 05, 2013 06:39 AM

random encounter is the entry event. death or range > visual_range + 50 ft or range > 300 feet (depending on the target type) is the exit condition (target removed from simulation). so the only AI crunched is for alive critters within 300 feet or less of a player character.

This is exactly the point I was making with all the O notations. An SAS would cut the processing time in massive ways using any of the suggestions from the link I provided (well, excepting the brute force method which is what you currently use) because you would only have to worry about the "entry" events. The SAS is optimized to run the complicated I see you/you see me stuff without going exponential in processing time.

add an awareness bubble around the character and if there are any "entry" events for enemies, go back to normal speed. No more n^2 issues and optimizing small repeatedly called functions won't be a big issue anymore.

the whole idea is to get it to run fast at 900x and 5400x normal speed with up to 200 active units in the player's awareness bubble, preferably without "faking" it.

Yup, and by removing the primary overhead via pushing it out to a global system which can avoid all the bottlenecks of a "per object" solution you should be able to do that without tricks quite easily. Simply reducing the update rate is a linear solution, it won't be a solution for long if you increase the unit counts since your current version is exponential in cost.

Atan should not be in your list of problems, more than likely your problem is simply one of not stepping back and optimizing how often you are calling various functions at a higher level.

i'm now down to selecting target once every 5 seconds, and doing the atan at that time. there are a couple other places where i do atan, when AI state changes, like from targeting a location to move towards, to targeting a nearby hostile critter. but even that 5 seconds is probably too much. it means its up to 5 seconds before a critter notices and responds to the player or any other hostile.

atan and the range checks are the two bottlenecks. atan now gets pre-calculated. so the range checks are the killer.

Again, I suggest that you may be miss-identifying your problem. Atan is a really damned fast function in reality, same with distance checks which is a simple subtraction then a square root. (Ok, non-simd code they take dimension times n time.) These are trivial math problems and should be executing in absolutely no noticeable time. But, if they show up n^2 times per call, yes, they are going to be hot spots in the code. Or, perhaps, the more notable issue is that the cache misses get identified with these functions and something which should take 10-20 cycles all of a sudden takes 500-2000 cycles to resolve because it has to go fetch a cache line.

Either way, a SAS would solve a lot of this by offloading a very complicated task from being per object to a global system which properly optimizes things without the brute force solution.

it is a description of a very bad "O" notation processing time curve.

yep, checks go up as the square of the number of targets.

I figured given the description. Hopefully the above clarifies why I think you should stop optimizing what seems to be a minor issue. :)

Norman Barrows

Author

7,180

March 05, 2013 07:34 PM

Either way, a SAS would solve a lot of this

time for more research, thanks for the tip.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Norman Barrows

Author

7,180

March 05, 2013 11:52 PM

while checking into SAS's and K-dimensional trees, it occurred to me that i never did a "in awareness bbox" check in the loop for select target before calculating range. so i added that to set_wild_tgt, which is the target selection routine used by the critters in the test case. timed it, no major difference. then out of curiosity, i put a trace on it to count the number of times it checks range while selecting the target for a critter. adding "in awareness bbox" cut the number of range checks from one to zero. basically, its not doing any range checks at all, and trivially rejects everything as inactive, same species, or "can't see them through all this overgrowth".

when i was reorganizing the AI, I added traces to track the current target and AI state of an animal (the first critter in the herd). according to all the traces, critters never check range, don't have a target, and just transition between stand, wander, and flock AI states. so any remaining slowdown is in the expert system that handles state changes, or in the runstate() routine that executes the various AI states. time for more timing tests, or a quick look at the code to see if some parts are obviously fast or slow. but at this point, the speed the game is running at is pretty decent. but there is that up to 5 second lag in targeting.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Norman Barrows

Author

7,180

March 05, 2013 11:54 PM

accidental double post

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Anonymous

March 07, 2013 05:51 AM

So in other words, I was only partially correct, you were optimizing something trivial but it wasn't being called thousands of times a frame, it wasn't even being called. :)

Anyway, for the future though, when you look into the SAS systems, here is a movie from a little example I put together real quick using my new simulation engine rewrite:

It was a nice little unit test for the system so figured what the hell. :)

It is 2000 objects all running the flocking algo but using an SAS to track objects of interest (i.e. worthy of adding to the flocking calculations). It also randomizes view distances for all the boids between 25 and 50 units (some are pretty near sighted :)) so it is a bit more challenging. The flocking algo and movement is all done multi-core, the rendering and awareness systems are both currently single core though. Given that it runs at over 100fps without rendering and the code is experimental and as such not highly optimized, that should give you an idea of how well the SAS system can help you out in the future with such problems if you do actually run into them.

Norman Barrows

Author

7,180

March 08, 2013 08:08 PM

So in other words, I was only partially correct, you were optimizing something trivial but it wasn't being called thousands of times a frame, it wasn't even being called.

i hadn't gotten to optimizing yet, i was still in the speculation and educated guess phase. sometimes it can show obvious things like out of order trivial rejections in loops, constants being re-calculated inside loops, and other first pass - write it easy, not fast coding stuff. then you break bad with the timers and tracing once you've done the obvious stuff. inspecting the code and thinking about it first will give you a good idea what the timers will tell you. if not, then learn from the timers, and next time you'll do a better job of predicting the profile results, or not writing lazy code in the first place where you know its going to matter. just doing the obvious stuff got me from 22 seconds down to about 7 seconds. a trace indicates that no further speedups are possible with atan and range while target selection continues to be done at a rate of only once every five seconds. I'd like to do target selection once per second to make them more responsive. until then, atan and range are fast enough. i now need to look at setstate(), the expert system that sets AI state, and runstate(), the code that executes the various AI states. I know that setstate() can be sped up. for MAINTAIN_DISTANCE_AI herds, every frame it "rolls the dice" to determine if a critter changes AI state between wander, flock, and stand. and sometimes "rolls the dice" to determine what they do next (wander vs flock, etc). runstate() can be sped up too. i have't sliced the avian and non-avian AI into totally separate parts yet, so many of the move() routines (attack, run, etc) have two branches of code in them, one for avians and one for non-avians. splitting that up so its just one short clean shot of code will eliminate some flow control checks. but not potentially setting state every frame , and instead doing it once per sec or so will be the major speedup.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

wodinoneeye

1,691

March 13, 2013 09:28 AM

Possibly 900X 5400X as long as you can cut down the rendering (and the question then will be how frequently you should or what events warrant an update) That is if most of your time in the game is spent waiting for animations to finish before calculating the next move (even when the units are running asynchonusly to each other).

If you have heavyweight AI taking up the majority of the processing time then you may be stuck if you cannot abstract the solutions the AI finds and uses well enough to preserve a valid simulation.

Oh and your optimization attempts at a low level - I assume you have already done more general optimiztions like streamlining object create/destruct functions (particularly use of object pools to get rid of the horrendous alloc/dealloc processing costs).....

Lazy evaluations ??? Task queues to get rid of polling loops?? Lockless methods and/or micro fibers ??

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

Norman Barrows

Author

7,180

March 15, 2013 02:46 PM

Given that it runs at over 100fps without rendering and the code is experimental and as such not highly optimized, that should give you an idea of how well the SAS system can help you out in the future with such problems if you do actually run into them.

I can see how a system like this would probably be required for persistent NPC's that go about their business whether the player is there or not, like the dogfights occurring all over the front in Red Baron II. When you went on a mission, it modeled the activity of every airborne plane on the entire front.

in my case, you only have to worry about the awareness bubble of band members the player controls (max 10 band members). the simulation adds and removes targets from the simulation as they move into and out of the awareness bubbles of the player's band members. They are added when a band member encounters them, either by random encounter,or by moving withing range of some objective, such as an occupied cave. The are removed when they are no longer in any band member's "bubble". checks for removal are done once per second.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php