Advertisement

AI speedups for running game in accelerated time

Started by March 01, 2013 05:11 PM
27 comments, last by Norman Barrows 11 years, 7 months ago

Possibly 900X 5400X as long as you can cut down the rendering (and the question then will be how frequently you should or what events warrant an update)

900x = one frame drawn per game minute.

5400x = one frame drawn per game hour.

If you have heavyweight AI taking up the majority of the processing time then you may be stuck if you cannot abstract the solutions the AI finds and uses well enough to preserve a valid simulation.

If you have heavyweight AI taking up the majority of the processing time then you may be stuck if you cannot abstract the solutions the AI finds and uses well enough to preserve a valid simulation.

Well, its not really heavyweight, randomly change states (flock, wander, stand), turn, move. occasional heading calculations for turning (i had to go to once per second in the attack AI's to get it responsive enough), and of course, collision checks when moving.

Changing state once every 5 seconds vs once per second didn't do much.

Oh and your optimization attempts at a low level - I assume you have already done more general optimiztions like streamlining object create/destruct functions (particularly use of object pools to get rid of the horrendous alloc/dealloc processing costs).....

In the test case, no targets are being added or removed, there's just 2 dozen of them active on the screen mulling around while you watch 4 hours of game time time go by in seven seconds (at 900x).

all targets are stored in a static array of structs, so no malloc's or free's to slow things down or cause a possible memory leak.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Task queues to get rid of polling loops??

windows message pump is the closest thing to a polling loop active during the test case.

the game does a lot of simulation in the background. i may need to time it with and without critters active. I may be down to the point where active critters are not such a bottleneck compared to the overall overhead of everything else being simulated.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Advertisement
Given the amount of speculation you're doing on what's slow, I'm still rather shocked you haven't bothered using a profiler yet. Seriously, give it a go, you might be surprised how much you discover.

Wielder of the Sacred Wands
[Work - ArenaNet] [Epoch Language] [Scribblings]

Given the amount of speculation you're doing on what's slow, I'm still rather shocked you haven't bothered using a profiler yet. Seriously, give it a go, you might be surprised how much you discover.

Yes, at this point all the obvious stuff has been done, and it time for divide and conquer with the timers if i want to do more. but the most recent tests show the game running at about 917x normal speed, when running at "900x" speed. Note that 900x is a theoretical approximate speed. IE its only drawing the screen once every 900 "game turns". And inspection of the code shows a relatively even distribution of relatively fast code. So the results will most likely be that i'm spending a little time everywhere. The first thing to test is time with critters active vs time without. that will determine whether its the AI of the active animals, or the background modeling being done all the time. then its time for the timers to determine more.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php


Given that it runs at over 100fps without rendering and the code is experimental and as such not highly optimized, that should give you an idea of how well the SAS system can help you out in the future with such problems if you do actually run into them.

I can see how a system like this would probably be required for persistent NPC's that go about their business whether the player is there or not, like the dogfights occurring all over the front in Red Baron II. When you went on a mission, it modeled the activity of every airborne plane on the entire front.

in my case, you only have to worry about the awareness bubble of band members the player controls (max 10 band members). the simulation adds and removes targets from the simulation as they move into and out of the awareness bubbles of the player's band members. They are added when a band member encounters them, either by random encounter,or by moving withing range of some objective, such as an occupied cave. The are removed when they are no longer in any band member's "bubble". checks for removal are done once per second.

The way I deal with this is very simple but it leverages the SAS considerably expecting it to have done all the heavy lifting. This is the basic idea:


class somedoodad : public Go::Object
{
  somedoodad()
  :  mUpdate( std::bind( &somedoodad::Update, this ) )
  {....  this is only used for serialization since member func pointers don't serialize ....}
 
public:
  somedoodad( Go::Manager& gm )
  : Go::Object( gm )
  , mUpdate( gm, std::bind( &somedoodad::Update, this ), World::Priorities::kActiveGo )
  {
    // Go ahead and activate this object, this will bind the update and start calling it
    // in the multicore engine.
    Manager().Activate( this );
  }
 
  void Update()
  {
    if( GetComponent< Go::Awareness >()->Inverse().size()==0 )
    {
       // Oh noes, time to die.
       Manager().Destroy( this );
       return;
    }
 
    // Do normal realtime update when in view of other objects.
  }
};

Simple, and the heavy lifting is performed in the SAS as mentioned. Performance wise, the combination of the object/component system itself and leveraging the SAS solution allows an unoptimized (it's a rewrite leveraging C++11 as you can see) simulation engine to run 500-1000 "active" objects to be updated at well over 500fps when graphics/sound items are disabled. So, again, the SAS solution does most of the work and my current variation has plenty of room for more specific optimization and multi-core work which can supply an easy 2-3 times performance increase overall without nit-picking details.

I do agree with Apoch though, performing a real profiling pass would refine the issue considerably. I personally never optimize anything without a profile in hand to tell me what specifically started slowing things down. Currently though, the only low level optimization I have in my rewrite is the SIMD math library which I moved over from the older system, everything I've been doing to maintain the current performance is usually just fixing brute force items to be more systematic, such as the SAS solution instead of searching for things manually.

Is it possible to simplify your rendering - as that might now be eating significant time in the speed up mode (to get to your 5400X)

Is the game already threaded with the graphics on a seperate thread and working on a once (interval) copied snapshot of the game situation so the simulation only need to wait briefly for that atomic copy (should be fast and maybe only copy the render related data if that is faster)

It sounds like you are running exactly the same simulation calculations which will result in exactly the same results as if it was run at normal speed (and it doesnt sound like its abstractable to simplify the simulation calculations)

What kind of sensory scanning do your objects do (adjacents only ?? a box search ? something more comples)

How is the edge of map endcase handled. There are some tricks to get rid of the XY tests needed when doing object sensor scanning which can simplify the objects calculation some. (I had a guy doing A* change to have a boundry edge of cell blocked status to eliminate the x<0 x>maxx y<0 y>maxy that had to be tested on all the adjacency tests and for a box check it is setting the XY loop limits once instead of inside the loops

---

Depending on the complexity of the behaviors here - the old method called the Big Switch inside the main game turn object loop which flattens out

the program alot can eliminate a great deal of processing overhead. Of course it depends on how your simulation is run whether that would actually be multiple processing loops per turn (sensor+decision phase, action phase, action resolution phase).

But if you dont have multiturn interactions bewteen objects (and have resolution of all actions in the current turn) the logic can usually fit and not be too obscured in one loop routine employing the Big Switch

--------------------------------------------[size="1"]Ratings are Opinion, not Fact
Advertisement

Is it possible to simplify your rendering - as that might now be eating significant time in the speed up mode (to get to your 5400X)

Is the game already threaded with the graphics on a seperate thread and working on a once (interval) copied snapshot of the game situation so the simulation only need to wait briefly for that atomic copy (should be fast and maybe only copy the render related data if that is faster)

the game is currently single threaded. one loop for drawscreen, doinput, and move_everything. the game is framerate limited to 15 frames per second. i know, 15 fps seems slow. and it is. its the slowest you can go, and still get sufficiently responsive input and smooth enough animation. so its the speed at which you'll have the maximum possible number of milliseconds (66 2/3) per frame to draw stuff and simulate stuff. and lord knows, none of us has enough frame time. even cheating to get 66ms vs 33 ms at 30fps, i'm still pressed for what i want to usually accomplish.

multi threading has not yet been considered due to keeping system requirements to a minimum for maximum compatibility. right now a directx 9 capable pc is the only system requirement - not even QueryPerformanceCounter is a requirement (yet).

it doesn't get a screen dump and then just display that. it draws the scene. once per game minute - once every 900 iterations of the main game loop, hence the 900x speed. it does 900 "turns" of input and simulation, then draws a frame. same idea for the 5400x speed, one frame drawn per game hour (once every 5400 main loop iterations). a screen dump would be faster, but then you couldn't watch the world go by in time lapse photography mode like you do now.

It sounds like you are running exactly the same simulation calculations which will result in exactly the same results as if it was run at normal speed (and it doesnt sound like its abstractable to simplify the simulation calculations)

correct. it turns off the frame rate limiter, allowing the simulation scream at full throttle. then it just draws the screen once every 900 turns, where a "turn" is one iteration of the main game loop, and represents 1/15th of a second of game time.

as for an abstraction, that may be possible. i was doing the same thing in another title the other day (a flight sim). any idea how long it takes to fly to a target 1000 miles away in an airship that does 47 mph? <g>. there i was using 100x and 1000x speeds. but it still wasn't fast enough. so i did an approximation. there it was easy, just moving a single target a long way until it gets close to something. just draw the chronometer showing mission time passing. before you know it, that 13 hour flight to target is done, and its time to lower the observation car and bombs away! accelerated time is an absolute must in any simulator. unless its one of those "real time" sims - log in once a day, water your virtual pet kind things.

but abstracting the AI would much more difficult here. you have predatory, stand your ground, and maintain distance behaviors all going on at once. and all of those use wandering and flocking as well as targeting and combat behaviors. so approximation of results is much more difficult.

in the airship sim, i didn't have to recalculate the density of helium given the altitude and temperature 15 time a second to approximate flying across europe, the way it does when running in real time. all i had to do was move the ship based on current orientation and airspeed, and check for range to targets, then fixup the altitude etc when they dropped back to real time speed. in the caveman sim, an average approximation could be done. IE given some mix of critters, on average who will have eaten whom after some given period of time. but it would not yield the same results as running the simulation in real time with full AI and modeling, the way the airship approximation does.

What kind of sensory scanning do your objects do (adjacents only ?? a box search ? something more comples)

there is no chess board like map for targets. so no map based searches such as adjacency or box search apply. targets are stored in a sparse matrix, IE a target list. sensory scanning consists of looping through the target list looking for nearby targets of interest. and nearby means looping through the list again for each target to get range. thus the (O)n^2 (or whatever it crunches out to) kinda thing going on.

How is the edge of map endcase handled. There are some tricks to get rid of the XY tests needed when doing object sensor scanning which can simplify the objects calculation some. (I had a guy doing A* change to have a boundry edge of cell blocked status to eliminate the x<0 x>maxx y<0 y>maxy that had to be tested on all the adjacency tests and for a box check it is setting the XY loop limits once instead of inside the loops

not an issue yet. environments are mostly outdoors with just trees or big rocks to navigate around. so no A* yet. just bang and turn, no avoidance or anything. works ok for now. they still kill you plenty quick (you die a LOT in this game!) <g>. i've already gone though the code and pulled all non-essentials out of the loops. about all i found was one lookup call to terrain_visual_range() that returns the max visual range based on intervening vegetation coverage (you can see far in the desert and not far in the jungle).

Depending on the complexity of the behaviors here - the old method called the Big Switch inside the main game turn object loop which flattens out
the program alot can eliminate a great deal of processing overhead. Of course it depends on how your simulation is run whether that would actually be multiple processing loops per turn (sensor+decision phase, action phase, action resolution phase).

you mean something like this?

void move_animals()
{
int a;
for (a=0; a<maxanimals; a++)
{
if (!animal[a].active) continue;
if (!animal[a].alive) continue;
if (animal[a].suprised) continue;
setstate(a);
runstate(a);
}
}
void setstate(int a)
{
int t;
t=animal[a].type;
if (animal[a].defend_location) defend_location_setstate(a);
else if (animal[a].state==FRIENDLY) friendly_setstate(a);
else if (animal[a].state==HOSTILE) hostile_setstate(a);
else if (animaltype[t].AI==ATTACK_AI) attack_setstate(a);
else if (animaltype[t].AI==MAINTAIN_DISTANCE_AI) maintain_distance_setstate(a);
else if (animaltype[t].AI==DEFEND_AI) defend_setstate(a);
else all_others_setstate(a);
}

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

The 'big switch' Ive used before usually has a persistant state of behavior for an object where It doenst recalcumate it scoarse stance as frequently as it does incremental moves

Thus for your irt might be every one in 10 'turns' it recalculated the objects general situation and may change its stance and within that stance ther are particular incremental actions at a specific level.

The object then has a switch/case statement it jumps to for its current state most of the time instead of running all the possible logic.

That also allows state specific logic to be run and special cases or logic with adjusted coefficient to match that 'stance' (state)

Smarter objects may reevaluate their state more frequently (or when woken up by important external effects)

Thus the first phase is figuring out of objects need a state change and if so calculating it

and the second is running the state specific logic for the incremental turns

SOme obejcts actions take several turns and they are locked into that action for a certain count and they dont need to do any complex situation analysis because they are already committed to the action

Another phasing can be where you decide what all objects have to do using the data from the last turns situation (its frozen) and then the next phase is using that decision to carry out its actions generating THIS turns results and situation data. Otherwise the situation data would be changes andreflect turn+1 for some objects and not for others. Depending on game mechanics and increment sizes that may not be needed)

---

for the Grid map way of deciding adjacent objects you can do it for Space Partitioning with arbitrarily sized grid cells If your N for the N^2 comparisons is sufficiently large (and interaction range means you usually interact with a substantially smaller subset ) then the overhea of inserting and removing object markers on a space partitioning grid can increase efficiency of finding which objects interact (when adjacent/close enough to each other.)

This can be particularly tru when you have alot of passive object that dont more or rarely move (an they dont need to have their grid positions checked and adjusted much)

Whenever you move you check if an object is still in the grid cell it previously was and if changed its removed from the old cell and added into the new current cell (link list) If the cells are many (fine grained) the link lists are short if the cellsa are large (in relation to how far objects move per turn) then they dont change cells very frequently negating possibly longer link lists (objects that move the most are near the head of the link list so they have lower removal times (insert at head is in constant time)

Generally you want to size the grid cells to have be larger than the maximum interaction range so you only have to check membership in the objects immediate cell and its 8 neighbors (if 2D mapping) 3x3 cells to check ... It Can be done checking further out 5x5..7x7..9x9 etc.. cells using a xy loop traversal (if needed when some objects have very long interaction ranges)

Example - a 10x10 full space partitioning grid of map with garanteed interaction range of less than 1/10th width of map you only have to check 3x3=9 out of 100 total cells for each object (and you can sometimes do optimizations - gang all the objects in that center cell together to test against the 9 candidate 'adjacent' cells - lots of walking of link lists...)

For large maps and large counts of objects that method can cut down the N^2 interaction testing significantly (I had maps with huge numbers of largely inactive objects (10s of thousand) and a few significant objects that moved alot

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

Thus for your irt might be every one in 10 'turns' it recalculated the objects general situation and may change its stance and within that stance ther are particular incremental actions at a specific level.

This would be the "select target" every 5 seconds, and the "randomly change AI state" every 5 or 10 seconds, when critters are just mulling about (stand, flock, and wander behaviors).

The object then has a switch/case statement it jumps to for its current state most of the time instead of running all the possible logic.

You mean like this...

void runstate(int a)

{
//show_hostile_info(a);
if (animal[a].collision_recovery)
{
move_collision_recovery(a);
return;
}
switch(animal[a].AIstate)
{
case 0: // stand
if (animaltype[animal[a].type].avian)
{
if ( animal[a].y > heightmap(animal[a].mx,animal[a].mz,animal[a].x,animal[a].z) ) land(a);
else standani(a);
}
else standani(a);
break;
case 1: // wander
steer_animal3(a,0);
moveRun(a,0);
walkani(a);
break;
case 2: // run away
steer_animal3(a,1);
moveRun(a,1);
walkani(a);
do_animal_atk(a);
break;
case 3: // flock
steer_animal3(a,0);
moveRun(a,0);
walkani(a);
break;
case 4: // attack
case 11: // predator hunt
if (frame % 15 == a) calc_heading_to_target(a);
set_rng(a); // calc rng to tgt
if (animaltype[animal[a].type].avian) run_avian_attack_AI(a);
else
{
steer_animal3(a,0);
new_moveAtk(a,0);
walkani(a);
do_animal_atk(a);
}
break;
case 5: new_eatkill(a,animal[a].butchered); break;
case 6: new_move_cornered_animal(a); break;
case 7: run_defend_location_attack(a); break;
// case 8: not used
case 9: new_moveto_home(a); break;
case 10: // friendly & all others attack
if (frame % 15 == a) calc_heading_to_target(a);
set_rng(a); // calc rng to tgt
new_steer_animal(a);
new_move_animal(a);
do_animal_atk(a);
break;
}
}

for the Grid map way of deciding adjacent objects you can do it for Space Partitioning with arbitrarily sized grid cells If your N for the N^2 comparisons is sufficiently large (and interaction range means you usually interact with a substantially smaller subset ) then the overhea of inserting and removing object markers on a space partitioning grid can increase efficiency of finding which objects interact (when adjacent/close enough to each other.)

yes, right now, almost all data in the game is stored in sparse matrices (IE lists of whats on the map and where, but no map grid). easy on the memory requirements, but slow to search. environments for collision check purposes can be tight and complex, as the terrain is randomly generated at the start of a new game. so for collision checks, a moving critter might look through a list of trees, a list of fruit trees, a list of rocks, a list of PCs, a list of caves and rockshelters, a list of critters, and the world map (the only grid data structure) for flags indicating other collidables such as tarpits, volcanoes, and huts. a change of data representation to a method that is more conducive to spatial searching may be called for. but at the moment, its really not that bad. I did test it with and without critters, and i'm looking at 4 games hours in 4 seconds with no critters active, and 4 game hours in 7 seconds with critters active. its rare that the player will accelerate time for more than 8 game hours at a time. imagine playing the sims, and the game is running at 1 game hour per second. in no time at all, your poor little sim would be dead from starvation, etc.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

This topic is closed to new replies.

Advertisement