Advertisement

Hitting the perforamce wall: sensory scanning

Started by August 17, 2011 05:47 AM
15 comments, last by NEXUSKill 13 years, 2 months ago
Well, I'm currently trying to test and tune the performance of my game and I'm hitting a performance wall within my AI.
Some background:
  • pathfinding is not part of the bottleneck and can be ignored
  • entities are controlled by a behavior tree writting in Lua using LuaJit2
  • entities change update intervals according to activity of surrounding(LOD)
    My stress test is a cluster of about 50 entities in a combat situation with the player close. My idle performance for some entities is ~4ms , this stress test goes up to 60-120 ms, that is 1-2ms per entity :o. My goal is to reduce it to a much lower, more stable behavior (~20ms).

    Due to the none linear behavior of the performance, I fear that the sensory scanning of the surrounding is the bottle neck O(n^2). I can query the surrounding for entities which met a list of certain criteria and get a basic rating of their importants (checking the criteria is quite expensive). I'm not sure what the best approach would be, some ideas sofar:
    1. Asynchronously sensory scanning, an entity enqueues a request and get the result a few frames later.
      Issue: Delayed reaction and refactoring of my BHTs.
    2. Dynamic range adjustment, when the performance decreases the scanning range decreases too.
      Issue: missing important targets when inside of a cluster.
    3. Caching of sensory results: partly done (current target is locked and reevaluated)
      Issue: missing of changing surrounding, delayed reaction ?

    Anyone has some experiences, best practises or ideas ?

Well, I'm currently trying to test and tune the performance of my game and I'm hitting a performance wall within my AI.
Some background:
  • pathfinding is not part of the bottleneck and can be ignored
  • entities are controlled by a behavior tree writting in Lua using LuaJit2
  • entities change update intervals according to activity of surrounding(LOD)
    My stress test is a cluster of about 50 entities in a combat situation with the player close. My idle performance for some entities is ~4ms , this stress test goes up to 60-120 ms, that is 1-2ms per entity :o. My goal is to reduce it to a much lower, more stable behavior (~20ms).

    Due to the none linear behavior of the performance, I fear that the sensory scanning of the surrounding is the bottle neck O(n^2). I can query the surrounding for entities which met a list of certain criteria and get a basic rating of their importants (checking the criteria is quite expensive). I'm not sure what the best approach would be, some ideas sofar:
    1. Asynchronously sensory scanning, an entity enqueues a request and get the result a few frames later.
      Issue: Delayed reaction and refactoring of my BHTs.
    2. Dynamic range adjustment, when the performance decreases the scanning range decreases too.
      Issue: missing important targets when inside of a cluster.
    3. Caching of sensory results: partly done (current target is locked and reevaluated)
      Issue: missing of changing surrounding, delayed reaction ?

    Anyone has some experiences, best practises or ideas ?


You may be able to prune your calculations using BSPs or Quadtree's to designate units that are allowed to update since something interesting is going on next to them (i.e., the player is there), the rest are not even quieried until they are close enough become actively involved at which point they are activated. You're best bet is to reduce as much as practical any "search" or "scanning" algorithms in favor of direct calculations with an entity based on local neighbors in the BSP or Quadtree to see if an action should take place due to some sort of threshold (e.g., i can see the player and i want to hurt him so i attack, player made a noise i could hear i'm going to go find him, i see that food and i'm hungry so im going to eat it, etc.). At least that is my approach and what i've seen from alot of AIs that try to implement more complex behavior. Each AI can easily know what it is near it locally since that is quickly accessible in the BSP or Quadtree, and therefore just does a test to see if it is close enough to care about it to respond with some sort of action. Course, if you get 50 AIs in the same location, this could still slow things down, so you'll want to spread them out smartly, but it is still better than testing 50 AIs in a level when you only need to test 2 or 3 which are close enough to respond to the player cause they're all in the same quadrant or subquadrant. Either way, quadtrees make finding which AIs should be active logarithmic and is constant afterwards until the quadrant is left, and makes action determination constant (best)[1 AI and 1 player] or linear (average)[more than 1 entity in a quadrant with memory of past actions] per AI. Even if you allow testing of everything in that quadrant, you're still dealing with a handful of things, so the number of tests is low, so things move quickly.

This also helps a bit with allowing AIs more freedom of action separate of the player as you can run a quadrant per cycle or a subquadrant per cycle to further limit things and then start updating "uninvolved" AIs less when player-vicinity AIs get actively involved with the player. This would allow for more autonomy with less overhead and allows your AIs to do other interesting things, like stumble upon the player, stalk other AIs or look for junk, or respond to a level-wide alarm. For finding stuff your AI doenst directly know about, but may have "general" ideas of, depending on your implementation, it may be able to "roll up" information if you need an AI to find soemthing, so that it knows that quadrant X contains a subquadrant with item Y, it then procedes to quadrant X and looks further when it gets there, eventually getting to the locaiton of item Y. You can use this for the AIs to talk to eachother and share knowledge of interest like "i saw a player in quadrant X" or "there was a dead body in quadrant X, watch out!".

Basically, the best thing i can suggest is, if the player isnt experiencing it, it ultimately is wasted cycles, so don't update AI that isnt involved with the player unless it is important to your game. Keep AI dormant or unchecked until it becomes a concern to the player whenever possible, you can cheat this in many ways: Dormancy can be AIs sleeping, "chatting" with other AIs, "goofing off", "bored", watching a campfire, "cleaning" something, whatever that makes look more interesting to the player when they stumble upon the AI, but ultimately require no cycles to process in any way until the player stumbles upon them when you can activate any animations or whatever to give variety until the player takes action or the local AIs notice the player in some way and react.

Anyway, hope that helps.
Advertisement
Make sure your actions are largely event-based rather than polling for them. e.g. Don't constantly check to see if a grenade went off in your area. Let the grenade tell you that it went off.

Also, if there are expensive calculations that are going to be used be multiple agents, store them separately and don't recalculate for each agent. When needed, the agents can check for the calculated value and simply use it.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

As NetGnome suggested, are you using space partitioning for anything else like collision or rendering? If so, you can use it to make AI queries fast. After all, AI sensor range queries are very similar to collision detection ... i.e. "what units intersect with my sensor range"?

Personally, I did a game with 100's of units on essentially a 2D plane (top-down RTS style game play) and used the game's existing spatial partitioning grid to optimize the AI queries. Since the partitions are uniform, I was able to define all AI unit's sensor ranges in terms of grid squares, so there was very little explicit distance calculation when finding targets. If the target is in a nearby grid square, it is in range. Even with 200+ AI units the game runs 60 fps on 5 year old laptop hardware.

Anyway, yeah...use spatial partitioning if you can. ;)

@NetGnome and EJH:
Thx for the sugguestions, but all these things I have already considered and implemented in a similar way. For example I'm using sweep'n'prune + binary search to query the surrounding, which is very fast even with a very high number of queries per frame.

This isn't the issue. The problem is, that once I've determined the potential candidates I need to check for game related properties (is it an enemy, is it injured, is it an item or a npc etc.). These tests are quite expensive in my engine due to the nature of my attribute handling, this in combination with a cluster of npc seems to be one of the bottlenecks (~40% AI performance in cluster test case).

Currently I'm checking all potential candidates in the surrounding, in a cluster this could be 10-20. With a more or less 50 entity cluster , these are 500-1000 entity scans per frame. Certainly I have to take a look at the per-frame scan, these seems obviously too much, still I must take care to do some load balancing to avoid sudden time peaks .
If I would only check the first 10 entities it could lead to reaction artifacts (i.e. an entity stands in a cluster of 10 unimportant entities not seeing the 11th dangerous entity approaching).

Make sure your actions are largely event-based rather than polling for them. e.g. Don't constantly check to see if a grenade went off in your area. Let the grenade tell you that it went off.

Also, if there are expensive calculations that are going to be used be multiple agents, store them separately and don't recalculate for each agent. When needed, the agents can check for the calculated value and simply use it.[/quote]
I'm polling only to determine entities around to include
new entities who enter the area of interest. When scanning an entity I already use some kind of "view" which maps certain states to simple attributes (i.e. 100 of 500 hp => condition="bad"). Thereafter most checks depends on the association between two entities.

Hmmm... maybe it is time to implement some kind of query-engine and move the script-part into the more efficient c++ engine in addition to an entity-entity query cache...:unsure:

@NetGnome and EJH:
Thx for the sugguestions, but all these things I have already considered and implemented in a similar way. For example I'm using sweep'n'prune + binary search to query the surrounding, which is very fast even with a very high number of queries per frame.

This isn't the issue. The problem is, that once I've determined the potential candidates I need to check for game related properties (is it an enemy, is it injured, is it an item or a npc etc.). These tests are quite expensive in my engine due to the nature of my attribute handling, this in combination with a cluster of npc seems to be one of the bottlenecks (~40% AI performance in cluster test case).

Currently I'm checking all potential candidates in the surrounding, in a cluster this could be 10-20. With a more or less 50 entity cluster , these are 500-1000 entity scans per frame. Certainly I have to take a look at the per-frame scan, these seems obviously too much, still I must take care to do some load balancing to avoid sudden time peaks .
If I would only check the first 10 entities it could lead to reaction artifacts (i.e. an entity stands in a cluster of 10 unimportant entities not seeing the 11th dangerous entity approaching).

Make sure your actions are largely event-based rather than polling for them. e.g. Don't constantly check to see if a grenade went off in your area. Let the grenade tell you that it went off.

Also, if there are expensive calculations that are going to be used be multiple agents, store them separately and don't recalculate for each agent. When needed, the agents can check for the calculated value and simply use it.

I'm polling only to determine entities around to include
new entities who enter the area of interest. When scanning an entity I already use some kind of "view" which maps certain states to simple attributes (i.e. 100 of 500 hp => condition="bad"). Thereafter most checks depends on the association between two entities.

Hmmm... maybe it is time to implement some kind of query-engine and move the script-part into the more efficient c++ engine in addition to an entity-entity query cache...:unsure:
[/quote]

Given that you have so many AIs so close together, you can probably get away with implementing some form of swarming, so that 2-5 of your enemies are actually just one AI that appears to be 2-5 individuals. Of course this is heavily dependant on the game you're making and whether this can be achieved without it becoming obvious to the player, but if done right, it may look like a squad of enemies working in tandom, giving the appearance of "better" AI while also cutting down your calculations by ~2-5x (assuming you're crafty about it)
Advertisement

Given that you have so many AIs so close together, you can probably get away with implementing some form of swarming, so that 2-5 of your enemies are actually just one AI that appears to be 2-5 individuals. Of course this is heavily dependant on the game you're making and whether this can be achieved without it becoming obvious to the player, but if done right, it may look like a squad of enemies working in tandom, giving the appearance of "better" AI while also cutting down your calculations by ~2-5x (assuming you're crafty about it)

This is just a test case. All of my AI agents are individuals with their own demands,weaknesses, fears etc. , but there are certain game events where a cluster of entities is probably and I want to cushen the FPS impact. In fact it is a performance optimisation.

I've already extended my engine to run on multiple cores, but AI, especially when running in a VM (lua), is really hard to optimize, therefore I've started to transfer the scanning of the surrounding from my script engine to the C++ engine and make it multithreaded and asynchroniously. I've done this already with pathfinding with really great results.:D

[quote name='Net Gnome' timestamp='1313749008' post='4851135']
Given that you have so many AIs so close together, you can probably get away with implementing some form of swarming, so that 2-5 of your enemies are actually just one AI that appears to be 2-5 individuals. Of course this is heavily dependant on the game you're making and whether this can be achieved without it becoming obvious to the player, but if done right, it may look like a squad of enemies working in tandom, giving the appearance of "better" AI while also cutting down your calculations by ~2-5x (assuming you're crafty about it)

This is just a test case. All of my AI agents are individuals with their own demands,weaknesses, fears etc. , but there are certain game events where a cluster of entities is probably and I want to cushen the FPS impact. In fact it is a performance optimisation.

I've already extended my engine to run on multiple cores, but AI, especially when running in a VM (lua), is really hard to optimize, therefore I've started to transfer the scanning of the surrounding from my script engine to the C++ engine and make it multithreaded and asynchroniously. I've done this already with pathfinding with really great results.:D
[/quote]

Yea, a VM would definitely kill performance :) Though i am interested in how you structured your multi-threaded pathfinding and returned the info to the AI.

Though i am interested in how you structured your multi-threaded pathfinding and returned the info to the AI.

To avoid unnecessary locking my pathfinding works only when modification of the underlying waypoint structure is barred. In this case multiple thread can read the waypoint structure without any problem. Whenever I manipulate the wp structure (seldom) I reset all running pathfinding processors.

The communication between pathfinder and AI is event driven. The AI can submit a pathfinding request which will be enqueued to the pathfinding queue. The requests will be dispatchted to one of pathfinding processors, which will do the pathfinding job within a time slice(i.e. 50 nodes are visited per frame) and send the result to the AI. The current action of the AI is stopped while waiting on the path result, thought the AI is still active (scanning the surrounding, reacting to events etc.).

My multithreaded approach is similar to the one in Id tech 5(google for id tech 5 challenges), where they use job lists and signal-wait tokens for synchronisation.

[quote name='Net Gnome' timestamp='1313750984' post='4851146']
Though i am interested in how you structured your multi-threaded pathfinding and returned the info to the AI.

To avoid unnecessary locking my pathfinding works only when modification of the underlying waypoint structure is barred. In this case multiple thread can read the waypoint structure without any problem. Whenever I manipulate the wp structure (seldom) I reset all running pathfinding processors.

The communication between pathfinder and AI is event driven. The AI can submit a pathfinding request which will be enqueued to the pathfinding queue. The requests will be dispatchted to one of pathfinding processors, which will do the pathfinding job within a time slice(i.e. 50 nodes are visited per frame) and send the result to the AI. The current action of the AI is stopped while waiting on the path result, thought the AI is still active (scanning the surrounding, reacting to events etc.).

My multithreaded approach is similar to the one in Id tech 5(google for id tech 5 challenges), where they use job lists and signal-wait tokens for synchronisation.
[/quote]

Very interesting. So you're AI queues up an "Update Path" job to your pathfinder processor and enters itself into a "wait for path update" state so it can continue to monitor things. Then before the pathfinder thread rejoins, it fires off an "Update AI with Path" event which the AI consumes at which point it changes itself back to some sort of action state? Very slick!

Are the events managed via the AI or are they managed via the pathfinder processor, or some game event manager?

This topic is closed to new replies.

Advertisement