Advertisement

Turn based RPG AI : how can I handle risk / reward ?

Started by September 10, 2015 04:53 PM
31 comments, last by IADaveMark 9 years ago

Hello!

I’ve got a game design AI related issue I’m having trouble to solve and I’d love to have you guys input on this :)

A bit of background on my game:

I’m designing an idle/incremantal RPG with a turn based battle system similar to FF7. The player can setup pre-determined rules for his party that the game engine will then execute (very much like the FF12 gambit system: a bunch of if/then/else do that).

I am now working on the monster’s AI. I’ve settled for a chess-like approach to the problem: Create a tree of all possible game state by listing at each unit’s turn, each possible move/skill she can use. And then, parse that tree and find the best move for the AI. I used a custom evaluation function to assess leaf nodes and value how good/bad the situation is for the AI.

I’ve managed to build up my AI data tree quite nicely and it can also use a MiniMax algorithm to find a good move to play. I’ve got unrelated issues to this topic (mainly my AI is way too slow and can only expand fully like 100 nodes, but this shouldn’t really matter for what I am coming here).

The issue:

For now, my AI use fixed random seed to simulate a unit’s turn and then use the same seed for the real turn. It allows me to know exactly how the game will pan out during my simulation.

The main things affected are everything that is percentage based:

  • Chance to hit mainly
  • Skill with a proc chance (ex: 10% do to this, 90% to do nothing)

My AI knowing what this random numbers will be, know if his next move is going to fail for example. It then chose a move that will either not miss or is just buying her time until the RNG seed is more favorable to her. If all skill would fail this turn, she then simply chose the one that cost her the less resources so she can maintain her advantage. It is smart of her but it means that the AI will likely never “miss” an attack or only use proc based skill when they will actually proc.

For example, it kind of transform a spell that has “X% chance to do Y”, to a spell that has “100% to do Y but has a cooldown of 1/X% turns”. This is not what I want.

I want my AI to use actions and sometimes miss. I’ve tried to find a “clean” solution to this but every ones I think about still has big flaws.

As my AI is kind of slow, I’d like to avoid adding a layer of complexity to it. I can compute things a bit differently but I cannot compute much more scenarios that it is already handling. Example: I cannot split every percentage roll into 2 sub branch tree: passed / fail. It would solve most of my issues but will be too time consuming for me.

Some of my ideas:

1 - Leave it that way and make a design feature: fighting an AI that can “predict” the future can be a challenge in itself, maybe even an interesting one if balanced properly. For example The AI may never miss a skill but if the player uses skills that reduce the monster’s chance to hit it may lead the AI to be forced to make poor choice in the end if it really has no other options. Not my favorite option but still worth exploring.

2 - Changing our RNG seed dynamically so that any random generated during the simulation of a move/turn will not be the same when the turn is really played. Easy to do. The simulation will be “realistic” but In the end inaccurate. It reflects a possibility that could happen but not the real one. The more depth we explore, the farther we will deviate from the real output of the game since we will be re-rolling every random number. In these cases, if the AI selected a move because during simulation because it seemed good, it would still have a chance to fail when the move is really made (that is main advantage of this method). But it doesn’t solve the inverse case where if the AI think it will miss an attack, will still discard it for other safer moves, leading here to eventually make stupid moves : if the AI got only an attack and the possibility to skip turn, it will skip its turn which make no sense.

3 – Change my simulation function to try and average every proc based on percentage. For example: instead of hit/miss I could multiply the damage done by the hit% chance and make it always hit. In average it gives the same damage. It does solves the issue of a “miss” since the AI will chose the best choice regardless of that. The thing is, the more depth we analyze, the farther we will end up from reality (making our decision process more fragile) and some procs are really hard to average with accuracy (stupid example to illustrate: a skill that has 5% chance to kill a character). It is also something that would require extensive recoding and that I’d like to avoid if possible.

4 – Use the same RNG in simulation and the real turn but while deciding what moves to choose, dumb down my AI and add a X% of chance to choose a different move than the best one. It allows us to somehow force the AI to select a move that will miss its target for example. I don’t really like dumbing down my AI on purpose that way. It could make her chose very stupid moves while an optimal one was the obvious choice (ex: a target can be finished with a simple attack but since the AI knows it will fail, it will skip its turn -> it’s counter-intuitive to what a player would do, a player would try to finish off the target, not knowing if the hit will land or not, but he will try if he has a decent hit chance). It also poses the issue of how to chance that X% to select a different move…not a trivial task at all if we want to be as realist as possible.

5 – Some other ones I already forgot :/

In the end, my problem is heavily tied to risk assessment. Trying to find how human assess these risk/reward situation would definitely help me in making the AI do the “right thing”. I want it to appear “smart” and avoid her making choices that the player can label as poor easily. The optimal move are often debatable but the really bad ones tends to stand one and that’s what I want to stay away from.

If you got ideas on all this, or question, don’t be shy :)

I have little experience here, but my first guess is to reduce the weight of second and further successes. If you continue after the first success/fail, you're planning with less certainty, so any success there should weigh less, since you may never get there.

This may mean that after a few win/losses, there is no point in continuing further look ahead, as the weight is too small.

Currently, you basically abort any fail, and only search win steps. I don't know whether "fail" means death (in which case aborting the search doesn't actually cut anything), but if you live after fail, you may want to consider a more balanced search between win/loose, but less deep (since win/loose further away doesn't mean much).

Advertisement
Minimax is not a reasonable choice for a game where actions have random outcomes. Try MCTS instead.


I’ve tried to find a “clean” solution to this but every ones I think about still has big flaws.
As my AI is kind of slow, I’d like to avoid adding a layer of complexity to it. I can compute things a bit differently but I cannot compute much more scenarios that it is already handling.

sounds like you're hitting the limits of complexity that your chosen AI implementation method can handle without undue work on your part.

it might be time to consider a different system.

Norm Barrows

Rockland Software Productions

"Building PC games since 1989"

rocklandsoftware.net

PLAY CAVEMAN NOW!

http://rocklandsoftware.net/beta.php

Thanx you 3 for your inputs !

I have little experience here, but my first guess is to reduce the weight of second and further successes. If you continue after the first success/fail, you're planning with less certainty, so any success there should weigh less, since you may never get there.

This may mean that after a few win/losses, there is no point in continuing further look ahead, as the weight is too small.

Currently, you basically abort any fail, and only search win steps. I don't know whether "fail" means death (in which case aborting the search doesn't actually cut anything), but if you live after fail, you may want to consider a more balanced search between win/loose, but less deep (since win/loose further away doesn't mean much).

In my game a "fail" just means that a skill will miss its target, it doesn't mean the loss of the game, far from it :) I don't see how weigting nodes at a more depth would give me a different result, since, If i understodd correctly you would weigth for example all node at depth 2 at 95%, all nodes at depth 3 at 90%,all nodes at depth 4, 85%, etc... How does it affects the one an AI would choose?

I may have forgotten to add an information about this: my AI game tree is rebuilt from the new game state after each move. I don't build one at the begnining of a fight and stick with it, so if I compute moves at a big depth and they are innacurate (for now it is not the case), it will only affect the next move decision of the AI. After that, i'll rebuild a new tree.

Minimax is not a reasonable choice for a game where actions have random outcomes. Try MCTS instead.

I've read a bit more about it. If I'm not mistaken, it should generally lead to better performance than my MiniMax alpha/beta but it wont solve my current issues (unless it can expand so many nodes that I can separate my hit/miss scenario into two nodes each time... but since I got lots of random numbers: chance to hit, chance to crit, lots of proc chances (x% to stun, x% to silence etc...) it would mean a really high number of node to explore, even for MCTS.

The algorithm also seems to requiere, when choosing a node, to be able to expand it to a win or loose state. In my game, expanding nodes that way (even for one move) is really slow too I must simulate all the computations the game engine normally does when registering a new move (and there is a LOT going on). Since that is one the two things holding me back time-wise, I think it would be more efficient to store a new node with that information (as I do now) rather than trying to find an end-point and kind of wasting all the game states computed to get there, don't you think ?

There is also the issue that a fight in my game can end in a unpredictable number of moves, as low as one, but most likely a few hundreds and possibly never end in some cases (extreme case: two healer with infiny mana fighting each other ;o). I could use my current evaluation function after X playout and call it a probable win or a loss based on this but again, my eval function is slow and I can't halp to think that it would be a "waste" to use it and not store that information.

I may missing important things here but so far it doesn't seems like a fit for my game :/


I’ve tried to find a “clean” solution to this but every ones I think about still has big flaws.
As my AI is kind of slow, I’d like to avoid adding a layer of complexity to it. I can compute things a bit differently but I cannot compute much more scenarios that it is already handling.

sounds like you're hitting the limits of complexity that your chosen AI implementation method can handle without undue work on your part.

it might be time to consider a different system.

It is quite possible. I'm opened to any suggestion :)

MCTS as mentioned above, or just completely rework it into a hand-authored utility-based system. At that point, you use formulas to score each potential action based on how reasonable it is. That could include hit %, damage it will do, amount of "mana" remaining, how much health you have left (which might increase or decrease the priorities of different actions), etc. You come up with a single score for each action, then chose the one that scored the highest.

For more on utility-based systems, see my 2 AI Summit lectures here.

Also, this AI Summit lecture shows an architecture that we developed around utility that implements similar to what you might want.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

Advertisement

I've read a bit more about it. If I'm not mistaken, it should generally lead to better performance than my MiniMax alpha/beta but it wont solve my current issues (unless it can expand so many nodes that I can separate my hit/miss scenario into two nodes each time... but since I got lots of random numbers: chance to hit, chance to crit, lots of proc chances (x% to stun, x% to silence etc...) it would mean a really high number of node to explore, even for MCTS.


The fact that there are lots of random events in your simulation shouldn't be a problem for MCTS. Perhaps the "Tree" part of MCTS would be difficult to use effectively, but Monte Carlo simulations could still be performed.


The algorithm also seems to requiere, when choosing a node, to be able to expand it to a win or loose state. In my game, expanding nodes that way (even for one move) is really slow too I must simulate all the computations the game engine normally does when registering a new move (and there is a LOT going on). Since that is one the two things holding me back time-wise, I think it would be more efficient to store a new node with that information (as I do now) rather than trying to find an end-point and kind of wasting all the game states computed to get there, don't you think ?


If your game is too slow to simulate, you could still use MCTS by simplifying the rules used in the MCTS. After all, my mental model of how the world works is a vast simplification of the actual Physics running the world, and yet I can make meaningful decisions in my day-to-day life based on my simple model. Your agents could do the same thing, and map the game to one that is simpler and much faster to simulate.


There is also the issue that a fight in my game can end in a unpredictable number of moves, as low as one, but most likely a few hundreds and possibly never end in some cases (extreme case: two healer with infiny mana fighting each other ;o). I could use my current evaluation function after X playout and call it a probable win or a loss based on this but again, my eval function is slow and I can't halp to think that it would be a "waste" to use it and not store that information.


That sounds like a potential game-design flaw. It's good to be able to guarantee finite games. There is usually some resource that is being depleted as the game progresses (perhaps mana and health points of the participants). Still, you could simulate a number of moves and then use an evaluation function to decide how happy you are with the outcome, instead of simulating to the end of the game. MCTS can accommodate that.


One more architecture to consider is an artificial neural net (ANN). A few years ago I would have told you myself that this is a terrible idea, but ANNs have improved a lot lately (people use the term "deep learning", which is short for "neural nets now work"). However, they are not the magic bullet that they appear to be at first, and you need significant expertise to make them work well.

If you are interested in exploring ANNs for your game, I can go into some detail of how they can be used.

To IADaveMark:

I've finally took some time to watch your 3 AI summit video Dave. It was a great watch smile.png

The utility based approach seems nice for behavior decision (move, talk, shoot, find ammo, etc...) but I don’t see it working well for my case where decisions all are based on the same motivation (kill or be killed) and the actions very similar to each other. What you suggest is (to simplify), to evaluate my possible actions based on a weighted scoring system of “considerations". I've done that already in my evaluation function: it averages hit chance and such to give me a value of threat for each unit based on each of its skills.

My evaluation function basically scores the result of the action instead of the action itself but it has the same effect (I can tell which one is the best for a given turn). For now I'm subject to RNG since I create my tree based on one possible outcome of a skill but I could use the kind of averaging modeling I used in my evaluation function to the Tree creation. That would leave me with some flaws thought: the more in depth I'll search, the farther from the truth my approximation will be [I’m pretty sure I’m forgetting another big flaw here because if I weren’t this would probably be a decent solution]

What bothers me more is that it seems the utility based AI you propose could reason with a task priority in mind but doesn't plan ahead. It would not be able to set-up "combo" of skills in contrary to a MiniMax (or MCTS) that would eventually discover it without the need to give our AI extra knowledge about the game. For example: casting a debuff that lower fire resistance then casting a fire spell. A tree based AI would test both cases and see that it's better to cast the debuff, then the fireball. It could probably be added to the utlity based AI as a consideration to the fireball but

  1. To be efficient that means a ***load of considerations for my skills that are basically already used and in place when I simulate a new move since my battle routine takes the fire resistance into consideration when applying damage
  2. The fire debuff skill would still be hard to score unless, again, we add lots of “considerations” like : how many fire offensive spells do me and my allies got, what are these spell efficiency, do they require me to lower the fire res to be effective, etc. Again, it seems very redundant with my current Evalution() function that can already assess this pretty well, using simulations of every units skill on every possible target.

To Alvaro:

The reason I cannot use MCTS is because my move generation is a slow process. Most examples I’ve seen have a very simple game data representation that they can modify really fast to generate a new state. In my case, a BattleState is over a thousand variables (float, not Boolean). For each move I’ve got to save the current state / generate the new one / revert to the old one. I’ll avoid the loading state / reverting if I do a full depth playout but it is still way too much work to be done.

Regarding the battle that may never end, it is by design. It probably won’t happen very often but is a very likely scenario. Not a big issue since as you said I could use my evaluation function after a while and use that to back-propagate a win/loss. The thing with RNG cases and any tree based structure is that to be somewhat exhaustive, I’d need, for a single move, to generate every combinatory possible of my different RNG factor. That is like a factorial branching multiplier for each node. Way too much for me :)

Concerning NNs, you already convinced me to try it 1-2 month ago. I gave it a try but came to the conclusion that it was not for me since I want the AI to be able to adapt in real time (even if I trough some new content at it) without the need to build-up knowledge (~in our case: the NN coefficients) before-hand.


In my game a "fail" just means that a skill will miss its target, it doesn't mean the loss of the game, far from it smile.png I don't see how weigting nodes at a more depth would give me a different result, since, If i understodd correctly you would weigth for example all node at depth 2 at 95%, all nodes at depth 3 at 90%,all nodes at depth 4, 85%, etc... How does it affects the one an AI would choose?
I aimed to suggest you use chance of win/fail as multiplication factor of the sub-tree.

Say you have 80% chance of winning. The result from the 'win' subtree thus counts for 0.8, and the result from the 'fail' side counts for 0.2. Since at each level you multiply by such a factor, its significance goes down real quickly, but you explore both sides.


The utility based approach seems nice for behavior decision (move, talk, shoot, find ammo, etc...) but I don’t see it working well for my case where decisions all are based on the same motivation (kill or be killed) and the actions very similar to each other.

That's not entirely true. In either case, you have multiple actions that you can take. In some of the examples in my lectures, there were simply many types of actions. You have many types of actions as well -- they are just more focused on combat. The bottom line is that you want to figure out what the best action to do at that time are... it doesn't matter what the type of action is.


What bothers me more is that it seems the utility based AI you propose could reason with a task priority in mind but doesn't plan ahead. It would not be able to set-up "combo" of skills in contrary to a MiniMax (or MCTS) that would eventually discover it without the need to give our AI extra knowledge about the game. For example: casting a debuff that lower fire resistance then casting a fire spell. A tree based AI would test both cases and see that it's better to cast the debuff, then the fireball. It could probably be added to the utlity based AI as a consideration to the fireball but
To be efficient that means a ***load of considerations for my skills that are basically already used and in place when I simulate a new move since my battle routine takes the fire resistance into consideration when applying damage
The fire debuff skill would still be hard to score unless, again, we add lots of “considerations” like : how many fire offensive spells do me and my allies got, what are these spell efficiency, do they require me to lower the fire res to be effective, etc. Again, it seems very redundant with my current Evalution() function that can already assess this pretty well, using simulations of every units skill on every possible target.

You are correct that it doesn't do any sort of "planning". It is simply a single ply reactive system just like most of the other AI architectures out there (other than GOAP or HTN which are, by definition, planners). However, even MCTS is not a planner, per se. It is only scoring the fact that if you WERE to continue down this branch later you would get these benefits. That's really no different than the other architectures (mine included) where you are going to be reevaluating the next time around anyway. Either way, you are coming up with a way of determining that doing A is a good idea because you could follow up with B -- which is preferable to doing B without A (or B then A). All of this can be codified quite easily through the addition of 1 or more considerations in my IAUS architecture.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

This topic is closed to new replies.

Advertisement