AI Scripting for RTS
Hi, I'm currently planning/designing the high-level AI scripting language for an empirical real-time strategy game called 0AD. It's similar to Age of Empires, Warcraft, and Command & Conquer type games. Anyway, I'm relatively new to AI although I feel I've learned enough to do this job well. I'm taking an object-oriented approach using Mozilla's javascript engine (Spidermonkey). It'll most certainly make use of several behavorial design patterns that amazingly fit perfectly with various AI techniques I plan to adopt. Although, I may lack an understanding with what I'm about to say... the plan is to create a Finite State Machine manipulated by hierarchical control. This would require an Expert System that looks up state and anticipates transitions to build a decision tree that could be analyzed using Boolean or Fuzzy logic. The hierarchy will probably relate to military ranks with the addition of an economic node. For example, The generals usually are responsible for what-if analysis and strategy... but individual unit ai are responsible for very specific things such as where do I go? Do I attack? or am I just scouting? I came up with this after spending some time reading articles here and many other places. The hierarchy idea isn't new, but I never seen it implemented before. Basically, the individual unit reports to the next node up which makes decisions and/or reports to the next node up until it reaches the root node (probably the General). The root node uses the various reports it recieves and gives orders to the child nodes which eventually get carried out by the units. So, what do you guys think about this? In addition, I'd like to somehow give the AI the ability to learn even if in a limited way. It could be automated or it could be supplied in the form of a scripting API somehow, but I don't really know where to start and how it can be worked into the above plan. Thanks for any input :).
We use neuronal networks for a "learning" AI in our upcoming MMORPG. Ever spend a thought on that?
Dark Destiny - Probably your final Cyberpunk-Experience....
I can barely read it while understanding less implement something like that so it's out of the question. I prefer to keep complications to a minimum because I focus on ease of use for the poor souls who'd like to script their own AI. It's not an in-team only feature.
One of the first things to figure out when learning, is what you want to your system to train. From the idea you have, the fuzzy logic operators might be an option. How precisly they are learned is often not as important, but a relatively straight-forward gradient descend is a starting point at least.
To illustrate a bit, consider the following:
Say you have a fuzzy logic rule to determine if you are to fire your siege weapons at an enemy army and that that rule is a consequence of (among others) the distance to the enemy. Something like: IF distance_is_medium THEN fire_catapult.
Say that you have a triangular membership function for distance_is_medium. Then each time the distance_is_medium antecendent in the rule triggers with some value>0 you record the final decision made and how you evaluate that decision.
This evaluation is the next (very) hard point. You can give manual feedback to the system when it wants it (which is obviously not suitable for a game shipped to the end-user), you can provide feedback to all actions after the entire combat or game is complete (which for this situation doesn't seem useful), you can give some instant feedback based on where the shot landed (but that doesn't take the global picture into account, only the local from the POV of the catapult) or something in between. In short, this is often very hard and will probably require experimentation.
Once you got an evaluation of the action, you can decide to change it. The gradience decend method, which is also at the core of the backprop algorithm for neural networks, is easily visualized by a graph where the current value you want to optimize is a point. When trying to optimize this point to a better (lower) value, you choose to move the point in the direction with steepest decent. How much you move is determined by what is commonly referred to as a learning rate. The lower the learning rate, the smaller the steps you take and the less chance you have of overstepping the bottom of the slope.
Put into context, if you figure out that the catapult fired when the enemy was too close, you adjust membership function to shift towards a median with longer range. You can either move the entire triangle or adjust the slopes.
Again, training these low-level things will probably require quite a bit of experimentation. You might end up with scripts and some rand() doing the trick just as good (or better) from the players point of view. Training higher level parameters, such as what kind of units to favor or how agressive to play might be easier.
I hope this at least gave you an idea of how it can be done. I am convinced there are far better ways of doing it :)
To illustrate a bit, consider the following:
Say you have a fuzzy logic rule to determine if you are to fire your siege weapons at an enemy army and that that rule is a consequence of (among others) the distance to the enemy. Something like: IF distance_is_medium THEN fire_catapult.
Say that you have a triangular membership function for distance_is_medium. Then each time the distance_is_medium antecendent in the rule triggers with some value>0 you record the final decision made and how you evaluate that decision.
This evaluation is the next (very) hard point. You can give manual feedback to the system when it wants it (which is obviously not suitable for a game shipped to the end-user), you can provide feedback to all actions after the entire combat or game is complete (which for this situation doesn't seem useful), you can give some instant feedback based on where the shot landed (but that doesn't take the global picture into account, only the local from the POV of the catapult) or something in between. In short, this is often very hard and will probably require experimentation.
Once you got an evaluation of the action, you can decide to change it. The gradience decend method, which is also at the core of the backprop algorithm for neural networks, is easily visualized by a graph where the current value you want to optimize is a point. When trying to optimize this point to a better (lower) value, you choose to move the point in the direction with steepest decent. How much you move is determined by what is commonly referred to as a learning rate. The lower the learning rate, the smaller the steps you take and the less chance you have of overstepping the bottom of the slope.
Put into context, if you figure out that the catapult fired when the enemy was too close, you adjust membership function to shift towards a median with longer range. You can either move the entire triangle or adjust the slopes.
Again, training these low-level things will probably require quite a bit of experimentation. You might end up with scripts and some rand() doing the trick just as good (or better) from the players point of view. Training higher level parameters, such as what kind of units to favor or how agressive to play might be easier.
I hope this at least gave you an idea of how it can be done. I am convinced there are far better ways of doing it :)
"I hope this at least gave you an idea of how it can be done. I am convinced there are far better ways of doing it." Hopefully there is ;). I understand that I have to somehow remember decisions and evaluate their relative outcomes, but how to do it extensibly with little complication is an issue.
For me, it's not what I want the system to train, it's how do I enable scripters to decide the training. Maybe the learning can be based upon the hierarchy which means it'll learn on different levels which simplifies the process, but the interface/implementation is still a mystery to me. Remembering decisions is the easy part, evaluating them as you said gets tricky, and modifying the evaluations seem even harder.
I guess you helped me specify my queries much better:
How exactly do I evaluate situations using the type of system described above, what does an evaluation look like, and how do I go about modifying them?
Your examples of how to evaluate just don't fit into the whole RTS model.
For me, it's not what I want the system to train, it's how do I enable scripters to decide the training. Maybe the learning can be based upon the hierarchy which means it'll learn on different levels which simplifies the process, but the interface/implementation is still a mystery to me. Remembering decisions is the easy part, evaluating them as you said gets tricky, and modifying the evaluations seem even harder.
I guess you helped me specify my queries much better:
How exactly do I evaluate situations using the type of system described above, what does an evaluation look like, and how do I go about modifying them?
Your examples of how to evaluate just don't fit into the whole RTS model.
Okay, how does this sound. When the AI makes a decision, the system triggers observers that keep track of state. The ending state is stored if positive or if negative or both? A negative state can act as a filter against the generated decision tree while a positive state can act as a vote for a specific decision route. I'm not sure if this makes sense or if I'm on to something. If I'm on to something, is there a clear definition of what exactly this is which can be used to expand on it?
What you mention there (if I understand you correctly) is what is called reinforcement learning. You provide your agent with a "delayed reward" in a final state and use that to reward all the actions that lead to that state. The problem is deciding how to change each of the actions, ie which action that is most "responsible" for achieving that reward.
An algorithm called Q-learning can help you here. In short it will discount the reward to an action the further away from the reward state that action is made. Then you can use the discounted reward to bias the node in your decision tree where that decision was made. As mentioned in the last post, you can to this biasing by changing the relevant membership functions at that node.
An algorithm called Q-learning can help you here. In short it will discount the reward to an action the further away from the reward state that action is made. Then you can use the discounted reward to bias the node in your decision tree where that decision was made. As mentioned in the last post, you can to this biasing by changing the relevant membership functions at that node.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement