Do you think this is do-able (long)?

hatch22 · 2004-10-16T19:46:38

Greetings everyone. I wanted to ask what you all thought of a project a friend of mine and I have started working on. If this description gets long, I apologize in advance. I simply want to make sure we are all on the same page as much as possible. I have been studying reinforcement learning (particularly Temporal Difference algorithms), especially in conjunction with function approximators such as ANN's for solving the value function (see an example and code). I want to see if this strategy can actually be adapted to an FPS-style fighting game in real time. Both my friend and I are experienced programmers and are using existing engines to create a 3D environment for the agent. To keep things simple, the agent will be a companion to the player, and all other opponent AI will be using a FSM. This way the expensive iterations through the ANN are kept to a minimum. Most of the examples of reinforcement learning I have seen applied to a game environment (or most other environments for that matter) completely learn on their own through sheer trial and error, gradually learning after a great many iterations. I am going to try to speed up the learning process in two ways: First and most importantly, in addition to the agent training the ANN by adjusting the weights based on reward, I also want the ANN to be updated by the player's behavior. The agent has the same interface to the game as the player (i.e. cursor position, movement controls, etc.) and when the player performs actions and receives a reward, the ANN is updated in the same manner as if the agent had performed the action. This allows good strategies performed by the player to be taught to the agent via the value function approximated by the ANN. Also, the player's avatar and the agent's avatar are very similar (though not identical) so that most of the states that the player encounters are likely to be experienced by the agent as well. In effect the agent learns by example, but retains the ability to experiment on its own and generalize from past experience using the ANN. You might think of it as a kind of real time semi-supervised learning. The second strategy in my design is to separate the AI into two parts. The low level AI is the one described above, learning by directly imitating the player's keyboard and mouse input based on the state the player or agent finds themselves in. Since it would be difficult to learn any long term strategies this way, there is a second layer to the AI that monitors the human player's behavior in terms of what path the player selects to follow given the player's state. This AI is aware of what parts of the level have been explored and what parts of the level are currently visible, using a waypoint grid. The possible paths that the player might take are determined by using a combined version of Dynamic A* Lite (for search speed) and Tactical A*. The High level AI observes what paths the player takes and what rewards are received for those choices, once again applying reinforcement learning and generalizing with an ANN. The Agent's behavior is also monitored in the same manner, and the low level AI is rewarded by the high level AI if it stays on the appropriate path selected by the High level AI's ANN (more info here and here). So at a more strategic level the Agent learns to mimic the player when the player performs actions that are beneficial and avoid behaviors that are detrimental. Since the goal of such an agent is not to be the most intelligent, but instead to appear human, I believe that this approach of learning by example has merit. The agent will be able to find state-action paths that lead to reward more quickly by following the example of the human companion, but it will not be prevented from performing its own experimentation and will be able to learn on its own if the player is absent. Breaking up the AI into a hierarchy of high and low level reinforcement learning segments has been shown in other research to provide slightly suboptimal results in exchange for a large reduction in state space and run time. Since the goal is not optimal behavior but believable behavior, I believe this is an acceptable trade-off. The reinforcment learning AI will then consist of two smaller ANNs, one per AI level (rather than one huge one larger than the two combined) and four iterations of the same reinforcement learning algorithm (one for each AI level and one for each avatar). Iterations will also be staggered across frames so that the entire system does not run every frame and kill the frame rate. It is our hope that such a system dealing with only one agent and one human player along with a small number of simpler opponents will allow real time performance. That is the idea of what we are attempting. Do any of you see any potential pitfalls or alternatives that I should explore? Do you think that such a system is too complex to run in real time on a high end machine? Do you have any questions regarding this approach or need clarification? I would basically like any and all constructive feedback anyone wishes to give. I am fully aware that this approach may not work at all in practice, but do you think that the theory is sound, or am I overlooking something important? Any suggestions for improvements? If anyone would like to see more sources, just ask and I will provide them. For an overview, check out the reinforcement learning survey. Thank you for taking the time to read all this. [Edited by - hatch22 on October 15, 2004 3:45:41 PM]

Artificial Intelligence Programming

Started by hatch22 October 15, 2004 03:27 PM

9 comments, last by hatch22 20 years, 1 month ago

hatch22

Author

178

October 16, 2004 07:46 PM

ST: It appears we have had yet another (perfectly reasonable) misunderstanding. When I said noise, I was not referring to sound (as compared to sight), but rather to the more mathematical use of the word noise, meaning a random and persistent disturbance that obscures or reduces the clarity of a signal. Basically the farther out you go, the more the precision of ALL your senses degrades (including sight). I appologize for momentarily slipping into math and physics lingo. I am an engineer, and despite my training in avoiding jargon unless it is accompanied by sufficient explaination, I do make mistakes. My bad.

The mouse speed issue actually occurred to me after finishing my last post. It is a difficult problem to tackle because different players prefer different mouse sensitivities. Suffice it to say that the agent will not be able to aim any faster than would be possible with the mouse sensitivity turned up all the way. This would at least give it an upper limit. The use of a noise signal applied to the aiming direction (so that it shakes slightly) should avoid perfect aim at long range as well. More complex solutions may become apparent later.

I did indeed misinterpret what you meant by misdirection and manipulation, and your clarification is noted. However, I don't think I was too far off the mark, since I demonstrated that the agent is unable to do anything more than try to predict what might happen. Prediction with certainty is quite impossible with incomplete knowledge of the environment. Thankfully it is in situations with incomplete knowledge of the nature of environment that RL performs better than other learning algorithms.

Also, for a really good explaination and demonstration of reinforcement learning that I just found, go here. The cat and mouse java applet is fun to play with. If anyone is interested, my particular approach to reinforcement learning is a Sarsa algorithm using a softmax policy. If you don't understand what I just said, check out that site.

Do you think this is do-able (long)?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Do you think this is do-able (long)?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines