Advertisement

Do you think this is do-able (long)?

Started by October 15, 2004 03:27 PM
9 comments, last by hatch22 20 years, 1 month ago
ST: It appears we have had yet another (perfectly reasonable) misunderstanding. When I said noise, I was not referring to sound (as compared to sight), but rather to the more mathematical use of the word noise, meaning a random and persistent disturbance that obscures or reduces the clarity of a signal. Basically the farther out you go, the more the precision of ALL your senses degrades (including sight). I appologize for momentarily slipping into math and physics lingo. I am an engineer, and despite my training in avoiding jargon unless it is accompanied by sufficient explaination, I do make mistakes. My bad.

The mouse speed issue actually occurred to me after finishing my last post. It is a difficult problem to tackle because different players prefer different mouse sensitivities. Suffice it to say that the agent will not be able to aim any faster than would be possible with the mouse sensitivity turned up all the way. This would at least give it an upper limit. The use of a noise signal applied to the aiming direction (so that it shakes slightly) should avoid perfect aim at long range as well. More complex solutions may become apparent later.

I did indeed misinterpret what you meant by misdirection and manipulation, and your clarification is noted. However, I don't think I was too far off the mark, since I demonstrated that the agent is unable to do anything more than try to predict what might happen. Prediction with certainty is quite impossible with incomplete knowledge of the environment. Thankfully it is in situations with incomplete knowledge of the nature of environment that RL performs better than other learning algorithms.

Also, for a really good explaination and demonstration of reinforcement learning that I just found, go here. The cat and mouse java applet is fun to play with. If anyone is interested, my particular approach to reinforcement learning is a Sarsa algorithm using a softmax policy. If you don't understand what I just said, check out that site.

This topic is closed to new replies.

Advertisement