Qlearning
I want to use this kind of algorithm to make a maze with one agent finding way to goal, of course there are alot of obstacles set up in the maze. The agent will randomly move and through experience, it will find out the shortest way to reach the goal.
Could anybody here be any help or give me any suggestions and instructions please.? Thank you so much.
(nice teeth)
W.F.Disneyland
Whenever I want to cry, I think of forums and chitchat
Whenever I want to cry, I think of forums and chitchat
January 06, 2003 10:33 PM
Well this simplist algorithm for a simple one entrance one exit maze you can always use the right hand algorithm. Think of it like this...You are walking through the maze and always keep your right hand on the right wall. If you keep moving forwards following the right wall eventually you will exit the max unless there is a loop in the maze in a certine way but that is unlikely and would more or less have to be engienered that way.
Hope that helps.
Hope that helps.
Thank you for your help!!!
But my problem is different. Basically, mine is taken from Suton''s book(An introduction to machine learning). The agent will have 4 directions to randomly move(up down left right). And everytime it reaches the goal, Q_values will be update. Could you please point out some useful web pages for me to solve this problem ? Thank you.
Whenever I want to cry, I think of forums and chitchat
But my problem is different. Basically, mine is taken from Suton''s book(An introduction to machine learning). The agent will have 4 directions to randomly move(up down left right). And everytime it reaches the goal, Q_values will be update. Could you please point out some useful web pages for me to solve this problem ? Thank you.
Whenever I want to cry, I think of forums and chitchat
Whenever I want to cry, I think of forums and chitchat
Create an array that for each map cell, can store a Q value (a float number) for each posible action in that cell.
This array will contain the Q values. Initially all the rewards are set to 0.
Start with your character moving randomly.
If your character reaches a trap, the reward is -1 (or some negative number), and if the character reaches the exit, the reward is 1.
Each time your character moves to a new state, actualize the Q values:
Q'(s,a) = (1-r)*Q(s,a) + r*[reward + y*max(Q(s',b)-Q(s,a)) ]
(I hope to have remembered well the equation, because I haven't used reinforcement learning since a long time ago )
where:
s is the current state
a is the action you take (move-left, move-right, etc.)
s' is the state you reach after aplying the action a
reward is the reward obtained in state s' (-1 for traps, 1 for goals and 0 otherwise)
Q(s,a) was the previous Q value for the state s and action a
Q'(s,a) is the actualized Q value for state s and action a
0 < r < 1 is the learnin rate
0 < y < 1 is the discount factor
max(Q(s',b)-Q(s,a)) is the expected reward in state s', computed as the maximum difference in the Q values of aplying any action b in the new state s' and the Q value of the current state s aplying action a.
To make the system learn, use a factor e (the explotation-exploration factor). Initially set it to 1, and start decreasing it until it reaches a small value (but never 0), 0.1 or 0.05 will do. With probability 1-e select each time the best action each time (higher Q value). In this way, at the begining (high e), the character will explore random states (in order to learn), and as time passes(and e decreases), it will stop using random moves, and will start choosing the best actions.
It is important not to set e=0, for giving allways a chance to the system to learn further.
I hope this helps. But I have to tell you: I think that you are not going to make it learn to find the exit of BIG mazes... reinforcement learning is SLOW is the number of states is big...
cheers
[edited by - popolon on January 7, 2003 10:26:12 AM]
This array will contain the Q values. Initially all the rewards are set to 0.
Start with your character moving randomly.
If your character reaches a trap, the reward is -1 (or some negative number), and if the character reaches the exit, the reward is 1.
Each time your character moves to a new state, actualize the Q values:
Q'(s,a) = (1-r)*Q(s,a) + r*[reward + y*max(Q(s',b)-Q(s,a)) ]
(I hope to have remembered well the equation, because I haven't used reinforcement learning since a long time ago )
where:
s is the current state
a is the action you take (move-left, move-right, etc.)
s' is the state you reach after aplying the action a
reward is the reward obtained in state s' (-1 for traps, 1 for goals and 0 otherwise)
Q(s,a) was the previous Q value for the state s and action a
Q'(s,a) is the actualized Q value for state s and action a
0 < r < 1 is the learnin rate
0 < y < 1 is the discount factor
max(Q(s',b)-Q(s,a)) is the expected reward in state s', computed as the maximum difference in the Q values of aplying any action b in the new state s' and the Q value of the current state s aplying action a.
To make the system learn, use a factor e (the explotation-exploration factor). Initially set it to 1, and start decreasing it until it reaches a small value (but never 0), 0.1 or 0.05 will do. With probability 1-e select each time the best action each time (higher Q value). In this way, at the begining (high e), the character will explore random states (in order to learn), and as time passes(and e decreases), it will stop using random moves, and will start choosing the best actions.
It is important not to set e=0, for giving allways a chance to the system to learn further.
I hope this helps. But I have to tell you: I think that you are not going to make it learn to find the exit of BIG mazes... reinforcement learning is SLOW is the number of states is big...
cheers
[edited by - popolon on January 7, 2003 10:26:12 AM]
I did something like this a long time ago.
You have a maze in one array. Empty sqaure = 0, wall = 1, exit = 2.
In another array, same size as maze, you have all 0s.
Put your guy somewhere in the maze (entrance). Move him randomly n times (where n is just some number so that you don''t wait forever). To make things run a little faster, don''t let him move on to squares he just came from.
When the guy hits the exit, go in to your second array and for every square the guy moved over in the maze, increment the corresponding element by 1. If the guy doesnt hit the exit, ignore and repeat.
You can keep doing this for a 100 iterations, and eventually the shortest path will be the squares in the second array with the highest score.
Cheers,
Will
You have a maze in one array. Empty sqaure = 0, wall = 1, exit = 2.
In another array, same size as maze, you have all 0s.
Put your guy somewhere in the maze (entrance). Move him randomly n times (where n is just some number so that you don''t wait forever). To make things run a little faster, don''t let him move on to squares he just came from.
When the guy hits the exit, go in to your second array and for every square the guy moved over in the maze, increment the corresponding element by 1. If the guy doesnt hit the exit, ignore and repeat.
You can keep doing this for a 100 iterations, and eventually the shortest path will be the squares in the second array with the highest score.
Cheers,
Will
------------------http://www.nentari.com
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement