Im not understanding very well..
Try and error vs past cases...but if you want learn something from try and error, you will save the goddam tries (witch in my brain can be seen as past cases, witch in my brain is the same thing as a state)
i.e.
When player was at that pos, I shooted that, he shooted that, I died, player lives;
Learned that I shouldnt shoot that when player is at that pos with that wepaon.
Save state/case, for reference next time samething happened..
My knowledge on the subject are very limited as you can see.
Whats the difference between Reinforcement Learning and Case Based Reasoning?
I think there are two answers.
1. "Reinforcement learning" tends to involve a Dynamic Programming step that propagates information from neighboring states.
2. "Reinforcement learning" is typically "eager learning," (either of a policy, or of something that implies a policy, like a cost-to-go function or q-table) whereas "case-based reasoning" is typically "lazy learning."
1. "Reinforcement learning" tends to involve a Dynamic Programming step that propagates information from neighboring states.
2. "Reinforcement learning" is typically "eager learning," (either of a policy, or of something that implies a policy, like a cost-to-go function or q-table) whereas "case-based reasoning" is typically "lazy learning."
I realize my reply may not have been very helpful, so let me be more specific by giving examples of how each method can work:
Case-based reasoning
Save a state trajectory -- a "replay" -- of every game that you play to a database. Then, when the player is at state x, look in your database for saved state trajectories that pass near x. Then, take whatever action was performed in the most successful trajectory (i.e., the one that has the best final state).
Reinforcement learning
Store a "how good is this state" number for each state. This is called the "value" of a state. Then,
- To select an action during play, take a look at the different states you can get to, and pick the action that takes you to the one with the best value.
- The value of state is tricky: It needs to encode the future rewards that the player will get. Or in other words, it needs to account for the reward that you'll get at the next state, and the one after that. So to update the value of state x, move it closer to this quantity:
[the reward you just got at state x] + [the best value you can get from a neighboring state].
If you want to know the details, take a look at Q-learning; most "reinforcement learning" algorithms are variations on this theme.
Case-based reasoning
Save a state trajectory -- a "replay" -- of every game that you play to a database. Then, when the player is at state x, look in your database for saved state trajectories that pass near x. Then, take whatever action was performed in the most successful trajectory (i.e., the one that has the best final state).
Reinforcement learning
Store a "how good is this state" number for each state. This is called the "value" of a state. Then,
- To select an action during play, take a look at the different states you can get to, and pick the action that takes you to the one with the best value.
- The value of state is tricky: It needs to encode the future rewards that the player will get. Or in other words, it needs to account for the reward that you'll get at the next state, and the one after that. So to update the value of state x, move it closer to this quantity:
[the reward you just got at state x] + [the best value you can get from a neighboring state].
If you want to know the details, take a look at Q-learning; most "reinforcement learning" algorithms are variations on this theme.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement