Hi,
I'm trying to get to grips with the basics of RL, I've been doing my best to get through Sutton and Barto's "Reinforcement Learning: An Introduction" online book. But I'm having a bit of difficulty with states.
One of the examples they go into is Tic-Tac-Toe it's more a comparison of RL vs Search vs Evolutionary Algorithms but I've been trying to work through that. Are all the states at the start of the game "empty" states because they are neither an 'X' nor an 'O'? And a state could be consider a position in the board? And each state has a Q-value for the possible actions that could be taken from that state right? Does that mean in a game like Tic-Tac-Toe a possible Q-value table would be something like:
State | Possible Action
(0,0)| Place X / Leave Blank
(1,1)| Place X / Leave Blank
(1,2)| Place X / Leave Blank
(1,3)| ...
... | ...
or is it:
State | Possible Action
(0,0)| (1,1), (1,2), (1,3), (2,1), (2,2), ...
(1,1)| (1,2), (1,3), (2,1) ...
(1,2)| (1,1), (1,3), (2,1) ...
(1,3)| ......basically all the empty squares but the current one
And so, (I apologise if I make a mess of my attempt at an explanation, I try hard but I'm not that smart!) if 'X' plays first, it would be in the state (0,0) (an initial state, i.e. not being on the board) and then selecting an action ("place an X in (1,1)), which leads to:
1 2 3
1 _X|__|__
2 __|__|__
3 | |
Would it be correct to say that the new state (s') for X is now (1,1)? And because they haven't lost or won the reward = zero? So when updating the Q-value
The next move is then made by 'O:
1 2 3
1 _X|__|__
2 __|_0|__
3 | |
Do the states for 'O' contain information about 'X' positions or is that sort of thing taken care of because it will either make an illegal move in the game (trying to go place an 'O' in an occupied 'X' square) or it will have lost the game and through the reward value (-1) will be probabilisically less likely to chose a losing move in future games?
I'd be really grateful for some advice on this and I'd also like to thank anyone who manages to reads this humungous post!