To use backpropagation without having to determine the "right" move each time, you need to play the games to the end, then "credit" the moves which contributed to a win. This is the technique that was used very successfully for backgammon (search for TD-Gammon). I suspect the details of how to implement Temporal Difference learning are tricky, but it should be possible to learn with Tic-tac-toe.
This book is an academic treatment of the whole area:
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html
Tic-Tac-Toe
An actor/critic machine-learning approach would satisfy everything you want in such simple games.
Even without rotations, there are only 3^9 possible game positions (some of which arent legal) and at most 9 possible choices. Simply put, you dont have to do any generalization of any kind and can self-play to perfection in a few milliseconds at most.
Imperfect play is built right into most actor/critic models in the form of occasionally choosing a random move (used to explore alternatives to what is currently considered best)
Even without rotations, there are only 3^9 possible game positions (some of which arent legal) and at most 9 possible choices. Simply put, you dont have to do any generalization of any kind and can self-play to perfection in a few milliseconds at most.
Imperfect play is built right into most actor/critic models in the form of occasionally choosing a random move (used to explore alternatives to what is currently considered best)
I used this AI struct for my tictac toe, its intelligent, is difficult to beat, but can be beaten.
1. If the computer can win on this move, make that move.
2. if the human can win on his next move, block it.
3. Otherwise, take the best remaining open square. The best square is the center. The next best squares are the corners, and then the rest of the squares.
If you want you can spice up 3 a bit and have it try to do opposite corner's, or even better add in a small (about 30%) chance the computer will not take the "next best" move but the "secondbest" place.
1. If the computer can win on this move, make that move.
2. if the human can win on his next move, block it.
3. Otherwise, take the best remaining open square. The best square is the center. The next best squares are the corners, and then the rest of the squares.
If you want you can spice up 3 a bit and have it try to do opposite corner's, or even better add in a small (about 30%) chance the computer will not take the "next best" move but the "secondbest" place.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement