Lines of Action - Using TD+NN for a deterministic perfect information board game
I've recentely become interesetd in the board game Lines of Action and have been poking away at creating some evaluation functions for the game. I've simultaneously become interested in reinforcement learning and thought that this game woukld be a good test bed for practising.
One thing I've seen mentioned in the literature is the introduction of random moves to prevent the NN getting stuck in a loop. I'm wondering if there are any good guideline for introducing the random moves. Should it be a fixed percentage of moves or should the randomossity ramp up if the game is going on to long
Two totally random players (not even checking to see if the next moves leads to a winning/losing state) facing off against each other take about 120 moves +- 30 all together to play a game to completion. A very basic hand crafted evaluation function with a half-ply search wins against the random player in about 23 moves.
If you just don't want the players to get stuck in an infinite loop, implement stalemate conditions. Set a limit to how long the game can be and let the AI learn to avoid making the moves that led to the long game. Then once it's learned to avoid making stupid moves that make the game take too long, it can work on learning how to win the game faster.
You still probably need randomness to allow the AI to explore different positions, but having a fixed chance of performing random moves can accomplish that. The rate at which the AI chooses random moves can be adjusted during training but I don't think there's much to be gained by changing it during a game.
You still probably need randomness to allow the AI to explore different positions, but having a fixed chance of performing random moves can accomplish that. The rate at which the AI chooses random moves can be adjusted during training but I don't think there's much to be gained by changing it during a game.
I think you've already answered your own question in a way: A very basic hand crafted evaluation function. Expand it to do more than 1/2 ply search, and refine it as you find weaknesses.
The fact that you know the magic words "deterministic perfect information game" shows that I don't need to tell you about minimax. Why oh why are you using NN for this?
Sneftel, unlike the usual abuses of neural networks, this one has actually been shown to have some promise. A computer backgammon player, TD-gammon, uses temporal differencing and backpropagation to train a neural-network based evaluation function. TD-gammon is actually one of the best backgammon AIs, and it plays on the level of human experts.
The goal is for the AI to improve its ability beyond the knowledge provided by an expert. Temporal differencing is used to improve an evaluation function. A neural network is just a convenient evaluation function for this purpose since there has already been work done on combining backpropagation with temporal differencing and the resulting algorithm is on the same order of complexity as ordinary backpropagation.
The neural network can still be given hand-crafted features as input in addition to a raw encoding of the board state in order to give it a head start. TD-gammon played at the level of its predescessor, neurogammon, a neural network based player that didn't use temporal differencing, with only a very raw encoding of the board. When given the higher level features that were designed for neurogammon as input, TD-gammon performed even better. Apparently TD-gammon learned to evaluate certain board positions in ways that were interesting to expert backgammon players.
The same techniques have been applied to other games but without as much success. The success of TD-gammon was apparently a bit of a surprise itself. Still it shows that temporal differencing + neural networks can be effective for learning an evaluation function for use with minimax (or at least expectiminimax...). The neural network can be replaced with any other function representation, but the basic algorithm would still be the same (reinforcement learning with temporal differencing).
The goal is for the AI to improve its ability beyond the knowledge provided by an expert. Temporal differencing is used to improve an evaluation function. A neural network is just a convenient evaluation function for this purpose since there has already been work done on combining backpropagation with temporal differencing and the resulting algorithm is on the same order of complexity as ordinary backpropagation.
The neural network can still be given hand-crafted features as input in addition to a raw encoding of the board state in order to give it a head start. TD-gammon played at the level of its predescessor, neurogammon, a neural network based player that didn't use temporal differencing, with only a very raw encoding of the board. When given the higher level features that were designed for neurogammon as input, TD-gammon performed even better. Apparently TD-gammon learned to evaluate certain board positions in ways that were interesting to expert backgammon players.
The same techniques have been applied to other games but without as much success. The success of TD-gammon was apparently a bit of a surprise itself. Still it shows that temporal differencing + neural networks can be effective for learning an evaluation function for use with minimax (or at least expectiminimax...). The neural network can be replaced with any other function representation, but the basic algorithm would still be the same (reinforcement learning with temporal differencing).
For what its worth, many chess engines use TD learning in one manner or another. They rarely use NN's though.
For example:
EXCHESS (the one I am most familiar with) uses TD to learn the value of each entry in its piece-square tables, as well as a more general value of each piece.
KNIGHTCAP uses a lot of TD as well and perhaps took it a bit too far.
TD is *perfect* for coefficient-based evaluation functions. Self-learning has proven to give fairly good coefficients, and has even opened the eyes of many experts on the contextual value of pieces in chess.
TD allows the use of an extremely extensive volume of coefficients (hundreds, thousand, or even hundreds of thousands) without manual tuning -> it affords the investment of CPU cycles in place of human hours and expertise
I don't know how appropriet NN's are for the game in question, but I do know that they have proven less than satisfactory for chess position evaluation (the form that NN's coefficients take on is too generalized -> an intractible number of CPU cycles would be required to train them)
For example:
EXCHESS (the one I am most familiar with) uses TD to learn the value of each entry in its piece-square tables, as well as a more general value of each piece.
KNIGHTCAP uses a lot of TD as well and perhaps took it a bit too far.
TD is *perfect* for coefficient-based evaluation functions. Self-learning has proven to give fairly good coefficients, and has even opened the eyes of many experts on the contextual value of pieces in chess.
TD allows the use of an extremely extensive volume of coefficients (hundreds, thousand, or even hundreds of thousands) without manual tuning -> it affords the investment of CPU cycles in place of human hours and expertise
I don't know how appropriet NN's are for the game in question, but I do know that they have proven less than satisfactory for chess position evaluation (the form that NN's coefficients take on is too generalized -> an intractible number of CPU cycles would be required to train them)
Quote: Original post by VorpyIt's rather like inserting the random mutations into a new generation of GAs. Same idea... fresh ideas that maybe haven't been covered.
You still probably need randomness to allow the AI to explore different positions, but having a fixed chance of performing random moves can accomplish that.
Related question: Has anyone ever put a bunch of NNs through a generational process like GAs? Talk about over the top! :-D
Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play
"Reducing the world to mathematical equations!"
Quote: Original post by Vorpy
Sneftel, unlike the usual abuses of neural networks, this one has actually been shown to have some promise. A computer backgammon player, TD-gammon, uses temporal differencing and backpropagation to train a neural-network based evaluation function. TD-gammon is actually one of the best backgammon AIs, and it plays on the level of human experts.
I have to say, I'm rather surprised about that. Then again, backgammon has a wide enough branching factor that minimax (or expectminimax) is unlikely to do very well regardless of how clever your heuristic is. It's an interesting result.
Quote: Original post by InnocuousFox
Related question: Has anyone ever put a bunch of NNs through a generational process like GAs? Talk about over the top! :-D
Over the top perhaps, but there's been a lot of work done on the topic. One advantage of GA-based NN learning over simple gradient descent is the potential to learn network connectivity as well as weighting parameters. Then again, since there's been so little mathematical formalism applied to GAs, it's difficult to determine how useful that is except through empiricism.
Quote: Original post by InnocuousFox
Related question: Has anyone ever put a bunch of NNs through a generational process like GAs? Talk about over the top! :-D
I know of some work done in checkers: http://en.wikipedia.org/wiki/Blondie24. I read their book and it was entertaining, but I didn't particularly like it from a technical point of view. The original experiment was supposed to be about making a program that would come up with a good evaluation function with no human-provided input. The architecture of the NN they were using was changed during the experiment (or you can consider that separate experiments), and they ended up giving material evaluation as a precomputed input. In the end, it is possible that the evaluation function they built was just material + noise, which is known to be a reasonable evaluation function (the noise makes minimax capture some dynamic notion of mobility).
Quote: Original post by InnocuousFox
Related question: Has anyone ever put a bunch of NNs through a generational process like GAs? Talk about over the top! :-D
Optimisation of parameters in ANNs using GAs has certainly been done to death. Structural adaptation is less prevalent in the literature.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement