This year for the science fair I have chosen to find out if an ANN can learn to play Connect 4. I realize ANN's don't necessarily get the best reputation on these boards, but it's what I have chosen to do.
My current implementation is this:
I have 3 input neurons for every state (red, black, empty) of every square on the connect 4 game board (42 x 3 = 126 input neurons)
I have 4 hidden neurons connected to each input neuron (126 x 4 = 504 hidden neurons)
and each hidden neuron connects to 7 output neurons (1 for each possible column the ANN can play in)
I have also created 4 varying levels of AI to provide training data for the ANN. My plan is to have each level of AI play a game against another level of AI, and save the gameboard states in a text file to be read by the ANN while using the game piece that the winning algorithm placed that turn as the target.
Now on to my question: How much training data should I supply my ANN with? I realize the more I give it, the better the ANN will be at playing, but just how much is too much or too little? 1 game between each training AI? 1x10^1000 games between each training AI? Do I just keep feeding it more games until it reaches a state where the ANN cannot lose a game?
Any help would be greatly appreciated.
how much training data to give my ANN
Well you could always feed it x amount of games for an easier mode and scale it up till you have impossible. That is how I would go about it, you don't want it just impossible or that would be absolutely no fun at all, but can you truly make it impossible to beat? You might just end up making it draw every time. If you have great results with Connect 4 you could always move to something more advanced like Chess. Ask yourself what you want from this ANN.
When in training mode, let the ANN observe the actions of two human players - at each step, the ANN should be shown the SAME input data until its outputs are 'correct' (ie, match the expected outputs) - this will often take from 100 to 2000 training cycles for the weights to settle down (rarely more), even for a simple XOR network!
Basically just repeat the input until the outputs match what you told them they should be, then continue to the next step (game turn). It might take just one cycle, it might take thousands, that really depends on the random distribution of the weights that you began with, more than any other factor.
Train, check the outputs, train some more, check the outputs.
It's worth mentioning that subsequent training can 'damage' prior training by unsettling the weights, so it can pay to record the ENTIRE game for training purposes, not just one step at a time... by that I mean that the outputs for a given input can become 'no longer correct' even if they were correct before. You just need to run the ENTIRE game through a few times and it will begin to stick (I guess this is like short and long term memory in humans, it takes repetition for memories to become long-term).
Don't use 'bruteforce training' by feeding it zillions of generated games... this will make the ANN impossible to beat, which is no fun.
By observing actual games between humans, the ANN learns typical human game strategies, and will be defeated by strategies it has never seen before.
However you can leave training active at ALL TIMES, even when 'training is completed' (ie in Run mode) - this allows the ANN to learn NEW strategies for tackling new adversaries (if the ANN relies totally on its prior training then it is doomed to repeat the same mistakes, unable to learn anything 'on the fly'). Basically, you let the ANN learn from its human adversary by putting the ANN in the shoes of said human - show it the state before and after the human's turn, train until outputs match, and thus the ANN can learn new tactics from its opponent during actual games, which it can employ in future games.
Also, depending on your ANN code, you can optionally reduce the number of output neurons by implementing 'fuzzy logic' - the outputs will never be a pure 0 or 1, unless you rounded them... using a SINGLE output neuron, you can divide the range (0.0 thru 1.0) into n ranges (7 in your case), and determine which range the output falls into, you might want to add one more hidden layer in this case. This 'fuzzy logic' concept can also be applied to the input layer.
I'll leave it up to you to experiment and determine whether or not this scheme works for you.
Just to clarify that, one of my old 2D creature AI demos uses a single output neuron to drive left and right 'thrusters' which control the direction and rate of change of orientation (like tank tracks), by determining if the output is greater or less than 0.5
Basically just repeat the input until the outputs match what you told them they should be, then continue to the next step (game turn). It might take just one cycle, it might take thousands, that really depends on the random distribution of the weights that you began with, more than any other factor.
Train, check the outputs, train some more, check the outputs.
It's worth mentioning that subsequent training can 'damage' prior training by unsettling the weights, so it can pay to record the ENTIRE game for training purposes, not just one step at a time... by that I mean that the outputs for a given input can become 'no longer correct' even if they were correct before. You just need to run the ENTIRE game through a few times and it will begin to stick (I guess this is like short and long term memory in humans, it takes repetition for memories to become long-term).
Don't use 'bruteforce training' by feeding it zillions of generated games... this will make the ANN impossible to beat, which is no fun.
By observing actual games between humans, the ANN learns typical human game strategies, and will be defeated by strategies it has never seen before.
However you can leave training active at ALL TIMES, even when 'training is completed' (ie in Run mode) - this allows the ANN to learn NEW strategies for tackling new adversaries (if the ANN relies totally on its prior training then it is doomed to repeat the same mistakes, unable to learn anything 'on the fly'). Basically, you let the ANN learn from its human adversary by putting the ANN in the shoes of said human - show it the state before and after the human's turn, train until outputs match, and thus the ANN can learn new tactics from its opponent during actual games, which it can employ in future games.
Also, depending on your ANN code, you can optionally reduce the number of output neurons by implementing 'fuzzy logic' - the outputs will never be a pure 0 or 1, unless you rounded them... using a SINGLE output neuron, you can divide the range (0.0 thru 1.0) into n ranges (7 in your case), and determine which range the output falls into, you might want to add one more hidden layer in this case. This 'fuzzy logic' concept can also be applied to the input layer.
I'll leave it up to you to experiment and determine whether or not this scheme works for you.
Just to clarify that, one of my old 2D creature AI demos uses a single output neuron to drive left and right 'thrusters' which control the direction and rate of change of orientation (like tank tracks), by determining if the output is greater or less than 0.5
In C++, friends have access to your privates.
In ObjAsm, your members are exposed!
In ObjAsm, your members are exposed!
This is a really difficult problem for a machine learning algorithm. Since you're mind is made up though:
There is another way to approach this without having a human involved.
Use evolutionary programming as the learning algorithm (no Backpropagation). Implement the following mutation operators:
- Create an 'insert node' and 'remove node' mutation operator.
- Create an 'add connection' and 'remove connection' mutation operator.
- Create 'Change weight', and 'Change threshold' mutation operator. (probably want to add 'increment weight' and 'increment threshold' as well).
Then, create two pools of players (red pool, black pool). Initialize the pools so that there are NO HIDDEN LAYERS at the beginning. Trust me.
Let them play.
Give points based on who wins. Immediate disqualification for illegal moves. Losers get points based on how long they lasted. Winners lose points for each move (you want to encourage winning right away).
Top scorers replicate.
You probably won't get a very good connect-4 player out of this, but you should have lots of fun either way.
There is another way to approach this without having a human involved.
Use evolutionary programming as the learning algorithm (no Backpropagation). Implement the following mutation operators:
- Create an 'insert node' and 'remove node' mutation operator.
- Create an 'add connection' and 'remove connection' mutation operator.
- Create 'Change weight', and 'Change threshold' mutation operator. (probably want to add 'increment weight' and 'increment threshold' as well).
Then, create two pools of players (red pool, black pool). Initialize the pools so that there are NO HIDDEN LAYERS at the beginning. Trust me.
Let them play.
Give points based on who wins. Immediate disqualification for illegal moves. Losers get points based on how long they lasted. Winners lose points for each move (you want to encourage winning right away).
Top scorers replicate.
You probably won't get a very good connect-4 player out of this, but you should have lots of fun either way.
I believe you've just described the NEAT algorithm
http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies
Of which there are a few implementation in different languages. So if you were to go down this route you can just use one of those libraries already.
http://en.wikipedia.org/wiki/Neuroevolution_of_augmenting_topologies
Of which there are a few implementation in different languages. So if you were to go down this route you can just use one of those libraries already.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement