Advertisement

Tic-Tac-Toe

Started by September 21, 2006 03:45 PM
21 comments, last by Kaanin 18 years, 4 months ago
id like a full source please...
Quote:
Original post by sharpnova

how would the neural network look? 27 inputs, 9 outputs, how many hidden layers, and how many neurons per hidden layer?


Why 27 inputs? I think you could do the same or better with just 9, with the states -1, 0 and 1. If that doesn't work, I'd go for 18, though, as having 9 neurons for "empty" is redundant, and will probably slow down your learning.

With almost all problems of this sort, where you're not trying to, say, create a cognitive model, or have some plan for what the different hidden layers ought to represent, you can do with just one hidden layer. Actually, all functions can be mapped by a three-layer ANN, although sometimes it's not worth it.

The number of hidden neurons is your own preference. I'd go for an hourglass-shaped network, to encourage generalization, so would have four or five hiddens.

Also, I'd help the network out a little bit by doing all the rotations for it. If someone started in the corner, I rotate the board so the network always saw a corner opening as the same opening. This is something we do in our heads, and a ANN could probably get, eventually, but it's nice to help it out. This changes 9x8x7... possible states into 3x5x... state tree.

Quote:

also, how is feedback applied? i cannot find anything online explaining using feedback. i understand the net is given feedback based on the outcome of the game.. how is this fed through and how does it alter the weights of the synapses.

Personally, I'd go for a GA here. Looking online, it seems quite a few other people have come up with the same conclusion. Alternatively, you could create a table of the best move in every situation, and apply backprop. However, this requires you to work out the best move each time, which rather defeats the purpose.

Quote:
Original post by Moon 111
id like a full source please...

The link above you wasn't enough? I think the source code there was pretty good. Work through that and try to understand it.
Advertisement
The thing is I don't want to use a GA. I have another project which is applying GA quite nicely. I just want to learn how to apply back propogation. I've looked online and most sources give a vague explanation of how to do it but I'm trying to find specifics.

Let me give you an example and if you would be so kind as to tell me step by step how to apply back propogation, it would solve my dilemna completely.

I'll make this small so it won't be a pain in the butt:

3 layers
2 inputs(I1, I2), 3 hiddens(H1, H2, H3), 2 outputs(O1, O2)

so we have 12 weighted synapses given arbitrary values:

I1->H1 = .2
I1->H2 = .3
I1->H3 = .4
I2->H1 = .5
I2->H2 = .6
I2->H3 = .1
H1->O1 = .7
H1->O2 = .3
H2->O1 = .4
H2->O2 = .25
H3->O1 = .45
H3->O2 = .05

now let's say our inputs were .5 and .25

H1 = .225
H2 = .3
H3 = .225

O1 = .37875
O2 = .15375

let's say my desired outputs were..
O1 = .45
O2 = .05


how would I apply back propogation to evolve this network?
Everyone hates #1.That's why a lot of idiots complain about WoW, the current president, and why they all loved google so much when it was new.Forget the fact that WoW is the greatest game ever created, our president rocks and the brainless buffons of America care more about how articulate you are than your decision making skills, and that google supports adware, spyware, and communism.
heres some source code for back-propagation hope it helps

http://cortex.snowseed.com/neural_networks.htm

also i have a pdf on nueralnets and backpropagation wich covers alot of the math if u want me to email it to you.
Quote:
Original post by sharpnova
I just want to learn how to apply back propogation. I've looked online and most sources give a vague explanation of how to do it but I'm trying to find specifics.


The simplest tutorial by far that I've found on backprop is at www.cs.ucc.ie/~dgb/courses/ai/notes/notes9.pdf. The first two pages are sufficient to code the algorithm, but he also has source-code and an example.

Note something that's confusing for beginers learning backprop: the backprop algorithm uses a function, g'(x), that depends on the activation function you have chosen. This function will be different for different activation functions. The function he gives in this tutorial is only if your activation function is the sigmoid function (which is handy, as sigmoid is probably the only function you'll need for a long time).

I still don't think tic-tac-toe is a good scenario for learning backprop, however. It should work fine for your mini-problem, though.
To use backpropagation without having to determine the "right" move each time, you need to play the games to the end, then "credit" the moves which contributed to a win. This is the technique that was used very successfully for backgammon (search for TD-Gammon). I suspect the details of how to implement Temporal Difference learning are tricky, but it should be possible to learn with Tic-tac-toe.

This book is an academic treatment of the whole area:
http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.html

Advertisement
An actor/critic machine-learning approach would satisfy everything you want in such simple games.

Even without rotations, there are only 3^9 possible game positions (some of which arent legal) and at most 9 possible choices. Simply put, you dont have to do any generalization of any kind and can self-play to perfection in a few milliseconds at most.

Imperfect play is built right into most actor/critic models in the form of occasionally choosing a random move (used to explore alternatives to what is currently considered best)
I used this AI struct for my tictac toe, its intelligent, is difficult to beat, but can be beaten.

1. If the computer can win on this move, make that move.
2. if the human can win on his next move, block it.
3. Otherwise, take the best remaining open square. The best square is the center. The next best squares are the corners, and then the rest of the squares.

If you want you can spice up 3 a bit and have it try to do opposite corner's, or even better add in a small (about 30%) chance the computer will not take the "next best" move but the "secondbest" place.

This topic is closed to new replies.

Advertisement