8 hours ago, sjhalayka said:
9 states takes 4 bits to encode.
This is true, but it's not likely to be the way you want to proceed, because this would mean that you have certain inputs that are true for completely unrelated states. e.g. bit 1 would be true for cards in player 1's hand, and player 3's hand. This means the NN has to work harder to tell the difference between those 2 states, training layers purely to separate out your 4-bit representation into the 9 separate states again.
Instead, you probably want "one-hot" encoding; 9 input neurons for each card, with only one of them 'hot', and that represents which of the 9 states it's in.
8 hours ago, sjhalayka said:
The biggest problem that I face now is deciding on how many hidden layers there should be, and how many neurons per hidden layer there should be.
This is all part and parcel of working with neural networks. If you know about things like bias vs variance, overfitting/underfitting, learning rates, error rates, then you can do the following:
- Pick reasonable estimates for these hyperparameters
- Perform some training with that neural network
- Observe how well it learns, and see whether it is converging, failing to converge, converging too slowly, etc.
- Adjust hyperparameters according to these observations
- Repeat from step 2.
If you don't know about some of those things, it's time to do some research, because it's a deep subject and not easy to cover here succinctly.
8 hours ago, sjhalayka said:
I'll train the ANN initially by facing it off against a computer player that picks pseudorandom state changes.
This is a bad idea. A neural network learns by comparing its own output to a desired output, and adjusting accordingly. A pseudo-random state change on the opponent's part is not going to produce reasonable states to compare against. And you've not mentioned how you'll measure the error, which is the important part.
8 hours ago, sjhalayka said:
Once the ANN has won a significant number of games, I will copy it a bunch of times and gently perturb the weights of these copies.
This is also a really bad idea, because you've thrown away the key benefit of a neural network - i.e. the fact that errors in the output layer can be propagated back through the network to adjust each of the layers to improve the response - and vastly increased the search space of your system, by multiplying the number of networks you need to train. On a game with a state space as large as poker, you will probably find that the universe ends far sooner than your neural network becomes usable.
What you should do is something more like this:
- use knowledge of poker rules and a database of past poker games to generate a training set of data that is a set of (game state, the chosen action, score/benefit).
- Train on this data so that, when in the same situation, the computer would take the actions that scored positively, and will not take the actions that scored negatively
- Ensure the network is not overfitting, and is therefore able to generalise to similar situations
- Generate extra training data, e.g. by playing against existing computer programs, or even by having 2 programs play each other.