1 hour ago, alvaro said:
I only read the vague initial description of the game, but I think I got enough details to tell you what I think.
[...]
Hi Alvaro,
thanks for your reply!
As i said before i’m very new to ML so i need to be addressed in the steps that are related to it.
I’m quite motivated for this project and i’m not afraid to learn something new, so please suggest me the resources you think i need to learn.
Quote
>I only read the vague initial description of the game, but I think I got enough details to tell you what I think.
>I think using reinforcement learning is quite reasonable for games with randomness and hidden state. I don't have much experience here, but I think I know how I would do it. First you need a fast simulator for the game. You can start with a strategy that picks an action at random to make sure that the simulator is working.
I’ve already built a library/engine (in javascript) that plays complete matches (a match is a set of rounds) for 2 or 4 players, it has all the rules, the scoring, etc etc.
Instead of picking an action at random (as you suggest) i’ve built a very basic heuristic to make the players play.
If you want to know more about it i think you know to have a better knowledge of the game.
I take this opportunity to underline that in this game the player has to take different decisions : on his turn he has to decide if drawing a card on the top of the deck or taking all the cards that were discarded ‘till that point.
Then he can make multiple plays until he decides to discard a card and pass the turn.
As you can see even this adds a bit of complexity.
Quote
>You then define a "policy" to be a mapping from the state known to the agent (including the history of previous actions by all players) to a probability distribution over actions. This policy can be implemented as some sort of neural network, with a final SoftMax operation to make sure what is produced is indeed a probability distribution. If you initialize the weights so the neural network produces small outputs before the SoftMax, the SoftMax will in turn produce close to a uniform distribution.
Can you elaborate on this? Unfortunately i do not understand this (my ML gaps! :/)
The only thing that i understand it’s that i need to store the history of the game state. In this moment my library has a state object that stores all the information needed to carry the game on in the current step, it deletes what is not needed. I can easily adjust this.
Quote
>Start playing games. After each game, you can tweak the weights of the neural network to increase the probability of the actions taken by the winners and decrease the probability of the actions taken by the losers. In order to do this, you would need to be able to compute the derivative of the probability of producing a move with respect to each of the weights in the neural network, which is what backpropagation does.
I think/hope this will become clearer when i will understand the step above
Quote
>If you had a database of games played by experts, you could train a policy to simply imitate their playing. In that case you would be using supervised learning, which is easier. You could even use supervised learning first and then tweak the resulting network using reinforcement learning. At some stage AlphaGo used this (although it's not the primary mechanism of how AlphaGo works; they just used this to generate a database of labeled positions that could be used to train their value network).
No, i have not.
Quote
>Monte Carlo search would be difficult to get to work, but not impossible. I can give you more details if you want to go down this route, but I don't recommend it. I will just say that using Monte Carlo search would still require a "policy", as described above. So you have to start there anyway.
If you think that the reinforcement learning can be a viable option i’m very excited to try it. I’m eager to learn it. Let’s focus on it. However i have to say that more details in the monte carlo approach interest me as well
Quote
>This sounds like a fun project. I should do something like this with some other card game.
You are very welcome to help me on this ? ? ? ?