MCTS with a Draw reward

Artificial Intelligence Programming

Started by mudslinger September 11, 2018 08:42 PM

1 comment, last by alvaro 6 years, 4 months ago

mudslinger

Author

143

September 11, 2018 08:42 PM

If I use MCTS but with "reward" as -1, 0, and 1 for lose, draw, and win respectively, can I use the UCT formula as is?

uct = node.rewards/(node.visits+1.0) + explorationRate * sqrt(ln(node.parent.visits) / (node.visits+1.0))

Afterwards, I still return the node that was most visited as the best move?

alvaro

21,610

September 12, 2018 03:50 PM

That seems reasonable. You just need to use an estimate of the expected value of the distribution, and node.rewards/(node.visits+1) is reasonable.

A minor matter of naming: I normally call that the "UCB1 formula", not the "UCT formula". UCT is the algorithm resulting from using the UCB1 formula at every node of an expanding tree.

MCTS with a Draw reward

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

MCTS with a Draw reward

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines