Advertisement

Multithreading Monte Carlo - UCB1

Started by May 23, 2015 12:24 PM
5 comments, last by ddyer 9 years, 5 months ago

Hello.

I have the basic AI for a card game:


While there is time{
     Find a move that maximizes UCB1
     Play move
     Finish game randomly (if won, increment counter for this move)
}
Choose move that was tried the most.

.

.

.

How would I multithread this?

It would have been nice if you had a more complete description of UCB1. I happen to be familiar with it, but most people here probably aren't.
Also, you slightly misrepresented UCB1 in one key aspect: If lost, increment loss counter for this move.

The usual solution to your problem is to account a "virtual loss" right away when you start exploring the move, so any threads coming after this one will be discouraged from trying the same move. I believe the idea is attributed to Rémi Coulom, but it's very natural and I thought about it myself.

Some reading material: https://www.aaai.org/ocs/index.php/SOCS/SOCS11/paper/viewFile/4023/4374
Advertisement

Also, you slightly misrepresented UCB1 in one key aspect: If lost, increment loss counter for this move.

I don't think I've read a UCB paper that uses a loss counter. This is my current code:


While there is time{
     Find a move that maximizes UCB1 (move.wins+(explorationRate * sqrt(2*log(totalTries)/move.tries)))
     Play move
     Finish game randomly (if won, move.wins++)
     move.tries++
     totalTries++
}
Choose move that was tried the most.

How would I use a loss counter in this code? Will it be better?


The usual solution to your problem is to account a "virtual loss" right away when you start exploring the move, so any threads coming after this one will be discouraged from trying the same move. I believe the idea is attributed to Rémi Coulom, but it's very natural and I thought about it myself.

For example, if thread1 runs the move with the highest UCB value, thread2 will now run the move with second highest UCB value. Is this correct?

You either keep two counters for wins and losses or you keep two counters for wins and visits (what you call "tries"; I'll stick to the names I'm used to.).

Your UCB1 formula seems to be completely messed up. It should be something like this:

 UCB1 = (move.wins + 1.0) / (move.tries + 2.0) + explorationRate * sqrt(2*log(totalTries)/(move.tries+1))

Now, the idea of the virtual loss is that you increment move.tries right after you pick the move. So while this thread is running the simulation, the other threads will see the statistics as if the result of the random game had been a loss. This doesn't guarantee that they are not going to pick the same move, but they will be less likely to do so.


Your UCB1 formula seems to be completely messed up. It should be something like this:

Oh yeah, that's what my code looks like, except I don't have the +1,+2. What is that for? Is it because these numbers start with zero? Instead, I initially I visit all the possible moves, so I don't get div by zeros.

Thanks, I'll be trying your suggestions on multi threading.

Adding the +1 and +2 is a minor variation. I believe the original UCB1 is precisely the way you have it, but there are good reasons to have the +1 and +2 (some people just initialize the counters pretending that a loss and a win have already happened, which has the same effect).

I explained the reason for the +1 and +2 in this thread.
Advertisement

The random playout part needs no special treatment for threads, but the UCT tree-building

and feedback steps need to be carefully synchronized. The most devilish part is, if your

search optimization includes any pruning of the tree, considering what happens if you

remove part of the tree that is in-use by some other thread.

---visit my game site http://www.boardspace.net - free online strategy games

This topic is closed to new replies.

Advertisement