Multithreading Monte Carlo - UCB1 - Artificial Intelligence

Author

109

May 23, 2015 12:24 PM

Hello.

I have the basic AI for a card game:


While there is time{
     Find a move that maximizes UCB1
     Play move
     Finish game randomly (if won, increment counter for this move)
}
Choose move that was tried the most.

.

How would I multithread this?

alvaro

21,607

May 23, 2015 12:52 PM

It would have been nice if you had a more complete description of UCB1. I happen to be familiar with it, but most people here probably aren't.
Also, you slightly misrepresented UCB1 in one key aspect: If lost, increment loss counter for this move.

The usual solution to your problem is to account a "virtual loss" right away when you start exploring the move, so any threads coming after this one will be discouraged from trying the same move. I believe the idea is attributed to Rémi Coulom, but it's very natural and I thought about it myself.

Some reading material: https://www.aaai.org/ocs/index.php/SOCS/SOCS11/paper/viewFile/4023/4374

paulo_

Author

109

May 23, 2015 03:31 PM

Also, you slightly misrepresented UCB1 in one key aspect: If lost, increment loss counter for this move.

I don't think I've read a UCB paper that uses a loss counter. This is my current code:


While there is time{
     Find a move that maximizes UCB1 (move.wins+(explorationRate * sqrt(2*log(totalTries)/move.tries)))
     Play move
     Finish game randomly (if won, move.wins++)
     move.tries++
     totalTries++
}
Choose move that was tried the most.

How would I use a loss counter in this code? Will it be better?

The usual solution to your problem is to account a "virtual loss" right away when you start exploring the move, so any threads coming after this one will be discouraged from trying the same move. I believe the idea is attributed to Rémi Coulom, but it's very natural and I thought about it myself.

For example, if thread1 runs the move with the highest UCB value, thread2 will now run the move with second highest UCB value. Is this correct?

alvaro

21,607

May 23, 2015 05:57 PM

You either keep two counters for wins and losses or you keep two counters for wins and visits (what you call "tries"; I'll stick to the names I'm used to.).

Your UCB1 formula seems to be completely messed up. It should be something like this:

 UCB1 = (move.wins + 1.0) / (move.tries + 2.0) + explorationRate * sqrt(2*log(totalTries)/(move.tries+1))

Now, the idea of the virtual loss is that you increment move.tries right after you pick the move. So while this thread is running the simulation, the other threads will see the statistics as if the result of the random game had been a loss. This doesn't guarantee that they are not going to pick the same move, but they will be less likely to do so.

paulo_

Author

109

May 23, 2015 06:48 PM

Your UCB1 formula seems to be completely messed up. It should be something like this:

Oh yeah, that's what my code looks like, except I don't have the +1,+2. What is that for? Is it because these numbers start with zero? Instead, I initially I visit all the possible moves, so I don't get div by zeros.

Thanks, I'll be trying your suggestions on multi threading.

alvaro

21,607

May 23, 2015 07:30 PM

Adding the +1 and +2 is a minor variation. I believe the original UCB1 is precisely the way you have it, but there are good reasons to have the +1 and +2 (some people just initialize the counters pretending that a loss and a win have already happened, which has the same effect).

I explained the reason for the +1 and +2 in this thread.