Killing bugs in Chess AI and correctly implementing alpha-beta and transpos. tables
For some time now, I've been implementing an AI for my 3d chess game, which is available with screenies at my webpage.
After I implemented the first minmax version (static 3-ply), I noticed that the computer plays always the same game against itself, of course, because there is no randomness. So I took this as a 'bug-test' requirement.
All the info I've read up on alpha-beta say that it is a conservative mathematical algorithm, so it cuts off only the paths that wouldn't be taken anyway. So wouldn't this mean that the AI should play the same with and without alpha-beta pruning?
Ok, now I have minmax, alpha-beta and I'm now experimenting on good move ordering, iterative deepening and transposition tables. This is where my problems arise. Aren't all of these methods also 'conservative' (I don't know if this is the right term), so that they only speed up the search but the result should be the same with and without them?
Because the problem is that with transposition tables, the AI starts behaving erratically. I'm not really sure how to store alpha & beta hashes in the table, so for now I just store exact hashes in it, i.e. I only hash positions which I've evaluated at depth=0. But still, the AI starts to make weird moves. Could this be due to collisions or something like that?
Do you know any good articles that describe the use of transposition tables? Many sites do describe the alpha-beta algorithm in detail, but I haven't been able to find many on the actual implementation of transposition tables.
This is a good website on chess programming:
http://www.brucemo.com/compchess/programming/
Changing the order of the moves is not "conservative" in the sense that the search can return some other move as best if two moves have the same score. Transposition tables are definitely not conservative, at least if you store bounds and if you use stored scores for searches of smaller depth. Read Bruce Moreland's page on "Search Instability".
However, your program shouldn't select bad moves because of search instability. Your problem lies somewhere else. I suggest you make a list of positions with clear best moves (chess problems would do) and use them to test changes to your program. When you introduce a big ugly bug, hopefully you'll see that your program doesn't solve the problems anymore.
http://www.brucemo.com/compchess/programming/
Changing the order of the moves is not "conservative" in the sense that the search can return some other move as best if two moves have the same score. Transposition tables are definitely not conservative, at least if you store bounds and if you use stored scores for searches of smaller depth. Read Bruce Moreland's page on "Search Instability".
However, your program shouldn't select bad moves because of search instability. Your problem lies somewhere else. I suggest you make a list of positions with clear best moves (chess problems would do) and use them to test changes to your program. When you introduce a big ugly bug, hopefully you'll see that your program doesn't solve the problems anymore.
I've pretty much read through the website, but I think the bug lies somewhere in my alphabeta -function. Moreland's site describes Negamax implementation, while I use minmax, because it's easier for me to understand. That's why I have had to adapt a little, and I'm not sure if I got it right. I can post source code if necessary, but I doubt if you have time to look it trough.
Thanks for pointing out the situation about two move paths having the same score, I hadn't thought of that. I maintain a list of best moves so far, and while it seems that most of the moves are logical, the computer often makes a half-dubious move like trading its queen for a knight.
One thing that puzzles me, is that how do I really cope with has collisions? I have a hash table of about 400MB, which I index by hashTable[board->zobristKey % hashTableSize] like it is usually done.. Then I check that the key in the hash table is the same as the board's key, but what if they are the same and the positions are in fact different? I would then get the score from an incorrect position which would really mess up the AI (make it do bad moves).
For now I have disabled the tables, and I think the AI still makes some very stupid moves. (not obviously stupid, but semi-stupid :)) My evaluation function is quite simple, in addition to calculating material score it only penalizes unmoved knights, bishops and king's and queen's pawns, and gives bonuses to pawns nearing promotion. I wonder if this simplicity might be the biggest reason for the AI acting so dumb?
Thanks for pointing out the situation about two move paths having the same score, I hadn't thought of that. I maintain a list of best moves so far, and while it seems that most of the moves are logical, the computer often makes a half-dubious move like trading its queen for a knight.
One thing that puzzles me, is that how do I really cope with has collisions? I have a hash table of about 400MB, which I index by hashTable[board->zobristKey % hashTableSize] like it is usually done.. Then I check that the key in the hash table is the same as the board's key, but what if they are the same and the positions are in fact different? I would then get the score from an incorrect position which would really mess up the AI (make it do bad moves).
For now I have disabled the tables, and I think the AI still makes some very stupid moves. (not obviously stupid, but semi-stupid :)) My evaluation function is quite simple, in addition to calculating material score it only penalizes unmoved knights, bishops and king's and queen's pawns, and gives bonuses to pawns nearing promotion. I wonder if this simplicity might be the biggest reason for the AI acting so dumb?
My alpha-beta implementation is based on what I learned from here:
wikipedia
More specific, I think here's my problem:
1) If I understood right, (beta <= alpha) means cutoff? this is a 'bad' node, and won't be played? so why do we return alpha?
The parent node is a max node, so it will select
the biggest node given to it. Shouldn't we return beta (a bad score) so that the parent function won't select this node?
2) And shouldn't here be return alpha; instead?
Or am I just way off somewhere?
wikipedia
More specific, I think here's my problem:
1) If I understood right, (beta <= alpha) means cutoff? this is a 'bad' node, and won't be played? so why do we return alpha?
The parent node is a max node, so it will select
the biggest node given to it. Shouldn't we return beta (a bad score) so that the parent function won't select this node?
2) And shouldn't here be return alpha; instead?
Or am I just way off somewhere?
AlphaBeta will not always play the same game as MinMax. This is because you can have several moves which are all equally as good.
AlphaBeta generally wants to do this:
-If my opponent already has a better move than the one I'm looking at, stop searching this node.
Go to:
http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic11/
for a decent description.
Transposition tables are very difficult to implement because of issues with bugs. I recommend you do the following:
Debug stage 1:
1. Take out the code that lets you return early due to move transposition.
2. Create a function called 'createHash', which will look at a board and generate a hash value for it.
3. Every time you see a move, call createHash and compare this value against the hash value you already have.
4. If there is a discrepency, find out why.
Debug stage 2:
1. Set a static search depth (4 ply?).
2. With move transposition turned off, play a test game and record every single move / response, and the scores, out to like 14 moves or so.
3. Turn on move transposition only for the last ply (the one where you would call your evaluation function).
4. Replay the same game-- the results should be identical, as move transposition is only removing a few evaluates.
Debug stage 3:
(You can use this one for every new feature you add)
1. Download some chess problems.. Things like 'mate in 4' or 'mate in 7'.
2. Make sure your program can solve them.
Best of luck,
Will
AlphaBeta generally wants to do this:
-If my opponent already has a better move than the one I'm looking at, stop searching this node.
Go to:
http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic11/
for a decent description.
Transposition tables are very difficult to implement because of issues with bugs. I recommend you do the following:
Debug stage 1:
1. Take out the code that lets you return early due to move transposition.
2. Create a function called 'createHash', which will look at a board and generate a hash value for it.
3. Every time you see a move, call createHash and compare this value against the hash value you already have.
4. If there is a discrepency, find out why.
Debug stage 2:
1. Set a static search depth (4 ply?).
2. With move transposition turned off, play a test game and record every single move / response, and the scores, out to like 14 moves or so.
3. Turn on move transposition only for the last ply (the one where you would call your evaluation function).
4. Replay the same game-- the results should be identical, as move transposition is only removing a few evaluates.
Debug stage 3:
(You can use this one for every new feature you add)
1. Download some chess problems.. Things like 'mate in 4' or 'mate in 7'.
2. Make sure your program can solve them.
Best of luck,
Will
------------------http://www.nentari.com
Quote: Original post by RPGeezus
http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic11/
been there, it's been a great help.
Quote: Original post by RPGeezus
Debug stage 2:
1. Set a static search depth (4 ply?).
2. With move transposition turned off, play a test game and record every single move / response, and the scores, out to like 14 moves or so.
3. Turn on move transposition only for the last ply (the one where you would call your evaluation function).
4. Replay the same game-- the results should be identical, as move transposition is only removing a few evaluates.
I'm now doing exactly that, and the results are nowhere near identical.
Quote: Original post by RPGeezus
Debug stage 3:
(You can use this one for every new feature you add)
1. Download some chess problems.. Things like 'mate in 4' or 'mate in 7'.
2. Make sure your program can solve them.
Hey, that's a GREAT suggestion! I think I'll try that some day.
I have a hard time thinking of alpha-beta when it's not written in the negamax style. Anyway, the way I think about alpha-beta is that it can return three things:
- An exact score between alpha and beta,
- alpha or lower, which means that the score is alpha or lower,
- beta or higher, which means that the score is beta or higher.
I hope that helps in general.
Are you using a quiescence search at all? You know that without quiescence search your program will play horribly no matter what, right?
I didn't find hash tables very hard to implement. The problem you mention of using a score that was store for one position as valid for a different position can be avoided by storing the zobrist key in the table for verification. If you read Bruce Moreland's page carefully, you'll see that he does that.
If you want to send me code, I'll probably have time to look over it during the weekend.
- An exact score between alpha and beta,
- alpha or lower, which means that the score is alpha or lower,
- beta or higher, which means that the score is beta or higher.
I hope that helps in general.
Are you using a quiescence search at all? You know that without quiescence search your program will play horribly no matter what, right?
I didn't find hash tables very hard to implement. The problem you mention of using a score that was store for one position as valid for a different position can be avoided by storing the zobrist key in the table for verification. If you read Bruce Moreland's page carefully, you'll see that he does that.
If you want to send me code, I'll probably have time to look over it during the weekend.
October 12, 2005 05:54 AM
Quote: Original post by alvaro
I have a hard time thinking of alpha-beta when it's not written in the negamax style. Anyway, the way I think about alpha-beta is that it can return three things:
- An exact score between alpha and beta,
- alpha or lower, which means that the score is alpha or lower,
- beta or higher, which means that the score is beta or higher.
I hope that helps in general.
I changed my alphabeta function a little and I think I had a bug there, it might be solved now. I think the problem is that when I return a value from alphabeta function, the parent function can't tell if the score returned is the exact score or a upper or a lower bound. So if I call alphabeta with a value like a=1002 and it returns 1002, I can't tell if it is 'the real' computed result for this node, or if it just cut off (which it probably did). So for now I just select the best move with '>' and not '>=' because the later searches will return the same score if they got cut off and shouldn't be selected. As a result (I always order moves so that captures will be tried first) the AI is very keen on capturing enemy pieces, because it will select the first move, usually capture, rather than some other move which actually has the same score. Is this normal and how would I overcome this? I don't want the AI to always just trade pieces.
Another thing that I can't make a "score" list of the evaluated moves because the list would look something like this:
1204 e2-d3 <
October 12, 2005 06:01 AM
Ok, achieved to mess up my post by putting tag signs there, doh me.
The point is that my 'best moves' -list ends up with the same score for the best move and several 'cut-off' moves. I can only tell from the list that which was the best move, but can't try out the 2nd best move for example (I'd like the AI to vary its play a bit)
The quiescence search I implemented wasn't very good because I didn't have the null-move option in it. I'll try that after I get the transposition tables to work.
I do store the zobrist keys in the hash table, what I meant to ask is that what if I get the same hash for two different positions? Can this happen or do I have a bug if it does?
I'll put something up by the weekend, so you can look what's going on.
The point is that my 'best moves' -list ends up with the same score for the best move and several 'cut-off' moves. I can only tell from the list that which was the best move, but can't try out the 2nd best move for example (I'd like the AI to vary its play a bit)
The quiescence search I implemented wasn't very good because I didn't have the null-move option in it. I'll try that after I get the transposition tables to work.
I do store the zobrist keys in the hash table, what I meant to ask is that what if I get the same hash for two different positions? Can this happen or do I have a bug if it does?
I'll put something up by the weekend, so you can look what's going on.
Quote: Original post by Anonymous Poster
The point is that my 'best moves' -list ends up with the same score for the best move and several 'cut-off' moves. I can only tell from the list that which was the best move, but can't try out the 2nd best move for example (I'd like the AI to vary its play a bit)
Sure, alphabeta won't tell you anything about the second best move. If you want that, you need to change the logic of the search at the root a little. Anyway, if you only want to randomize your games a little, you can start by giving it a broad opening book. After that, if you ponder during the opponent's turn, variations in time control should be enough to not play the exact same game over and over again. You can also add a little random component to the evaluation function.
Quote: The quiescence search I implemented wasn't very good because I didn't have the null-move option in it. I'll try that after I get the transposition tables to work.
Bad idea. Get your quiescence search to work as soon as possible. Everything else is secondary.
Quote: I do store the zobrist keys in the hash table, what I meant to ask is that what if I get the same hash for two different positions? Can this happen or do I have a bug if it does?
I haven't made the computations in a while, but if you store a 64-bit key and the table you use to compute the Zobrist keys is reasonably random, you will probably never see a collision.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement