Advertisement

Heuristic works just when I play for black player

Started by January 08, 2015 10:36 AM
19 comments, last by StepTNT 9 years, 10 months ago
I'm working on an AI for a 2-players board game and I ran into this issue: when my AI plays for the black player everything goes smooth, but when I decide to use it to play for the white one it chooses the worst possible move. I've tried changing the sign of the result when it plays as the white player but it still doesn't work. The heuristic that I'm using is taken from a working software which doesn't distinguish between black and white when it coms to evaluate the board, so I don't get why it doesn't work for me. Is there something that I'm not aware of? Do I need to do some changes in my negaScout based on what's my Ai's player?
You have a bug. Finding the problem can be very challenging even with complete access to the code and with an understanding of how it's supposed to work. But you seem to want someone to do this for you without seeing any of the code and without even knowing what the game is or what the heuristic looks like.
Advertisement

I cannot post the code but I can post some pseudocode and explain how things should work. I've not included more details in the question because I thought I was missing some easy thing and I hoped you could enlighten me, never thought of a bug in my code.

So, the game is Abalone and here's how the heuristic is supposed to work:

  • compute the center of mass for black marbles, call it Cb
  • compute the center of mass for white marbles, call it Cw
  • compute the geometric mean between Cb, Cw and C (which is the center of the board) and call it R
  • compute the sum of the distances between each black marble and R, call it Db
  • compute the sum of the distances between each white marble and R, call it Dw
  • the result of the evaluation function should be Db - Dw

This is based on AbaPro and it's explained better here (my version is a little bit simple but the idea is the same).

Now, why is the result Db - Dw?
The idea is to minimize the distance from the center for the player's marbles and maximize the opponent's one, so, assuming that the player is the black one, a good result is when Db < Dw and this leads to a negative score which is fine because I have


double score = -negaScout(...)

so the lowest negative score will be the better one for my negaScout code.

When the player is the white one I'd expect to have a good result when Dw < Db, but having Db - Dw as result of the evaluation leads to a positive score in this situation and, given the previouse line of code, this positive result will be the worst one for the negaScout.

I've tried switching to Dw - Db when I play as the white player but it still doesn't work somehow and I don't know why.

Of course I'm not asking you to solve the problem without having the full code, but I need to know if the idea is correct or if I was wrong since the beginning.

There are several places where one could mess up a sign and end up with a situation like what you describe, but I have a suspicion of where your problem is.

You can write an evaluation function that returns "positive is good for black", like the one you describe, or one that returns "positive is good for the player whose turn it is". Whether you use one or the other is a matter of picking a convention. The important thing is sticking to it.

Towards the beginning of the negaScout function there should be code that says something like this:

int negaScout(Board board, int alpha, int beta, int depth_left) {
  if (depth_left <= 0)
    return static_evaluation(board);
  
  //...

Or perhaps it goes like this:

int negaScout(Board board, int alpha, int beta, int depth_left) {
  if (depth_left <= 0)
    return board.black_to_move() ? static_evaluation(board) : -static_evaluation(board);
  
  //...

If you have a mismatch between your static evaluation function convention and what negaScout expects, you might have had it working OK by coincidence for one of the sides, as long as the depth is always performed to a fixed depth, or to the depths with the same parity. If you try the other player, you would get something like what you describe. If you change your search to go to depth 7 instead of 6, it would also break for the black side.

I hope this helps. But you probably need to do your own debugging. Good luck!

Isn't what you're suggesting the same of doing


double evaluate(){
    ...
    return (isBlackPlayer) ? (black - white) : (white - black);
}

Because this is what I tried without success.

Isn't what you're suggesting the same of doing


double evaluate(){
    ...
    return (isBlackPlayer) ? (black - white) : (white - black);
}
Because this is what I tried without success.


My answer depends on what `isBlackPlayer' means. If it means "is the computer the black player" or "is the black side the player (as in, not the computer)", that's completely wrong: The evaluation function in principle shouldn't even know about such things. If it means "it is black's turn on the position I am evaluating", it still seems wrong, because although you didn't tell me what `black' and `white' mean, they are probably sums of distances for black and white, and therefore undesirable, so you probably want to write `return (isBlacksTurn) ? (white - black) : (black - white)'.

Notice how isBlackPlayer is a long descriptor that could mean one thing, it's opposite, or something else altogether. smile.png

What I am suggesting is that either you show us some real code (it's not like NegaScout is a state secret), or you take the general flavor of the issues I have described to guide you through your own debugging.
Advertisement

Here's the code I'm using for the negaScout.

I've added transposition tables, history sorting and null move pruning.


protected MoveValue executeSearch(double alpha, double beta, int depth) {
    MoveValue bestMove = null;
    double best = Double.MIN_VALUE;
    ArrayList<Move> moves = sortMoves(generateLegalMoves(playerType), 0);
    Iterator<Move> movesIterator = moves.iterator();
    while (movesIterator.hasNext() && canContinue()) {
        Move move = movesIterator.next();
        board.applyMove(move);
        if (bestMove == null) {
            bestMove = new MoveValue();
            bestMove.returnMove = move;
        }
        double score = -negaScout(best, best, depth - 1, 0, playerType.opponent());
        board.undoLastMove();
        if (score > best) {
            bestMove = new MoveValue(score, move);
        }
    }
    return bestMove;
}

protected double negaScout(double alpha, double beta, int maxDepth, int currentDepth, MarbleType player) {
    if (!canContinue()) {
		// This happens when time's up so we don't need to go on.
        return 0;
    }
    if (maxDepth == 0 || board.isGameOver()) {
        return currentHeuristic.evaluateBoard();
    }
    long hash = ZobristKey.generateHash();
    HashEntry entry = transpositionTable.get(hash);
    if (entry != null) {
        if (entry.depth >= currentDepth) {
            switch (entry.flag) {
                case LOWER_BOUND: {
                    alpha = Math.max(alpha, entry.score);
                    break;
                }
                case UPPER_BOUND: {
                    beta = Math.min(beta, entry.score);
                    break;
                }
                case EXACT_SCORE: {
                    return entry.score;
                }
            }
            if (alpha >= beta) {
                return alpha;
            }
        }
    }
	// Null move pruning
    double nullValue = -negaScout(-beta, -beta - 1, 0, 0, player.opponent());
    if (nullValue >= beta) {
        updateTranspositionTableEntry(nullValue, alpha, beta, currentDepth, hash);
        return nullValue;
    }
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player), currentDepth);
    double scoutWindow = beta, currentValue = alpha;
    Iterator<Move> movesIterator = moves.iterator();
    while (movesIterator.hasNext() && canContinue()) {
        Move move = movesIterator.next();
        board.applyMove(move);
        double score = -negaScout(-scoutWindow, -currentValue, maxDepth--, currentDepth++, player.opponent());
        if (score > currentValue) {
            currentValue = score;
        }
        if (currentValue >= beta) {
            board.undoLastMove();
            // History sorting based on prunings
            int prunings = 0;
            if (historyTable.containsKey(move)) {
                prunings = historyTable.get(move);
            }
            historyTable.put(move, prunings++);
            break;
        }
        if (currentValue >= scoutWindow) {
            currentValue = -negaScout(-beta, -currentValue, maxDepth--, currentDepth++, player.opponent());
            if (currentValue >= beta) {
                board.undoLastMove();
                // History sorting based on prunings
                int prunings = 0;
                if (historyTable.containsKey(move)) {
                    prunings = historyTable.get(move);
                }
                historyTable.put(move, prunings++);
                break;
            }
        }
        scoutWindow = currentValue++;
        board.undoLastMove();
    }
    updateTranspositionTableEntry(currentValue, alpha, beta, currentDepth, hash);
    return currentValue;
}

isBlackPlayer means that we searching a move for the black player, but I'm starting to suspect that this is completely wrong :D

Goodness me. You don't have a bug: You have many bugs. I'll add some comments to your code.

protected MoveValue executeSearch(double alpha, double beta, int depth) {
    MoveValue bestMove = null;
    double best = Double.MIN_VALUE;
    // This looks like a mistake. See https://code.google.com/p/error-prone/issues/detail?id=203
    
    ArrayList<Move> moves = sortMoves(generateLegalMoves(playerType), 0);
    Iterator<Move> movesIterator = moves.iterator();
    while (movesIterator.hasNext() && canContinue()) {
        Move move = movesIterator.next();
        board.applyMove(move);
        if (bestMove == null) {
            bestMove = new MoveValue();
            bestMove.returnMove = move;
        }
        double score = -negaScout(best, best, depth - 1, 0, playerType.opponent());
        // This call to negaScout seems incorrect. If I understand correctly and we
        // are in the function that searches the root, you want to pass (-beta, -alpha) as the search window

        board.undoLastMove();
        if (score > best) {
            bestMove = new MoveValue(score, move);
            // Where is `best' updated? Shouldn't it be here?
            // Why does `best' even exist? Why not use `alpha' instead?
            // Speaking of which, where is `alpha' used in this function?
        }
    }
    return bestMove;
}

protected double negaScout(double alpha, double beta, int maxDepth, int currentDepth, MarbleType player) {
    if (!canContinue()) {
        // This happens when time's up so we don't need to go on.
        // You probably shouldn't be checking this every single node. You might find out you spend a significant chunk of time checking the clock
        return 0;
    }
    if (maxDepth == 0 || board.isGameOver()) {
        return currentHeuristic.evaluateBoard();
        // Aha! Your evaluation function is expected to return a score from the point of view of the player to move.
    }
    long hash = ZobristKey.generateHash();
    HashEntry entry = transpositionTable.get(hash);
    if (entry != null) {
        if (entry.depth >= currentDepth) {
            switch (entry.flag) {
                case LOWER_BOUND: {
                    alpha = Math.max(alpha, entry.score);
                    break;
                }
                case UPPER_BOUND: {
                    beta = Math.min(beta, entry.score);
                    break;
                }
                case EXACT_SCORE: {
                    return entry.score;
                }
            }
            if (alpha >= beta) {
                return alpha;
            }
        }
    }
    // You should probably also store a first move to try in the hash tables.
    // But this can wait until you get your signs right. :)
	
// Null move pruning
    double nullValue = -negaScout(-beta, -beta - 1, 0, 0, player.opponent());
    if (nullValue >= beta) {
        updateTranspositionTableEntry(nullValue, alpha, beta, currentDepth, hash);
        return nullValue;
    }
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player), currentDepth);
    double scoutWindow = beta, currentValue = alpha;
    Iterator<Move> movesIterator = moves.iterator();
    while (movesIterator.hasNext() && canContinue()) {
        Move move = movesIterator.next();
        board.applyMove(move);
        double score = -negaScout(-scoutWindow, -currentValue, maxDepth--, currentDepth++, player.opponent());
        // Operators -- and ++ don't do what you think they do. You want `maxDepth - 1' and
        // `currentDepth + 1' instead.
        
        // You could call undoLastMove here, instead of doing it in three different places further down.
        
        if (score > currentValue) {
            currentValue = score;
        }
        if (currentValue >= beta) {
            board.undoLastMove();
            // History sorting based on prunings
            int prunings = 0;
            if (historyTable.containsKey(move)) {
                prunings = historyTable.get(move);
            }
            historyTable.put(move, prunings++);
            // Again, you probably mean `prunings + 1', unless this is a bizarre language where that's what ++ means.
            break;
        }
        if (currentValue >= scoutWindow) {
            currentValue = -negaScout(-beta, -currentValue, maxDepth--, currentDepth++, player.opponent());
            // -- and ++, as above
            if (currentValue >= beta) {
                board.undoLastMove();
                // History sorting based on prunings
                int prunings = 0;
                if (historyTable.containsKey(move)) {
                    prunings = historyTable.get(move);
                }
                historyTable.put(move, prunings++);
                break;
            }
        }
        scoutWindow = currentValue++;
        // ++, as above
        board.undoLastMove();
    }
    updateTranspositionTableEntry(currentValue, alpha, beta, currentDepth, hash);
    return currentValue;
}
You really should sort out the sign issue before you introduce transposition tables or even scout search, just because those are extra sources of bugs. I would actually recommend you start with vanilla NegaMax and progressively introduce alpha-beta cuts, transposition tables and scout search, probably in that order.

It is possible your scout implementation is sound, but I find it easier to think about if you follow the pseudo-code in the Wikipedia article about PVS.

Another piece of advice: Use integers as evaluation values. It's not like you can't use doubles, but it gets trickier, and all the pieces of code you based your code on were actually using integers as scores. That's why you add 1 to the current best score for the scout search, instead of using something like java.lang.Math.nextAfter.

I'll take some time to read and understand your answer but I'll give you more details on this:

  • executeSearch starts the search from the root. This is needed because the standard negaScout returns a numeric value while I need a Move (wrapped inside the MoveValue class). The function takes both alpha and beta because it extends a class which requires this signature but, from the pseudocode found online, I don't need them
  • double best = Double.MIN_VALUE;
    is used to give a low value before starting the search. I know I should use Double.NEGATIVE_INFINITY but using it gave odd results and I reverted to MIN_VALUE
  • double score = -negaScout(best, best, depth - 1, 0, playerType.opponent());
    I found this code somewhere online, I can't find the source again but it happened to work for the black player so I thought it was correct

  • "Speaking of which, where is `alpha' used in this function?"
    Considering that the main call is
    executeSearch(Double.MIN_VALUE, Double.MAX_VALUE, depth), alpha and best should be the same thing. I could use just alpha, but my source used this code and so I did
  • I'm checking canContinue everytime to avoid going deeper if it's not needed, so that the thread that's doing the search can terminate fine.
  • "Your evaluation function is expected to return a score from the point of view of the player to move."
    Actually I'm starting to have some doubts because the software that I'm using as reference shows the same scores despite the player that I'm using
  • "You could call undoLastMove here, instead of doing it in three different places further down"
    I'm not undoing it because it may happen that the result is outside the scout window
    so I need to make a full search for the same move
  • "Operators -- and ++ don't do what you think they do"
    And you're right, this one was really really noobish. I'll fix them ASAP

Use integers as evaluation values

I would use them if only the evaluation function won't return a double :)

I don't like making line-by-line comments about a post, but the format you used makes it too natural.

executeSearch starts the search from the root. This is needed because the standard negaScout returns a numeric value while I need a Move (wrapped inside the MoveValue class). The function takes both alpha and beta because it extends a class which requires this signature but, from the pseudocode found online, I don't need them


It looks like it should not extend a class that requires that signature, then. Rethink the design so you don't need to pass parameters that don't belong.


double best = Double.MIN_VALUE;
is used to give a low value before starting the search. I know I should use Double.NEGATIVE_INFINITY but using it gave odd results and I reverted to MIN_VALUE


Double.MIN_VALUE is effectively zero. -Double.MAX_VALUE would do. Of course you see odd results if you use the right number, because your code is seriously broken. You don't see odd results when using Double.MIN_VALUE only because you are not looking carefully enough at the answers you get.


double score = -negaScout(best, best, depth - 1, 0, playerType.opponent());
I found this code somewhere online, I can't find the source again but it happened to work for the black player so I thought it was correct


I can guarantee this is not doing the right thing for black either.


"Speaking of which, where is `alpha' used in this function?"
Considering that the main call is executeSearch(Double.MIN_VALUE, Double.MAX_VALUE, depth), alpha and best should be the same thing. I could use just alpha, but my source used this code and so I did


The confusing thing here is that the function that searches the root takes an alpha-beta window.


I'm checking canContinue everytime to avoid going deeper if it's not needed, so that the thread that's doing the search can terminate fine.


The alternative I suggest is only checking every certain number of nodes. This is the beginning of the search function in my chess program (in C++):

int Engine::negamax(int alpha, int beta, int depth, int from_root) {
  if (depth <= 0)
    return quiescence_search(alpha, beta, from_root);

  if (++nodes_searched >= next_time_check) {
    next_time_check = nodes_searched + 30000l;
    check_if_time_is_up();
  }
  // ...

"Your evaluation function is expected to return a score from the point of view of the player to move."
Actually I'm starting to have some doubts because the software that I'm using as reference shows the same scores despite the player that I'm using


There is no doubt that NegaScout uses that convention. What the software ends up displaying to the user is a different matter.


"You could call undoLastMove here, instead of doing it in three different places further down"
I'm not undoing it because it may happen that the result is outside the scout window so I need to make a full search for the same move


Oh, I see.


Use integers as evaluation values


I would use them if only the evaluation function won't return a double smile.png


You can always multiply the return value of the evaluation function by some constant and round to an integer. I have recently written an alpha-beta searcher using floats for evaluation values for the first time, and it's doable, but you have to be extra careful with some details, and I don't recommend it to you are this point. I might still go back and change it to ints, because it bothers me to no end that a draw is sometimes 0 and sometimes -0 (whatever that means).

This topic is closed to new replies.

Advertisement