Advertisement

Chess Engine - search development priority

Started by April 21, 2013 02:51 PM
16 comments, last by crybot 11 years, 9 months ago

hi, I'm currently developing a chess engine in c++ (here is the github repository: https://github.com/crybot/NapoleonPP), it is called NapoleonPP (probably i will change the name...) and it is based on magic bitboards move generation.

I've recently completed move generation which i think is fast enough (it can calculate perft 6 in about 1,2 seconds without special improvements like threading or transposition tables) . So I started working on search and evaluation, my primary goal was to build a nice search function so i just created a very simple evaluation function that takes care only of material, movement of some pieces and center control.

Now i'm stuck...

I tried to play with the program, and i must say it plays a discrete chess, but it is sooo slow... at some point it takes more than 10 seconds to search under ply 6...

what i need now is to know what to do first to improve my engine speed.

I need to implement move ordering (by now i search first captures and then non captures), Null move pruning, pv-search, principal variation extraction, iterative deeping, transposition tables(actually i've implemented transposition tables, but i didn't see benefits on speed, maybe i did something wrong? ), ecc.ecc.

If someone could direct me to the right way (maybe with description and some resources on the argument) I would be very happy.

Here is my actual search function:


int Search::search(int depth, int alpha, int beta, Board &board)
    {
        int bound = Alpha;
        int score;

        if((score = board.Table.Probe(board.zobrist, depth, alpha, beta)) != TranspositionTable::Unknown)
            return score;

        if( depth == 0 )
        {
            score = quiesce(alpha, beta, board);
            board.Table.Save(HashEntry(board.zobrist, depth, score, 0, Exact)); // should i save leaf nodes??
            return score;
        }

        int pos = 0;
        Move moves[Constants::MaxMoves];

        BitBoard attackers = board.KingAttackers(board.KingSquare[board.SideToMove], board.SideToMove);

        if (attackers)
            MoveGenerator::GetEvadeMoves(board, attackers, moves, pos);
        else
            MoveGenerator::GetAllMoves(moves, pos, board);

        BitBoard pinned = board.GetPinnedPieces();

        for (int i=0; i<pos; i++)
        {
            if (board.IsMoveLegal(moves, pinned))
            {
                board.MakeMove(moves);
                score = -search(depth - 1, -beta, -alpha, board);
                board.UndoMove(moves);

                if( score >= beta )
                {
                    board.Table.Save(HashEntry(board.zobrist, depth, beta, 0, Beta));
                    return beta;   //  fail hard beta-cutoff
                }
                if( score > alpha )
                {
                    bound = Exact;
                    alpha = score; // alpha acts like max in MiniMax
                }
            }
        }

        board.Table.Save(HashEntry(board.zobrist, depth, alpha, 0, bound));
        return alpha;
    }

Here is Transposition Table Code:


    TranspositionTable::TranspositionTable(unsigned long size)
    {
        Size = size;
        Table = new HashEntry[size];
    }

    void TranspositionTable::Save(HashEntry entry)
    {
        HashEntry* hash = &Table[entry.Hash % Size];

        hash->Hash = entry.Hash;
        hash->Score = entry.Score;
        hash->Depth = entry.Depth;
        hash->Bound = entry.Bound;
    }

    int TranspositionTable::Probe(ZobristKey key, int depth, int alpha, int beta)
    {
        HashEntry* hash = &Table[key % Size];

        if (hash->Hash == key)
        {
            if (hash->Depth >= depth)
            {
                if (hash->Bound == Exact)
                    return hash->Score;
                if (hash->Bound == Alpha && hash->Score <= alpha)
                    return alpha;
                if (hash->Bound == Beta && hash->Score >= beta)
                    return beta;
            }
        }

        return TranspositionTable::Unknown;
    }

I made some test, in particular i calculated kNodes per second by incrementing a variable every time i call alphabeta and quiesce:

I got about 3000-4000 kn/s using 32 MB for the transposition table (even if it does not make almost any difference).

My system configuration:

O.S.: Arch linux 64 bit (xfce)

CPU: Intel i5 760 2.80 Ghz (Quad core)

It is fast enough? I'm average with other non professional engines?

Any advice is welcome.

thanks all smile.png

Move ordering is probably the most important part, especially in the quiescence search. MVV-LVA is a simple and effective order for captures. For non-captures, search killer moves first and then everything else in history-heuristic order. You can then try to move captures with negative SEE to the end of the move list, so they are searched after non-captures.

Ah, and more important than most of those things, you should store the best move in the hash tables, and try it first in case of a hash hit where you cannot just return a score (because of not enough depth or a useless bound).

By the way, perft 6 in 1.2 seconds is pretty impressive, if it's from the starting position. My own engine fits a similar description to yours, and my perft is about 6 times slower than that (although we are not using the same hardware, of course).
Advertisement

Thanks for your answer, now I see all more clear, but I still have some doubts:

regarding Transposition tables, should i save leaf nodes during alphabeta (and maybe also during quiescence search)? I tried to run the program without saving them and it seems to work faster.

For storing the best move, can i just use the index of the move inside the array?

That's all, thank you, again.

P.S. i forgot one important thing: How can i prove my transposition table is working correctly?

Thanks for your answer, now I see all more clear, but I still have some doubts:
regarding Transposition tables, should i save leaf nodes during alphabeta (and maybe also during quiescence search)? I tried to run the program without saving them and it seems to work faster.

That's a minor detail. In my own engine, I don't use the transposition tables at all in quiescence search, but I know others do it. You will just have to test it. But it's not critical.

For storing the best move, can i just use the index of the move inside the array?

I used to do that in my old engine (Ruy-López), but that requires you to generate all the moves before you even try the move from the transposition tables. Since this move very often produces a beta cutoff, it's worth writing a function that checks validity of a move on a given position and try to avoid generating the move list altogether.

I use a 16-bit format for the move, as described here: http://chessprogramming.wikispaces.com/Encoding+Moves

P.S. i forgot one important thing: How can i prove my transposition table is working correctly?

That's a very difficult question. A few ideas:
* You can run a search up to depth D (I assume you are using iterative deepening) and then rerun it without clearing the transposition tables: It should be almost instantaneous.
* You can run a long search on a position with and without transposition tables, and the version with transposition tables should generally get to the same depth faster.
* You can match your program against itself, with and without transposition tables, at a fixed depth, and the results should be even.

I recommend you spend some time trying better replacement schemes. I use buckets of size 4, which seems to work very well.

I used to do that in my old engine (Ruy-López), but that requires you to generate all the moves before you even try the move from the transposition tables. Since this move very often produces a beta cutoff, it's worth writing a function that checks validity of a move on a given position and try to avoid generating the move list altogether.

I use a 16-bit format for the move, as described here: http://chessprogramming.wikispaces.com/Encoding+Moves

That's sound cheaper than what I implemented now, I'll try.

You said that you can encode a move in 16 bit, but as far as I know the minimum is 32 bit (how do you manage castle and en passants?). Omitting this, I made some test when i was working on the move generator and it came out that my move encoding (which is a class with 4 byte fields ) is faster than using a single 32 bit integer for representing it. Should I change encode method for saving some memory in the transposition table?

That's a very difficult question. A few ideas:
* You can run a search up to depth D (I assume you are using iterative deepening) and then rerun it without clearing the transposition tables: It should be almost instantaneous.
* You can run a long search on a position with and without transposition tables, and the version with transposition tables should generally get to the same depth faster.
* You can match your program against itself, with and without transposition tables, at a fixed depth, and the results should be even.

for trying point one I should first implement iterative deeping (when should i do that?).

I will try point 2 and 3 right now.

Thank you for your detailed answers.

P.S. I've just implemented MVV-LVA and the search is now MUCH faster! thank you.


I used to do that in my old engine (Ruy-López), but that requires you to generate all the moves before you even try the move from the transposition tables. Since this move very often produces a beta cutoff, it's worth writing a function that checks validity of a move on a given position and try to avoid generating the move list altogether.

I use a 16-bit format for the move, as described here: http://chessprogramming.wikispaces.com/Encoding+Moves


That's sound cheaper than what I implemented now, I'll try.
You said that you can encode a move in 16 bit, but as far as I know the minimum is 32 bit (how do you manage castle and en passants?).


Castling and en-passants are encoded in the 4-bit "move type" described in the page I linked to. That plus 6-bit "to" and "from" gets you to 16 bits.

Omitting this, I made some test when i was working on the move generator and it came out that my move encoding (which is a class with 4 byte fields ) is faster than using a single 32 bit integer for representing it. Should I change encode method for saving some memory in the transposition table?

No, you can always convert to and from the 16-bit format just for transposition-table access. That shouldn't be too slow.


That's a very difficult question. A few ideas:
* You can run a search up to depth D (I assume you are using iterative deepening) and then rerun it without clearing the transposition tables: It should be almost instantaneous.
* You can run a long search on a position with and without transposition tables, and the version with transposition tables should generally get to the same depth faster.
* You can match your program against itself, with and without transposition tables, at a fixed depth, and the results should be even.


for trying point one I should first implement iterative deeping (when should i do that?).


Now would be a good time. It's basically impossible to implement time control without it, and it's very hard to work with an engine that doesn't control time properly.

P.S. I've just implemented MVV-LVA and the search is now MUCH faster! thank you.

Yup, that one is a biggie.
Advertisement

Now would be a good time. It's basically impossible to implement time control without it, and it's very hard to work with an engine that doesn't control time properly.

I've just implemented a sort of iterative deeping (without time control, just for testing the engine) and started checking correctness of the transposition table.

I have found a few bugs and the engine seems to work properly now:

* You can run a search up to depth D (I assume you are using iterative deepening) and then rerun it without clearing the transposition tables: It should be almost instantaneous.
* You can run a long search on a position with and without transposition tables, and the version with transposition tables should generally get to the same depth faster.
* You can match your program against itself, with and without transposition tables, at a fixed depth, and the results should be even.

point 1 and 2 are ok, but point at point 3 i noticed that then engine doesn't produce the same final state after a "self-play" with and without transposition table.

I also tried to build a perft routine with the transposition table and, probably for some kind of collision error, i got, at depth 6, a node count of 119060316 instead of 119060324 (that i can obtain if I reduce the table size...).

I think zobrist keys are ok, so what's going on?

The last thing I have doubts about is principal variation: I'm triyng to improve move ordering and it seems that principal variation moves are the most important moves to try during alpha-beta search, but, with transposition table implemented, aren't them already the first moves tried (best moves)?

Thanks.

You need to debug the difference in perft with and without transposition tables. Can you post the definition of the type of your TT entries?

You need to debug the difference in perft with and without transposition tables. Can you post the definition of the type of your TT entries?


namespace Napoleon
{
    enum ScoreType { Exact, Alpha, Beta };

    class HashEntry
    {
    public:
        ZobristKey Hash; // unsigned long long
        Byte Depth; // unsigned char
        Byte Bound; // unsigned char
        Move BestMove; // 32 bit integer wrapper
        int Score;

        HashEntry();
        HashEntry(ZobristKey, Byte, int, Move, Byte);
    };
}

The data structure seems correct. The only comment I have is that if you switch to using a 16-bit representation for the move, the size of a hash entry will go from 24 bytes to 16 bytes (this depends on the compiler, of course).

This topic is closed to new replies.

Advertisement