I'm currently developing a solver for a trick-based card game called Skat in a perfect information situation. Although most of the people may not know the game, please bear with me; my problem is of a general nature.
Short introduction to Skat:
Basically, each player plays one card alternatingly, and three cards form a trick. Every card has a specific value. The score that a player has achieved is the result of adding up the value of every card contained in the tricks that the respective player has won. I left out certain things that are unimportant for my problem, e.g. who plays against whom or when do I win a trick.
What we should keep in mind is that there is a running score, and who played what before when investigating a certain position (-> its history) is relevant to that score.
So that everyone can imagine better how the score develops thoughout a Skat game until all cards have been played, here's an example. The course of the game is displayed in the lower table, one trick per line. The actual score after each trick is on its left side, where +X is the declarer's score (-Y is the defending team's score, which is irrelevant for alpha beta). As I said, the winner of a trick (declarer or defending team) adds the value of each card in this trick to their score.
The card values are:
Card J A 10 K Q 9 8 7
Value 2 11 10 4 3 0 0 0
If you have questions about Skat/its rules nonetheless, I would love to elaborate.
I have written an alpha beta algorithm in Java which seems to work fine, but it's way too slow. The first enhancement that seems the most promising is the use of a transposition table (TT). I read that when searching the tree of a Skat game, one will encounter a lot of positions that have already been investigated.
In theory, I figured, when I find a position that has already been investigated before, I need to substitute the "old" position's running score with the score of the current position. In other words: When I store a position in the TT, I take the optimal value that alpha beta returns, subtract the node's running score from it, and associate the difference with the position. To my understanding, this difference represents the maximum points that the declarer can achieve in this subtree, with the current position as the root. Analogously, when I find a previously investigated position, I return the sum of the previously stored value and the current position's running score.
Enough with the theory, here's how I implemented it in Java:
The storing routine of a node:
// "bestVal" is alpha or beta, depending on the player
// "tranpo" is a HashMap<Integer, int[]>
int val = bestVal - node.getScore();
int hash = logic.hashNode(node);
// "TT_VALID" is a flag constant
transpo.put(hash, new int[]{TT_VALID, val});
And the lookup:
int hash = logic.hashNode(node);
int[] sameState = transpo.get(hash);
if(sameState != null) {
int val = node.getScore() + sameState[1];
if(sameState[0] == TT_VALID) {
return val;
}
}
I'm only storing "VALID" nodes for now. Those are the nodes that haven't been pruned by alpha/beta.
The problem is, that it won't return the same (correct) results that the normal alpha beta algorithm returns. And the moves it suggests are obviously suboptimal, but not necessarily "absurd".
I've been debugging this for a while, without success. The hashing function seems to work correctly (I tested it by comparing allegedly identical nodes with their verbal represantations). More importantly, I don't suspect any error in the alpha beta algorithm itself, since it works perfectly without the TT. If there's a reason that my assumptions could be wrong, please let me know, and I'll post more code.
Given, my assumptions are correct, the problem has to be in the storing or lookup routine. What am I missing? Is my whole approach logically flawed? Or is it something technical I've overlooked?
Since I don't expect a magical solution to appear, I'm mainly asking about how I would debug this. Since the algorithm makes millions of recursive calls, I can't just let it run line by line and sit there with my calculator until I find the mysterious error. Where should I start debugging? Maybe some kind of back-to-back test with the working alpha beta function? Or what inconsistencies could I look for?
References:
There's almost no English sources out there that deal with AI in skat games, but I found this one: A Skat Player Based on Monte Carlo Simulation by Kupferschmid, Helmert. Unfortunately, the whole paper and especially the elaboration on transposition tables is rather compact.
Again, if there's something you would like to have clarified, don't hesitate to ask!