Quote:Original post by the_absent_king How many playouts will I need to run / how many playouts per possible move? Is there any way of estimating the order of playouts without testing? I hear 500+ playouts is good for Connect Four - I have a much larger search space.
I wouldn't venture to make a guess without testing. You can think about it this way: With 0 playouts your program will pick random moves. With infinitely many playouts (and infinite RAM), it will play perfectly. Whether anything in between is acceptably strong is a matter of measuring it.
Quote:
How can I keep the AI strong on weak hardware without running up the amount of time the AI has to think each turn? By our release date, Unity3d will publish to Windows/Mac desktop, iPhone/iPad/iPod, Android, Wii, and PS3. The iPhone processor is something like 400MHz.
Well, if you find out that you need too many playouts to get a decent level of play, this will be an issue. If you are targeting the iPhone, perhaps you should consider a more immediate utility-based scheme of the type that Dave described ealier. I'll be happy to try to explain it differently if you didn't understand his description.
Quote:
It is possible that for some playout strategies the game state would reach equilibrium, where neither side is able to win. Since most games would last no more than N turns, is it reasonable to call a playout a loss after N*M turns, where M>1?
It's OK to call it a draw too. It's not too hard to adapt MCTS to accept draws.
Quote:
What if no playouts return victory? Pick a random move, or have a default one? (this question is mostly just me pondering)
Well, if you have some idea of what makes a "better loss" (longer game, more dignified or whatever), you can use a continuum of results for the playouts.
The thing is, in Connect 4, most moves are either good or bad. In a game like yours, there will be a lot of moves that are completely meaningless, so I'm curious how you're going to deal with that. Are you doing the normal MCTS with truly random moves, or do you plan to implement some kind of move-ordering? I can imagine wasting a great deal of playthroughs just moving characters around aimlessly.
Quote:Original post by lmelior The thing is, in Connect 4, most moves are either good or bad. In a game like yours, there will be a lot of moves that are completely meaningless, so I'm curious how you're going to deal with that. Are you doing the normal MCTS with truly random moves, or do you plan to implement some kind of move-ordering? I can imagine wasting a great deal of playthroughs just moving characters around aimlessly.
I am sure it's not that hard to come up with a policy that is the practical equivalent of random moves in connect-4 or go. For instance, consider only moves that consist of moving near a possible target for an action and then performing the action, and pick one of those at random. If that doesn't result in short enough playouts, favor attacking actions.
Been trying to figure out how to maximize the speed of the playout. Im prone to making things too complicated, so can you please provide a sanity check?
Is it a good idea to consider running the first N ply of a playout using a "slow" strategy, while reverting to a simpler, faster strategy for the remainder? Probably wont need to do that here, just curious.
---
I think of the classical pathfinding approaches Dijkstra's may make the most sense, due to the size of the graph (largest map size in the game I'm all-but-cloning is 14*13=182) and the regularity of obstructions. However, the playout needs speed and Dijkstra's is not as fast as a look up table...
My favored implementation is a 3 dimensional matrix. The first dimension contains the "from" node, and the second dimension contains the "to" node (there would need a separate vector of pointers tying every index on this look up table to every node). The third dimension is the actor's "jump" score (a measurement of how well you traverse vertically). This dimension would range from 0 to 10, where 0..9 are possible values, and 10 is the special case where an actor has the ignore-height flag.
Therefore, the value at [J][K] is the minimum number of squares necessary for an actor, with jump of K, to move to reach the J node from the I node. A value of -1 is the special case where no path is possible.
This matrix would be algorithmically built during development for each map (~50 maps intended for launch).
Assuming each element is a 1 byte integer, and the array takes no space beyond the sum of the size of it's elements, the array will take N*N*11 bytes of space. For our "worst case" of N=182, that is 182*182*11=364,364 bytes that will need to be held in memory. Realistically, there is the chance of bridges etc that allow two nodes to be placed atop of each other. For N=200, N*N*11= 440,000 bytes.
The iPhone has 128 megabytes of available RAM for apps (correct?), so that is < 0.5% of total allowable size. This seems reasonable to me.
Note that this approach does not tell you what the path is, merely the length of the path. The path itself is needed for animation, but not for playout. Path can be generated *after* deciding where to move, via Dijkstra's algorithm.
Also, there would be another 2x two dimensional arrays. They are both Boolean. The first tells you if there is line-of-sight between nodes, and the second tells you if the pair of tables threatens each other with parabolic projectiles.
---
I have to go back to work now, but I am almost done writing up my proposed implementation of MCTS for a similar sanity-check from the community. Thanks, you guys are really making my life a lot simpler.
This sounds OK as a first attempt. Perhaps you can use bitboards to determine what moves are available. A bitboard is a compact representation of a subset of the board (i.e., one bit per square). std::vector<bool> in C++ might do the job. Or you can roll your own structure using as many machine words as you need to cover the board size.
You can have a structure with one bitboard per origin square and character type (or radius-jump combination). Presumably there aren't that many character types on the board at a given game, and this structure will use about N*N bits per character type. You can easily compute it at the beginning of the game.
You can also have a similar structure to find out which squares a character can be attacked from (with whatever parameters of the attack apply here).
When trying to generate a random turn for an actor, you can pick an attack, determine all the places from which it can be applied by using bitwise OR on the maps for each enemy actor. Then you can use bitwise AND between that bitmap and the places that the actor can get to. This will identify the places from which that attack can be used. If it's empty, pick a different attack.
Well, perhaps you can't use this scheme exactly, but hopefully you can extract some ideas out of it.
Quote:Original post by the_absent_king Note that this approach does not tell you what the path is, merely the length of the path. The path itself is needed for animation, but not for playout. Path can be generated *after* deciding where to move, via Dijkstra's algorithm.
Let me point out that having costs is equivalent to having paths, and you don't need to run Dijkstra again. At every step, you just need to do the following: Look at all the nodes adjacent to where you are, and go to whichever one has the lowest cost relative to the goal. That's it: A greedy steepest-descent algorithm. All the "looking into the future" that Dijkstra does has already been summed up by the costs.
Quite right, Emergent, and thats what I intended to do. I called it Dijkstra's due to an artifact of how I had originally written that post (I am usually very verbose, I have to trim my soliloquies to a reasonable size before posting).
I probably wont be using bitboards. Unity3d (my engine) uses Mono, and has three different 'dialects' (languages) that you can use to get there - javascript, C#, and Boo. (note, the "javascript" is compiled) Also, I dislike C++.<br><br>What InnocuousFox mentioned sounded like some type of graph traversal which the textbook in front of me doesnt have a name for, but I remember encountering before. I believe I get the gist of it, but a translation would be welcome. I am however a little reluctant to implement a rule-based system since my game has a lot of emergent behavior (and I cringe to use, even correct, buzz-words). Plus, there are a lot of different types of actions, and I think Im getting a grip on the specifics of MCTS.<br><br>The rest is mostly just talking aloud in a place where someone might be able to correct me, just so I can convince myself that I know how to do all of this.<br><br>---<br><br>First thing I will need to do is develop a playout strategy. Each skill in the game (excluding the special case, always available, "do nothing" skill) is given a simple formula to determine its 'usefulness score' on that actor. In the case of simple, single targeted damage skills this value is their expected average damage. Hit percentile, AOE, charge time, range, and mana cost all further cause skills to have their score adjusted up and down. This is because a randomly generated actor may have low mana, but high mana-cost skills which will need to be used reservedly.<br><br>Skills which dont deal damage (buffs, debuffs) will have to be evaluated in some other way.<br><br>At game start a vector of length N+1 is generated where index N corresponds to skill N, and index 0 is the special case "do nothing". Each element is set to that skill's 'usefulness score'. Index 0 is set to, say, 10-20% of the magnitude of that vector. A second vector, a <a href="http://en.wikipedia.org/wiki/Probability_vector">probability vector</a>, is created, by normalizing the first vector. This is done instead of overwriting the first vector in case the probability vector needs to be recalculated. This can happen because of debuffs, like the actor having it's weapon broken (thus massively changing the usefulness of certain skills). Every function which changes a stat which could change the probability vector will cause the probability vector to be recalculated before returning.<br><br>---<br><br>Next I need to determine how the simulation is run. Nominally, on any given turn a unit may move and act. This may happen in either order. Both have the special case of "do nothing". At the end of every turn, even if the actor is unable to act or move, the actor may choose a new (cardinal) direction to face. (attacking from side / behind is often more successful) Because doing nothing in both/either the move or act parts of your turn results in your next turn coming about quicker, sometimes the best strategy is to do nothing.<br><br>Ive decided to handle it like so internally: At simulation start, a vector of all nodes it is possible to move to with the current Move/Jump score is created. The creation of this vector is unaffected by the "Dont Move" debuff, in case the actor is able to cure it using their action, then retreat. <br><br>Before a playout can begin, a turn must be chosen. A vector of 5 elements represents the actor's turn. They are conceptualized as Move1-Act-ActTarget-Move2-Face. Each element is chosen pseudo-randomly. There are two arrays of equal length, with <br><br>If the "Dont Move" debuff is present, then Move1 is set to 0 ("do nothing"). With 50% chance, Move1 is set to 0 (it is way more often a good strategy to stay still and act then it is to move without acting). The remainder of the time, a node to move to is randomly selected from the vector of legal moves.<br><br>Next we have to decide what action to do. An action is selected with probability dependent upon the actor's playout strategy (the previously mentioned probability vector). <br><br>A target for this action must be decided. There are two types of skills, instant and those with a charge time. <br><br>If the skill is instant, or has a charge time that will resolve before the next actor's turn, a vector of all legal targets is created. This vector is put in random order. The vector is then traversed ('in order'), and the first target node with a chance of hitting anyone is considered. (to prevent running playouts where you attack nothing)<br><br>If the skill is charged, and has a charge time that will resolve after at least one other actor's turn has passed, then a random node is chosen to be the target.<br><br>If Move1 is non-zero, Move2 is set to 0. If the Action is a charged skill and flagged "Unable to move while charging", then Move2 is 0. If Move1 is 0, there is a 50% chance that Move2 is 0. Elsewise, Move2 is randomly selected from the vector of legal moves. During resolution of the turn, if the "Dont Move" debuff is present when evaluating Move2, then Move2 is set to 0. (allows actor to move the same turn as removing the "Dont Move" debuff from itself)<br><br>What cardinal direction you face is simply randomly chosen from the 4 possible options.<br><br>Playout can now begin. Each non-dead actor uses their playout strategy blindly, and this is repeated until either there is an excess of turns (draw) or all of the actors on one side are dead. If the actor who is running the simulation has living team members, then this is back propagated as a win. Elsewise, as a loss.<br><br>"Zero ply" turns are continued to be generated in this way, and their playouts evaluated, until X time has passed and at least N playouts were performed.<br><br>The turn with the best Win:Loss ratio is considered the winner, and as long as it has at least M playouts to its name.<br><br>If the amount of time the simulation has been running exceeds a certain threshold Y (where Y > X), then the best <br><br>---<br><br>When it is time resolve the chosen turn in the "real" game, a movement path is built using the greedy descent method Emergent pointed out. There are actually slight changes to this, due to how actor's can jump across small gaps (ie, next node might not always be adjacent), and the presence of varying depths of water, but I feel confident in how I plan to approach those problems. Animation is done based off of the path generated from this approach.<br><br>The action (with ActionTarget) is evaluated, it either hits or does not, and is then animated accordingly. Move2 is evaluated as Move1, and is in turn animated. Direction to face is chosen. ActiveTurn passes to the next actor.<br><br>I can adjust the AI by changing how the probability vector for playout is created, changing the threshold of playouts required, taking draws into consideration, and further culling pre-playout. <br><br>For some reason, I am only half sure this will work. It feels almost like I am cheating (having an AI that is based around "elegant brute force").<br><br>---<br><br>EDIT: One more thing. Because of the freakin huge search space I was considering that turns after, say, ply-50 are considered using an easier/faster approach then the relatively in depth method I described for playout. <br><br>After this certain point, all units are considered to be within range of each other. Standard AOEs have a 50% chance of hitting a second target, while the largest of AOEs have a further 50% chance of hitting a third target. All actors are considered to have unlimited mana, and always choose their most useful action (highest score in the probability vector). <br><br>This should result in a more rapid collapse to a win/loss state, and any inaccuracies introduced should be mitigated by the law of large numbers, and the fact that tactical advantages of a ply-0 move are effectively impossible to consider that far in the future.<br><br>Basically, after a certain point in the playout, tactical representation of the board is eschewed for an idealized, strategic approach.<br><br>My only concern is that because the enemy team will typically be point-per-point stronger then the player team (larger number of actors, higher level, better gear, etc) that the super-fast approach that deep in the playout will almost always return a victory for the AI team, despite the actual 'worthiness' of the ply-0 move of the playout.<br><br>Would it also be acceptable to, instead of running a playout until a "true" win/loss state is reached, that all playouts are run for 100 turns, then determine the "win/loss" state based on number of actors alive per team, and their percentile health?<br><br>Sorry for what must seem like silly questions (back of my brain is saying "of course that will work, if you set the threshold deep enough") but Im cautious of the whole approach for reasons I stated earlier. Thank you again for help.<br><br><!--EDIT--><span class=editedby><!--/EDIT-->[Edited by - the_absent_king on June 20, 2010 2:43:58 PM]<!--EDIT--></span><!--/EDIT-->
I think at this point you'll have to implement the basic idea and experiment with the different options that you are considering. You should find some method to evaluate your progress, like playing 100 matches against a fixed opponent using a script.
I just wanted to mention that using bitmaps doesn't mean you have to implement them in C++. All it needs is a compact structure that uses one bit per square to represent some property on the board. Finding the squares that satisfy both condition A and condition B can be done in a few cycles on a 64-bit machine (all it takes is 4 bitwise AND operations).
One more idea is to try to separate the GUI part of the program from the AI as much as possible. Ideally you would run them as separate processes, communicating through a pipe (although I don't think the iPhone would allow this). In chess there is a protocol called UCI which is used precisely in this manner. This means the same GUI can be used for many engines, you can match engines automatically, and you can connect your engine to an Internet server so people can play against it, and it all works because a standard protocol has been adopted.
Agreed, on needing to implement. As mentioned, this was public-pseudo code.
On separating the visual side of things (the AI is ignorant of any 'graphical user interface', so I assume you mean the visual representation of the game state), Im not 100% sure of that but I think what Ill do during the simulation/playouts is copy the relevant part of the game state elsewhere (1000 units away, with a culling layer set so it is invisible to the camera), and act on it freely.
Not sure about that, but I think asking the Unity community on how to do this is my best bet, since a number of the particulars of my implementation (avoided here) are specific to Unity.
I remember trying to optimize some code in Unity one time by including bit operations (XOR), and it didnt work. I believe NOT, AND, and OR were the only ones implemented, and they didnt show the speed increase they should've.
Given that knowledge, is a bitboard still generally superior to a boolean matrix, or storing such bools in my nodes (which are Unity Game Objects)? It seems like a sufficiently long vector of booleans (or a bitstream), with rows given by Placement/Width and columns by Placement%Width would be an easy way of making my own bitboards. Not sure on enough of the internals to say thats true, though.
I am sure I will be back to ask for more help when it comes time to implement. Till then, I noticed that this board has only one page. Why? Are old posts culled, will I need to back up this thread elsewhere, etc?