The classic simple example of cooperative game theory is Prisoner's Dilema. That may give you a starting point.
Sorry I haven't weighed in, Kylotan... it's been a busy week. This sort of thing is right up my alley. Maybe later tonight.
Grouping actors based on competence and interest
Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play
"Reducing the world to mathematical equations!"
Quote: Original post by KylotanQuote: Original post by NotAYakk
Let's suppose we have the threshold game -- you need the competence total, or others can freeload and/or you are screwed success wise.
Then for each task:
Sort all actors by competence. Sum up how many of the most-competent are needed to break the difficulty.
Divide (reward) / (# of actors needed). (I believe reward is fixed by task, not by actor, right?)
Do this for each task. Then, for each task, for each player, figure out if they prefer a different task.
Yep, my original plan was just like this, except instead of summing up the most competent, I summed up all people eligible for the task (ie. with some positive degree of competence).
What about the question -- do you have linear results? If so, then nobody ever wants someone less competent helping them at a task if they want to maximize their average reward.
Quote:Quote: Now if you don't want to use currency, you can make conclusions about a subset of games at this point.
I think you've talked about 'using currency' a couple of times but I still have no idea what that really means in implementation terms. I can imagine some sort of hypothetical currency based on competence level and a reverse auction where actors bid on the tasks they want, something like easyBob0101's suggestion?
Either allow people to trade on the rewards produced by the tasks, or allow them to create a "favor" currency in which actors are convinced to do something they find less than optimal in exchange for a bunch of other actors getting a benefit from it, which is later used to have other actors do favors for this actor.
Ie: bittorrent clients have a goal (bytes downloaded) and a limited period currency (are you also uploading bytes at me?). The goal and the currency are distinct.
Because of exploit risks, it doesn't go further than this -- but imagine if whenever you uploaded to someone, they would say "thanks" and toss you upload currency.
This isn't exchanged for downloads, but rather to encourage people to download to you. And when you upload, you bias yourself towards uploading to people who bribe you with more currency.
Same thing here. When you want a task done with high rewards, you offer what currency you have to bribe competent people who don't need it as much to do it for you.
This doesn't work in a perfectly linear model, because more competent people never want the help less competent people -- so someone who dominates everyone else will have no use for the currency...
The basic idea is there -- you allow the actors to pay others to help them, and later cash in on the payments to get others to help them.
Quote:Quote: Any task which has an ideal set of actor that ideally want to do that task can be committed to it.
As it happens, this is guaranteed to exist. The task that returns the highest return per actor is guaranteed to be the task that is best for each participant. In the case of a tie, random.
Remove those actors and that task from the pot. Repair the missing actors from the remaining tasks, and repeat.
But what about the situation where someone's best task never draws enough interest for it to be viable. eg. Task 1 fills up nicely, but Task 2's estimate relied on those people who are now doing Task 1. So someone who picked Task 2 over Task 3 and Task 4 is now allocated to Task 2, but in fact they will now receive no reward because Task 2 cannot be done. Task 2 is only guaranteed to the best for them if the estimate was correct, yes?
Then nobody wanted to do Task 2, because everyone sufficiently effective at doing Task 2 preferred some other task which gave them higher rewards.
I don't see the problem -- what do you want to happen when a task is both hard and doesn't reward the people who are needed to complete it as much as other tasks? Shouldn't it be neglected?
Quote:Quote: So you can do all of this in O(T * A * ln A) steps, which should be fast enough. It does require that order of memory.
It's ok, I'm not concerned about that aspect of complexity. Just in knowing exactly how to implement the algorithm, and in guaranteeing that it produces useful results.
If you don't care about complexity, then just look at every possible combination of actors on every combination of tasks, and score them.
This takes time roughly exponential in the number of actors/tasks -- ie, basically forever. But it would allow you to solve the problem as perfectly as your game evaluation function allows.
(I'd advise against that: complexity of solution is very important, because you want it to be solved this century.)
I think we're not quite understanding each other.
If by linear you mean straight line linear, then no. There is an optimal value of actors that can be assigned to a given task.
This may well work, but from my point of view, it just moves the problem to 'how do the actors decide how to trade this currency'.
The point is that your first sentence there is false. Let's say Bob may prefer Task 2 because they are not competent enough for Task 1. However, their preference was based upon Alice also being a participant in Task 2. Once you remove Task 1 and Alice from the equation, on the second pass Task 2 is no longer of interest to Bob because it's now too hard, without Alice. Yet someone did want Task 2 as their first choice.
This may not necessarily be a problem for my system, but I just wanted to highlight the discrepancy in your assumption.
All I was trying to say is that I don't particularly care about the optimal time/space solution. Just one that is practical to implement and will run in reasonable time. Given that there are probably loads of simple approximations that will yield something acceptable, I didn't want anybody to be worrying about the complexity of their suggestion.
Quote: Original post by NotAYakk
What about the question -- do you have linear results? If so, then nobody ever wants someone less competent helping them at a task if they want to maximize their average reward.
If by linear you mean straight line linear, then no. There is an optimal value of actors that can be assigned to a given task.
Quote: Either allow people to trade on the rewards produced by the tasks, or allow them to create a "favor" currency in which actors are convinced to do something they find less than optimal in exchange for a bunch of other actors getting a benefit from it, which is later used to have other actors do favors for this actor. [..] The basic idea is there -- you allow the actors to pay others to help them, and later cash in on the payments to get others to help them.
This may well work, but from my point of view, it just moves the problem to 'how do the actors decide how to trade this currency'.
Quote:Quote: But what about the situation where someone's best task never draws enough interest for it to be viable. eg. Task 1 fills up nicely, but Task 2's estimate relied on those people who are now doing Task 1. So someone who picked Task 2 over Task 3 and Task 4 is now allocated to Task 2, but in fact they will now receive no reward because Task 2 cannot be done. Task 2 is only guaranteed to the best for them if the estimate was correct, yes?
Then nobody wanted to do Task 2, because everyone sufficiently effective at doing Task 2 preferred some other task which gave them higher rewards.
I don't see the problem -- what do you want to happen when a task is both hard and doesn't reward the people who are needed to complete it as much as other tasks? Shouldn't it be neglected?
The point is that your first sentence there is false. Let's say Bob may prefer Task 2 because they are not competent enough for Task 1. However, their preference was based upon Alice also being a participant in Task 2. Once you remove Task 1 and Alice from the equation, on the second pass Task 2 is no longer of interest to Bob because it's now too hard, without Alice. Yet someone did want Task 2 as their first choice.
This may not necessarily be a problem for my system, but I just wanted to highlight the discrepancy in your assumption.
Quote: (I'd advise against that: complexity of solution is very important, because you want it to be solved this century.)
All I was trying to say is that I don't particularly care about the optimal time/space solution. Just one that is practical to implement and will run in reasonable time. Given that there are probably loads of simple approximations that will yield something acceptable, I didn't want anybody to be worrying about the complexity of their suggestion.
The trick is I decided that actors want the task that they are best at.
Second, I decided that if a more competent actor at a task wants to do a task, then a less competent actor has less priority, because the rest of the actors would rather do it with the more competent actor and get more reward (as more competent = fewer actors = higher reward per actor).
So you first build the "dream team" for every single task. That's cheap. You aren't paying attention to "but what if they go somewhere else" at this point.
Then you find the task that has the best reward per actor.
Everyone in this task has no other task that they would prefer doing, because this is the highest reward task that could possibly be generated by the rules. So in all possible set ups, none of these actors would prefer the other setup.
So those actors run off and do their task. And then you eliminate them from the pool of potential actors to solve tasks, as well as the task, and find the next best possible rewarding task per actor.
(Note: if actors can bribe each other, then the above assumption doesn't hold.)
(Note: as you stated, the reward per task is fixed, and divided among the actors equally. This property is important to make the above simple algorithm work.)
Everyone who is assigned to a task thus finds the task that is the best of all possible task setups for them at that point, given that other actors have also picked tasks that they cannot be lured away from.
This can screw some actors with crappy tasks, but that is because they suck. :)
...
This places the agency with the actors, and they maximize their own rewards. If your goal was to maximize the total task reward produced, the problem probably becomes NP-hard.
Second, I decided that if a more competent actor at a task wants to do a task, then a less competent actor has less priority, because the rest of the actors would rather do it with the more competent actor and get more reward (as more competent = fewer actors = higher reward per actor).
So you first build the "dream team" for every single task. That's cheap. You aren't paying attention to "but what if they go somewhere else" at this point.
Then you find the task that has the best reward per actor.
Everyone in this task has no other task that they would prefer doing, because this is the highest reward task that could possibly be generated by the rules. So in all possible set ups, none of these actors would prefer the other setup.
So those actors run off and do their task. And then you eliminate them from the pool of potential actors to solve tasks, as well as the task, and find the next best possible rewarding task per actor.
(Note: if actors can bribe each other, then the above assumption doesn't hold.)
(Note: as you stated, the reward per task is fixed, and divided among the actors equally. This property is important to make the above simple algorithm work.)
Everyone who is assigned to a task thus finds the task that is the best of all possible task setups for them at that point, given that other actors have also picked tasks that they cannot be lured away from.
This can screw some actors with crappy tasks, but that is because they suck. :)
...
This places the agency with the actors, and they maximize their own rewards. If your goal was to maximize the total task reward produced, the problem probably becomes NP-hard.
Ok...New idea to throw at ya...
Tasks will have a total reward and total risk. From those two values, calculate an task conpetence value. This value will be used to select actors from the pool of available actors.
The actors themselves will have a competence value and a reward threshold. Let's just say those two values are random for now, or reward threshold is related to competence. (as I don't know how you go about setting up your actors values).
Start randomly selecting actors(or just try them one at a time) that match the competence level of the task; add some noise so it looks more random. Every actor that is added divides the total reward evenly. If the divided reward drops below an actors reward threshold, that actor will not be able to join (he doesn't want to, you could say). You can stop at the first actor that doesn't join, or continue some more iterations to see if you can fit a few more in.
You can, if you want, allow the actors that already joined the group to leave if the divided reward is below their reward threshold. This process should attempt to "pack" the task, lowering risk to individual actors, and putting the most actors on the task with the required competence and minimum reward threshold.
-Chris
[Edited by - easyBob0101 on April 13, 2008 11:05:55 AM]
Tasks will have a total reward and total risk. From those two values, calculate an task conpetence value. This value will be used to select actors from the pool of available actors.
The actors themselves will have a competence value and a reward threshold. Let's just say those two values are random for now, or reward threshold is related to competence. (as I don't know how you go about setting up your actors values).
Start randomly selecting actors(or just try them one at a time) that match the competence level of the task; add some noise so it looks more random. Every actor that is added divides the total reward evenly. If the divided reward drops below an actors reward threshold, that actor will not be able to join (he doesn't want to, you could say). You can stop at the first actor that doesn't join, or continue some more iterations to see if you can fit a few more in.
You can, if you want, allow the actors that already joined the group to leave if the divided reward is below their reward threshold. This process should attempt to "pack" the task, lowering risk to individual actors, and putting the most actors on the task with the required competence and minimum reward threshold.
-Chris
[Edited by - easyBob0101 on April 13, 2008 11:05:55 AM]
Quote: Original post by easyBob0101
Ok...New idea to throw at ya...
An idea very much in line with your proposal was put forward in the last iteration of this thread.
Quote: Original post by Kylotan
Unfortunately my understanding of game theory is minimal. Can you explain in layman's terms what makes this situation different from Rock/Paper/Scissors, which has no pure strategy that is a Nash equilibrium, only a mixed strategy - right?
Rock-paper-scissors is an intransitive zero-sum game in that a situation of maximum payoff for one player coincides with the minimum for the other player. As such, two players cannot agree on an equilibrium point that maximises their individual rewards.
Quote: Original post by Kylotan
In this situation, since it's a non-repeated game, a mixed strategy isn't of any use, as I understand it.
The simple answer is that your game situation is not zero-sum. There is a strategy that each player could follow such that all players then receive a positive reward (or at least a positive expected reward).
Quote: Original post by Kylotan
And I would figure that adding damping to a permanently oscillating system would result in purely arbitrary results?
Not at all. Dy definition, adding (sufficient) damping to any oscillatory system will (always) force it to a stable critical point in finite time; in other words, a steady state solution.
Doesn't that damping theorem depend on the function being continuous though? If the 2 (or 3) possible values are discrete then I can imagine a situation where one is essentially picked arbitrarily. Anyway, it's not all that important in this scenario. I am just interested in learning a bit more about the theory here.
I'm almost done implementing one of the methods, but before I get some sleep and continue tomorrow, I was wondering if there was a simple formula for implementing a degree of risk aversion in my actors? At the moment all my actors are risk neutral, with the expected value being p(success) * reward(success) - p(failure) * cost(failure). Halve the chance of success and their interest roughly halves. How can I add another parameter in here to set risk aversion, parameterised on the amount? The formulae given on Wikipedia didn't make much sense to me, sadly.
I'm almost done implementing one of the methods, but before I get some sleep and continue tomorrow, I was wondering if there was a simple formula for implementing a degree of risk aversion in my actors? At the moment all my actors are risk neutral, with the expected value being p(success) * reward(success) - p(failure) * cost(failure). Halve the chance of success and their interest roughly halves. How can I add another parameter in here to set risk aversion, parameterised on the amount? The formulae given on Wikipedia didn't make much sense to me, sadly.
Quote: Original post by Kylotan
Doesn't that damping theorem depend on the function being continuous though?
No. Discrete systems can be damped and you'll end up with the same scenario as the continuous system; that it settles to a single state given sufficient time. In this problem though, you don't actually have a discrete system just a discrete time system, as the expected payoff is a continuous function (at least in theory...in practice finite precision representations will give you a numerically discrete system).
Quote:
I'm almost done implementing one of the methods, but before I get some sleep and continue tomorrow, I was wondering if there was a simple formula for implementing a degree of risk aversion in my actors?
You'd need to look into how risk aversion is incorporated in investment strategies to get a good answer to this question. Off the top off my head I don't have an answer for you... but perhaps someone around here does. ;)
Cheers,
Timkin
Risk Aversion (wikipedia) does a lot of math, but then notes that the computations don't...mesh well with lab results. The example given is that a person who refuses a gamble that could lose $100 or win $110 with even chances, consistently refuses a gamble that could lose $1000 with a 50% chance to win any larger amount of money.
At least in this case, the old saying "a bird in the hand is worth two in the bush" seems to be an understatement. Perhaps the Nigerian pimps refugees, mail-in rebates, and proclaimations of winning a bajillion dollars or a free x-box have left their mark on estimations of likelihood.
At least in this case, the old saying "a bird in the hand is worth two in the bush" seems to be an understatement. Perhaps the Nigerian pimps refugees, mail-in rebates, and proclaimations of winning a bajillion dollars or a free x-box have left their mark on estimations of likelihood.
--"I'm not at home right now, but" = lights on, but no ones home
It's quite likely that the existing models may be too simple, but I think they do capture a useful proposition, which is that a guaranteed $50 is more desirable to many people than a 50% chance of $100. I could do with modelling that curve in to my system so that as p(success) rises from between 0 and 1, the amount of expected reward rises more slowly at first, making the actors prefer slightly larger groups for safety.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement