Advertisement

Decision theory, high risk scenarios, etc

Started by February 15, 2008 10:34 AM
30 comments, last by Kylotan 16 years, 9 months ago
Quote: Original post by Kylotan
...I want agents to form groups where the size of the group reflects their wish to reduce their perceived risk (while not making the payoff trivial).


How do you define 'risk' for your agents (quantify it please).

Quote: the duration of the task (which is how long they'll be unavailable for other tasks), and in the situation I envisaged, the attributes of the other members of the proposed group.


This clearly indicates that agents should be trying to maximise their expected future rewards (or minimise expected future losses) given the population of agents (at least in the ideal solution) and the set of tasks.

Okay... more info needed...

Can tasks run concurrently, or are all tasks sequentially ordered? If the latter then each agent has the option to participate in each task. If the former, then an agent must always choose a schedule of tasks to participate in such that this schedule maximises some quantity over this set of tasks. If you force them to a choose a task at any given time they are free and evaluate their potential risk/reward based only on that task, you will not be able to make any assurances about the long term viability of agents (nor encode this in their solutions). They need to be able to consider what it is they are giving up by accepting the following task in order to make rational decisions.

You should probably use a discounted future reward model of expected utility.

Fundamentally you still have one problem: each agent can only make a decision after all other agents have made a decision. You can get around this by asking agents to list their preferences for tasks. Once preferences have been given (and this might be random as a first assignment, or based on some agent attribute) each agent can assess the potential risk/reward of each task more accurately and re-order their preferences. You could iterate this and hope for a stable solution, or merely limit each agent to a finite number of changes they can make to their preference list.
chance of success: 1 - 1/2^numAgents
reward: prize / numAgents
Expected Reward: (1 - 1/2^numAgents) * (prize / numAgents)

As agents are added, the difference between Reward and Expected Reward diminishes. For example, using a prize value of 1:

Agents Reward Expected Difference
1______1.0_____0.5______0.5
2______0.5_____0.375____0.125
3______0.3333__0.2917___0.0417
4______0.25____0.2344___0.0156
5______0.2_____0.1938___0.0063

So it might make sense to have the agents choose to join, based on the difference. Individual risk tolerances could be measured as the maximum difference that an agent is willing to accept.

[Edited by - AngleWyrm on February 19, 2008 8:28:07 PM]
--"I'm not at home right now, but" = lights on, but no ones home
Advertisement
Quote: Original post by Timkin
Quote: Original post by Kylotan
...I want agents to form groups where the size of the group reflects their wish to reduce their perceived risk (while not making the payoff trivial).


How do you define 'risk' for your agents (quantify it please).


Consider it the probability of dying during the course of that task, which is proportional to the probability of the group failing the task. I can't give you an exact quantity because I don't know it - that's part of my problem. I mentioned Pascal's Wager as an example of why it might be hard to quantify it.

Quote: Can tasks run concurrently, or are all tasks sequentially ordered? If the latter then each agent has the option to participate in each task. If the former, then an agent must always choose a schedule of tasks to participate in such that this schedule maximises some quantity over this set of tasks.


It's the former. At any given time, there can be a variety of tasks ongoing, each with several people assigned exclusively to that task.

Quote: If you force them to a choose a task at any given time they are free and evaluate their potential risk/reward based only on that task, you will not be able to make any assurances about the long term viability of agents (nor encode this in their solutions). They need to be able to consider what it is they are giving up by accepting the following task in order to make rational decisions.


It is possible to present them with all the current tasks on offer. They can also judge or rank their own suitability for/interest in them.

They're not forced to take a task whenever one is available for them; they can ignore tasks entirely, if they don't suit. I have to balance the game so that this doesn't happen too often.

Quote: Fundamentally you still have one problem: each agent can only make a decision after all other agents have made a decision.


I really must stress that it's not important for me to have each agent acting individually here. If a top-down system presents some sort of resolution that is considered likely to be accepted - eg. "People A, B, D, and G join Task 1" - and then those people get to accept or veto this, that's fine, providing I can come up with resolutions that have a decent chance of being accepted. I don't want any potential solutions to allocating people to groups to be held back by the notion of each agent needing to act totally individually.

Quote: Once preferences have been given (and this might be random as a first assignment, or based on some agent attribute) each agent can assess the potential risk/reward of each task more accurately and re-order their preferences. You could iterate this and hope for a stable solution, or merely limit each agent to a finite number of changes they can make to their preference list.


Hmm. Unless the reward levels differ significantly between tasks, I would expect the individual agents' preferences will spread them out fairly evenly. But I suppose that it wouldn't take much deviation from an even spread for one or two tasks to become more attractive though, and on the subsequent iterations maybe that would draw others in.

However, I still don't have a criteria for deciding when a group is 'good enough' anyway, since I don't know where this risk/reward crossover is (or if it even exists yet). I can't just make the best groups that are on offer - I have to be able to people join no groups at all, if none meet the agents' criteria.
Quote: No, I want agents to form groups where the size of the group reflects their wish to reduce their perceived risk (while not making the payoff trivial).


I don't think that works. If an agent believes there is any chance of death, it may not want to be involved in any group of any size that participates in that activity.

You also assume that the "threat level" applies to agents on a group basis. That may be particular to your game. But have you considered cases where that may not be so? For instance, living on a major fault-line or in Tornado Alley. It doesn't matter how many people live there already, there is no protection in being part of a group when a tornado runs over you.
If every task has a base risk and reward, you know what the expected value is from each task. If all of your agents were super-smart with perfect information, they would only go on the tasks with the highest possible expected values. If we assume that isn't the case, then we have a fairly simple algorithm:

1. Give each agent an attribute (call it 'wisdom') valued between 0 and 1.

2. Add up all of the expected values from the tasks on offer.

3. Map each task to a range of the total based on its expected value with the lowest expected value at the start leading up to the highest at the end.

4. Multiply the agent's wisdom by the total to find which task it picks.

The tasks with bigger expectation values then get more agents. Wiser agents will go for tasks with bigger expectation values.

We don't need to worry about a change in numbers per task because while the reward will go down, so will the risk. Unless the relationship is not linear of course. If the reward goes down disproportionately to risk, then you can always rearrange afterwards, biasing toward smaller groups - or larger, in the reverse case.

You do need to build the risk of death into the expected value though, like Sneftel did. It's not really the same as Pascal's Wager, because an eternity in hell is a lot worse than death, which is coming to us in any case. A life without reward should be worthless to your agents, in which case rewards can always be valued against death.

Quote: Original post by leiavoia
Quote: No, I want agents to form groups where the size of the group reflects their wish to reduce their perceived risk (while not making the payoff trivial).


I don't think that works. If an agent believes there is any chance of death, it may not want to be involved in any group of any size that participates in that activity.


In real life, that's not an issue for normal people, who do things that carry a small risk of death all the time (eg. crossing the road). I will have more dangerous tasks, but braver characters. The agents will want to go on some tasks.

Quote: You also assume that the "threat level" applies to agents on a group basis. That may be particular to your game.


It does. (Mostly.)

Quote: But have you considered cases where that may not be so?


They aren't in my game. :)

Quote: Original post by Argus2
If every task has a base risk and reward, you know what the expected value is from each task. If all of your agents were super-smart with perfect information, they would only go on the tasks with the highest possible expected values. [...]
We don't need to worry about a change in numbers per task because while the reward will go down, so will the risk. Unless the relationship is not linear of course. If the reward goes down disproportionately to risk, then you can always rearrange afterwards, biasing toward smaller groups - or larger, in the reverse case.


I do need to worry about the change of numbers per task, because that's the entire problem! The number of people per task is precisely the thing I'm trying to set. Simply adding any interested person to the task means the groups could grow infinitely, and if there weren't many interested people, there's no guarantee the groups would be big enough to make it safe in the first place.

I don't think I'm making it very clear here what I'm trying to do, because I seem to be seeing the same advice repeated which just doesn't apply to my problem.

I need to fill groups for certain tasks. The group needs to have 'enough' people to have made the task safe. The group needs to have 'few enough' people for the reward to be worthwhile. Simply mapping characters to tasks is trivial. The problem is mapping a decent quantity of them in each case, while giving the impression that they're making reasonable decisions about which tasks to do. Just to give some ball-park figures, I want 'enough' to be something like N>=3, and 'few enough' to be N<=12, but they will vary from task to task depending on how dangerous they are.

If I just add people who are interested in a task, then where do I stop? Assuming I had a large supply of eligible people, N would end up being 50 or 100, unless I add some arbitrary limit, at which point N would always be that limit. I need some sort of curve where N naturally limits itself somewhere between 5 and 15 depending on the task, not on me deciding to clamp it.

Quote: You do need to build the risk of death into the expected value though, like Sneftel did. It's not really the same as Pascal's Wager, because an eternity in hell is a lot worse than death, which is coming to us in any case. A life without reward should be worthless to your agents, in which case rewards can always be valued against death.


Yes, I always planned on building the risk of death in. I was just unsure at how to value it. At the moment I think a large fixed cost for death, plus a smaller linear cost for task duration should be enough.
Advertisement
Quote: Original post by Kylotan
I need to fill groups for certain tasks. The group needs to have 'enough' people to have made the task safe. The group needs to have 'few enough' people for the reward to be worthwhile. Simply mapping characters to tasks is trivial. The problem is mapping a decent quantity of them in each case, while giving the impression that they're making reasonable decisions about which tasks to do. Just to give some ball-park figures, I want 'enough' to be something like N>=3, and 'few enough' to be N<=12, but they will vary from task to task depending on how dangerous they are.



To vary from task to task depending on how dangerous, change the value for risk, previously stated as being proportional to chance of success.

The green line is then proportional to an agent's desire to join the group.

[Edited by - AngleWyrm on February 24, 2008 12:57:55 AM]
--"I'm not at home right now, but" = lights on, but no ones home
Yeah, that's the shape of the graph I expected, and which I got when I plotted it, and which is unfortunately not really much use to me. What I want is a graph like the one Sneftel had on the first page, but even with a fixed participation cost, I didn't get the same sort of results.

I'm also still interested in iterative methods such as the one Timkin proposed, but without a extrema in the expected value function somewhere other than at the limits, I don't see how it will work.
Quote: Original post by Kylotan
Yeah, that's the shape of the graph I expected, and which I got when I plotted it, and which is unfortunately not really much use to me. What I want is a graph like the one Sneftel had on the first page, but even with a fixed participation cost, I didn't get the same sort of results.[...]
If you graph the parts individually, you'll get a graph as AngleWyrm showed, but if you graph it the way Sneftel mentioned (psucc,n*rsucc,n + pfail,n*rfail,n), you should get a graph similar to the one he showed. If you're not, double check the signs of everything and try different values for the variables. The expected cost of failure diminishes much faster than the expected reward does since it is 1/2N vs 1/N, which gives you the shape he showed.
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
Yeah, true enough, it was my mistake, and after I rectified my SciLab code, I came out with a graph similar to Sneftel's. It looks like the next step is to tweak the cost of participation and the cost of death so that I can vary the probability of success (which is typically 1.0 - (x^n), where 0.0 < x < 1.0. Examples above used 0.5) to simulate task difficulty, which in turn should tip the balance towards smaller or larger groups.

My problem still isn't entirely addressed though, because the 'x' above is actually based on the abilities of the participants, and so it's only possible to estimate how dangerous (and therefore, how rewarding) a task will be if you get typical participants, rather than get an accurate estimate. Perhaps this is where the iterative method would come in, starting with an estimation, allocating people to their preferred tasks based on that, re-calculating how rewarding the task is for each person in that group, moving some to other groups, and so on. Not sure if this is stable, however. I'll try and give it a go. Any other suggestions would be appreciated though!

This topic is closed to new replies.

Advertisement