Decision theory, high risk scenarios, etc

Ben Sizer · 2008-02-26T05:39:20

I have a system where several agents have to be arranged into groups to attempt certain abstract tasks, for which a fixed reward is on offer to be shared among the group members if successful. The mathematics of this are chosen so that each additional person contributes less towards the chance of success than the previous one. eg. If one person attempted Task A, he'd have a 50% chance of success, if two people attempted it, the chance would rise to 75%, if three attempted it the chance would become 87.5%, etc. Obviously this means that, given that the expected value for an agent participating in that group drops as the group grows bigger, so they would always opt for smaller groups - a 25% chance of $10 is better than a 90% chance of $1, for example. One factor not yet mentioned though is that these tasks carry risk; failed tasks can be considered dangerous and an agent may be destroyed as a result of that task. Similar to the conclusions of Pascal's Wager, one could consider that any potential reward, no matter how high, compares unfavourably to the chance of death - arguably an infinitely negative penalty - and that therefore the rational decision swings back the other way; all agents should opt for the largest group possible to reduce the risk (assuming they can't just avoid attempting the task altogether). Obviously that is an extreme position, as we daily perform acts with a minuscule but non-zero chance of fatality. But how do we rationalise this? For gameplay reasons, I don't want the agents to always form the smallest group (to earn the most reward) nor to always form the biggest group (to minimise risk), but to come to tradeoffs, based on the amount of reward and the amount of risk. Intuitively, this is how I think humans act, but mathematically, it's hard to see where those two curves cross over. Basically my question therefore is, is there a good way to model this sort of decision making process where people decide that the chance of a high penalty has been reduced to an acceptable level? Can the penalty in this case be adequately measured, and thus compared directly against the reward? (Edit: reworded first sentence to remove emphasis on each agent and more on the group.) [Edited by - Kylotan on February 15, 2008 5:42:59 PM]

Artificial Intelligence Programming

Started by Kylotan February 15, 2008 10:34 AM

30 comments, last by Kylotan 16 years, 9 months ago

Timkin

864

February 19, 2008 07:23 PM

Quote: Original post by Kylotan
...I want agents to form groups where the size of the group reflects their wish to reduce their perceived risk (while not making the payoff trivial).

How do you define 'risk' for your agents (quantify it please).

Quote: the duration of the task (which is how long they'll be unavailable for other tasks), and in the situation I envisaged, the attributes of the other members of the proposed group.

This clearly indicates that agents should be trying to maximise their expected future rewards (or minimise expected future losses) given the population of agents (at least in the ideal solution) and the set of tasks.

Okay... more info needed...

Can tasks run concurrently, or are all tasks sequentially ordered? If the latter then each agent has the option to participate in each task. If the former, then an agent must always choose a schedule of tasks to participate in such that this schedule maximises some quantity over this set of tasks. If you force them to a choose a task at any given time they are free and evaluate their potential risk/reward based only on that task, you will not be able to make any assurances about the long term viability of agents (nor encode this in their solutions). They need to be able to consider what it is they are giving up by accepting the following task in order to make rational decisions.

You should probably use a discounted future reward model of expected utility.

Fundamentally you still have one problem: each agent can only make a decision after all other agents have made a decision. You can get around this by asking agents to list their preferences for tasks. Once preferences have been given (and this might be random as a first assignment, or based on some agent attribute) each agent can assess the potential risk/reward of each task more accurately and re-order their preferences. You could iterate this and hope for a stable solution, or merely limit each agent to a finite number of changes they can make to their preference list.

AngleWyrm

554

February 19, 2008 07:28 PM

chance of success: 1 - 1/2^numAgents
reward: prize / numAgents
Expected Reward: (1 - 1/2^numAgents) * (prize / numAgents)

As agents are added, the difference between Reward and Expected Reward diminishes. For example, using a prize value of 1:

Agents Reward Expected Difference
1______1.0_____0.5______0.5
2______0.5_____0.375____0.125
3______0.3333__0.2917___0.0417
4______0.25____0.2344___0.0156
5______0.2_____0.1938___0.0063

So it might make sense to have the agents choose to join, based on the difference. Individual risk tolerances could be measured as the maximum difference that an agent is willing to accept.

[Edited by - AngleWyrm on February 19, 2008 8:28:07 PM]

--"I'm not at home right now, but" = lights on, but no ones home

Kylotan

Author

10,513

February 20, 2008 04:29 AM

Quote: Original post by Timkin
Quote: Original post by Kylotan
...I want agents to form groups where the size of the group reflects their wish to reduce their perceived risk (while not making the payoff trivial).

How do you define 'risk' for your agents (quantify it please).

Consider it the probability of dying during the course of that task, which is proportional to the probability of the group failing the task. I can't give you an exact quantity because I don't know it - that's part of my problem. I mentioned Pascal's Wager as an example of why it might be hard to quantify it.

Quote: Can tasks run concurrently, or are all tasks sequentially ordered? If the latter then each agent has the option to participate in each task. If the former, then an agent must always choose a schedule of tasks to participate in such that this schedule maximises some quantity over this set of tasks.

It's the former. At any given time, there can be a variety of tasks ongoing, each with several people assigned exclusively to that task.

Quote: If you force them to a choose a task at any given time they are free and evaluate their potential risk/reward based only on that task, you will not be able to make any assurances about the long term viability of agents (nor encode this in their solutions). They need to be able to consider what it is they are giving up by accepting the following task in order to make rational decisions.

It is possible to present them with all the current tasks on offer. They can also judge or rank their own suitability for/interest in them.

They're not forced to take a task whenever one is available for them; they can ignore tasks entirely, if they don't suit. I have to balance the game so that this doesn't happen too often.

Quote: Fundamentally you still have one problem: each agent can only make a decision after all other agents have made a decision.

I really must stress that it's not important for me to have each agent acting individually here. If a top-down system presents some sort of resolution that is considered likely to be accepted - eg. "People A, B, D, and G join Task 1" - and then those people get to accept or veto this, that's fine, providing I can come up with resolutions that have a decent chance of being accepted. I don't want any potential solutions to allocating people to groups to be held back by the notion of each agent needing to act totally individually.

Quote: Once preferences have been given (and this might be random as a first assignment, or based on some agent attribute) each agent can assess the potential risk/reward of each task more accurately and re-order their preferences. You could iterate this and hope for a stable solution, or merely limit each agent to a finite number of changes they can make to their preference list.

Hmm. Unless the reward levels differ significantly between tasks, I would expect the individual agents' preferences will spread them out fairly evenly. But I suppose that it wouldn't take much deviation from an even spread for one or two tasks to become more attractive though, and on the subsequent iterations maybe that would draw others in.

However, I still don't have a criteria for deciding when a group is 'good enough' anyway, since I don't know where this risk/reward crossover is (or if it even exists yet). I can't just make the best groups that are on offer - I have to be able to people join no groups at all, if none meet the agents' criteria.

leiavoia

960

February 20, 2008 01:07 PM

Quote: No, I want agents to form groups where the size of the group reflects their wish to reduce their perceived risk (while not making the payoff trivial).

I don't think that works. If an agent believes there is any chance of death, it may not want to be involved in any group of any size that participates in that activity.

You also assume that the "threat level" applies to agents on a group basis. That may be particular to your game. But have you considered cases where that may not be so? For instance, living on a major fault-line or in Tornado Alley. It doesn't matter how many people live there already, there is no protection in being part of a group when a tornado runs over you.

Argus2

140

February 20, 2008 02:25 PM

If every task has a base risk and reward, you know what the expected value is from each task. If all of your agents were super-smart with perfect information, they would only go on the tasks with the highest possible expected values. If we assume that isn't the case, then we have a fairly simple algorithm:

1. Give each agent an attribute (call it 'wisdom') valued between 0 and 1.

2. Add up all of the expected values from the tasks on offer.

3. Map each task to a range of the total based on its expected value with the lowest expected value at the start leading up to the highest at the end.

4. Multiply the agent's wisdom by the total to find which task it picks.

The tasks with bigger expectation values then get more agents. Wiser agents will go for tasks with bigger expectation values.

We don't need to worry about a change in numbers per task because while the reward will go down, so will the risk. Unless the relationship is not linear of course. If the reward goes down disproportionately to risk, then you can always rearrange afterwards, biasing toward smaller groups - or larger, in the reverse case.

You do need to build the risk of death into the expected value though, like Sneftel did. It's not really the same as Pascal's Wager, because an eternity in hell is a lot worse than death, which is coming to us in any case. A life without reward should be worthless to your agents, in which case rewards can always be valued against death.

Kylotan

Author

10,513

February 21, 2008 04:25 AM

Quote: Original post by leiavoia
Quote: No, I want agents to form groups where the size of the group reflects their wish to reduce their perceived risk (while not making the payoff trivial).

I don't think that works. If an agent believes there is any chance of death, it may not want to be involved in any group of any size that participates in that activity.

In real life, that's not an issue for normal people, who do things that carry a small risk of death all the time (eg. crossing the road). I will have more dangerous tasks, but braver characters. The agents will want to go on some tasks.

Quote: You also assume that the "threat level" applies to agents on a group basis. That may be particular to your game.

It does. (Mostly.)

Quote: But have you considered cases where that may not be so?

They aren't in my game. :)

Quote: Original post by Argus2
If every task has a base risk and reward, you know what the expected value is from each task. If all of your agents were super-smart with perfect information, they would only go on the tasks with the highest possible expected values. [...]
We don't need to worry about a change in numbers per task because while the reward will go down, so will the risk. Unless the relationship is not linear of course. If the reward goes down disproportionately to risk, then you can always rearrange afterwards, biasing toward smaller groups - or larger, in the reverse case.

I do need to worry about the change of numbers per task, because that's the entire problem! The number of people per task is precisely the thing I'm trying to set. Simply adding any interested person to the task means the groups could grow infinitely, and if there weren't many interested people, there's no guarantee the groups would be big enough to make it safe in the first place.

I don't think I'm making it very clear here what I'm trying to do, because I seem to be seeing the same advice repeated which just doesn't apply to my problem.

I need to fill groups for certain tasks. The group needs to have 'enough' people to have made the task safe. The group needs to have 'few enough' people for the reward to be worthwhile. Simply mapping characters to tasks is trivial. The problem is mapping a decent quantity of them in each case, while giving the impression that they're making reasonable decisions about which tasks to do. Just to give some ball-park figures, I want 'enough' to be something like N>=3, and 'few enough' to be N<=12, but they will vary from task to task depending on how dangerous they are.

If I just add people who are interested in a task, then where do I stop? Assuming I had a large supply of eligible people, N would end up being 50 or 100, unless I add some arbitrary limit, at which point N would always be that limit. I need some sort of curve where N naturally limits itself somewhere between 5 and 15 depending on the task, not on me deciding to clamp it.

Quote: You do need to build the risk of death into the expected value though, like Sneftel did. It's not really the same as Pascal's Wager, because an eternity in hell is a lot worse than death, which is coming to us in any case. A life without reward should be worthless to your agents, in which case rewards can always be valued against death.

Yes, I always planned on building the risk of death in. I was just unsure at how to value it. At the moment I think a large fixed cost for death, plus a smaller linear cost for task duration should be enough.

AngleWyrm

554

February 21, 2008 01:57 PM

Quote: Original post by Kylotan
I need to fill groups for certain tasks. The group needs to have 'enough' people to have made the task safe. The group needs to have 'few enough' people for the reward to be worthwhile. Simply mapping characters to tasks is trivial. The problem is mapping a decent quantity of them in each case, while giving the impression that they're making reasonable decisions about which tasks to do. Just to give some ball-park figures, I want 'enough' to be something like N>=3, and 'few enough' to be N<=12, but they will vary from task to task depending on how dangerous they are.

To vary from task to task depending on how dangerous, change the value for risk, previously stated as being proportional to chance of success.

The green line is then proportional to an agent's desire to join the group.

[Edited by - AngleWyrm on February 24, 2008 12:57:55 AM]

--"I'm not at home right now, but" = lights on, but no ones home

Kylotan

Author

10,513

February 21, 2008 02:03 PM

Yeah, that's the shape of the graph I expected, and which I got when I plotted it, and which is unfortunately not really much use to me. What I want is a graph like the one Sneftel had on the first page, but even with a fixed participation cost, I didn't get the same sort of results.

I'm also still interested in iterative methods such as the one Timkin proposed, but without a extrema in the expected value function somewhere other than at the limits, I don't see how it will work.

Extrarius

1,412

February 22, 2008 09:09 AM

Quote: Original post by Kylotan
Yeah, that's the shape of the graph I expected, and which I got when I plotted it, and which is unfortunately not really much use to me. What I want is a graph like the one Sneftel had on the first page, but even with a fixed participation cost, I didn't get the same sort of results.[...]

If you graph the parts individually, you'll get a graph as AngleWyrm showed, but if you graph it the way Sneftel mentioned (psucc,n*rsucc,n + pfail,n*rfail,n), you should get a graph similar to the one he showed. If you're not, double check the signs of everything and try different values for the variables. The expected cost of failure diminishes much faster than the expected reward does since it is 1/2N vs 1/N, which gives you the shape he showed.

"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk

Kylotan

Author

10,513

February 25, 2008 10:53 AM

Yeah, true enough, it was my mistake, and after I rectified my SciLab code, I came out with a graph similar to Sneftel's. It looks like the next step is to tweak the cost of participation and the cost of death so that I can vary the probability of success (which is typically 1.0 - (x^n), where 0.0 < x < 1.0. Examples above used 0.5) to simulate task difficulty, which in turn should tip the balance towards smaller or larger groups.

My problem still isn't entirely addressed though, because the 'x' above is actually based on the abilities of the participants, and so it's only possible to estimate how dangerous (and therefore, how rewarding) a task will be if you get typical participants, rather than get an accurate estimate. Perhaps this is where the iterative method would come in, starting with an estimation, allocating people to their preferred tasks based on that, re-calculating how rewarding the task is for each person in that group, moving some to other groups, and so on. Not sure if this is stable, however. I'll try and give it a go. Any other suggestions would be appreciated though!

Decision theory, high risk scenarios, etc

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Decision theory, high risk scenarios, etc

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines