Advertisement

Decision theory, high risk scenarios, etc

Started by February 15, 2008 10:34 AM
30 comments, last by Kylotan 16 years, 9 months ago
I have a system where several agents have to be arranged into groups to attempt certain abstract tasks, for which a fixed reward is on offer to be shared among the group members if successful. The mathematics of this are chosen so that each additional person contributes less towards the chance of success than the previous one. eg. If one person attempted Task A, he'd have a 50% chance of success, if two people attempted it, the chance would rise to 75%, if three attempted it the chance would become 87.5%, etc. Obviously this means that, given that the expected value for an agent participating in that group drops as the group grows bigger, so they would always opt for smaller groups - a 25% chance of $10 is better than a 90% chance of $1, for example. One factor not yet mentioned though is that these tasks carry risk; failed tasks can be considered dangerous and an agent may be destroyed as a result of that task. Similar to the conclusions of Pascal's Wager, one could consider that any potential reward, no matter how high, compares unfavourably to the chance of death - arguably an infinitely negative penalty - and that therefore the rational decision swings back the other way; all agents should opt for the largest group possible to reduce the risk (assuming they can't just avoid attempting the task altogether). Obviously that is an extreme position, as we daily perform acts with a minuscule but non-zero chance of fatality. But how do we rationalise this? For gameplay reasons, I don't want the agents to always form the smallest group (to earn the most reward) nor to always form the biggest group (to minimise risk), but to come to tradeoffs, based on the amount of reward and the amount of risk. Intuitively, this is how I think humans act, but mathematically, it's hard to see where those two curves cross over. Basically my question therefore is, is there a good way to model this sort of decision making process where people decide that the chance of a high penalty has been reduced to an acceptable level? Can the penalty in this case be adequately measured, and thus compared directly against the reward? (Edit: reworded first sentence to remove emphasis on each agent and more on the group.) [Edited by - Kylotan on February 15, 2008 5:42:59 PM]
Quote: Original post by Kylotan
Obviously that is an extreme position, as we daily perform acts with a minuscule but non-zero chance of fatality. But how do we rationalise this?

We don't. But if you want a game-theoretical explanation, we place a finite negative utility on death.

Now, I'm a little confused as to the exact mechanics of this game. Do all agents announce their decision to participate or not participate simultaneously, do they announce one at a time, or do they have multiple (or infinite) rounds in which to change their mind? That last category can become extremely hairy, as there's no guarantee that the agents will ever make up their minds.
Advertisement
Apologies if some of the initial post is vague or lacking in information. I wasn't sure what was relevant and what was not, since I'm really learning as I explore this myself. I have a game design and am trying to work out the mechanics to fit it.

A finite negative utility sounds ok, and is what I expected really. But is there some way of picking a decent value for this, possibly based on predicted future rewards or something?

Currently, the manner in which the agents decide to participate in groups is not finalised. At a given point, there are N agents available, and M tasks available to perform (though it is possible that none will be performed). The use of the term agent is perhaps misleading, as I had envisaged having an external omniscient system coordinating the grouping of the agents. Assume that this is a genetic algorithm that creates random candidate groups, and the fitness of the group is how much each agent wishes to be part of it. Thus the agents will veto proposed groupings that they do not think are worthwhile, based on their own assessment of the chance of success, and the relative costs of success or defeat. I just need to work out how to determine those value on a per-agent basis.
Well, let's put some exactness in, then. Here's the formulae I'm using:

$ = total reward (to be split) from success
-D = punishment for failure (death)
-C = cost of participation, regardless of outcome

pfail,n = 1/(2^n) = probability of failure with n agents
psucc,n = 1-1/(2^n) = probability of success with n agents
rfail,n = -D - C = reward from failure with n agents
rsucc,n = $/n - C = reward from success with n agents

Epart,n = psucc,n*rsucc,n + pfail,n*rfail,n = expected reward from being the nth agent
to participate
Eabst,n = 0 = expected reward from abstaining

Now, assume that in round 1, each agent in turn announces his decision to participate or not participate. After all have announced, all have a chance to change their minds (again in order). If any do, it can be assumed that the game diverges and their is no solution. If nobody changes their mind, the game succeeds.

From some Matlab scribbling, it looks relatively easy to come up with constants that produce reasonable results. $=200, D=100, C=10, for instance, make it a good bet to be the 20th agent, and a bad bet to be the 21st. In this case, the game does not diverge; all of the first 20 agents sign on, nobody else does, and nobody changes their mind (note that by that time, each participating agent's expected reward is miniscule but positive). If C=0, the solution is again non-divergent, but boring; either everybody signs on, or nobody wants to be the first to sign on.

If already-participating agents can veto other agents from signing on, the signup simply stops near the maximum. For the values above, that's 1 or 2 agents.
Ok, that's quite helpful. I hadn't thought about a constant cost of participation, but I suppose it makes sense that it's needed, since the expected reward function curve doesn't have any local maxima and will therefore, as you add participants, either always grow, always shrink, or always remain unchanged. (Is that right? Or am I making this up? My maths is not too hot.)

I don't have an actual constant cost of participation, but I can come up with some arbitrary value based on the fact that each task has a duration, during which you are unavailable for other tasks. I would assume that a constant proportional to this duration would be sufficient.

Ideally I want the groups to have between 1 and 10 members. Given that the probability of success/failure will vary depending on the members and the task (ie. the 2 in the formula is liable to change, though the result will always converge on 1 as the number of agents increases), so I assume I would have to tweak that participation cost constant to yield groups of this size.

One thing in your prototype that doesn't seem to ring true with what I want to do however, is that I would expect most tasks to be so risky that no individual agent would choose to do them alone. Hence, the auto-generation of potential groups, based on some heuristic I'd come up with. If it's vetoed, it's thrown away and another chosen, a group for the task is formed or it becomes apparent that one won't be formed. Does increasing D significantly affect this?

It's worth noting that optimising the reward per agent is desirable but not necessary. Since there are massive amounts of permutations, I don't expect or require anything resembling the best solution. Mainly I'm just looking for a method that creates groups of a size appropriate to the task's reward and risk, where each member of the group feels that it is in their benefit to be there. It has to look like a decision that each group member might reasonably make.
Quote: Original post by Kylotan
Ok, that's quite helpful. I hadn't thought about a constant cost of participation, but I suppose it makes sense that it's needed, since the expected reward function curve doesn't have any local maxima and will therefore, as you add participants, either always grow, always shrink, or always remain unchanged. (Is that right? Or am I making this up? My maths is not too hot.)
The curve does have exactly one maximum for reasonable values. It also has a horizontal asymptote to the right. The purpose of C here is to make that asymptote at a negative value (so not everybody plays).
Quote: I don't have an actual constant cost of participation, but I can come up with some arbitrary value based on the fact that each task has a duration, during which you are unavailable for other tasks. I would assume that a constant proportional to this duration would be sufficient.
You can alternatively have a cost of success only, which produces a curve that looks about the same.

Quote: One thing in your prototype that doesn't seem to ring true with what I want to do however, is that I would expect most tasks to be so risky that no individual agent would choose to do them alone. Hence, the auto-generation of potential groups, based on some heuristic I'd come up with. If it's vetoed, it's thrown away and another chosen, a group for the task is formed or it becomes apparent that one won't be formed. Does increasing D significantly affect this?

Yes, with a D much higher than $, the expected reward for the first agent is far, far negative. If a few agents can form a cabal which agrees to participate together then this is circumvented (this is an area of game theory which I'm not familiar with, though).

Quote: Mainly I'm just looking for a method that creates groups of a size appropriate to the task's reward and risk, where each member of the group feels that it is in their benefit to be there. It has to look like a decision that each group member might reasonably make.

Note that, as with many game theory tasks, the self-interest solution I've posed results in a bunch of agents whose expected utility is barely greater than zero. If you aren't interested in dry game theory stuff and instead just want that sort of a grouping, I do suggest you simply find the maximum of the curve (the solution where participating agents can veto other agents).
Advertisement
IMHO, in real life, we do not put infinite negative value on death, since, we come accross suicides or self sacrifice things (maybe a mother for her children etc.)

but, the chances we keep taking despite death possibility may not be as high as we think. you are taking a risk of death while crossing a street or driving a car. but what is the % of death? imho quite low. (some things it is higher on plane and do not get on planes) maybe less that 1th in 10 million or less.

would you do something which has a higher death possibility? lets say 10%? would you involve a group of 10 people, which one of them be chosen and killed and remaining ones will get lets say 1000$? would you accept this for 1,000,000$ ? some people tends to accept second one. (which i also would give a serious thought) or would you accept if it was not 10 (10%) people but 1000 (0.1%) people or 10,000 (0.001%) people?

so, what i am trying to say?
1. death is not infinite.
2. if you mimick real life, your agents will try to make groups to decrease the ratio to an acceptable level. (very low) how to solve it? (maybe use a logarithmic/quadratic ratio calculation so possibility will decrease with less number of agents)

Quote: one could consider that any potential reward, no matter how high, compares unfavourably to the chance of death - arguably an infinitely negative penalty -

Only if you are otherwise immortal. But if you're going to die later on in any case, then an agent can just try to maximise his rewards over his expected lifetime.

If the chance of death is directly related to the expectation value of a task, then they can just pick tasks randomly. If it isn't, and I'm immortal, then I wait for the best possible ratio every time. If it isn't, and I'm not immortal, then the problem becomes interesting. Model this as a small chance to die when not accepting a task?
Quote: Obviously that is an extreme position, as we daily perform acts with a minuscule but non-zero chance of fatality. But how do we rationalise this?

We rationalise this because:
a. we're likely to die in the future anyway.
b. since we don't have perfect information, not performing those acts may also bring us closer to death.

Quote: Original post by Sneftel
The curve does have exactly one maximum for reasonable values. It also has a horizontal asymptote to the right. The purpose of C here is to make that asymptote at a negative value (so not everybody plays).


Ok, I think my problem is that I don't have an intuitive picture of what this curve looks like, and I have no tool to plot it on. Is it highest where N=1, dropping down as N increases, approaching -C in the limit(N->inf)?

Quote: Yes, with a D much higher than $, the expected reward for the first agent is far, far negative. If a few agents can form a cabal which agrees to participate together then this is circumvented (this is an area of game theory which I'm not familiar with, though).


This implies that the curve rises though, so I think it's clear I don't quite understand what is going on. :(

Either way, I want to keep things simple by eliminating any iterative aspect to the decision making. Basically the character should see the group offered, and be able to say how much it suits him (ie. report their expected value, I think). They don't have to weigh it against the same group with 1 fewer person, or 1 more person, or potential other groups that may be offered to them if they refuse this one.

Perhaps if I retract my original statement of "arrange themselves in groups" and replace it with "be arranged into groups", it will make things clearer. In other aspects of the game, they will sense and act autonomously, but at this point, I'm not interested in having them maximise their reward, just in forming groups of reasonable sizes where they all anticipate a positive reward, taking into account that chance of a massive cost.

Unfortunately this seems to leave me back at the point where the optimum group size is either 1 or infinity, depending on whether the cost or the benefit diminishes more quickly. If I could work out how to use SciPad, I'd plot things for myself and work something out. :)
Quote: Original post by Argus2
Quote: one could consider that any potential reward, no matter how high, compares unfavourably to the chance of death - arguably an infinitely negative penalty -

Only if you are otherwise immortal. But if you're going to die later on in any case, then an agent can just try to maximise his rewards over his expected lifetime.


I think the idea is that since you don't know how many rewards you will get in your future lifetime, you don't know how much you could lose by dying. Therefore death is modelled as an infinitely high cost, since it always has to be greater than your potential future rewards. Obviously as soon as you know your lifespan or know the limit to the rewards you can obtain within it, the cost of death can become proportional to that.

Quote: If the chance of death is directly related to the expectation value of a task, then they can just pick tasks randomly. If it isn't, and I'm immortal, then I wait for the best possible ratio every time. If it isn't, and I'm not immortal, then the problem becomes interesting. Model this as a small chance to die when not accepting a task?


I'm not sure what you're suggesting here. The chance of death is directly related to the expected value of a task, but the task's reward is also directly related, as is the number of people participating in the task. So you can't just pick them randomly as there are more factors than just the risk. Either way, I don't need to find the best ratio for a given person. I need to find curves so that the better ratios are greater or equal to 1 and less than or equal to 10 or 15 or so, and this needs to vary based on the task's reward and the task's risk.

This topic is closed to new replies.

Advertisement