i'm reposting this from comp.ai.games 'cause i like to see my name on lots of forums
It's an
attempt to come up with a standard format for documenting various techniques i use in game AI
Problem Area: action selection
AI Type: decision making, learning (secondary),
personality (secondary)
Detail Level: mid-level
Technique: matching law
Assumptions: Options are relatively equal
Example Uses: sports: choosing a shot type
FPS: choosing a weapon
RPG: choosing a spell
RTS: choosing a build unit type
Explanation
-----------
In 1962, psychologist Herrnstein discovered that, when an animal is presented with two options, the relative rate of responding (choosing) of a given option is equal to the relative rate of reinforcement (success) of that option. This was called the matching law and written as RA/RA+RB = rA/rA+rB. In 1974, psychologist Baum refined the formula by adding the notion of bias and sensitivity, resulting in the new equation RA/RB = b(rA/rB)^s. The matching law has been repeatedly upheld in human and non-human animal studies
In 2000, Vollmer and Bourret studied the shot selection of 26 college basketball players. They found that how often a player attempted a three-point shot or a two-point shot was proportional to the relative rates of reinforcement (success rates) for each. If a player successfully completed 10% more two-point shots than three-point shots, the player would should 10% more two-point shots than three-point shots
Variables
---------
RA/RA+RB = b(rA/rA+rB)^s = matching law
RA = Rate of response for option A.
How often option A is chosen.
This is a counter
RA/RA+RB = Relative rate of response for option A.
The percentage of time option A is chosen from
options A&B
rA = Rate of reinforcement for option A.
The percentage of time choosing option A has lead
to a good result
rA/rA+rB = Relative rate of response for option A.
The percentage of overall successes that have
come from choosing option A
b = Response bias.
A preference for a given option or outcome.
b>1 means prefers
s = Sensitivity.
How much better an option must be to switch to it.
s<1 means must be better than the other option(s).
A major cause for sensitivity is switching cost.
In an FPS, switching cost would be the time spent
unarmed while switching to the new weapon
Game Example
------------
A wizard is 30 meters away from a group of orcs. He has three third level spells that are appropriate to use - flamestrike, iceblade and stonestorm. Question: which spell should the wizard cast?
Assume that the wizard has successfully hit his enemies 3/10 times with flamestrike, 2/4 times with iceblade and 6/7 times with stonestorm.
r(flamestrike) = 3/10 = 0.3 (30%)
r(iceblade) = 2/4 = 0.5 (50%)
r(stonestorm) = 6/7 = 0.86 (86%)
relative r(flamestrike) = .3/.3+.5+.86 = .3/1.66 = 0.18 (18%)
relative r(iceblade) = .5/.3+.5+.86 = .5/1.66 = 0.30 (30%)
relative r(stonestorm) = .86/.3+.5+.86 = .86/1.66 = 0.52 (52%)
So the wizard would cast stonestorm 52% of the time, iceblade 30% of the time and flamestrike 18% of the time
Personality . To see how this affects the personality of the character, assume we have an NPC named Pyro the Flame Wizard. The NPC has no special advantages with fire, he just likes how pretty it is. Pyro's bias for fire-based spells is 1.25 (25% bias).
relative r(flamestrike) = 1.25(.3/.3+.5+.86) = 1.25(.3/1.66) = 0.23 (23%)
Note that if the other options (iceblade and stonestorm) do not have their biases downgraded accordingly, the percentages (.23, .30, .52) will add up to 105%, not 100%. Given that this is, by definition, impossible, all numbers would need to be normalized. We do this by dividing each result by the overage (1.05)
relative r(flamestrike) = 0.23/1.05 = 0.22 (22%)
relative r(iceblade) = 0.30/1.05 = 0.29 (29%)
relative r(stonestorm) = 0.52/1.05 = 0.50 (50%)
The numbers here add up to 101 because i did a lot of rounding, but if you don't round they should add to 100%
Note that, even with the bias, Pyro the Flame Wizard still only casts flamestrike 22% of the time. This is because he'd be suicidal or a fool to cast it more than that given his poor track record with the spell
Learning . Action selection is driven by two counters, numAttempted and numAttemptsSuccessful. Because of that, the agents will constantly adapt their actions based on feedback and their history
Limitations
-----------
- Learning via the matching law is slow. Does not smoothly accomodate rapid changes in skills such as happens in games when leveling up or aquiring a skill-enhancing weapon
- As stated, does not accomodate learning by observation (ie, switching to a sniper rifle because the winner of the last four rounds used the sniper rifle)
- Modifications must be made to accomodate context-specific decisions (ie, tracking numAttempts at a per-enemy or per-enemyType level rather than at a global level)
Notes
-----
- Bias and sensitivity are not learned, they are hard coded by the designer
- Sensitivity should be situation specific. Sensitivity to switching weapons (armor, etc.) should be normal (~1) in safe spots and should be lower when surrounded by enemies. The sensitivity of the current item should reflect the fact that there is no switching cost (ie, s=1)
[edited by - baylor on November 4, 2003 6:30:36 PM]
[edited by - baylor on November 4, 2003 6:31:40 PM]
[edited by - baylor on November 4, 2003 6:33:08 PM]