I've got a broad production level puzzle I've been trying to figure out and I've never been sure if I've been working with all of the pieces. I'll try to explain the problem and my considerations.
1) I need to create AI for all of my characters and it needs to be good enough that it's convincing.
2) My approach has been to quickly write Finite State Machine scripts in C++. It has worked well enough.
3) During development, I change stuff in my game and I will be adding in new mechanics.
4) Every time I change something important, I need to refactor my FSM AI to account for it. This becomes a "tax" I need to pay, and when I just have a few relatively simple characters, it's not intolerable.
Now, I put on my magical hat of farseeing +3, and I see a future where my game has dozens of characters, each with FSM scripts. The overhead tax I need to pay increases proportionately, and it gets to the point where I need to spend equal amounts of time maintaining AI scripts as I do with building out game systems, which ultimately slows down the pace of development.
Here's where I get a bit conflicted. One side of me says, "Premature optimization! YAGNI! Solving problems you don't have!"
The other side of me says, "But these are real problems you're going to have if you don't head them off early and it's just going to be more expensive to fix them later. This has a compounding cost in terms of refactoring effort. Nip it in the bud now. Work smarter, not harder."
And the business side of me says, "Will this help me sell more copies today? What's urgent? If this costs me X months and I don't sell Y copies to sustain development work, then my effort is misdirected. I should be focused exclusively on building what sells!"
I think all are sage, prudent voices of reason to listen to and have merit behind them.
My creative engineering side has been quietly asking, "How can I avoid paying that increasing maintenance tax as development goes forward?" and the answer I've been recircling back to over and over again is "Design an AI system that doesn't need maintenance. It'll have a bit of a steeper upfront cost, but the investment will pay off in saved time over the course of development." Easier said than done, and to build such a system is a tall order which teams of other, much smarter people have tried to do with limited success. My engineering approach has been to dig into current research and development in machine learning, particularly the work being done by Google owned DeepMind. They've been able to create model free AI systems which can learn how to play any game with no instructions on how to play it. Their AI systems have beaten world champions at the game "Go". A year ago I said, "That sounds like something I need for my game!" (cue: ridiculous groans).
The reality is that I don't have a long history and deep background in machine learning AI systems. I didn't know anything about artificial neural networks, reinforcement learning, CNN's, or any other acronym super smart AI people mention off the cuff. I fumbled in the dark trying to understand these various AI systems to see if they could help me build an AI system for my game. I explored quite a few dead ends over the months. It's hard to recognize a dead end when you're in the middle of development, and it's also important not to get mentally entrenched on one track. I have to take a step back and look at the bigger picture to recognize these tracks and traps.
One track/trap is to think, "I need to have an artificial neural network! That's the only way to make this work"
Another one is, "I need to have machine learning, and my implementation for machine learning needs to match the AI industries definition! It's not machine learning if its not an implementation of back propagation!"
Another trap: "My machine learning type of AI should convince AI professionals it's real!"
All of that is obviously nonsense and nobody is saying this -- its an invented internal narrative, which creates unnecessary constraints, which creates traps.
Russel (Magforce7) has helped me see these traps and stay focused on what matters, but I'm still fumbling my way in the dark within a maze. I'm just taking inspiration from the ideas proposed by these other AI systems, but creating my own and trying to keep it as simple as possible while trying to minimize the amount of future work I need to do to maintain the system.
So, here are my design goals for the AI system:
1) It shouldn't require a lot of maintenance and upkeep after it has been deployed
2) It should ideally be flexible: Updates to the game mechanics shouldn't cause AI system code updates
3) The AI controlled characters should appear reasonably intelligent and perform plausibly intelligent behaviors.
4) There should be support for variation in behavior between characters of the same class, and between characters of different classes.
5) I don't want to write any special case code. If I start writing FSM code, I'm doing it wrong.
6) Complex behavior should be emergent rather than designed.
7) It should be relatively simple. Integrating things into the AI system should be fast, easy and non-technical.
Designing a system to meet these broad goals has been very challenging, but I think I've done it. I have spent several months thinking about how people and animals think and trying to create a model for intelligence which is consistent with biological intelligence. It's taken a lot of internal reflection on my own mind and thought processes, and doing a bit of external research.
QuoteA small side note on the external internet research: There are a lot of different psychological models for how the mind works, but almost none of it is scientifically backed with empirical evidence or tested for correctness. Pretty much, all the research and proposed intelligence models out there are guesswork, some of which contradict other proposed models. That leads me to believe that even the experts don't know very much and to make things even more complicated, there are a lot of wacky pseudoscience people mixed in.
One important distinction with proposing broad intelligence models is that at the end of the day, it must be computable. If you can't reduce an intelligence model into computable systems, then it is no good for AI, and probably isn't a well defined model and gets lumped in with all of the other guesswork other people have proposed. I've come up with a few of these myself and haven't been able to reduce them into data structures or algorithms, so I had to throw them out. Anyways, on to my model! I've decided that I would structure my model to work as sets of loosely coupled systems. If one particular module is flawed and needs to be refactored, it shouldn't mean the whole system is flawed and needs to be refactored. Here are each of the modules:
QuotePropertySet:
The mind does not use or work with objects, but with the set of properties for those objects. It's critical to make this distinction for the sake of pattern matching.
Memory:
Memory is transient, stateful information which is used to choose a most optimal behavior. All memory comes with expiration times. When an expiration time is up, the memory is lost and forgotten. The importance of a memory determines how long it persists in memory, and its importance is driven by relevance to internal motivations and number of recalls. The constant trimming of memory state is what prevents cognitive overload.
Sensory Input:
Sensory input is how an agent gets stateful information about the environment around itself. Sensory input information is fed directly into transient memory. There is no filter applied at the sensory input level. Sensor inputs get fed sets of properties created by external objects in the surrounding environment.
Behavior:
Behavior is a type of response or interaction an agent can have with itself or the external world around it. Behavior choice is the only tangible evidence we have of an agents internal intelligence, so choosing the correct and wrong behaviors will determine whether the agent passes an intelligence test.
Motivator:
Every character has internal needs it is trying to satisfy through the use of behaviors. Motivators are what drive behavior choice in the agents given environmental context. Motivators are defined by a name, a normalized value, and a set of behaviors which have been discovered to change the motivation value one way or another.
Reward:
Reward (emergent) is the summed result of all motivations when a behavior effect has been applied to an object. The amount of reward gained is exponentially proportionate to the motivation satisfaction, using an F(X,W) = W(10X^3); equation, where X is normalized and represents motivational need, and W represents a weight. If you are manually assigning reward values to actions or properties, you're doing it wrong.
Knowledge:
Knowledge is a collection of abstract ideas and concepts which are used to identify relationships and associations between things. The abstractions can be applied towards related objects to make predictive outcomes, even though there is no prior history of experience. Knowledge is stored as a combination of property sets, behaviors, motivators, and reward experiences. Knowledge is transferable between agents.
Knowledge reflection: This is an internal process where we look at our collection of assembled knowledge sets and try to infer generalizations, remove redundancies, streamline connections, and make a better organized sense of things. In humans, this crystalization phase typically happens during sleep or through intentional reflection.
Mind:
The mind is the central repository for storing memory, knowledge, motivators, and behavior sets, and chooses behaviors based on these four areas of cognition. This is also where it does behavior planning/prediction via a dynamically generated behavior graph, with each node weighted by anticipated reward value evaluated through knowledge.
A picture or class diagram would be helpful in understanding this better. But let me describe the general workflow here for AI characters.
Each character has a mind. The mind has memory, knowledge, motivators, and a list of possible behaviors to choose from. The mind is very similar to the finite state machine. Each character has sensory inputs (eye sight, hearing, smell, etc). The only way an AI character can know about the environment around it is through its sensory inputs. The sensory inputs are just a bunch of data feeds which go directly into memory. Memory is where we contain transient state about the environment around us. Memory can be persistent even after a sensory feed is cut -- closing your eyes doesn't cause objects to disappear, so we have object permanence. Objects stored in memory have "importance value" filters applied to them and they also have expiration times. Ultimately, the memory in a mind contains all relevant state information! Our mind has a list of all possible behaviors it can perform, so there needs to be a way to choose the most optimal behavior from the behavior list, given the current memory state. This means there needs to be some sort of internal decision making process which quickly evaluates optimal behavior. How do we build this without creating a bunch of FSM scripts? Because if that's what we end up doing, then we failed and are just creating overly complicated FSM's and are creating scripted behavior models which need to be maintained and updated. Here's the trick where it gets interesting... When we get objects through our sensory inputs, we don't store references to those instanced objects in memory. Instead, we store a set of descriptive tags for those objects. We don't choose our behavior based on objects in memory, but the memories abstract representation of those objects. Our brain also has a knowledge repository of behaviors, tag sets, and its effects on motivators. Our goal is to choose behaviors which create the most reward for our character, and reward is determined by the sum of motivators satisfied (more on this below). Our agent doesn't intrinsically know how much reward certain behavior and tag sets generate, so it needs to query its internal knowledge repo. The knowledge repo is where abstract reasoning happens. Since we're not working with instanced objects directly, but rather abstract representations of those objects, we can look at the tag sets which define our objects in memory and do pattern matching against tag sets in knowledge, find which sets of knowledge are relevant to the object, and then look at historical motivation satisfaction (not historical rewards). Essentially, we're looking at objects and querying our knowledge banks for similarly related objects and asking about our past experiences, and then projecting the best matched experience onto the current object. We're trying to match the best motivationally satisfying behavior to the object, and that becomes our most optimal behavior towards that particular object. We repeat this process for all objects in transient memory, keep a high score of the most rewarding behavior, and then choose that as our most optimal behavior.
What's interesting is that this applies abstraction to objects and doesn't require thousands of training cycles. Imagine an AI character reaches out and touches a burning candle flame. That creates a negative reward for that action. The AI looks at the set of properties which define that candle flame and stores it in knowledge and associates the negative motivational experience. Let's define this by the property set {A,B,C,X,Y}. Now, some time passes, and the AI is looking at a campfire, which has the property set {C,G,H,J,X,Y}. It queries its knowledge base and sees that there is a set intersection between {A,B,C,X,Y} and {C,G,H,J,X,Y} which is {C,X,Y}. It can then say that there is a relationship between the campfire and the candle flame and based on its experience with the candle flame, it can project a predicted outcome to what would happen to its motivations if it touches the campfire, without ever actually having touched the campfire. In other words, we can make generalizations and apply those generalizations to related objects.
I was initially making the mistake of defining how much reward each tag was worth. This is wrong, don't do this. Let's talk about reward calculation and how it's related to motive satisfaction, and how this can vary by character class, and character instance. Here is a general set of motives every character will have:
- Existence / Self Preservation
- Pain avoidance
- Hunger
- Sex
- Love / Affection
- Comfort
- Greed
- Morality
- Justice
- Fear
- Power
- Curiosity
This is not a complete set of motives, you can add more as you think of them, but the central idea here is that our underlying motives/goals are what truly drive our actions and behaviors. It's the undercurrent to everything we do as humans and animals.
The general equation for evaluating reward is going to be defined as:
QuoteF(a, b, w) = w(10a^3 + -10b^3);
a = normalized motivation before the behavior
b = normalized motivation after the behavior
w = weight factor
We're essentially calculating the sum of all F(a,b,w) for all changes in motivation factors. Let's look specifically at hunger to illustrate how this works:
In our knowledge repo, we have the following:
QuoteEating [TagSet]:
Base Effects:
hunger -.25
pleasure -.1
which describes the effects on your motivations when you eat a loaf of bread: It satisfies hunger and creates a little bit of pleasure (the bread is tasty!).
We have a few different character actors which have an eating behavior:
1) Humans
2) Zombies
3) Cows
4) Termites
For reference, humans like to eat breads and meats, as long as the meat is not human flesh. Zombies are exclusively carnivores who eat any kind of meat and have no qualms about eating human flesh. Cows are herbivores who only eat grass and nothing else. Termites are a type of insect which only eats wood and nothing else. We 100% definitely do not want to write a state machine describing these behaviors and conditions! Let the characters learn for themselves through trial and error and abstraction!
So, our token human is hungry and we represent this by setting his initial hunger motivation value to 0.5. In front of him is a loaf of bread and he has prior experience/knowledge with that bread, as described above in the quote. Using our equation, how much reward would he get for performing the "eat" behavior on the bread multiple times?
QuoteF(a, b, w) = w(10a^3 + -10b^3);
very hungry (0.5), eat bread!
hunger a: 0.5 = 125
hunger b: 0.25 = -15.625
reward: 125 + -15.625 = 109.375
not very hungry (0.25), eat bread.
hunger a: 0.25 = 15.625
hunger b: 0. = -0
reward: 15.625 + 0 = 15.625
full (0), eat bread:
hunger a: 0 = 0
hungry b: -.25 = -15.625
reward: 0 + -15.625 = -15.625
As our human continues to eat bread, it satisfies his hunger and it becomes decreasingly rewarding to continue eating bread, to the point that it becomes a disincentivized behavior when he can't eat anymore (represented by the F(X)=X^3 graph).
Let's place zombies and humans and put a chunk of human flesh in front of them both. The knowledge looks like this:
Quoteeat human flesh:
Base effects:
hunger -.3
morality +2
It satisfies hunger, but generates a moral crisis! Here's where weights come into play.
Quotevery hungry human, eat human flesh:
hunger before: 0.75 = 421.875
hungry after: .45 = 91.125
morality before: -1 = -1000
morality after: 1 = 1000
reward: 1*(421.875 + -91.125) + 1*(-1000 + -1000) = -1669.25 (very bad moral cost!)
very hungry zombie, eat human flesh: (no moral weight)
hunger before: 0.75 = 421.875
hungry after: .45 = 91.125
morality before: -1 = -1000
morality after: 1 = 1000
reward: 1*(421.875 + -91.125) + 0*(-1000 + -1000) = 330.75 (good)
Internally within the zombie character, we have a constant, fixed weight on the influence of morality on their reward modifier. Zombies have no morality, so they are completely unaffected. Our particular human has a strong moral conscience, so eating human flesh would be deeply objectionable. We *could* adjust the humans morality weighting to 0.8 or something, and if they eventually get hungry enough, the morality consequence could get overridden by the motivation to eat, and we'd have a cannibal. Notice that no extra code would need to be written to create these special behavior cases? These numbers can be adjusted in a spreadsheet to change behavior patterns.
We also don't want to go through the process of describing what behaviors can be performed with particular objects. That would add extra work. Let's say we have a wooden door. It's entirely possible and allowable for the human and the zombie to eat the door (or anything for that matter). But how do we prevent them from doing so? If they attempt to eat something they aren't supposed to eat, we simply don't change a single motivating value. They will both learn that eating wood doors does not help them satisfy their driving motives, so when it comes to choosing rewarding behaviors, this would score a big fat zero. If we have an idle behavior which scores a minimum reward of 1, then the characters would prefer to idle around doing nothing before they'd go around eating wooden doors. It's a bit hilarious that the threshold between idling and eating wooden doors is so small though.
Taking a few steps back, I think I've got all of the working pieces together now and it's mostly going to be a matter of implementing this. One section that's still missing from this future planning and look ahead. If you are hungry and standing outside of a house, look in through the window and see a ham sandwich, and want to eat it, then there is an intermediate step of moving to a door and opening it. This series of chained actions has a cost which needs to be factored into the reward calculation, and it would also need to be capable of working towards goals which don't exist (such as deciding to plant crops to get food to eat -- the food doesn't exist in present time). For the last week or so, I've been building this AI system out and I've got a rough working prototype. I'm still implementing the underlying framework and discovering design oversights and errors, but I think once this is working, I'll have a pretty unique type of AI capable of abstract reasoning, learning, planning, and optimized behaviors.
I suspect that lots of different characters with slight variations in weights, could generate an interesting system of interactions and an economic system of competing interests could be an emergent property of these underlying systems of motivation satisfaction driven behavior. I think this is also reflective of real life? It's been making me look at people very differently for the last few days and it's been blowing my mind a bit.
You might want to read this book...
https://books.google.com/books/about/Designing_games_and_simulations.html?id=n9TtAAAAMAAJ
Part of what you are describing in your post is a very old method of scientific modeling known as "Needs, Wants & Desires". You probably recognize it better as "Will Wright's style". The book I linked too above has a very extensive section on Needs, Wants, & Desires. It was Will Wright's game design bible, that taught him how to make games the way he made them. Your list of what you are calling "motives" is actually a list of Needs, Wants, & Desires and this book will show you a more powerful way doing what you are doing here.
This is actually 1970's era simulation design stuff...