Below is my preliminary draft design for the AI system within Spellbound. I'm slowly migrating away from scripted expert systems towards a more dynamic and fluid AI system based on machine learning and neural networks. I may be crazy to attempt this, but I find this topic fascinating. I ended up having a mild existential crisis as a result of this. Let me know what you think or if I'm missing something.
Artificial Intelligence:
Objectives:
Spellbound is going to be a large open world with many different types of characters, each with different motives and behaviors. We want this open world to feel alive, as if the characters within the world are inhabitants. If we went with pre-scripted behavioral patterns, the characters would be unable to learn and adapt to changes in their environment. It would also be very labor intensive to write specific AI routines for each character. Ideally, we just give every character a self-adapting brain and let them loose to figure out the rest for themselves.
Core Premise: (very dense, take a minute to soak this in)
Intelligence is not a fixed intrinsic property of creatures. Intelligence is an emergent property which results directly from the neural topology of a biological brain. True sentience can be created if the neural topology of an intelligent being is replicated with data structures and the correct intelligence model. If intelligence is an emergent property, and emergent properties are simple rule sets working together, then creating intelligence is a matter of discovering the simple rule sets.
Design:
Each character has its own individual Artificial Neural Network (ANN). This is a weighted graph which uses reinforcement learning. Throughout the character's lifespan, the graph will become more weighted towards rewarding actions and away from displeasurable ones. Any time an action causes a displeasure to go away or brings a pleasure, that neural pathway will be reinforced. If a neural pathway has not been used in a long time, we reduce its weight. Over time, the creature will learn.
A SIMPLE ANN is just a single cluster of connected neurons. Each neuron is a “node” which is connected to nearby neurons. Each neuron receives inputs and generates outputs. The neural outputs always fire and activate a connected neuron. When a neuron receives enough inputs, it itself fires and activates downstream neurons. So, a SIMPLE ANN receives input and generates outputs which are a reaction to the inputs. At the end of neural cycle, we have to give response feedback to the ANN. If the neural response was positive, we strengthen the neural pathway by increasing the neural connection weights. If the response was negative, we decrease the weights of the pathway. With enough trial runs, we will find the neural pathway for the given inputs which creates the most positive outcome.
The SIMPLE ANN can be considered a single cluster. It can be abstracted into a single node for the purposes of creating a higher layer of connected node networks. When we have multiple source inputs feeding into our neural network cluster and each node is running its most optimal neural pathway depending on the input, we get complex unscripted behavior. A brain is just a very large collection of layered neural nodes connected to each other. We’ll call this our “Artificial Brain” (AB)
Motivation, motivators (rule sets):
-All creatures have a “desired state” they want to achieve and maintain. Think about food. When you have eaten and are full, your state is at an optimally desired state. When time passes, you become increasingly hungry. Being just a teensy bit hungry may not be enough to compel you to change your current behavior, but as time goes on and your hunger increases, your motivation to eat increases until it supersedes the motives for all other actions. We can create a few very simple rules to create complex, emergent behavior.
Rule 1: Every creature has a desired state they are trying to achieve and maintain. Some desired states may be unachievable (ie, infinite wealth)
Rule 2: States are changed by performing actions. Actions may change one or more states at once (one to many relationship).
Rule 3: “Motive” is created by a delta between current state (CS) and desired state (DS). The greater the delta between CS and DS, the more powerful the motive is. (Is this a linear graph or an exponential graph?)
Rule 4: “relief” is the sum of all deltas between CS and DS provided by an action.
Rule 5: A creature can have multiple competing motives. The creature will choose the action which provides the greatest amount of relief.
Rule 6: Some actions are a means to an end and can be chained together (action chains). If you’re hungry and the food is 50 feet away from you, you can’t just start eating. You first must move to the food to get within interaction radius, then eat it.
Q: How do we create an action chain?
Q: How do we know that the action chain will result in relief?
A: We generally know what desired result we want, so we work backwards. What action causes desired result (DR)? Action G does (learned from experience). How do we perform Action G? We have to perform Action D, which causes Action G. How do we cause Action D? We perform Action A, which causes Action D. Therefore, G<-D<-A; So we should do A->D->G->DR. Back propagation may be the contemporary approach to changing graph weights, but it's backwards.
Q: How does long term planning work?
Q: What is a conceptual idea? How can it be represented?
A: A conceptual idea is a set of nodes which is abstracted to become a single node?
Motivators: (Why we do the things we do)
Hunger
Body Temperature
Wealth
Knowledge
Power
Social Validation
Sex
Love/Compassion
Anger/Hatred
Pain Relief
Fear
Virtues, Vices & Ethics
Notice that all of these motivators are actually psychological motivators. That means they happen in the head of the agent rather than being a physical motivator. You can be physically hungry, but psychologically, you can ignore the pains of hunger. The psychological thresholds would be different per agent. Therefore, all of these motivators belong in the “brain” of the character rather than all being attributes of an agents physical body. Hunger and body temperature would be physical attributes, but they would also be “psychological tolerances”.
Psychological Tolerances:
{motivator} => 0 [------------|-----------o----|----] 100
A B C D E
A - This is the lowest possible bound for the motivator.
B - This is the lower threshold point for the motivator. If the current state falls below this value, the desired state begins to affect actions.
C - This is the current state of the motivator.
D - This is the upper threshold point for the motivator. If the current state exceeds this value, the desired state begins to affect actions.
E - This is the highest bounds for the motivator.
The A & E bounds values are fixed and universal.
The B and D threshold values vary by creature. Where you place them can make huge differences in behavior.
Psychological Profiles:
We can assign a class of creatures a list of psychological tolerances and assign their current state to some preset values. The behavioral decisions and subsequent actions will be driven by the psychological profile based upon the actions which create the sum of most psychological relief. The psychological profile will be the inputs into an artificial neural network, and the outputs will be the range of actions which can be performed by the agent. Ideally, the psychological profile state will drive the ANN, which drives actions, which changes the state of the psychological profile, which creates a feedback loop of reinforcement learning.
Final Result:
We do not program scripted behaviors, we assign psychological profiles and lists of actions. Characters will have psychological states which drive their behavioral patterns. Simply by tweaking the psychological desires of a creature, we can create emergent behavior resembling intelligence. A zombie would always be hungry, feasting on flesh would provide temporary relief. A goblin would have a strong compulsion for wealth, so they'd be very motivated to perform actions which ultimately result in gold. Rather than spending lots of time writing expert systems styled AI, we create a machine learning type of AI.
Challenges:
I have never created a working artificial neural network type of AI.
Experimental research and development:
The following notes are crazy talk which may or may not be feasible. They may need more investigation to measure their merit as viable approaches to AI.
Learning by Observation:
Our intelligent character doesn’t necessarily have to perform an action themselves to learn about its consequences (reward vs regret). If they watch another character perform an action and receive a reward, the intelligent character creates a connection between an action and consequence.
Exploration Learning:
A very important component to getting an simple ANN to work most efficiently is to get the neurons to find and establish new connections with other neurons. If we have a neural connection topology which always results in a negative response, we’ll want to generate a new connection at random to a nearby neuron.
Exploration Scheduling:
When all other paths are terrible, the new path becomes better and we “try it out” because there’s nothing better. If the new pathway happens to result in a positive outcome, suddenly it gets much stronger. This is how our simple ANN discovers new unscripted behaviors.
The danger is that we will have a sub-optimal behavior pattern which generates some results, but they’re not the best results. We’d use the same neural pathway over and over again because it is a well travelled path.
Exploration Rewards:
In order to encourage exploring different untravelled paths, we gradually increase the “novelty” reward value for taking that pathway. If traveling this pathway results in a large reward, the pathway is highly rewarded and may become the most travelled path.
Dynamic Deep Learning:
On occasion, we’ll also want to create new neurons at random and connect them to at least one other nearby downstream neuron. If a neuron is not connected to any other neurons, it becomes an “island” and must die. When we follow a neural pathway, we are looking at two costs: The connection weight and the path weight. We always choose the shortest path with the least weight. Rarely used pathways will have their weight decrease over a long period of time. If a path weight reaches zero, we break the connection and our brain “forgets” the neural connection.
Evolutionary & Inherited Learning:
It takes a lot of effort for a neural pathway to become developed. We will want to speed up the development. If a child is born to two parents, those parents will rapidly increase the neural pathways of the child by sharing their own pathways. This is one way to "teach". Thus, children will think very much like their parents do. Other characters will also share their knowledge with other characters. In order for knowledge to spread, it must be interesting enough to be spread. So, a character will generally share the most interesting knowledge they have.
Network Training & Evolutionary Inheritance:
An untrained ANN results in an uninteresting character. So, we have to have at least a trained base preset for a brain. This is consistent with biological brains because our brains have been pre-configured through evolutionary processes and come pre-wired with certain regions of the brain being universally responsible for processing certain input types. The training method will be rudimentary at first, to get something at least passable, and it can be done as a part of the development process.
When we release the game to the public, the creatures are still going to be training. The creatures which had the most “success” will become a part of the next generation. These brain configurations can be stored on a central database somewhere in the cloud. When a player begins a new game, we download the most recent generation of brain configurations. Each newly instanced character may have a chance to have a random mutation. When the game completes, if there were any particular brains which were more successful than the current strain, we select it for “breeding” with other successful strains so that the next generation is an amalgamation of the most successful previous generations. We’ll probably begin to see some divergence and brain species over time?
Predisposition towards Behavior Patterns via bias:
Characters will also have slight predispositions which are assigned at birth. 50% of their predisposition is innate to their creature class. 25% is genetically passed down by parents. 25% is randomly chosen. A predisposition causes some pleasures and displeasures to be more or less intense. This will skew the weightings of a developing ANN a bit more heavily to favor particular actions. This is what will create a variety in interests between characters, and will ultimately lead to a variety in personalities. We can create very different behavior patterns in our AB’s by tweaking the amount of pleasure and displeasure various outputs generate for our creature. The brain of a goblin could derive much more pleasure from getting gold, so it will have strong neural pathways which result in getting gold.
AI will be able to interact with interactable objects. An interactable object has a list of ways it can be interacted with. Interactable objects can be used to interact with other interactable objects. Characters are considered to be interactable objects. The AI has a sense of ownership for various objects. When it loses an object, it is a displeasurable feeling. When they gain an object, it is a pleasurable feeling. Stealing from an AI will cause it to be unhappy and it will learn about theft and begin trying to avoid it. Giving a gift to an AI makes it very happy. Trading one object for another will transfer ownership of objects. There is no "intrinsic value" to an object. The value of an object is based on how much the AI wants it compared to how much it wants the other object in question.
Learning through Socialization:
AI's will socialize with each other. This is the primary mechanism for knowledge transfer. They will generally tell each other about recent events or interests, choosing to talk about the most interesting events first. If an AI doesn't find a conversation very interesting, they will stop the conversation and leave (terminating condition). If a threat is nearby, the AI will be very interested in it and will share with nearby AI. If a player has hurt or killed a townsfolk, all of the nearby townsfolk will be very upset and may attack the player on sight. If enough players attack the townsfolk, the townsfolk AI will start to associate all players with negative feelings and may attack a player on sight even if they didn't do anything to aggravate the townsfolk AI.
What is it that you're hoping that an AI like this adds to Spellbound?
The mindless undead that attack the player... well they're mindless, so they don't really need an AI, right? For an AI like this to be applied to an end boss or other main character that may or may not be adversarial, I would think that you would run the risk of not providing a consistent experience for players when the boss is encountered. Which then would leave assorted NPCs like villagers to be given the AI, the main purpose of which, from a game play perspective, I would think would be to generate side quests which I would expect there to be simpler ways to go about generating.