Advertisement

Artificial Morality

Started by December 15, 2009 07:12 PM
2 comments, last by wodinoneeye 15 years, 2 months ago
A recent thread on Moral Complexity and Character Identification got me designing a model for agent morality, which exhibits some interesting meta-behaviors. Among them are Curiosity, the persuit of happiness, Judgement, Morality, and the beginnings of Story telling and Trust. Sound good so far? It's especially pretty considering the simplicity of it's construction. A Model for DecisionsDirected graph representationFor a virtual decision space, a series of choices is modeled as a journey along a directed graph. Within this graph, each node has at least two outbound vectors, representing a choice of directions to take to the next decision. Add a further distinction between significant and insignificant decisions, by defining a significant decision as a choice that leads to a change in future choices. This crops out paths that refer back to the originating node, and duplicate nodes that offer no difference in their sources and destinations. The space of all decisions is thus not covered; only a subset deemed significant. Furthermore, this model imposes a containment within the graph, so that there is a single, atomic level of detai over which to navigate. With a clear framework in place, it is then possible to look into the nature of the decisions made at each node, and how to get to the next one. Memory and JudgementAgent using Memory and JudgementChoosing one path over another requires only one thing: That we rank one choice over another. This could be as arbitrary as flipping a coin, or first come, first served. Or it could be more involved, including assessing the value of future positions and matching present circumstance with a memory of similar situations and their outcomes. One popular method of scoring among natural life on earth is Good attractors vs Bad repulsors. Happiness, Growth and Fulfillment vs Pain, Hunger and Death. This simplistic binary partition of the possibilities is enough to recursively subdivide the choices to any level of detail. It's a hardware-level function implemented as a nervous system and neurochemicals input into a neural net. This function summarizes present circumstance in the simplest of terms, Good or Bad, the seed of Judgement. It's a boolean Utility function that returns Good/Bad. During the comparison of two choices, a memory of previous outcomes is employed, which contains the sum of previous judgements; namely how much Good/Bad votes did it accumulate? The memory function is a two-step process. This part is Memory-Retrieval, where a lookup function returns a coordinate pair {GoodPoints, BadPoints}. Then the winner of the comparison is determined by first who has the least BadPoints and then who as the most GoodPoints. The second step of memory is Memory-Storage, which is a post-process of the judgement utility function. This two-step process has both a relative comparison and an 'absolute' comparison, where the value is relative to self. Morality as Summarization One view of moral and immoral choices is as a classifier of memories long forgotten and lessons learned through others but not experienced. Accepting a summarization of the experiences of many in the form of a code of conduct can be more efficient than individual trial-and-error. Within the model, this can be seen as adopting a set of {GoodPoints/BadPoints} for choices that haven’t been made by that agent, but are instead the summary of another agent’s experiences. By sharing their summaries, agents can pass on their knowledge and experience of their world. Such sharing of knowledge potentially improves the survivability of both agents. So even though the representation of choice included only a judgement of the present outcome with a boolean Good/Bad, and a current summary of those judgements, the agents have expressed several emergent features: Curiosity, a potential to share their experiences, and an internal representation of the right thing to do, as agreed upon by their society. [Edited by - AngleWyrm on December 19, 2009 8:16:32 PM]
--"I'm not at home right now, but" = lights on, but no ones home
I like this kind of approach, although I think you set the expectations too high at the beginning of the post. :)

Your greedy rule for selecting an action (pick the one with the highest frequency of good outcomes) is a bit simplistic. If one option gave you a bad result the first time around and its alternative gave you a good result, you'll never give the first option a second chance, even if you revisit this node a million times. The theory of multi-armed bandits has more sophisticated solutions that balance exploration and exploitation, like the epsilon-greedy strategy, or Upper Confidence Bounds (UCB).

If you do this computation speculatively and in a Monte-Carlo style (running many simulations to accumulate statistics), the resulting planning algorithm is called Upper Confidence bounds applied to Trees (UCT). Well, "trees" or "directed graphs"... same difference. This has been very successful in computer go. It can be applied to all sort of situations: minimax (like in the example of go), a planning under uncertainty, games with more than 2 players (you would need to accumulate counts for how satisfactory the results have been for each player)...

I think the morality will only really come into play when you incorporate a more interesting utility function, which can then be parametrized by the agent's personality and preferences. For the decisions in the tree that are taken by other agents, you need a model of the personality of those agents. You can also make a poker-playing program this way.

Or perhaps I misinterpreted your idea altogether. :)
Advertisement
Quote:
Original post by alvaro
I like this kind of approach, although I think you set the expectations too high at the beginning of the post. :)

Your greedy rule for selecting an action (pick the one with the highest frequency of good outcomes) is a bit simplistic. If one option gave you a bad result the first time around and its alternative gave you a good result, you'll never give the first option a second chance, even if you revisit this node a million times.

Looks a lot like First Impressions. It seems a legitimate outcome, although possibly too effective. I'm leaning towards a noisy selector in the Decision phase, so that a bit of low-confidence randomness may offer 'accidental' exploration. Maybe a fuzzy logic decision: Instead of choosing the highest of the two, choose randomly, with their goodPoints and badPoints used as chance weights. More research is in order :)
Quote:
I think the morality will only really come into play when you incorporate a more interesting utility function, which can then be parametrized by the agent's personality and preferences. For the decisions in the tree that are taken by other agents, you need a model of the personality of those agents. You can also make a poker-playing program this way.

If we define personality as a sum of experiences that is expressed in our choices, then we can say that agents using this memory model present such personalities. This also makes personality something that develops with larger jumps early on, and eventually begins to settle near a social norm.

Instilling character into an agent would constitute giving it various (possibly false) summaries of which choices are the best thing to do. The 'age' of an agent also becomes a natural part of who they are, as measured in total GoodPoints/BadPoints contained in their memory. A young and impressionable agent with high ideals could be represented by having a very few points in their start memory. A grisled vet would have many points applied.

[Edited by - AngleWyrm on December 19, 2009 8:37:36 PM]
--"I'm not at home right now, but" = lights on, but no ones home
You might seek a different term than "Morality" as that is not whats good/bad for an individual but what behavior is seen as 'proper' for a society. The proper behaviors are decided upon for their advantages to the group (which does include a good amount of advantage for the individual who mostly has to continue to exist and function).

A particular decision is likewise based on the current situation where different goals may have different priorities.

Consider also that decisions are based on their effect on the current situation and the likelyhood of a desired change happening (effectiveness) which then is weighed against the cost/risk (resources/time/danger). Certain desired future situations can offer more potential options (higher probabilities of achieving goals) and thus the 'utility' of achieving that future position becomes a significant consideration. With uncertainties increasing further into the future, positions of versatility should have high value to achieve.

There is the usual problem in a complex system of classifying the situation to symbolize/generalize the factors and to winnow out the insignificant. The (memory) logic grows geometricly with the situational complexity (and therefore also the test cases required to train/decide on the formation of that logic).

A learning system would have to take into account the temporal flow of a situation both in making a decision and in finding the future impact to guage the full effect of the action taken (to evaluate for the 'memory'). Combinations of actions lead to outcomes so the single edge transition model is insufficient to visialize the mechanism.
--------------------------------------------[size="1"]Ratings are Opinion, not Fact

This topic is closed to new replies.

Advertisement