Advertisement

Dialogue AI

Started by February 29, 2004 04:20 AM
9 comments, last by Crispy 20 years, 11 months ago
I''ve done a lot in my life, starting with simple algorithms and all the way up to rather complex solutions for different problems. I have, however, never even touched AI before (apart from pathfinding). Right now I''m poking around to see what are the best ways to have the computer generate sensbile dialogue based on an adaptive memory system. Basically, what I''m trying to simulate, is a group of 5 people or so, who live near eachother. I am the sixth person. Their lives are intermingled and therefore slowly generate a database for each of them, based on which I want to make them talk. A common scenario:

      to Salisbury
 ______  |   |
|      |_|   |
|  P1 \ _|   |
|______| |   |   ______
         |   |__|      |
 ______  |    __ / P4  |
|      |_|   |  |______|
|  P2 \ _|   |
|______| |   |_______
         |    ____   |
 ______  |   |    |  |
|      |_|   |   _|  |____
|  P3 \ _|   |  | /  \    |
|______| |   |  |   P5    |
         |   |  |_________|
         |   |
      to Heathrow
Random facts: P1 - is a businessman from Salisbury, who goes there every weekday - has a wife and 2 children P2 - is gardener who works for P3 - is single, but lives with a young girl as the housemate P3 - is a gene engineer in Heathrow - goes there every Tue, Thu and Fri P4 - is a gardening expert from Heathrow - visits this place every weekend P5 - is a hermit - never speaks to people - is sometimes seem starting at the stars on his porch I am a random visitor whom everyone here know quite well. I can visit any time I like and ask any questions from any of the people. What I''d like to do is come up with an efficient data structure basis to hold each of these people''s memories for, say a month. I''d also like to be able to ask them questions about their community and make basically random queries to thir memories. I hackd through some basic conceptson my own, but didn''t manage to come up with anything too useful. I suppose some related tutorials would be in order. I know dialogue generation is a slighly different problem from traditional decision-making AI - can anyone direct me to some resources on that subject or perhaps share their own expertise?

"Finishing in second place, simply means you are the first loser." - PouyaCat
"Literally, it means that Bob is everything you can think of, but not dead; i.e., Bob is a purple-spotted, yellow-striped bumblebee/dragon/pterodactyl hybrid with a voracious addiction to Twix candy bars, but not dead."- kSquared
Have a look at www.alicebot.org. While it is not related exactly to what you want, if you check out AIML and the Graphmaster algorithm, it might help you come up with a solution. AIML is an extensible markup language for artificial intelligence in the area of dialogue. *most* of the stuff presented there is really geared toward a single entity that interacts with a large number of humans, but it does have mechanism for storing facts and bits of conversations for later retrieval, and shouldnt be too hard to extend to multiple agents. You probably want to start out with a very minimal AIML set so as not to overwhelm your agents when they start talking to each other.

Join and search the alicebot general mailing list and you will find a wealth of information, from experts in the areas of psychology, linguistics and story writing.

There are several open source implementations of the Graphmaster in the downloads area. ''Program D'' is probably the most widely documented, and ''Program N'' is probably the most frequently updated, and has recently been integrated with the Cyc inference engine.

Peace
Advertisement
Thanks a lot Krippy! That''s a pretty amazing piece of software right there. I''ll probably be starting a thread in the Lounge soon enough about some of the funnyish foul-ups I got Alice to say. Anyway, on a serious note - I have a question. After reading most of the information there I could find that seemed to be useful for someone who is about to try to implement a simple version of it, I didn''t quite understand the following:

In this article, ALICE''s graphing system is explained, which is pretty simple. However, I''m having trouble understanding what the underscore denotes. The star is a simple wildcard as I understand it. For instance, "Are you *?" would expand to:

+ are  + you    + *


where the asterisk is for instance, a set of properties, such as "sad", "slim", "stupid", "very hot", etc.

However, how would a phrase with an underscore expand?



"Finishing in second place, simply means you are the first loser." - PouyaCat

"Literally, it means that Bob is everything you can think of, but not dead; i.e., Bob is a purple-spotted, yellow-striped bumblebee/dragon/pterodactyl hybrid with a voracious addiction to Twix candy bars, but not dead."- kSquared
Hi,

When searching through the graph, the node precedent is as follows:

1) match _
2) match exact word
3) match *

So for instance, if you have a category with the pattern "_", it would catch ALL input. In contrast, if you have a category with the pattern "*" it will act as a default category. If no other categories match, it will fall back to the "*" category.

For this reason, the _ is used sparingly. In general it is used when the words after the wildcard are most important, or when you want to do a word replacement.

The underscore acts just like the wildcard in all other regards, as far as the variables are concerned.

There is an article somewhere either on alicebot.org or aiml.info that deals with ISA, HASA, etc relationships with AIML, this shows a great example of when the underscore is useful. I think the article title has "symbolic reduction" in it, which is the concept that is really at the core of AIML. If I run across it again I'll post the link here.

Peace

[edited by - krippy2k on February 29, 2004 11:09:49 PM]
I guess to answer your question more directly...

If you have the following patterns:
ARE YOU *
ARE YOU OKAY
ARE YOU _

The representation would be:
+ ARE  + YOU    + _    + OKAY    + * 


In this case, the pattern "ARE YOU _" overrides the OKAY and *. So even if the input is "ARE YOU OKAY", the pattern matched will be "ARE YOU _". The input "ARE YOU CRAZY" would also match the pattern "ARE YOU _".

If you didn't have the "_", you would have this:

+ ARE  + YOU    + OKAY    + * 


Now the input "ARE YOU OKAY" would match the "ARE YOU OKAY" pattern. The input "ARE YOU CRAZY" would match the "ARE YOU *" pattern.

Peace

[edited by - krippy2k on February 29, 2004 11:56:30 PM]
I had a look around on both on the utilization of the underscore, but unfortunately found nothing useful. Basically - I''m not stupid (I hope) - I can understand the general structure, but it seems to me that the underscore and the list of atomic choices are duplicates of each other, even if they are treated as separate things. To bring out the conflict as it appears to me, consider the following:

NM0+ Be  + _ (links to nodemapper NM1)  + ready to rumble  + *NM1+ happy+ satisfied+ content


What''s different between "Be ready to rumble" and "Be content"?




"Finishing in second place, simply means you are the first loser." - PouyaCat

"Literally, it means that Bob is everything you can think of, but not dead; i.e., Bob is a purple-spotted, yellow-striped bumblebee/dragon/pterodactyl hybrid with a voracious addiction to Twix candy bars, but not dead."- kSquared
Advertisement
Never mind - found an answer here.



"Finishing in second place, simply means you are the first loser." - PouyaCat

"Literally, it means that Bob is everything you can think of, but not dead; i.e., Bob is a purple-spotted, yellow-striped bumblebee/dragon/pterodactyl hybrid with a voracious addiction to Twix candy bars, but not dead."- kSquared
u may also want to visit www.elbot.com.

edit: goto www.kiwilogic.com for details regarding the chatbot.

[edited by - pacman2003 on March 2, 2004 2:48:39 PM]
Thanks for the links pacman.

I''ve been trying to implement a simple "5-question" chatbot of my own (in C++) and quite honestly I''m having trouble with efficient pattern matching. Deviating from the ALICE method, which uses XML as the basic means of pattern/response description, I chose a somewhat different method, which is also iterative, but in a different sense:

1) matching is done only against the simplest of patterns, which include one type of wildcard and base forms of certain words. For instance:

Pattern: WHO _BE_ *

This would handle any sentences from: "who are you?" to "who was the president of the United States in 1946?" and so forth.

2) Once the wildcard part(s) is/are identified, any articles are removed and the leftovers can be processed independently. For instance:

Input: "who was the president of the United States in 1946?"
After pattern-matching: which living being + past form of be + "president of United States in 1946"
Object matching: search for a match for "which living being" in the wildcard part: "president of United States"

How the rest of the sentence ("in 1946") is handled, is irrelevant ATM. My aim is to have the bot answer simple questions such as "what is your name?", "who is McGyver?", etc. based on its knowledge base.



RANT

After a little bit of reading up, I regard ALICE''s structural buildup as flawed in the sense of becoming the ultimate dialog AI engine - simply because it relies too much on static replies and doesn''t really have any "emotion" coded into it. That is, it relies on a huge knowledge base, but doesn not draw connections between facts very efficiently based on the nature of the input.

It only takes one look at the way we think to see that ALICE is a whole lot different from that. This means that while humans effectively take a bunch of phrases and sentences and, based on the situation, mix and match these, ALICE relies on the answer to be hard-coded in its memory.

For instance, consider the following example as input: "I''m taken aback". While ALICE, with all its knowledge base, searches for a match for "be taken aback", it fails to take the shortcut of finding a logical answer to the statement rather than a hard-coded one. Suppose "be taken aback" wasn''t added to the bot''s knowledge base, but a list of sysnonyms, which requantify the size of the klnowledge base. Suppose "be taken aback" was resolved to a more common word or phrase, such as "astonished" before being processed, which wouldn''t have the bot match for "I''m taken aback", but rather "I''m astonished". If "astonished" as chosen as the base synonym for also "very surprised", "speechless", etc., this would highly simplify the pattern matcher''s job and liability to actually find a match on the whole. In my view, this would far better serve as AI than direct matching to the input.

While Alice''s replies are often ill-formed and fall out of context even if the context is clear, simply becaus there is not direct match for an input, this would, with the help of a rather rudimentary look-up dictionary, solve any such problems.

Comments?

END OF RANT



Back to my problem: functional programming provides quite a few simple tools for doing pattern matching, while structural languages, such as C++, fail in that miserably. Simple patterns are easy to match (such as "WHO _BE_ *"), however, more complex ones can mess the algorithm up quite a bit (such as "WHO * _BE_ * _WHEN_ *", to match the example sentence with the president of the US). Pattern-matching also includes matching the analyzed senetnce structure to the data in memory. While documentation on the bots this thread has linked to, has been of great help to me, it hasn''t really given me an idea how exactly to store the stuff in the bot''s memory. I suppose neural networks would be one solution, although I don''t really know how they work precisely.



"Finishing in second place, simply means you are the first loser." - PouyaCat

"Literally, it means that Bob is everything you can think of, but not dead; i.e., Bob is a purple-spotted, yellow-striped bumblebee/dragon/pterodactyl hybrid with a voracious addiction to Twix candy bars, but not dead."- kSquared
Hi Crispy,

Well, while Alice is not the 'perfect' dialog machine, the things you mention it lacking is actually at the core of what Alice is all about. Unfortunately, the ALICE that you talk to on alicebot.org is a general Alice that uses a very general and limited AIML set. In order to converse with the 'real' Alice (also known as Silver Alice), you need to be a member of the Alice AI Foundation. You are basically talking to the 'base' set of AIML categories that is distributed as a starting point for bot creation, with a small set of ALICE personality thrown in.

That being said, the concept you are talking about of 'synonyms' is at the heart of the ALICE paradigm. This is what is known as 'symbolic reduction'.

Let's take your case example.

quote:

For instance, consider the following example as input: "I'm taken aback". While ALICE, with all its knowledge base, searches for a match for "be taken aback", it fails to take the shortcut of finding a logical answer to the statement rather than a hard-coded one. Suppose "be taken aback" wasn't added to the bot's knowledge base, but a list of sysnonyms, which requantify the size of the klnowledge base. Suppose "be taken aback" was resolved to a more common word or phrase, such as "astonished" before being processed, which wouldn't have the bot match for "I'm taken aback", but rather "I'm astonished". If "astonished" as chosen as the base synonym for also "very surprised", "speechless", etc., this would highly simplify the pattern matcher's job and liability to actually find a match on the whole. In my view, this would far better serve as AI than direct matching to the input.



In AIML, we would do it like this:

<category>   <pattern>I M VERY SURPRISED</pattern>   <template>      For some reason, that doesn't surprise me.   </template></category><category>   <pattern>_ ASTONISHED</pattern>   <template>      <srai><star index="1"/> VERY SURPRISED</srai>   </template></category><category>   <pattern>_ TAKEN ABACK</pattern>   <template>      <srai><star index="1"/> ASTONISHED</srai>   </template></category>


With this set of categories, this would be the dialog:

Client: I'm taken aback!
Alice: For some reason, that doesn't surprise me.


the SRAI tag is the SYMBOLIC REDUCTION tag... first it expands the tags and text that are inside of the SRAI tags, and then it takes what is generated and passes it back through the pattern matching again. So first it would reduce I M TAKEN ABACK to I M ASTONISHED, and passes I M ASTONISHED back through the graphmaster. I M ASTONISHED is further morphed to I M VERY SURPRISED, which is then fed into the graphmaster again, which comes up with a match that has no SRAI tag. One could also do this:

<category>   <pattern>_ VERY _</pattern>   <template><star index="1"/> <star index="2"/></srai></category><category>   <pattern>I M SURPRISED</pattern>   <template>      For some reason, that doesn't surprise me.   </template></category>





In this case, the first category effectively removes the extraneous 'very', which would reduce I M VERY SURPRISED to I M SURPRISED.... it would also serve to reduce I M VERY VERY VERY VERY VERY VERY VERY VERY VERY VERY VERY SURPRISED to I M SURPRISED.


In addition to the inherent AIML reduction, most of the AIML interpreters also have a file called 'substitutions.xml', which is a list of synonyms just like you described, as well as substitions for first/second/third person conversions.

Now let's take your example a little bit further... What if we wanted ALICE to respond in context to the conversation? There are a couple of ways to facilitate this. First the THAT method. Consider the following categories:


   <category>   <pattern>WHAT IS DADA</pattern>   <template>      The first thing you hear when a shark approaches.   </template></category><category>   <pattern>* YOUR SEXUAL PREFERENCE</pattern>   <template>      I am gay.   </template></category><category>   <pattern>I M VERY SURPRISED</pattern>   <template>      For some reason, that doesn't surprise me.   </template></category><category>   <pattern>I M VERY SURPRISED</pattern>   <that>* GAY</that>   <template>      Does my sexual preference really bother you?   </template></category><category>   <pattern>_ ASTONISHED</pattern>   <template>      <srai><star index="1"/> VERY SURPRISED</srai>   </template></category><category>   <pattern>_ TAKEN ABACK</pattern>   <template>      <srai><star index="1"/> ASTONISHED</srai>   </template></category>


Possible dialog:

Client: What is Dada?
Alice: The first thing you hear when a shark approaches.
Client: I'm taken aback!
Alice: For some reason that doesn't surprise me.
Client: What is your sexual preference?
Alice: I am gay.
Client: I'm taken aback!
Alice: Does my sexual preference really bother you?

...

In this case, there are 2 categories for I M VERY SURPRISED, one default, and one THAT. THAT refers to the last thing Alice said. You can use all wildcards in the THAT as well. Now anytime you say any phrase that evaluates to IM VERY SURPRISED, it will first match to the pattern IM VERY SURPRISED, and then check the THAT. THAT is added to the end of the search sequence like follows:

I'M TAKEN ABACK is first evaluated to I M VERY SURPRISED and then expanded to:

I M VERY SURPRISED <that> I AM GAY

And then the category pattern is expanded to:
I M VERY SURPRISED <that> * GAY

So we get a match.

....

THe idea of context can further be expanded into topics. You can embed categories in topics that will only evaluate if their topic is set as the current topic. Some interpreters allow various types of nesting of topics, but I don't think the standard defines topic nesting. Anyway, here is an example:

      <category>   <pattern>WHAT IS DADA</pattern>   <template>      The first thing you hear when a shark approaches.   </template></category><category>   <pattern>* YOUR SEXUAL PREFERENCE</pattern>   <template>      <condition name="botsexualpreference">         <li value="hetero">I'm straight as an arrow!</li>         <li value="homo">I'm gay.</li>         <li value="default">Uhhh... I'm not really sure?</li>      </condition>      <think>         <set name="topic">sexual preference</set>      </think>   </template></category><category>   <pattern>I M VERY SURPRISED</pattern>   <template>      For some reason, that doesn't surprise me.   </template></category><topic name="sexual preference">   <category>      <pattern>I M VERY SURPRISED</pattern>         <template>            Sexual preference is a boring subject, I'm surprised it surprises you.         </template>   </category>   <category>      <pattern>*</pattern>      <template>         Hmmmm. So, what is your preference?      </template>   </category>   </topic><category>   <pattern>_ ASTONISHED</pattern>   <template>      <srai><star index="1"/> VERY SURPRISED</srai>   </template></category><category>   <pattern>_ TAKEN ABACK</pattern>   <template>      <srai><star index="1"/> ASTONISHED</srai>   </template></category>


Client: What is your sexual preference?
Alice: I am straight as an arrow!
Client: I'm taken aback!
Alice: Sexual preference is a boring subject, I'm surprised it surprises you.
Client: Well it does!
Alice: Hmmm. So what is your preference?

The TOPIC is expanded just like the THAT. In a normal input, this is what it would be like:

WHAT IS YOUR SEXUAL PREFERENCE <that> * *

Then the second input would look like:
I M VERY SURPRISED <that> I AM STRAIGHT AS AN ARROW SEXUAL PREFERENCE

and the relevent category would be defined as:
I M VERY SURPRISED <that> * SEXUAL PREFERENCE

Therefore, a match.

...

Anyway, if you really need more powerful cognitive functionality, and actual concept evaluation, you might consider looking into Thought Tresure. It is an open source project that combines a variety of different methods, from the simple pattern matching, to hardcore natural language processing. However, I tend to get better conversations out of AIML. Personally, I am currently adding Thought Treasure style cognition to my AIML interpreter, as I find XML a lot easier to work with in a content-oriented way, and ALICE's means of robust dialog personality are far superior.

Peace



[edited by - krippy2k on March 5, 2004 8:22:33 AM]

This topic is closed to new replies.

Advertisement