Advertisement

I want to talk in my game

Started by November 05, 2013 01:57 AM
9 comments, last by powerneg 11 years, 2 months ago

I want to reintroduce an old staple of gaming with an updated twist. Some of you old gamers out there should remember text-based games where you had to type what you were doing like "go north" and "get sword". I don't want to use it for directing the player character, but in conversations with NPCs. I would like to not only be able to type conversations, but speak into the microphone and have the text appear on screen and be understood by the NPC in the conversation. I think I will need the following:

-- speech-to-text software. This is something that Windows already has, but I don't know how to access it in my application (or if it can be done at all).

-- a thesaurus database.

-- a function that parses the sentence and references the thesaurus database to convert it to a set of values.

I think that's all I will need. Any thoughts?

Check out the program www.cleverbot.com and other chatbots for coding ideas. Cleverbot seems to work on the principles of the Chinese Room.

--"I'm not at home right now, but" = lights on, but no ones home
Advertisement
I used this .Net library for a couple of simple speech-recognizing programs and it worked pretty well (I used a very small grammar):

http://msdn.microsoft.com/en-us/library/system.speech.recognition(v=vs.110).aspx

Basically it works like this:

- You configure the words/phrases you want to be able to recognize using a 'grammar'.
- You listen to some events which may occur depending on what the microphone hears.
- You start the recognizer.

I don't know how refined system we are talking about. I assume you're going for something more than just NPC "listening" to you until it picks up the word "sword" when he prints the set text about how to get a sword. That level wouldn't be much of a conversation in my opinion.

Speech recognition and AI conversation are two big topics both of which you are able to pour as much time into as you can spare. It's good you can use the Windows library to give you a good head start and mainly focus on making the conversations somehow interesting.

Finally something mood lightening on speech driven gaming:

<- Audio not work safe.

I actually watched most of that playthrough live and it was pure gold with the misinterpretations and frustration.

I think you should look into SHRDLU. It is an old (1968) AI that can stack blocks, learn things about its world (such as blocks can't go on pyramids), and learn words. It has an official page here.

SAMULIKO: My blog

[twitter]samurliko[/twitter]

BitBucket

GitHub

Itch.io

YouTube

I don't know how refined system we are talking about. I assume you're going for something more than just NPC "listening" to you until it picks up the word "sword" when he prints the set text about how to get a sword. That level wouldn't be much of a conversation in my opinion.

Speech recognition and AI conversation are two big topics both of which you are able to pour as much time into as you can spare. It's good you can use the Windows library to give you a good head start and mainly focus on making the conversations somehow interesting.

Finally something mood lightening on speech driven gaming:

<- Audio not work safe.

I actually watched most of that playthrough live and it was pure gold with the misinterpretations and frustration.

I've seen a bunch of videos of this game. It's a shame the speech recognition is so bad (it's a PS2 game though, so things might have evolved).

You have to design the game while keeping in mind the technical limitations. Don't put options that coexist in the same screen that could lead to the player wanting to say two words that are remotely similar.

Advertisement

Others have alluded to this, but I'll reiterate that this ought not to be too difficult if you don't have really high expectations, but creating a program that can hold a reasonable conversation that's somewhat open-ended is a subject suitable for the life's work of a Ivy League Professor.

Realistically, though certainly non-trivially, it ought to feasible if:

  • NPC dialog is still driven largely by dialog trees, plus maybe some general pleasantries/banter/canned responses.
  • Speech parsing works similarly to keyboard-driven text adventures in the vein of Zork -- understands basic grammatical forms, cares about important words, ignores the chaff.
  • Speech-to-text is handled by a third-party library, preferably one which can be trained to your grammar.

There will still be design challenges to overcome -- for one, in typical dialog tree systems today, the player drives the conversation by selecting statements and questions from a list, to which the NPC responds, and which may open new paths on the dialog tree. The player knows the conversation is finished when there are no more interesting options left to choose. If your system is driven by speech recognition, how does the player know when the NPC has no more information to give? Do you retail the selection-driven UI? Is speech recognition then just another input option? Just something to think about.

throw table_exception("(? ???)? ? ???");

I looked at this like 10 years ago and played with the Microsodft Speech SDK enough to see some possibilities.

The limitations was that you probably want a small vocabulary and very simple sentence trees - like : actionx , toolx, actionx - toolx, actionx - targetx, toolx - actionx - targetx. Similar systems should have some training modes where the specific user is given words to say repeatedly (like 5 times) to tune the recognition.

I actually had another idea using the standard (maybe game expanded) vocabulary to take voice talking input and convert it to text to have the older 'talk bubble over the head' like Ultima Online had (greatly optimizing the trasmittal of the talk data to/from masses of players)). Recent games like LOTRO had nice 'party' talking features, but was then the limited chat box talking for casual speechifying (and in UO a great deal of interest was seeing what many other people were all talking about). I did a test once with the Speech SDK and watching a TV show to get random dialogue I repeated as input - at that time about 1/4 -1/3 of the widely varying input data was mistranslated (again that was 10 years ago and a freebie SDK)

--------------------------------------------[size="1"]Ratings are Opinion, not Fact

Here's an idea, have the conversations with your NPC-party that is traveling with you, a player can discover info and switch equipment while traveling.

good to entertain players and more forgiving if sentences are misunderstood :)

Thank you all for replying.

The STT will be limited to conversation only. My plan is to have conventional dialog options like in most games but give the player a "textbox" to type in simple sentences. This textbox can alternatively be filled in with the STT output. Once the player gives the command to "say" the sentence (or a specific time passes without input), the entire string will be sent to the parser/thesaurus function to create a string of key words that will then be interpreted by the NPC's conversation function and decide on the appropriate response.

As far as the NPC-party conversations while traveling-- That's going to be a big part of the game. Travel times will be on the magnitude of minutes to hours.

I just realized some of you may not know what type of game I'm working on. It's a space simulation RPG. It will have no "loading" screens and each star system will be generated on-the-fly while traveling to it. The RPG aspect will be first person shooter-like.

With that out of the way..... Certain NPCs will have hidden Easter Eggs that you can discover by listening to their conversations with other NPCs/your character and then asking them about it. These Easter Eggs won't be part of the normal conversation menu, so it's not like you can just click every option to get what you want.

The interface will also allow the player to still move around and interact with objects while still in conversation with a NPC-- A hot-key toggle will change from "I want to talk" and "I want to move". This will allow the player to effectively talk and chew gum at (nearly) the same time.

About "mistranslation":

I understand that even the best STTs out there have trouble sometimes getting what you say correctly. That's why I want to show the text in the textbox as you say it. This way you can see if there's an error and clear it so you can start over, perhaps saying it differently. I have played with Windows Speech Recognition (Win7) and it seems to be fine as long as I don't talk too fast. It took a while for it to "learn" my vocal patterns, but once it did, it seemed to work just fine. I want to use this in my game, but I don't know how to access it in my program..... sad.png

This topic is closed to new replies.

Advertisement