Advertisement

Voice Recognition

Started by January 15, 2007 10:17 AM
6 comments, last by WeirdoFu 18 years, 1 month ago
When do you guys think Voice Recognition software will be at an acceptable level for use in games? I had an idea the other day for a game where you play as a cop, and you have to speak commands into your microphone for use in the game: "PUT YOUR HANDS IN THE AIR AND STEP OUT OF THE VEHICLE!" "DROP THE WEAPON!" "FREEEEEZE!" What are some other good uses of this technology for games?
I know there are a couple Air Traffic Control simulations out there which use speech recognition.

Xavius Software publishes the sim I'm most familiar with. The speech recognition functionality, however, was not part of the original product and was created by a user a couple years ago using Microsoft's free Speech API.

ATC Simulator I have not used. I know from glancing over their forums that some folks have acheived 90%-95% recognition rates with the SR. I believe it also employs the Microsoft SAPI though don't quote me on that.

I never had particularly good luck with the SR when I tried it with ATCC (Xavius' product). Part of the problem was that I never took time to "train" the SR properly. I know there are professional SR engines out there that do a lot of the training in a user friendly fashion but the one for ATCC involved a lot of file editing.

I think that the training of the SR to recognize a unique voice is the biggest hurdle in terms of getting the functionality into games. It (most likely) won't work that well out of the box and it can take a significant amount of time to train the software to an acceptable level of accuracy, whatever that might be. It will also never be 100%. Part of the reason the SR is successful for both the ATC sim products I mentioned is that it was presented to an already established user base with a demonstrated commitment to the product. They were willing to invest the time in order to give themselves the experience they wanted. Not everyone is willing or has time to do that. An ATC sim exists very much in a niche market place and a lot of the folks using it consider it a hobby rather than simply a game.

A similar issue involves investment in a microphone. A higher quality microphone, particularly one with an on/off switch, makes a considerable difference in effectiveness. Your users may have to invest in this as well. Again, a matter of commitment as with the training.

One final obstacle is that in times of stress the voice changes and people have a tendency to speak faster as well. That means reduced accuracy for the SR exactly when the user needs it most. This was another problem I'd have with the Air Traffic Control stuff. If you listen to the actual controllers they sound pretty laconic all of the time, even when they're getting slammed. Me? Not so much.

I have seen SR used to great effect in situations (dictation for example) where folks have taken time to train their software and are unlikely to raise their voice or change their tone when things don't quite work.

My, I have rambled on a bit. Sorry about that.

One last thing. The Dragon Naturally Speaking engine is the one I have seen in use. It is quite effective. I know they provide an SDK but I have no idea what sort of costs might be involved. I am similarly ignorant of whatever freeware/shareware there might be out there other than the aforementioned Microsoft SAPI which I believe is a free download somewhere.

Advertisement
I always wondered how difficult this would be to implement in a game.

Euclidean Crisis uses voice recognition software for a number of functions (Special Attack, Activate Special, Detonate, Set Group Alpha, Find Group Alpha, Set Formation Line, and a few others), but that's still a very short list of commands to work with.

In a police game, I need a large number of commands. For a single felony traffic stop, I would need dozens, and they're all relatively complex. Also, there are a lot of commands that have to be made very quickly - Again, I don't know how well voice recognition works. I assume a game would simplify this, but even the old text-input Police Quest games were marred by a too-small dictionary, and that didn't require the delicate work of voice recognition.

However, there is an Army training game that uses a great deal of fairly advanced voice recognition: "Sergeant! Take two squads and sweep the street." I have no idea how well it works, but I assume it's fairly effective. The guys do speak very clearly and a little slowly, though.

I think there's a lot of room to use voice recognition as an additional form of input as done in Euclidean Crisis. That technology is certainly effective and not nearly as difficult as implementing an entire library of possible commands or conversations.

To be able to use a mouse, keyboard, and voice to select and control your RTS units? That could be awesome; in fact, that's the use I would most like to see come out of voice recognition.
gsgraham.comSo, no, zebras are not causing hurricanes.
I think what I've actually been really impressed by is the "obeymoto" software on my cell phone. It doesn't require a trainer, and it understands when you say the names you've personally typed into your phone. For example, I have an entry "Wings2Go" which is one solid word. When I say, at a natural pace and in my natural speaking voice, "Call Wings To Go" it pulls up that specific entry. That astounds me, and makes me think that before 2010 voice recognition will be awesome in games.
Well sometimes I have problems recognising what people are saying to me and I speak perfect English(sort of). Othertimes I have to work out what they've said not because I'm deaf, but because I dont listen or blank them out through boredom. Yet still I can sort of work out what they've said.

I dont know exactly how the speech recognition software's work but if they work just by recognizing the words and how they are said then we will not get very far with it. If they are to even get a bit closer to use they need to recognize the context of what is being said and calculate probabilities, and possibilities of what could've been said. And all other manner of heavy computations and checking to make as certain as it can be.

Also If you speak with a heavy scottish accent or some other, then likely voice recognition will never work for you(although they should be thankful that anyone can understand them).

As for possible uses I can think of one, swearing at your computer when it goes wrong, "I do not understand, would you repeat that".
The Microsoft Speech API (and, presumably, others as well) allow you to define a grammar, which minimizes problems with accents, etc.

We use this for voice controlled autopilot functions in our flight sim "AUTOPILOT SPEED HOLD", "AUTOPILOT ALTITUDE HOLD", etc.... the SR engine has a very finite set of possible inputs defined by a grammar, and it looks for those.
Advertisement
Some company recently (well... within the last four years or so) had to discontinue their use of voice recognition software because it had such a difficult time understanding cajun drawl, which seemed to make up a large percentage of their calls.

Still, for now, it's probably easiest and most effective when you have a system like smitty mentioned.

Quote:
As for possible uses I can think of one, swearing at your computer when it goes wrong, "I do not understand, would you repeat that".
Pfft, clearly the computer should swear back [grin].
gsgraham.comSo, no, zebras are not causing hurricanes.
Probably shouldn't forget the game Lifeline released in 2004 for PS2.

http://www.metacritic.com/games/platforms/ps2/lifeline

It's definitely related to what you want to do, sort of. It wasn't perfect, but it worked better than many expected.

This topic is closed to new replies.

Advertisement