voice recognition and text to speach
Ok, here goes and its probably a long shot. Im designing a game to be playable over the internet and here''s the problem that we may have all come across at one time or the other.
First its annoying to have to type in everything you say, especially in the heat of battle when taking time to type something in could mean being killed by another player.
Second an alternative to typing is using some kind of voice technology like Roger Wilco. But the problem with this is when you have younger players or people with (sorry but no other way to say it) geeky or high voices. Obviosly this takes away from the seriousness and realism of the game if your player is some kind of marine grunt.
So while I was brainstorming I thought of possible using voice recognition technology to overcome this. Basically someone would say something, then that would be converted to text, and then it would be outputted in a new voice by the computer. Now I know converting voice into text can be done as I''ve seen it in may programs. I also know text to voice is possible, from applications like simple text for the macintosh. However it is outputted as a very monotone voice. If the ability to output the voice as something other than monotone cant be overcome I can still work around it.
So I have three basic questions.
First: are there any SDKs out there or can I use a combination of SDKs to help me with this. If there are it would also be nice if there were any that supported the ability to add accents, yelling, lower or higher tones, male or female, etc.
Second: If there are no SDKs what kind of books are there out there for this. I would prefer SDKs however.
Three: If it is too difficult to do this how hard would it be and what is involved with converting someones voice to sounding like something else (grunt, more manly, etc) and do this effectively.
P.S. If it makes any difference I am programming in C++
Current Projects -
+Tactical Assault: A Half- Life Mod: tactical-assault.
tripod.com
Current Projects - +Tactical Assault: A Half- Life Mod: tactical-assault. tripod.com
quote: Original post by BobInTown
So while I was brainstorming I thought of possible using voice recognition technology to overcome this. Basically someone would say something, then that would be converted to text, and then it would be outputted in a new voice by the computer. Now I know converting voice into text can be done as I''ve seen it in may programs. I also know text to voice is possible, from applications like simple text for the macintosh. However it is outputted as a very monotone voice. If the ability to output the voice as something other than monotone cant be overcome I can still work around it.
So I have three basic questions.
First: are there any SDKs out there or can I use a combination of SDKs to help me with this. If there are it would also be nice if there were any that supported the ability to add accents, yelling, lower or higher tones, male or female, etc.
In REBEL MOON REVOLUTION (it was never published) I used the IBM Via Voice SDK to recognize spoken commands from the human player and then built my own context associater to try understand what the player wanted, and then directed the NPCs to perform the tasks generated by the player''s commands.
It worked pretty good in testing and I thought that it might be workable in actual game play provided 1) the player took the time to train the voice recognition software to his/her voice and 2) did not get excited while speaking commands (in the middle of a firefight in first-person that is hard to do) and 3) the player had a powerfull enough computer system to drive the voc rec.
quote:
Second: If there are no SDKs what kind of books are there out there for this. I would prefer SDKs however.
Try a web search on "voice recognition software" and see what you come up with.
quote:
Three: If it is too difficult to do this how hard would it be and what is involved with converting someones voice to sounding like something else (grunt, more manly, etc) and do this effectively.
P.S. If it makes any difference I am programming in C++
Current Projects -
+Tactical Assault: A Half- Life Mod: tactical-assault.
tripod.com
I also used the text to voice capabilities. It is hard to make the spoken words sound like anything but a computer voice. Of course this was in 1998 and things may have improved (but not likely to have improved by too much).
Good luck,
Eric
When you want a good text-to-speech, you will have to write your own routines I''m afraid... Anyway, case is, I never managed to find a single program that could do this even a *little* bit okay...
Maybe you could sample the average pitch/volume from the recorded voice as well, say every 1/4 of a second, and transmit that with the text. Then you could use the data to add life to the synthesised voice.
You may want to look at some of the tools and documentation associated with Microsoft Agent. Their newest Speach to text engine works very well (and is freely distributed). They also have a text to speech engine, but the sound quality is poor; however in the documentation they refer to a technique for using sound files in place of the engine (I believe a recording of each on the phonic sounds). I believe that these files are then used by the text to speach engine, so your work would be reduced. However, if that last statement isn''t true, you could look into coding a tool that plays the phonetic sound files in order for you.
Well, thanks for all the help everyone, if anyone has anymore tips for me, please continue to post. Also, does anyone know if there are any programming sites out there just devoted to text to speach and speach to text? Thanks
Current Projects -
+Tactical Assault: A Half- Life Mod: tactical-assault.
tripod.com
Current Projects -
+Tactical Assault: A Half- Life Mod: tactical-assault.
tripod.com
Current Projects - +Tactical Assault: A Half- Life Mod: tactical-assault. tripod.com
April 18, 2001 08:01 PM
In my opinion it would make more sense to alter the voice directly and in this way avoid the speech->text->speech conversions. This could be done through creating a speech-profile for every player/creature, in which the amplitudes of mainfrequency-components of the voice are kept. then you could convert a voice in another voice by compensating the differences in the profiles through a filter/EQ . Just an idea, but i think it''s worth mentioning
Greetings,
-Thies Heidecke
Greetings,
-Thies Heidecke
I can''t help but notice that no one mentions Microsoft DirectX.
Yes, they have a Speech SDK... and I quote :
And it''s here
I don''t know anything about the damn thing, except that it exist... knowing that most people use Visual C++ to program Half Life MODs, I think this would fit, wouldnt it ?
youpla :-P
Sancte Isidore ora pro nobis !
Yes, they have a Speech SDK... and I quote :
quote:
The Microsoft® Speech SDK 5.0 includes the redesigned Win32 Speech API, which includes expanded COM support for both speech recognition and text-to-speech, improved audio handling, and a new context-free grammar namespace that supports XML. It also includes a new grammar compiler, tutorial, sample applications, documentation, null sample engines, and improved Microsoft continuous speech recognition and text-to-speech. The text-to-speech engines are available in US English and Simplified Chinese. The speech recognition engines are available in US English, Simplified Chinese, and Japanese.
And it''s here
I don''t know anything about the damn thing, except that it exist... knowing that most people use Visual C++ to program Half Life MODs, I think this would fit, wouldnt it ?
youpla :-P
Sancte Isidore ora pro nobis !
-----------------------------Sancte Isidore ora pro nobis !
Don''t do speech to text then text to speech. All you need are the phonemes. The phonetic elements of the spoken sentence. You recognize what is being said as a collection of sounds, like f, th, sh, short a, long a, oo, whatever. You pass that along to the other side and reconstruct the sound in whatever synthesized voice you want. This has several advantages. 1) it''s faster than trying to get english text out of spoken words. 2) it works for any language, as long as the computer can recogize the correct phonemem. 3) minimal data being passed across the network.
If someone is hard of hearing, their client can convert the phonemes into text.
p
If someone is hard of hearing, their client can convert the phonemes into text.
p
p
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement