Advertisement

a program that reads

Started by April 10, 2002 09:08 PM
14 comments, last by redneckCoder 22 years, 7 months ago
Hey, I know it would be hard to do this, I but need to know how hard it would be. As part of a program I''m designing right now, I need to code some kind of algorithm that will read a word in, and then ''say'' the word back. Example, suppose I have an appointment reminder and the user''s name is Jack. I''ll store the user''s name as part of a config file or something and then when it comes time to remind Jack of the appointment, it reads in the name from the config file and then through the speakers says, "Jack, you have an appointment at 1:30." How would go about making the application ''say'' the user''s name? Thanks. -AJ C:\DOS C:\DOS\RUN RUN\DOS\RUN -Comic Book Store Guy''s t-shirt that I saw on the Simpsons, although it didn''t actually come from the Simpsons. http://vdsoft.netfirms.com/home.html
C:DOSC:DOSRUNRUNDOSRUN-Comic Book Store Guy's t-shirt that I saw on the Simpsons, although it didn't actually come from the Simpsons.http://vdsoft.netfirms.com/home.html
I don''t think it would be a simple algorithm, I''ll tell you that. It would require you to parse the word (in this case jack) into phonetics. Then you could try and play back the phonetic sounds (pre-recorded) in the right sequence and blend them together with another algorithm to make it sound somewhat human. It would be really tough. I think you might be better off having the user record his name once, and then just playback the recording of the full name. I really don''t think it would be worth just so the program could "say" his name.

However, there do exist programs out there that read your e-mail to you or convert text to sound. I believe they probably use a database of prerecorded words. In the case of names they may have some common ones and then of course when they don''t recognize a word they could spell it out loud.
Advertisement
Starting that kind of thing from scratch sems pretty daunting. However, you could look into using Microsoft Agent. I haven''t used it myself, but it seems like it fits the bill.
quote: Original post by Anonymous Poster
I think you might be better off having the user record his name once, and then just playback the recording of the full name.


That was my first thought, but then I thought, not everyone has a mic and also it wouldn''t sound flowing enough to create the feeling that program is actually talking to you. I might have to settle though for the sake of simplicity. Thanks for the suggesstions.

quote: Original post by TerranFury
Starting that kind of thing from scratch sems pretty daunting. However, you could look into using Microsoft Agent. I
haven''t used it myself, but it seems like it fits the bill.


Thanks, I''ll look into it.

-AJ

C:\DOS
C:\DOS\RUN
RUN\DOS\RUN

-Comic Book Store Guy''s t-shirt that I saw on the Simpsons, although it didn''t actually come from the Simpsons.

http://vdsoft.netfirms.com/home.html
C:DOSC:DOSRUNRUNDOSRUN-Comic Book Store Guy's t-shirt that I saw on the Simpsons, although it didn't actually come from the Simpsons.http://vdsoft.netfirms.com/home.html
Head over to google and dig, you''ll find some code eventually. There was an article on this topic in DDJ 5 or 6 years ago. There''s more to it then just phenomes, but they play a large part.
"I thought what I'd do was, I'd pretend I was one of those deaf-mutes." - the Laughing Man
Try creating a program to convert it to the sounds that the dictionary give for pronunciation... For example, "aw" is normally pronounced one way, so by default you would play that sound, but in some words, specifically when it is followed by an a, h, o, r, or e and a vowel, it is pronounced differently. Find all of these special circumstances, and create your program...

Advertisement
Thanks for the suggesstions guys, I''ll keep them in mind. However, I was think something along these lines: The program reads in the word, then analyses the first and second letters, deciding what sound to play for the first letter based on what the second letter is and so on for the rest of the letters, but only in special circumstances like Puzzler183 said. But I would would only have to code for the special circumstances, not every letter of the alphabet. So let''s say for exampple I have the word ''right''. Here''s some pseudo-code of what would happen:

Read in word;
First letter = r, second letter = i;
third letter = g, fourth letter = h;
fifth letter = t;
Play r sound;
Play long i sound;
Play t sound;
//There is no play sound for g or h because g and h together make no sound.

Now here''s another scenario, the word ''write'':

Read in word;
First letter = w, second letter = r;
Third letter = i, fourth letter = t;
Fifth letter = e;
//''w'' when followed by r makes no sound so skip right to the r sound, the e on the end makes the i long, t sound is normal, and e sound is omitted because all it does is make the i sound long.
Play r sound;
Play long i sound;
Play t sound;

For simple words it looks like it should work pretty well, of course it will need to be refined for more complex words. But it''s a start. Thanks again for the help guys.

-AJ


C:\DOS
C:\DOS\RUN
RUN\DOS\RUN

-Comic Book Store Guy''s t-shirt that I saw on the Simpsons, although it didn''t actually come from the Simpsons.

http://vdsoft.netfirms.com/home.html
C:DOSC:DOSRUNRUNDOSRUN-Comic Book Store Guy's t-shirt that I saw on the Simpsons, although it didn't actually come from the Simpsons.http://vdsoft.netfirms.com/home.html
You do still have to analyze special cases longer than one letter. Start with 26 wav files (one for each letter in it''s simplest form). Later look for combos like ph, mb, and th and tell your program to play those files. Finally, implement special cases such as a*e playing as long a and then * (where * is a character).
If its simply the functunality that you want (ie. its not a school project on speech synthesis) then you could just download
Microsoft Speech SDK.
If you need to code your own, I would break down the process into two distinct steps.

- The first converts a text string into phonetic tokens
- The second combines these tokens to create a sound

This topic is closed to new replies.

Advertisement