Advertisement

a scientist kneels to the arts

Started by September 07, 2002 02:31 PM
6 comments, last by walkingcarcass 22 years, 3 months ago
one thing that has always annoyed me about NPCs eg Half-Life's scientists, is that no effort is made to make them individuals. even if the models change, the voices never do. as a programmer i want to do better. it's my duty, dammit! rather than record many accents, i figured it would be possible to distort speech at runtime to introduce small variations. the obvious start is changing pitch or speed slightly, but i plan a speech engine with a simple framework that allows a lot of flexibility. the basic idea is that a map is created for the phonemes in a sample, each element is given attributes. at runtime, introduce random variations such as a teenager's voice breaking after most high "ee"s, a military commander who raises his voice on every word beginning with a short consonant, or a drunkard bum who sniffs or coughs roughly every 5-9 syllables. this seems ambitious (just a little) but the building blocks are actually quite primitive. what i want and need to know is wether a bland, neutral voice can be realistically squished with little more than pitch, speed and volume controls. ******** A Problem Worthy of Attack Proves It's Worth by Fighting Back [edited by - walkingcarcass on September 7, 2002 3:38:59 PM]
spraff.net: don't laugh, I'm still just starting...
It''s certainly ambitious and I don''t believe that it would be possible to get the voice to sound natural. For instance, when the military commander says certain syllables louder, it isn''t just a volume increase; the timbre of his voice would change as well (i.e. it would be more gravelly). There has been a lot of research into this type of thing, so I doubt it''s something you could pull off on your own.

Having said that, it does look like something interesting to try out, even if you don''t get totally realistic results.

www.bankie.com
Advertisement
I''m sure Neverwinter Nights does this - slightly pitching NPC''s so they sound different (Unless it''s several similar wav''s). Keep clicking the talk icon on some people to see what I mean.
It''s posible to record every posible sound and join them together to make words, and modulate. But your best bet is to just use whole phrases and then modulate pitch, and perhaps some other filtering for extra effect. That would be well beyond acceptable, as well as being easy to do.

Do not meddle in the affairs of moderators, for they are subtle and quick to anger. ANDREW RUSSELL STUDIOS
Cool Links :: [ GD | TG | MS | NeHe | PA | SA | M&S | TA ]
Got Clue? :: [ Start Here! | Google | MSDN | GameDev.net Reference | OGL v D3D | File Formats | Go FAQ yourself ]

I don''t have a link at the moment (sorry!) but there''s a couple standalone programs that approach speech synthesis in just this way. After all, there''s not that many phonemes that they couldn''t be recorded w/ a bunch of different emphasis for your idea. . . alot of trial and error, but the most natural-sounding TTS program I''ve heard (including Dragon NaturallySpeaking or whatever its called) works off this method, minus the modulation, and it was done by a hobbyist coder. I don''t think its all that out of reach to get it to an acceptable level for games . . . especially for "stock" characters like the scientests. I''ll try and find a link to the program, but no promises . . . I originally found it on google by searching for "text to speech phoneme" but I don''t see it right now.
If you see the Buddha on the road, Kill Him. -apocryphal
Try looking at Festival Speach Engine (google for it)
Advertisement
try ADAPTATIVE DELAY !

-> the delay time for each sample of your sound can be computed from THE VALUES of your samples (for example, (the average of last 64 samples) * (a magic value) or anything you want)

It''s dynamic (cause you use sound samples data) and it''s very easily tunable (the magic number).
What are the results ? it can be changes in the rythm, the articulation of the sound (longer or shorter silences...), or other things. It works great for speech but that needs a very long and hard tuning !

Good luck and welcome to the world od ADAPTATIVE DELAY !

PS : compressor is an adaptative GAIN FX, cause you use samples data to compute the gain to apply to samples...
interesting! will it work well if the "magic number" is randomly shifted slightly per NPC voice, or is it sensetive and highly sample-set dependant?

********


A Problem Worthy of Attack
Proves It''s Worth by Fighting Back
spraff.net: don't laugh, I'm still just starting...

This topic is closed to new replies.

Advertisement