How to get started with voice recognition?

wenching · 2003-09-05T06:47:51

Hi there, I plan to write my very own voice recognition engine but i do not know how to start writing one. I know it requires biometric. Wow.. that sounds pretty advanced topic. Does voice recognition uses neural networks like training purposes? I think voice recognition need to identify sound waves... Any idea? Is there any sample app which i can refer to? At least the source code provided! Any help? Thanks. Regards, Chua Wen Ching

wenching

Author

122

September 03, 2003 09:58 AM

Hi botman,

the link http://www.isip.msstate.edu/projects/speech/ can''t work?

Any other link?

Thanks.

Regards,
Chua Wen Ching

"Very new to games I think"

wenching

Author

122

September 03, 2003 10:00 AM

What does FFT stands of?

Hmm.. i check out the fftw website, not sure and didnt see the keywords voice/speech?

Compare 50% hmm.. will that be accurate?

Thanks.

Regards,
Chua Wen Ching

"Very new to games I think"

RPGeezus

216

September 03, 2003 10:46 AM

FFT is a Fast Fourier Transform.

A Fourier Transform is used to convert a signal in to frequency components. Most of the stuff you''ll find on the net will probably deal with fourier transforms and images, but it applies to sound as well.

In the case of sound it can make it much easier to work with and understand. Within the realm of voice recognition it provides you with a nice way of generating sound "fingerprints".

Comparing 1 signal, byte by byte, to another will NOT work. If the user speaks a little slower, a little faster, a little softer, or a little louder, then the two signals will appear to have nothing in common. It is also quite possible that a cat meowing may, at the byte level, share 50% of the data that a human speaking their name would.

Good luck with your project. I''d love to know how you eventually solve this problem.

Will

------------------http://www.nentari.com

Predictor

198

September 03, 2003 10:47 AM

quote: Original post by wenching
What does FFT stands of?

Hmm.. i check out the fftw website, not sure and didnt see the keywords voice/speech?

Compare 50% hmm.. will that be accurate?

"FFT" is short for "fast Fourier transform", which is (very) commonly used in digital signal processing. The 50% idea will not work- I think he was kidding.

bucheron

122

September 03, 2003 11:03 AM

Wenching, it seems you are not familiar with the basics behind sound and signal in general. It would be reasonable to read some easy introduction first before going into the complex task you want to solve :

Introduction to DSP

If you want to know more about FFT, read half of the chapter 1 of the next link. The tutorial is about wavelet transform but begins with an intro to FFT. It's really easy to get (the rest is way harder

The Wavelet tutorial by Rob Polikar

[edited by - bucheron on September 3, 2003 12:05:13 PM]

wenching

Author

122

September 03, 2003 12:02 PM

Well i had no idea about sounds stuff.. but i willing to learn...

Hmm..

but FFT is a c library component..

and i am using .Net C#, hmm.. any managed and free fft?

Hehe! So easier for me to used!

Well, had you guys played with sounds or voice recognition? Hmm.. or not how come you get to know these stuffs?

Regards,
Chua Wen Ching

"Very new to games I think"

bucheron

122

September 03, 2003 02:10 PM

About C#, you should check on http://www.codeproject.com, there should be a FFT routine. Note that the FFTW library is designed for speed, this is not necessary in your case. It''s easy to convert a simple FFT from C/C++ to C#, converting the code available on Paul Bourke page shouldn''t be a problem, just a matter of replacing the pointers with references to an array or a list :

http://astronomy.swin.edu.au/~pbourke/analysis/dft/

ps. You should create a tool in C# that displays the spectrum just to ensure everything is working.

quote: Well, had you guys played with sounds or voice recognition? Hmm.. or not how come you get to know these stuffs?

Not directly with voice but with sound, I tried some dsp stuff and sound generation (square/triangles waves and you apply filters on them). I''m more into image processing though.

Timkin

864

September 03, 2003 08:34 PM

The point of computing the FFT of a signal is typically to determine the frequency components of a given signal. However, for speech signals in particular, this doesn't uniquely identify the speaker. Computing the cepstrum on the other hand (compare cepstrum with spectrum ) does typically uniquely identify a speaker, since you can use it to determine the components of the vocalisation; i.e., the glottal excitation and the vocal tract emission. If you look up cepstral analysis you'll find that you can compute the cepstrum utilising just the FFT, the iFFT (inverse transform), log10 and summations. There is an immense amount of literature on the web that gives more information than you could ever need for processing audio signals using cepstral analysis. Try here for a detailed starting point:

http://mi.eng.cam.ac.uk/~ajr/SA95/SpeechAnalysis.html

If you have any questions about the maths I (and other) would be happy to help, so feel free to post them in this thread, rather than the maths forum. There's no need to start a new thread.

Good luck,

Timkin

[edited by - Timkin on September 3, 2003 9:35:34 PM]

wenching

Author

122

September 03, 2003 11:47 PM

Thanks..

Well, now the main problem is so many theories here and there.. thanks...

but is there any sample app that runs on PC, that i can see how voice recognition works..

Hmm.. at least i can see how it works.. i only can visualise it!

Hehe!

Regards,
Chua Wen Ching

"Very new to games I think"

Predictor

198

September 04, 2003 05:28 AM

quote: Original post by wenching
Well, now the main problem is so many theories here and there.. thanks...

I think your best next step, regardless of which method you use, is to collect audio samples of yourself and a number of unauthorized people (several samples from each person). Regardless of the technique you select, you will at very least need to test the system when it''s done.

More likely than not, you''ll also use at least some of this data in constructing the system. Once audio data is obtained, you can try the simplest solutions first and work your way up the ladder of complexity.

How to get started with voice recognition?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

How to get started with voice recognition?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines