Hi botman,
the link http://www.isip.msstate.edu/projects/speech/ can''t work?
Any other link?
Thanks.
Regards,
Chua Wen Ching
How to get started with voice recognition?
What does FFT stands of?
Hmm.. i check out the fftw website, not sure and didnt see the keywords voice/speech?
Compare 50% hmm.. will that be accurate?
Thanks.
Regards,
Chua Wen Ching
Hmm.. i check out the fftw website, not sure and didnt see the keywords voice/speech?
Compare 50% hmm.. will that be accurate?
Thanks.
Regards,
Chua Wen Ching
"Very new to games I think"
FFT is a Fast Fourier Transform.
A Fourier Transform is used to convert a signal in to frequency components. Most of the stuff you''ll find on the net will probably deal with fourier transforms and images, but it applies to sound as well.
In the case of sound it can make it much easier to work with and understand. Within the realm of voice recognition it provides you with a nice way of generating sound "fingerprints".
Comparing 1 signal, byte by byte, to another will NOT work. If the user speaks a little slower, a little faster, a little softer, or a little louder, then the two signals will appear to have nothing in common. It is also quite possible that a cat meowing may, at the byte level, share 50% of the data that a human speaking their name would.
Good luck with your project. I''d love to know how you eventually solve this problem.
Will
A Fourier Transform is used to convert a signal in to frequency components. Most of the stuff you''ll find on the net will probably deal with fourier transforms and images, but it applies to sound as well.
In the case of sound it can make it much easier to work with and understand. Within the realm of voice recognition it provides you with a nice way of generating sound "fingerprints".
Comparing 1 signal, byte by byte, to another will NOT work. If the user speaks a little slower, a little faster, a little softer, or a little louder, then the two signals will appear to have nothing in common. It is also quite possible that a cat meowing may, at the byte level, share 50% of the data that a human speaking their name would.
Good luck with your project. I''d love to know how you eventually solve this problem.
Will
------------------http://www.nentari.com
quote: Original post by wenching
What does FFT stands of?
Hmm.. i check out the fftw website, not sure and didnt see the keywords voice/speech?
Compare 50% hmm.. will that be accurate?
"FFT" is short for "fast Fourier transform", which is (very) commonly used in digital signal processing. The 50% idea will not work- I think he was kidding.
Wenching, it seems you are not familiar with the basics behind sound and signal in general. It would be reasonable to read some easy introduction first before going into the complex task you want to solve :
Introduction to DSP
If you want to know more about FFT, read half of the chapter 1 of the next link. The tutorial is about wavelet transform but begins with an intro to FFT. It's really easy to get (the rest is way harder
The Wavelet tutorial by Rob Polikar
[edited by - bucheron on September 3, 2003 12:05:13 PM]
Introduction to DSP
If you want to know more about FFT, read half of the chapter 1 of the next link. The tutorial is about wavelet transform but begins with an intro to FFT. It's really easy to get (the rest is way harder
The Wavelet tutorial by Rob Polikar
[edited by - bucheron on September 3, 2003 12:05:13 PM]
Well i had no idea about sounds stuff.. but i willing to learn...
Hmm..
but FFT is a c library component..
and i am using .Net C#, hmm.. any managed and free fft?
Hehe! So easier for me to used!
Well, had you guys played with sounds or voice recognition? Hmm.. or not how come you get to know these stuffs?
Regards,
Chua Wen Ching
Hmm..
but FFT is a c library component..
and i am using .Net C#, hmm.. any managed and free fft?
Hehe! So easier for me to used!
Well, had you guys played with sounds or voice recognition? Hmm.. or not how come you get to know these stuffs?
Regards,
Chua Wen Ching
"Very new to games I think"
About C#, you should check on http://www.codeproject.com, there should be a FFT routine. Note that the FFTW library is designed for speed, this is not necessary in your case. It''s easy to convert a simple FFT from C/C++ to C#, converting the code available on Paul Bourke page shouldn''t be a problem, just a matter of replacing the pointers with references to an array or a list :
http://astronomy.swin.edu.au/~pbourke/analysis/dft/
ps. You should create a tool in C# that displays the spectrum just to ensure everything is working.
Not directly with voice but with sound, I tried some dsp stuff and sound generation (square/triangles waves and you apply filters on them). I''m more into image processing though.
http://astronomy.swin.edu.au/~pbourke/analysis/dft/
ps. You should create a tool in C# that displays the spectrum just to ensure everything is working.
quote: Well, had you guys played with sounds or voice recognition? Hmm.. or not how come you get to know these stuffs?
Not directly with voice but with sound, I tried some dsp stuff and sound generation (square/triangles waves and you apply filters on them). I''m more into image processing though.
The point of computing the FFT of a signal is typically to determine the frequency components of a given signal. However, for speech signals in particular, this doesn't uniquely identify the speaker. Computing the cepstrum on the other hand (compare cepstrum with spectrum ) does typically uniquely identify a speaker, since you can use it to determine the components of the vocalisation; i.e., the glottal excitation and the vocal tract emission. If you look up cepstral analysis you'll find that you can compute the cepstrum utilising just the FFT, the iFFT (inverse transform), log10 and summations. There is an immense amount of literature on the web that gives more information than you could ever need for processing audio signals using cepstral analysis. Try here for a detailed starting point:
http://mi.eng.cam.ac.uk/~ajr/SA95/SpeechAnalysis.html
If you have any questions about the maths I (and other) would be happy to help, so feel free to post them in this thread, rather than the maths forum. There's no need to start a new thread.
Good luck,
Timkin
[edited by - Timkin on September 3, 2003 9:35:34 PM]
http://mi.eng.cam.ac.uk/~ajr/SA95/SpeechAnalysis.html
If you have any questions about the maths I (and other) would be happy to help, so feel free to post them in this thread, rather than the maths forum. There's no need to start a new thread.
Good luck,
Timkin
[edited by - Timkin on September 3, 2003 9:35:34 PM]
Thanks..
Well, now the main problem is so many theories here and there.. thanks...
but is there any sample app that runs on PC, that i can see how voice recognition works..
Hmm.. at least i can see how it works.. i only can visualise it!
Hehe!
Regards,
Chua Wen Ching
Well, now the main problem is so many theories here and there.. thanks...
but is there any sample app that runs on PC, that i can see how voice recognition works..
Hmm.. at least i can see how it works.. i only can visualise it!
Hehe!
Regards,
Chua Wen Ching
"Very new to games I think"
quote: Original post by wenching
Well, now the main problem is so many theories here and there.. thanks...
I think your best next step, regardless of which method you use, is to collect audio samples of yourself and a number of unauthorized people (several samples from each person). Regardless of the technique you select, you will at very least need to test the system when it''s done.
More likely than not, you''ll also use at least some of this data in constructing the system. Once audio data is obtained, you can try the simplest solutions first and work your way up the ladder of complexity.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement