Wav file frequency/intensity info
How do you convert the data in a wav file into frequency/intensity information?
------------------------------"My sword is like a menacing cloud, but instead of rain, blood will pour in its path." - Sehabeddin, Turkish Military Commander 1438.
Hehe, with a Fast Fourier Transform and/or a Power Spectrum Analysis function.
(Unless you have a mathematics and/or electrical engineering degree, don''t write your own, go find one to use)
(Unless you have a mathematics and/or electrical engineering degree, don''t write your own, go find one to use)
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
All I want to be able to do is what winamp/media player 7 does with the frequency bars and the wave form line.
------------------------------"My sword is like a menacing cloud, but instead of rain, blood will pour in its path." - Sehabeddin, Turkish Military Commander 1438.
The use some form of a power density function, likely an audio specific derivation. Its seems to group certain frequency bands (the x axis) together and display the power logrithmically (the y axis).
The topic is called 'digital signal processing' in engineering applications, it may have a different name in acoustics...
They probably use some cobbled APS estimating function.
...
Knowing how to use these functions, and knowing how to write these functions are two very different things. You really need to go find one... I wrote my own FFT, and I still don't know exactly how they do it in WinAMP. I could never get mine to look as "pretty". I don't know how they twiddled the FFT, or if they just used something simpler that I don't know about...
Edited by - Magmai Kai Holmlor on December 26, 2000 2:08:15 AM
The topic is called 'digital signal processing' in engineering applications, it may have a different name in acoustics...
They probably use some cobbled APS estimating function.
...
Knowing how to use these functions, and knowing how to write these functions are two very different things. You really need to go find one... I wrote my own FFT, and I still don't know exactly how they do it in WinAMP. I could never get mine to look as "pretty". I don't know how they twiddled the FFT, or if they just used something simpler that I don't know about...
Edited by - Magmai Kai Holmlor on December 26, 2000 2:08:15 AM
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
I don't know exactly what winamp did, but I've spent a lot of time with audio signal processing, so I can take some educated guesses:
- First of all, winamp is definitely using an FFT to get the spectragram you see. No two ways about it.
- A lot of mistakes beginners make when trying to do spectrograms is that they just take an FFT of N samples of data. This is the wrong way to do it. You must window the data first, or it'll look like crap. The Hann and Hamming windows are easy to implement and use (and it's "Hann", not "Hanning". Common mistake in even textbooks).
- When windowing, you should usually use some overlap. If it's for a real-time display, I'm not sure that's necessary.
- If you want 'N' freqency bins, you have to do your window/FFT on at least 'N' samples.
- It looks like winamp throws away a lot of the higher frequencies. It's not showing you all 22.05kHz of visible spectrum that exists in a CD-quality recording (bonus cookie to whoever knows why you can only see 22.05kHz frequency if the sampling rate is 44.1kHz.) If you get some accoustically simple music playing from winamp, you'll notice that about only the first 1/4 of the spectrum is where "musical" notes are, the rest consists of the upper frequency harmonics. If you have a tune with just a singer, you'll notice the right side responds to any "t" or "s" sounds in the voice, but the open-throated notes are all on the left.
- Winamp's FFT display has falloff, even if you set the falloff to zero. If you have no falloff, your spectrogram looks very jumpy (an in fact it usually is--the human ear has audial persistence built-in). So throwing in some maximum rate at which a specral line can fall will make it look prettier.
- I haven't done the math, but I'm sure they can sample a full FFT frame many more times than they're displaying it. They might take the maximum value for each spectral line per frame, then display that value when it comes time for a screen update. This would also make the display less jumpy and you wouldn't miss any quick transients.
As a final note, if this discussion interests anybody, I heartily suggest you look into it. Digital Signal Processing is the second-fastest and second-highest-paying electrical engineering field (device physics is the highest--if you like quantum physics, go for it), and it's just a boatload of fun. There are numerous applications for DSP (video & audio), and you'll be in high demand if you're a proficient programmer who specializes in DSP, or even the other way around.
I could talk for hours about this subject, but I'll stop now.
Edited by - Stoffel on December 26, 2000 3:04:31 PM
- First of all, winamp is definitely using an FFT to get the spectragram you see. No two ways about it.
- A lot of mistakes beginners make when trying to do spectrograms is that they just take an FFT of N samples of data. This is the wrong way to do it. You must window the data first, or it'll look like crap. The Hann and Hamming windows are easy to implement and use (and it's "Hann", not "Hanning". Common mistake in even textbooks).
- When windowing, you should usually use some overlap. If it's for a real-time display, I'm not sure that's necessary.
- If you want 'N' freqency bins, you have to do your window/FFT on at least 'N' samples.
- It looks like winamp throws away a lot of the higher frequencies. It's not showing you all 22.05kHz of visible spectrum that exists in a CD-quality recording (bonus cookie to whoever knows why you can only see 22.05kHz frequency if the sampling rate is 44.1kHz.) If you get some accoustically simple music playing from winamp, you'll notice that about only the first 1/4 of the spectrum is where "musical" notes are, the rest consists of the upper frequency harmonics. If you have a tune with just a singer, you'll notice the right side responds to any "t" or "s" sounds in the voice, but the open-throated notes are all on the left.
- Winamp's FFT display has falloff, even if you set the falloff to zero. If you have no falloff, your spectrogram looks very jumpy (an in fact it usually is--the human ear has audial persistence built-in). So throwing in some maximum rate at which a specral line can fall will make it look prettier.
- I haven't done the math, but I'm sure they can sample a full FFT frame many more times than they're displaying it. They might take the maximum value for each spectral line per frame, then display that value when it comes time for a screen update. This would also make the display less jumpy and you wouldn't miss any quick transients.
As a final note, if this discussion interests anybody, I heartily suggest you look into it. Digital Signal Processing is the second-fastest and second-highest-paying electrical engineering field (device physics is the highest--if you like quantum physics, go for it), and it's just a boatload of fun. There are numerous applications for DSP (video & audio), and you'll be in high demand if you're a proficient programmer who specializes in DSP, or even the other way around.
I could talk for hours about this subject, but I'll stop now.
Edited by - Stoffel on December 26, 2000 3:04:31 PM
quote:
- If you want ''N'' freqency bins, you have to do your window/FFT on [exactly 2N] samples.
quote:
(bonus cookie to whoever knows why you can only see 22.05kHz frequency if the sampling rate is 44.1kHz.)
The FFT transform produces an odd function (that a specific mathemarical term), kinda like a sine wave. It''s the same on both sides of the y axis. Exactly the same. Sooo you get the same data twice when you do an FFT, and you usually don''t display it twice. Intuitively, you can''t determine the frequency of one point, you need at least two. You can only decude so much out of the data you have, in particular you can''t determine a frequency without sampling at a rate that''s at least twice as fast as that frequency. CD audio is sampled at 44,100Hz, so the highest frequency you can determine is 22,050Hz. The FFT output spreads from -22,050Hz to 22,050Hz, but like the sine, it repeats, so its the same if you display from 0 to 44,100.
Now one of the ways mp3 gets the compression that they do, is by whacking off data in the higher frequency ranges (that you can barely hear) and add it back in when its needed (cymbol crash). So, if you look at mp3 playback on a spectrograph, it sharply drops off at 16KHz (for 128kbps), a litter higher for higher bit rates.
The WinAmp spectrum doesn''t show anything above about 16kHz.
Also, pure FFT output is in complex data points. You need to apply some power function to get the graph everyone used to seeing (even thought the raw FFT looks much neater when plotted in 3D). You usually take the absolute value of the complex point and divide by N².
You can''t FFT every frame multiple times and only use 3% of the cpu power to do it. An FFT is on the order of nlog(n) so the more points you have, the longer it takes (more than linear). It''s faster to do 32 1024 point FFTs than to do 1 32768 point FFT. You get the same power with both methods if you sum the 32 ''little'' ones (except the bucket size is larger for the 32).
The WinAmp FFT updates at up to 70Hz, so they must do a rolling sum with the FFT data.
And finally, they have very good spectral containment, I have not been able to duplicate thier speed and have a nice sycronized visual graph.
As you can see UR, this is not a trival task. Unless someone has an acoustic APS that works well, and has made it open source, and you find it.
...
P.S. You can calclate APSs without FFTs, by using other more optimized algorithms, like Hartley transforms (which do similar, but not exactly the same thing).
- The trade-off between price and quality does not exist in Japan. Rather, the idea that high quality brings on cost reduction is widely accepted.-- Tajima & Matsubara
humm this is turning out to be much more complex then I had first thought. So for understandings sake I would need to feed raw wav data into a FFT and then it would gimme what?
I thank you all for your help by the way
I thank you all for your help by the way
------------------------------"My sword is like a menacing cloud, but instead of rain, blood will pour in its path." - Sehabeddin, Turkish Military Commander 1438.
quote: Original post by Magmai Kai Holmlor
The FFT transform produces an odd function (that a specific mathemarical term), kinda like a sine wave. It's the same on both sides of the y axis. Exactly the same. Sooo you get the same data twice when you do an FFT, and you usually don't display it twice. Intuitively, you can't determine the frequency of one point, you need at least two. You can only decude so much out of the data you have, in particular you can't determine a frequency without sampling at a rate that's at least twice as fast as that frequency. CD audio is sampled at 44,100Hz, so the highest frequency you can determine is 22,050Hz. The FFT output spreads from -22,050Hz to 22,050Hz, but like the sine, it repeats, so its the same if you display from 0 to 44,100.
Close. It produces an even function (cosine is even, sine is odd), but that's the effect rather than the cause. The FFT calculates both positive and negative frequencies. In a real (no imaginary part) wave, the magnitude of the frequency of +f and -f is always the same (where f is some frequency).
This is the cause of your statement "you can't determine a frequency without sampling at a rate at least twice as fast", which is known as the Nyquist sampling theorem.
Edited by - Stoffel on December 27, 2000 2:27:31 PM
December 27, 2000 11:25 PM
OK I''ve done some researsh and I''ve discovered this web page: http://www.dspdimension.com/html/dftapied.html. At bottom of the page listing 1.4 is a FFT function that calculates sin and cos parts into an array.
How could this be converted into frequency/intensity information.? Am I missing something that is obvious??? Knowing me probably .
Yah and thanks for all your time and help guys too I apprecate it
How could this be converted into frequency/intensity information.? Am I missing something that is obvious??? Knowing me probably .
Yah and thanks for all your time and help guys too I apprecate it
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement