Advertisement

fourier transforms

Started by January 27, 2003 12:58 PM
20 comments, last by walkingcarcass 22 years ago
quote:
Original post by Anonymous Poster
Very important is the usage of windows (hamming,hanning(!) etc) since it''s the only way to make a useful thing out of the Fourier Transform.


"There''s no such thing as a Hanning window. The man''s name is Hann. It''s the Hann window. Would you like it if people started putting ''ing'' at the end of your name?"
- paraphrased from fred harris, of the Blackmun-harris window & the deinitive IEEE paper on windows (and yes, that''s little-f, little-h.)

Otherwise, nice post. =)


Think Liberally..
quote:
Original post by walkingcarcass
oh dear my brain still lacks the eureka monent in this...


Don''t be too hard on yourself--it''s tough to get the first time.

quote:

let''s say i have a buffer of A amplitude samples, with a rate of B samples per second, and the buffer represents A/B seconds of sound

at any point buffer[X], how much of the signal is made up of a C hertz sine wave?



It''s not useful to look at the frequency spectrum on a sample-by-sample basis. If you wanted to look at just one sample like you say, the only sinusoid you can use is the 0Hz sinusoid (DC value), which would be proportional to the sample itself. Pretty useless.

What you really do (and what the AP was getting at) is divide your buffer into frames . Let''s say you have 10240 samples. What you could do is split that buffer into 1024-sample frames. Perform the FFT on each frame, and you will get the frequency spectrum for that frame. In other words, you would have 10 sets of spectra, with each set being the sectrum of the signal in the given frame.

Changing the size of the frame has two effects:
- if a frame is shorter, you have less frequency resolution but can track a changing signal better
- if a frame is longer, you have greater frequency resolution but can''t track a changing signal as well

So there''s your key tradeoff. What you usually do is figure out a frame size based on either your minimum frequency resolution needed or how fast your signal''s changing that you want to track. That sets your frame size. Also, you window (already covered) and usually overlap your frames, so the first frame for our example of 1024 frame size would start at sample 0, the second at sample 512, etc. This is called 50% overlap (should be easy to see why). This can give the illusion of more data points, but you have to be careful about fooling yourself. One of the H windows (Hamming or Hann, forget which) will give you a perfect unity envelope with 50% overlap.

If it''s still not clear, you should try to work on an example of a 4-point frame, and explain what each point of the FFT (which would give you 4 data points as well) actually means.

Think Liberally..
Advertisement
quote:
Original post by walkingcarcass at any point buffer[X], how much of the signal is made up of a C hertz sine wave?


there''s your problem. you can''t determine it _AT_A_POINT_. a point does not have a sinewave in it. a timeregion does. a wave/part of a wave does. you determine how a timeslice is built up by frequencies/sinewaves.

over that timeslice, those frequencies have a constant factor, sin(t), sine(2t), sine(4t), etc.. they all have a constant factor, summed together they build up exactly the wave segment you wanted.

so for a given point at t on your whole soundbuffer, determine in what timeslice it is (what chunk of say 1024 samples), transform that chunk with fft, and look up wich frequency you want. and any other point at t2 in the same timeslice/chunk has the same frequency factors.

"take a look around" - limp bizkit
www.google.com
If that's not the help you're after then you're going to have to explain the problem better than what you have. - joanusdmentia

My Page davepermen.net | My Music on Bandcamp and on Soundcloud


>at any point buffer[X], how much of the signal is made up of a >C hertz sine wave?

Restating what Davepermen said: a sine wave extends over all time [Now, we''re talking in a mathematical way - no windowing here]. Thus if you say you have a C Hz sine wave at a time X you will have it for all times <X and >X. It is no meaning to assign a frequency to a specific point(or time).
I you have use frames/windows as mentioned before, then you can say things like: In frame 3 (meaning _all__over_ frame 3)there are 10% of the energy in the frequence band 0-20Hz. Suddenly, by appying a moving window you have started to localize the sinus wave! like magic!

And this has profound implications. I physics we talk a lot about Heisenbergs incertainty relation (sorry if my Scandinavian spelling hurts ) Basically the better you know the position of an electron the less certain you are about it''s speed.
In the signal processing game the relation goes: the shorter window you use (meaning more precise positioning in time) the less certain you''re about the frequency.

I won''t go deeper into this, but to help motivate you to further insights: There are multidimensional fourier transforms, where one talks about spatial frequencies. This is used a lot in digital image compression, where things like DCT (discrete cosinus transform) are very much similar to DFT. You can apply Fourier methods everywhere! [grin]

And it''s quite a good thing to calculate a 4-point FFT. And then a 8... ... and also 1024-point.
But there is a real strange thing about DFT in frequence domain, frequences are "folded" and you really need a book on that if you only have seen ordinary FT before.


/Ola


thanks for the technical pointers, but still noone''s said how to, in practice, determine the frequency composition for a given part of the sound data. a simple code example, please, anything to make me slap my head and say "d''oh!"

********


A Problem Worthy of Attack
Proves It''s Worth by Fighting Back
spraff.net: don't laugh, I'm still just starting...
quote:
Original post by walkingcarcass
thanks for the technical pointers, but still noone's said how to, in practice, determine the frequency composition for a given part of the sound data. a simple code example, please, anything to make me slap my head and say "d'oh!"



OK, I'll try.

You take a part of sound data and multiply each sample with the window function. Then you apply the Fourier transform to the result and get the frequency distribution at the point where the maximum of window function was located.

                       part to analyzeSound ~~~~~~~~~~~~~~~~|~~~~~~~~~~~~~~~~|~~~~~~~~                            ______      Window function |____/      \____| multiply those together and apply Fourier transfor to the result.       the result      |--~~/\/\/\/\~~--|   


Actually you can apply Fourier transform directly to the some part of sound data and you will get some result, but result will be bad because of discontinuities at the beginning and end of data. This is why you need window function.

Some pseudocode:


      cont int length;double windowFunction[length];Frequencies getFrequencies(Sample[] sound, int at){  Sample buffer[length];  for(int i = 0; i < length; i++)    buffer[i] = sound[at - length / 2 + i] * windowWindowFunction[i]; // MOST IMPORTANT LINE  return doFourierTransform(buffer);}      



[edited by - Advanced Bug on January 30, 2003 5:31:32 AM]

[edited by - Advanced Bug on January 30, 2003 5:39:28 AM]
Advertisement
thanks, but what data is in windowWindowFunction, what operations does doFourierTransform do, what kind of structure is Frequencies, and why is the length divided by 2?

********


A Problem Worthy of Attack
Proves It''s Worth by Fighting Back
spraff.net: don't laugh, I'm still just starting...
I don't know about the FFT or any of the advanced math, but the code is easy: the window function appears to hold a single cosine wave from -pi to pi, stretched over length. The 'at - length / 2 + i' centers the samples taken around the parameter at (it takes length/2 samples before and length/2 samples after).

To fill windowFunction:

for(i = 0; i < length; ++i)
windowFunction[i+0] = MaxValue*((cos((i/length*2-1)*3.14159)+1)*2)

the +0 is to avoid it being detected as the board-code for italic

[edited by - Extrarius on January 30, 2003 8:55:00 AM]
"Walk not the trodden path, for it has borne it's burden." -John, Flying Monk
quote:

but what data is in windowWindowFunction


It depends on what window function you choose. A Hann window is a simple window formula.

    const int length = 512;double windowFunction[length];for(int n = 0; n < length; ++n)    windowFunction[n] = 0.5*(1-cos(2*pi*n/length))    

quote:

what kind of structure is Frequencies


You should no look at the code as something you just plug into the compiler as it is. Frequencies in this case would be an array of the same length as the sample buffer. Each entry in the array represents the amplitude of a frequency (the fourier transform actually results in complex values, but it's the absolute value of these complex values you're generally interested in).
quote:

why is the length divided by 2


I assume you mean '/2' in 'sound[at - length / 2 + i]'

That is because the function getFrequencies takes a parameter 'at', which is a point in the sample buffer where you want to extract frequencies, and you pick a window of size 'length' around this point. '/2' is just to get half the window size on each side of 'at'. Don't worry about that.
quote:

what operations does doFourierTransform do


Now this is where all magic happens. The discrete fourier transform is defined as
X(k) = sum from n=0 to n=length-1, x(n)*exp(-j*2*pi*k*n/length) 

Where x(n) is the sound buffer, and X(n) is the frequency components of the sound buffer. doFourierTransform could look something like this, in pseudo-code.
complex X[length];double  F[length];for(k = 0; k < length; ++k){    X[k] = 0;    for(n = 0; n < length; ++k)    {        X[k] += x[n]*exp(complex(0, 2*pi*k*n/length));    }    F[k] = abs(X[k]);}return F; 

complex is a, say, class for complex numbers.
F is the frequency spectrum of the buffer.

As you can see, the discrete fourier transform (DFT) is an O(n2) operation. As the buffer grows, the time taken to get the frequencies grows squared. The fast fourier transform (FFT) is another way to get the DFT, which uses symmetries in the different values of n and k, and the corresponding values of the exponential function, to greatly reduce the number of calculations needed, but on the other hand, it requires the buffer to be of power of two length. The main idea behind FFT is to break the buffer down into smaller parts and perform the DFT on these instead. Since splitting the buffer into two reduces the DFT-time by four, we have reduced the time by one half. But we don't stop here, so we keep on splitting all the way down to a length of 2 samples. The FFT is a O(n*logn) operation.


[edited by - Brother Bob on January 30, 2003 8:57:42 AM]
And if the above post makes you think that sound spectrum analyzing is somethink very complicated you can try this Java applet:

Click here

The source code in Java is here.


I really hope it will work since it is my first applet ever. Took me about an hour to write (most of it reading the Java API reference ). It uses Discrete Fourier Transform (not FFT).


IF ARROW KEYS INITIALLY DON'T WORK, CLICK YOUR MOUSE ON AN APPLET.

The upper graph is for frequencies, the middle one for original sound and the lower one for sound multiplied with the window function.

[edited by - Advanced Bug on January 31, 2003 6:11:32 AM]

This topic is closed to new replies.

Advertisement