Advertisement

standard deviation: what's it mean? (excuse the pun)

Started by January 09, 2003 02:41 AM
17 comments, last by Mnsr Cola 22 years, 1 month ago
What exactly is the standard deviation of a distribution? I require a random number function that will return a value based upon a gaussian distribution (mean of 0) and have a spread between -1 and 1. I thought that this would mean that the standard deviation would be 1, but as usual I''m totally wrong. So, I''d like to know how to calculate the SD required to get the distribution model I require. Thanks for any assistance.
the standrad deviation is calculated as:

r^2 = (1/(N - 1)) * sum((xi - m)^2)

sum runs through 0 <= i <= N - 1

where m is the signal's mean, N is the number of samples and x is the value of the current sample, and r^2 is the square of the standard deviation (hence you have to take a sqrt of it). sorry for the lousy interpretation of a mathematic formula - i'm just not very good with the extended charset.

Hope this helps,
Crispy

edit: added casts for 1 / (N - 1) to make it clearer

[edited by - crispy on January 9, 2003 6:02:21 AM]
"Literally, it means that Bob is everything you can think of, but not dead; i.e., Bob is a purple-spotted, yellow-striped bumblebee/dragon/pterodactyl hybrid with a voracious addiction to Twix candy bars, but not dead."- kSquared
Advertisement
okay, I understand that, thanks. Although I still have no idea what actual use it is :-)

What I''d really like to know is how I can calculate the standard deviation required that will give me a ''spread'' of +/- n from the mean.

I have found a normal distribution random number function I want to use:

float Gaussian(float mean, float StandardDeviation);

I want to be able to generate random numbers that are distributed about zero and that are bounded by -1 < r < 1. But I''m not sure how I calculate the SD.
You can not bound a true gaussian distribution to +/- 1 (unless you set sd to 0). In my mind, it makes no sense for the "spread" of a gaussian distribution to mean anything else but the standard deviation. What more exactly are you trying to do? What will the random numbers be used for?
well, what you''re asking for is as far as I know impossible as gaussian distributions are infinite (with P(x)->0 as x->+/-infinity) but we can estimate

Anywho, the standard deviation is defined as the square root of the variance, and the variance is defined as E[X^2]-E[X]^2 where E[] is the expected value (mean) and X is a bunch of numbers (turns out to be the same as Crispy''s where r^2 is the variance, just looks nicer)

It''s just a number though (certainly useful in statistics, but a number nonetheless)... a number used to describe how closely grouped around the mean the values are... what you''re probably looking for is called a confidence interval, which gives you the probability that a value will fall between a certain interval ([-1,1] in this case) .. anyway, the math gets pretty hairy with integrals of probability density functions and what not so I won''t try to reproduce it here

To sum up, the smaller the sigma, the more confident you can be that the value falls between [-1,1]... but it also leads to the values being more tightly grouped around the mean so you''ll have a rather large probability of returning values near zero which might not be what you were looking for but thats what you get

You can look up all this stuff at www.mathworld.com if you want to get into the math (which you''ll probably have to in order to make a good Gaussian PRNG)

quote:
Original post by Anonymous Poster
You can not bound a true gaussian distribution to +/- 1 (unless you set sd to 0).


Setting the standard deviation to 0 would bound it a lot better than +/- 1... it would always equal the mean with probability 1, which isn''t much of a distribution, but yeah.. I imagine you knew that

Advertisement
Gaussian (aka "Normal") distributions aren''t bounded. A Normal random variable with variance > 0 has some probability of generating numbers from -inf to +inf, but is most likely to generate numbers at the mean. There''s no way to "bound" it.

However, you can be confident that, on average, a certain percentage of the numbers generated will be within those bounds. This is usually how standard deviation is talked about in the context of RNV''s. Here''s a good page on it.

Think Liberally.
quote:
Original post by Stoffel
However, you can be confident that, on average, a certain percentage of the numbers generated will be within those bounds.


In fact, this is what I remember reading someplace: the Gaussian distribution is much steeper than a falloff produced by most other functions (such as 1/x) - in other words you don''t need to go very far from the centre of the distribution to find a value that has "probably ocurred for no more than a few microseconds throughout the entire existence of the Universe (10+ billion years)". Knowing that the values towards the rim become progressivley less probable, you can pretty well presume what you you''re lookign for is somewhere near the mean.

Crispy
"Literally, it means that Bob is everything you can think of, but not dead; i.e., Bob is a purple-spotted, yellow-striped bumblebee/dragon/pterodactyl hybrid with a voracious addiction to Twix candy bars, but not dead."- kSquared
Mnsr Cola,

I hope the following helps you with your understanding of parametric distributions (like the Gaussian) and how to use them.

Probability density functions are just like other density functions and this can help us to understand them better. For example, the distribution of matter in an iron rod can be described by a density function along the length of the rod. Imagine two such rods, both weighing the same total mass. If the have the same volume, then they both have the same average density. However, consider one rod where most of the mass is concentrated at the ends of the rod and another where the mass is concentrated near the centre of the rod. Assume each rod has a symmetric mass distribution about it''s centre. Notice how I can switch equally between density and mass. I''m keeping volume constant so variations in density represent variations in mass and vice versa.

Now, if the rods are symmetric, then the centre of mass of each rod is at the half way point along the rod (each half balances the other half). This corresponds to the mean of the mass and density distributions, which is also known as the first moment (and is also the first cumulant). As nohbdy mentioned, the variance describes how concentrated the density is about the mean. Because it is computed relative to the mean, it is of second order. But what does it mean in terms of the distribution of density/mass/probability?

Consider the two rods again. If you were to spin them about the centre of mass using the same amount of energy to get them going, the rod with the mass concentrated at the middle would spin faster. This is the same problem as the ice skater who pulls her arms in to speed up her spin. Clearly there are different physical results from having mass spread further away from the centre of mass of an object. This quality is measured by the second moment (stricly speaking, variance is the second cumulant of the distribution, but common usage of second moment from physics seems to have affected probabilities as well) of the mass/density distribution.

Now, there are higher order moments/cumulants as well. However, for any symmetric distribution of mass/density/probability, all odd moments are zero. Furthermore, the Gaussian distribution has all cumulants of order higher than 2 equal to zero. For this reason is it a very special distribution (since it can be shown that either all cumulants higher than 2 are zero, or there are an infinite number of cumulants).

Anyway, getting back to your questions again. I hope that you can see that the variance (standard deviation) represents the distribution of mass/density/probability about the mean.

As for your other questions...

As mentioned, you cannot constrain a standard Gaussian distribution to -1,1 since it is defined on -inf,+inf. However, you can create a distribution which has a percentage of its mass concetrated within these bounds. You cannot say that 100% of the mass lies between these bounds, but you can say that 99.99% does... or any other number between 0% and 100%. For the Gaussian distribution, 99.96% of the probability mass lies within 3 standard deviations of the mean. So, if you wanted this mass to lie within -1,+1, then set the standard deviation to 1/3 (so that 3*1/3 = 1, the distance from the mean to the confidence limit). Since variance is the square of the standard deviation, then a distribution with mean 0 and variance 1/9 will have 99.96% of it''s probability mass between -1 and +1. 0.04% of the time you would find a number outside of these bounds. You could make it more accurate by creating a distribution that had 4 standard devations between the bounds. However, you''d find that numbers near -1 and +1 would be very rare indeed!

I hope this has helped your understanding of distributions. If you have any more questions, feel free to ask.

Cheers,

Timkin
Since this is a math board, it proves helpful to examine the actual equation of the standard normal distribution curve (also known as a "Bell Curve" in some instances):

e-(x^2)/2 / sqrt(2*PI)

The generalized formula is actually:

e-( (x-μ )^2 ) / 2(σ^2) ) / sqrt(2*PI*σ2)

Where μ is the mean and σ is the standard deviatation. As you can see, the standard deviation is squared wherever you see it, so you can replace σ2 with the variance. However, when μ is 0 and σ is 1, you can see how we get the standard equation.

This page is a quick reference. It also has a nice probability chart that your professors are sure to force you to use every now and then

[edited by - Zipster on January 10, 2003 12:31:27 PM]

This topic is closed to new replies.

Advertisement