Bias in Neural Networks
Hi, I'm a newbie to the world of ANN. I'm aware of the Gradient Desecent Rule and the Backpropagation Theorem. What I don't get is , when is using a bias important?
For example, when mapping the AND function, when i use 2 inputs and 1 output, it does not give the correct weights, however , when i use 3 inputs(1 of which is a bias), it gives the correct weights.
I blog here : http://lameness-prevails.com/
You can think of a single neuron as separating the inputs by a hyperplane. If you don't use a bias, you are forcing the hyperplane to pass through the origin. Generally you want to use a bias.
A situation that is easier to understand is the difference between doing simple linear regression of the form y=Kx or allowing for lines of the form y=Kx+b. That `b' allows the line to not go through the origin.
A situation that is easier to understand is the difference between doing simple linear regression of the form y=Kx or allowing for lines of the form y=Kx+b. That `b' allows the line to not go through the origin.
Hey, thanks for the quick reply! However, I don't really understand the concept of hyperplanes. Could you explain it a bit further? What problem is caused when the hyperplane passes through the origin?
Also, Where do I need to include a bias? In all the hidden and output nodes?
Also, Where do I need to include a bias? In all the hidden and output nodes?
I blog here : http://lameness-prevails.com/
We'll start with no bias. Let's say you used the function 1/(1+exp(-S)) as a transfer function after you do the linear combination of the inputs. If S has large values (away from 0, either positive or negative), the result will saturate to something very close to 1 or 0. Imagine a neuron with only two inputs, so we can have good geometric intuition for what's going on. You can represent any combination of inputs as a point in the plane. Now consider where is the boundary that separates the 1s and the 0s. The points where the value will be 0.5 are those where S is 0. Now, the set of points where a linear combination of the inputs is 0 is a line that passes through the origin (something like 2x-3y=0). This means that this neuron will distinguish between points on one side of the line and points on the other side. The position of the line will be learned so it does the best job at classifying the points correctly.
Now, if you had a bias, your neuron could learn to classify points by separating them with lines that don't go through the origin, because S=0 now would look something like 2x-3y+5=0.
This still only allows you to classify points judging by which side of the line the input is on, but that's because this a single neuron. A multi-layer neural network can learn other more complicated shapes, because the inputs to the last layer are themselves functions that have been learned.
If instead of two inputs we had three, there would be a plane separating the inputs that result in outputs greater than 0.5 and lower than 0.5. In general, if you have n inputs, the barrier would be an affine subspace of dimension n-1, which is called a hyperplane. In all these cases, the hyperplane is forced to pass through the origin if you don't have a bias.
So yes, you probably want to have biases in all neurons.
I am sure there is a web page somewhere that explains these things perfectly well, with pictures and all, but I can't find it.
Now, if you had a bias, your neuron could learn to classify points by separating them with lines that don't go through the origin, because S=0 now would look something like 2x-3y+5=0.
This still only allows you to classify points judging by which side of the line the input is on, but that's because this a single neuron. A multi-layer neural network can learn other more complicated shapes, because the inputs to the last layer are themselves functions that have been learned.
If instead of two inputs we had three, there would be a plane separating the inputs that result in outputs greater than 0.5 and lower than 0.5. In general, if you have n inputs, the barrier would be an affine subspace of dimension n-1, which is called a hyperplane. In all these cases, the hyperplane is forced to pass through the origin if you don't have a bias.
So yes, you probably want to have biases in all neurons.
I am sure there is a web page somewhere that explains these things perfectly well, with pictures and all, but I can't find it.
Quote: Original post by kvsingh
Hi, I'm a newbie to the world of ANN. I'm aware of the Gradient Desecent Rule and the Backpropagation Theorem. What I don't get is , when is using a bias important?
For example, when mapping the AND function, when i use 2 inputs and 1 output, it does not give the correct weights, however , when i use 3 inputs(1 of which is a bias), it gives the correct weights.
I think the simplest way to understand the usefulness of a bias term is to think about what happens when all of the input variables equal zero. No matter what the weights are on those variables, the output of the linear part of any neuron will be zero. The bias term allows the linear portion of a neuron to output a value other than zero when all inputs are zero.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement