Neural network problem
I've been reading the neural network chapter of the book 'AI for Game Programmers' and there's something I don't understand.
The author explains that we get the error of an output node (this is back-propagation) in the following way:
error = (desired value - calculated value) * (derivative at calculated value)
This error is then used to calculate the node 'bias' as:
bias = old bias + (learning rate) * error
So my take on this is that ((learning rate) * error) represents the change in 'x' (where 'x' is the input to the sigmoid function) which gives us a better appoximation to the desired value.
But I can't understand how this could work. Obviously we all know that derivatives represent (change in y)/(change in x). So how can we get the desired change in 'x' by multiplying the (desired value - calculated value) by the derivative?
(desired value - calculated value) is a difference of y-values, so wouldn't we want to use the reciprocal of the derivative instead?
I don't want to just use the code without understanding it, so if anyone can offer any insight here I'd appreciate it
Edit:
Changed 'So my take on this is the bias...' to 'So my take on this is the ((learn...'
Other Clarifications
[Edited by - averisk on March 31, 2006 10:20:02 AM]
The output of a (any) unit is the dot_product of the inputs and the weights "normalised" by using the sigmoid (squash) function. the latter serves to limit the output of a unit within a certain range, often [0,1] or [-1,1]. The bias is added to adjust (i.e. bias) this basic evaluation for a given unit.
This relates to neurons that have inhibitive or reenforcing behaviour in human brains. So the input weights of a unit work together with the bias to determine it's output value. Basically the bias is added to the dot product mentioned above.
Now in the real world this (the matrix/vector multiplication (for multiple units)) will usually be implemented by assigning an input value of 1 to the bias (viewed as a weight this time). Then in mathematical terms, the bias is just another weight.
Quote: "So my take on this is that the bias represents the change in 'x' (where 'x' is the input to the sigmoid function)"
-> the backprop alg. does not only change the bias but (for a certain unit) all the input weights (including the bias) (because the output value is the result of all input values plus the bias). So bias = old bias + (learning rate) * error means bias = old bias + (learning rate) * error * 1, which is similar to the weights change which is w = old w + (learning rate) * error * input value.
This relates to neurons that have inhibitive or reenforcing behaviour in human brains. So the input weights of a unit work together with the bias to determine it's output value. Basically the bias is added to the dot product mentioned above.
Now in the real world this (the matrix/vector multiplication (for multiple units)) will usually be implemented by assigning an input value of 1 to the bias (viewed as a weight this time). Then in mathematical terms, the bias is just another weight.
Quote: "So my take on this is that the bias represents the change in 'x' (where 'x' is the input to the sigmoid function)"
-> the backprop alg. does not only change the bias but (for a certain unit) all the input weights (including the bias) (because the output value is the result of all input values plus the bias). So bias = old bias + (learning rate) * error means bias = old bias + (learning rate) * error * 1, which is similar to the weights change which is w = old w + (learning rate) * error * input value.
Thanks Degski for the response
I understand what you are saying. So the weight of the bias is not the only adjustment that is made in back-propagation.
My problem is that I'm trying to view the error-adjustment as something like a numerical root-solver. The weight-adjustment problem, to me at least, seems like the opposite of euler integration. We have the difference in function values (yd - yc) and the derivative, and we want an approximation on the x-value that gives us yd so that we can adjust the 'shift' in the sigmoid function to make a better guess next time.
So, unlike euler integration, we should be dividing by the derivative to get the error, not multiplying by it. And yet according to this book we multiply. I've been playing with the book's source and this works pretty good too... So obviously I don't understand something here
Quote: Original post by Degski
-> the backprop alg. does not only change the bias but (for a certain unit) all the input weights (including the bias) (because the output value is the result of all input values plus the bias). So bias = old bias + (learning rate) * error means bias = old bias + (learning rate) * error * 1, which is similar to the weights change which is w = old w + (learning rate) * error * input value.
I understand what you are saying. So the weight of the bias is not the only adjustment that is made in back-propagation.
My problem is that I'm trying to view the error-adjustment as something like a numerical root-solver. The weight-adjustment problem, to me at least, seems like the opposite of euler integration. We have the difference in function values (yd - yc) and the derivative, and we want an approximation on the x-value that gives us yd so that we can adjust the 'shift' in the sigmoid function to make a better guess next time.
So, unlike euler integration, we should be dividing by the derivative to get the error, not multiplying by it. And yet according to this book we multiply. I've been playing with the book's source and this works pretty good too... So obviously I don't understand something here
Quote: I understand what you are saying. So the weight of the bias is not the only adjustment that is made in back-propagation.
That's correct, all weights are changed (including the bias, which is a weight, but with a special function (having input 1 or mathematically the same: added to the dot product)
Quote: So, unlike euler integration, we should be dividing by the derivative to get the error, not multiplying by it.
Unless I misunderstand you it seems here's the problem: it is not that you're trying to get the error, you already have it (output - target output), what you're doing in backprop is calculating the required weight adjustment given the error and the derivative, i.e. as compared to where you're coming from you shouldn't be dividing but multiplying (because you trying to do the oposite of what you were thinking). Hope this helps! (and I understood correctly.
That's correct, all weights are changed (including the bias, which is a weight, but with a special function (having input 1 or mathematically the same: added to the dot product)
Quote: So, unlike euler integration, we should be dividing by the derivative to get the error, not multiplying by it.
Unless I misunderstand you it seems here's the problem: it is not that you're trying to get the error, you already have it (output - target output), what you're doing in backprop is calculating the required weight adjustment given the error and the derivative, i.e. as compared to where you're coming from you shouldn't be dividing but multiplying (because you trying to do the oposite of what you were thinking). Hope this helps! (and I understood correctly.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement