double Layernet:: find_grad(double *input)
{
int n = n_hid * (n_in + 1) + n_out * (n_hid + 1); //total number of neurons
for (int j = 0; j <n; j++)
grad[j] = 0;
double *hidgrad = grad;
//double *outgrad = grad + n_hid * (n_in + 1);
double diff; double error = 0.0;
for (int k=0;k<n_out;k++)
{
diff = input[k] - out[k];
error += diff * diff;
outdelta[k] = diff * act_deriv (out[k]);
}
int l;
//calc o/p gradient, of the weights connecting Hidden to Output
double delta;
for (int k=0;k<n_out;k++)
{
delta = outdelta[k];
for ( l=0; l< n_hid;l++)
grad[n_hid * (n_in + 1) + l + k * (n_hid + 1)] = delta * hid[l];
grad[n_hid * (n_in + 1) + l + k * (n_hid + 1) + 1] = delta ;
}
int i,jj,k;
//the hidden grads now
for (i=0;i<n_hid;i++)
{
delta = 0;
for (jj=0;jj<n_out;jj++)
delta += outdelta[jj] * out_coeffs[jj * (n_hid + 1) + i];
delta *= act_deriv (hid);
for (k=0; k<n_in; k++)
*hidgrad++ = delta * input [k];
*hidgrad++ = delta;
}
return error / (double (n_out));
}
void Layernet::modify_weights ()
{
int i,j;
for (i=0; i<n_hid;i++)
{
for (j=0; j<=n_in;j++)
hid_coeffs[j + i * (n_in + 1)] += .4* grad [j + i * (n_in + 1)] ;
}
double* newgrad = grad + n_hid * (n_in + 1);
for (i=0; i<n_out; i++)
{
for (j=0; j<=n_hid;j++)
out_coeffs[j + i*(n_hid + 1)] += .4 * newgrad [j + i*(n_hid + 1)] ;
}
}
Thanks.
Back Prop Problem
I am writing a Neural Network that will take in 64 inputs (all normalised between 0-1) and try to reproduce the same at the o/p. It will have 16 hidden neurons.
I am using simple back-prop to train it, but I only seem to cycling between 2/3 different weight vectors. I had initialised the network with small random weights ((double)rand()/RAND_MAX). What could be the problem? My textbook uses Conjugate gradient for training, but I dont want to use it, as I dont understand the Math behind.
I am actually calculating the gradient vector of the weight vector, and then using w += grad w * learning rate as the modification step. Is it that I am trying to train a little too much using a rather simplistic algorithm? Or is the problem with my algorithm?
Some debugging has shown that the total_error of the network
(target[k] - observed[k])^2 actually starts 'increasing' after about 100 iterations. How is that happening? Any clues??
(target[k] - observed[k])^2 actually starts 'increasing' after about 100 iterations. How is that happening? Any clues??
The code's a bit fuzzy to me, but first of the block, you don't need to calculate the gradient, you just need to keep the output of the units (which you'll need anyway, that's where your problem is). The gradient = output * ( 1 - output ), so forget the explicit gradients.
Then, w += grad w * learning is not complete, should be
w += grad w * learning * error * input
the error is obviously determined by the output (that's why you need to keep them). for the output layer this is (output-desired output) and for the hidden layer(s) this value is calculated backward through the network (the inverse way of how you calculated the output going forward.
In lin algebra terms its like:
going forward for all layers
sigmoid( ( weight-matrix * input ) ) = ( output next layer )
at the output layer
output error = ( output - desired output )
than backwards for all layers
weight-matrix-transposed * output error = input error (or output error previous layer )
and
weight adjustment = learning * ( output * ( 1 - output ) ) *
input * weight
This probably still not very clear, but looking at it like matrix/vector products simplifies your implmentation tremendously and you might be able to use some EFFICIENT linear algebra package! If you would use matrix-matrix multiply, with the same schema you could process (more efficiently) several inputs at the same time (for offline training). Lin Alg Packs usually do a better job at optimising matrix matrix multiply than consequtive matrix vector multiplies. Speed is really the crux!
Then, w += grad w * learning is not complete, should be
w += grad w * learning * error * input
the error is obviously determined by the output (that's why you need to keep them). for the output layer this is (output-desired output) and for the hidden layer(s) this value is calculated backward through the network (the inverse way of how you calculated the output going forward.
In lin algebra terms its like:
going forward for all layers
sigmoid( ( weight-matrix * input ) ) = ( output next layer )
at the output layer
output error = ( output - desired output )
than backwards for all layers
weight-matrix-transposed * output error = input error (or output error previous layer )
and
weight adjustment = learning * ( output * ( 1 - output ) ) *
input * weight
This probably still not very clear, but looking at it like matrix/vector products simplifies your implmentation tremendously and you might be able to use some EFFICIENT linear algebra package! If you would use matrix-matrix multiply, with the same schema you could process (more efficiently) several inputs at the same time (for offline training). Lin Alg Packs usually do a better job at optimising matrix matrix multiply than consequtive matrix vector multiplies. Speed is really the crux!
Quote:
Then, w += grad w * learning is not complete, should be
w += grad w * learning * error * input
Page 163 of Neural Networks (Simon Haykin, 2nd Edition) shows
del w(i,j) = - eta * grad ---eqn (4.12)
where grad = partial derivative of the error function wrt w(i,j)
This equation should translate into del w = learning rate * grad (negative values have been already multiplied with grad, so grad here is the negative gradient)
Well, I don't want to argue with you, i looked it up in my handbook, which is
R Rojas, Neural Network (a systematic introduction) page 165-167 gives the above information. All I can say is that my networks work and you seemed to have a problem. Have a look at this link http://www.dontveter.com/bpr/public2.html there's a detailed numerical example, so that should help.
R Rojas, Neural Network (a systematic introduction) page 165-167 gives the above information. All I can say is that my networks work and you seemed to have a problem. Have a look at this link http://www.dontveter.com/bpr/public2.html there's a detailed numerical example, so that should help.
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement