Advertisement

Backprop Issues

Started by August 29, 2006 11:33 PM
5 comments, last by haemonculus 18 years, 2 months ago
Hey all, I've read several books on neural networks as well as inumerable sites. I recently attempted to code a simple net in C to play around with. It's possible (perhaps even likely) that I made a coding mistake, but I used the equations for backpropagation from this page: http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/backprop/backprop.html I can't imagine these equations are wrong, but I've checked my code several times and it looks correct. The issue is as follows: I have a simple test coded to input two numbers between 0 and 1. I want to teach the net to multiply them, so the target value is in1*in2. I've tested my net layout (2-8-8-1) with a "real" NN package, and it does learn the action very quickly. When I run the code, the first few iterations operate properly, then suddenly the output of the net zooms up to huge numbers, and finally just returns "nan" (or "really big number that's too big for this format string"). Example: 0.182 , 0.936 --> 1.647 (real: 0.170 error: -1.477) 0.969 , 0.597 --> 2.223 (real: 0.578 error: -1.645) 0.031 , 0.345 --> 0.560 (real: 0.011 error: -0.549) 0.370 , 0.929 --> 1.847 (real: 0.344 error: -1.503) 0.043 , 0.860 --> 1.757 (real: 0.037 error: -1.721) 0.119 , 0.907 --> 6.960 (real: 0.108 error: -6.852) 0.390 , 0.032 --> 513554528.000 (real: 0.012 error: -513554527.988) 0.217 , 0.052 --> nan (real: 0.011 error: nan) 0.756 , 0.104 --> nan (real: 0.079 error: nan) 0.556 , 0.911 --> nan (real: 0.507 error: nan) 0.769 , 0.309 --> nan (real: 0.237 error: nan) The file is here: http://www.crepinc.com/files/neural-backprop_g.c I've done all the debugging I can think of and I can't pin down the source of the crazy value. Thanks for any input you can give! -Jack Carrozzo jack {@} crepinc.com
It would probably be a lot easier if you were to post some of the relevant sections of your code. However, there are some checks you should do yourself first:

Are you certain the basic, feed forward section of your ANN is working correctly? Create a random, 2-1-1 network, print out the values for the weights and give it some inputs. Are the outputs calculated the same as those you reach when you work through the network with a pencil and paper?

Can a 2-2-1 network be trained to solve easier problems? In order of difficulty: OR, AND, XOR?

Are you sure the delta rule is coded correctly? Are you using the correct differentiation of your activation function? If the g(x) you're using is the basic sigmoid, g`(x) should be g(x) * (1 - g(x)). If you're using a different activation function, the differentiation will be different.
Advertisement
I verified the feed forward section was working properly as described by Asbestos.

I went back through the whole back propogation sequence to see where the really large numbers were coming from. I did this by starting at the weight deltas. They "exploded". I continued working backwards to find the variable that first gets really big.

Here's my feed-forward function: (2nd hidden layer removed for testing)

void run_net ( void ) {
int i,j;
float tmp;

//the array in[] needs to have been set manually before calling this function

for (i=0;i<NUM_HID1;i++) {
tmp=0.0;
for (j=0;j<NUM_IN;j++) {
tmp+=in[j]*wa_in_hid1[j];
}
hid1=tmp;
hid1_o=sigmoid(hid1);
}

for (i=0;i<NUM_OUT;i++) {
tmp=0.0;
for (j=0;j<NUM_HID1;j++) {
tmp+=hid1[j]*wa_hid1_out[j];
}
out=tmp;
out_o=sigmoid(out);
}

//the output has been stored in out_o[]
}

and my back-propagation function:

void back_prop( void ) {

int i,j;
float omega_out[NUM_OUT];
float omega_hid1[NUM_HID1];
float tmp;

for (i=0;i<NUM_OUT;i++) { //output nodes
omega_out=out*(1.0-out)*(target-out_o); //<-- !!! correct?
}

for (i=0;i<NUM_HID1;i++) { //hidden
tmp=0.0;
for (j=0;j<NUM_OUT;j++) {
tmp+=wa_hid1_out[j]*omega_out[j];
}
omega_hid1=hid1*(1.0-hid1)*tmp;
}

//now change the weights

for (i=0;i<NUM_HID1;i++) {
for (j=0;j<NUM_OUT;j++) {
dwa_hid1_out[j]=(mu*omega_out[j]*out[j])+(dowa_hid1_out[j]*mc);
}
}

for (i=0;i<NUM_IN;i++) {
for (j=0;j<NUM_HID1;j++) {
dwa_in_hid1[j]=(mu*omega_hid1[j]*hid1[j])+(dowa_in_hid1[j]*mc);
}
}

//change weights from deltas, and save to old weights

for (i=0;i<NUM_HID1;i++) {
for (j=0;j<NUM_OUT;j++) {
wa_hid1_out[j]+=dwa_hid1_out[j];
dowa_hid1_out[j]=dwa_hid1_out[j];
}
}

for (i=0;i<NUM_IN;i++) {
for (j=0;j<NUM_HID1;j++) {
wa_in_hid1[j]+=dwa_in_hid1[j];
dowa_in_hid1[j]=dwa_in_hid1[j];
}
}
}

Again, I'm using the back proagation equations from http://www.cse.unsw.edu.au/~billw/cs9414/notes/ml/backprop/backprop.html

"If node j is an output node, then δj is the product of φ'(vj) and the error signal ej, where φ(*) is the logistic function and vj is the total input to node j (i.e. Σi wjiyi), and ej is the error signal for node j (i.e. the difference between the desired output and the actual output)" (I beleive that's what I've coded, please let me know if I've done it wrong)

The first sign of this is the variable "out[]", which holds the INPUT to each output node (the actual output is in out_o[], where out_o[]=sigmoid(out[]))

To recap: the weights grow rediculously huge in a few iterations, because the out[] variable gets really big. But in a vicious circle, the out[] variable gets even bigger because the weights get huge from the previous iteration!

I checked my equations a few more times as well, and I can find nothing wrong.

In response to Asbestos, the net will not learn anything, as the results go to "inf" or "nan" (too big to be computed) before any real learning could have occured.

Thanks for looking, hope somebody has a clue whats going on...

-Jack Carrozzo
jack {@} crepinc.com
Math isn't my strongest point, so I don't know if these different methods give different results, but for what its worth the backprop algorithm I use is best explained at this tutorial: http://www.cs.ucc.ie/~dgb/courses/tai/notes/handout10.pdf (a simple, 2-page pdf).

He has

For output 'o':

Error(o) = target(o) - actual(o)
delta(o) = g'(input(o)) x error(o)

where, if your activation is sigmoid, g'(x) = g(x) x (1 - g(x))

To adjust each weight:

w(h,o) = w(h,o) + (learning rate x output(o) x delta(o))

... with something similar for the hiddens.

This seems to be much simpler than the other version cited, so I don't know why they differ or what the reason is. You should try it out, though, and see if you can get better results.
Asbestos,

Thanks for the equations. Mthematically those are the same as what I have (or the same as the equations I used, it's still a good possibility that I didn't implement them properly in the code)

If this makes sense at all, when the code runs and before it crashes, the output always converges to 0.5, no matter the input.
If this question is still live: I know nothing about neural nets, but it seems to me your omega value for the output node is of the wrong sign. If you are higher than the target value, omega will be positive, you will increase the weights of the connections and you will be even more above the target on the next iteration. That's positive feedback and will tend to result in the behaviour you see. I would suggest the insertion of a minus sign on the line you marked "correct?" :).
Advertisement
hm jugding from your output, you are using the NN with normal values
with how many test data have you fed it? a NN needs to learn the function (and very "long" (not time but tries)) before it can get the right results..
perhaps your learning phase is too short (or too long, which can be a problem too if your test data set is too small)

but I haven't looked at the code (haven't implemented a NN myself, so I don't know how it has to look exaktly) and I don't remember the exact formulas of the NNs so maybe only your code is buggy

but if you are using a small dataset and only few iterations to learn your NN, then you should try it with a greater dataset and more iterations first
and even if you're using the same as the NN api you're testing against: perhaps that uses a better algorythm than you do..

This topic is closed to new replies.

Advertisement