Bug in back propagation algorithm (C++)

Started by staticVoid2 February 28, 2011 12:54 PM

0 comments, last by staticVoid2 13 years, 9 months ago

Author

381

February 28, 2011 12:54 PM

I'm implementing a basic 3-layered ANN and over the past few days I've been trying to fix a bug with the back propagation algorithm to get it to work but have been unsuccessful.
I first feed forward the input data into the single hidden layer and then from the hidden layer to the output using tanh() as the activation function:





	for(int i = 0; i < numhidden; ++i) {

		float value = 0.0f;

		for(int j = 0; j < numinputs; ++j) {

			value += w[j] * x[j];

		}

		hidden = tanh(value);

	}



	for(int i = 0; i < numoutputs; ++i) {

		float value = 0.0f;

		for(int j = 0; j < numhidden; ++j) {

			value += v[j] * hidden[j];

		}

		y = tanh(value);

	}

I then calculate the error terms at the output layer and update the weights:





	for(int i = 0; i < numoutputs; ++i) {

		errors = (target - y) * (1 - (y*y));

		for(int j = 0; j < numhidden; ++j) {

			v[j] += lrate * errors * hidden[j];

		}

	}

where y[] is the output array and target[] is the target output. I don't know at this point if I should leave the updating of the weights till after I calculate the error terms for every layer of whether to update the weights in one layer and use the results of which to calculate the errors in the previous layer (I've seen implementations do both ways) ???

I then calculate the errors at the hidden layer and update the w weights (from input->hidden):





	for(int i = 0; i < numhidden; ++i) {

		float sum = 0.0f;

		for(int j = 0; j < numoutputs; ++j) {

			sum += errors[j] * v[j];

		}

		hiddenerrors = sum * (1 - (hidden*hidden));

	}



	// adjust the w weights

	for(int i = 0; i < numinputs; ++i) {

		for(int j = 0; j < numhidden; ++j) {

			w[j] += lrate * hiddenerrors[j] * x;

		}

	}

As you can see I'm indexing the weights matrices as w[input][hidden] and v[hidden][output] consistently so I don't think this is the problem however the weird thing about this is that if I change the x to x[j] on the third last line there ^ it suddenly learns the pattern which is really weird considering it is clearly incorrect as j indexes into the wrong array. In this example however numinputs and numhidden are the same so the addressing is still valid. Currently the program will just run forever and not learn, the criteria for exiting the application is when every input pattern's output is equal to it's target output (within a small margin of course (0.1)). The input in this example is the typical XOR truth table. Can you see anything I'm doing wrong?

Thanks.