The maths behind back propagation

142

January 27, 2006 09:16 AM

"Ive been debugging my neural net class for a week now, I cant figure out whats wrong. I thought it could be that the back propagation equation isnt implented as intended."

I too had similar problems a while ago. It turned out that the problem was not in the back propagation. There are many other things to which a neural network is sensitive to:

* Initial weight values. I found it a good idea to use random numbers from some configurable bounds, e.g [-0.1,+0.1]. My network actually refused to learn XOR with zero initial weights!

* Order of teaching. I found that using stochastic teaching (i.e choosing teaching samples at random) produces better results than orderly teaching (which can lead to bias). Again my network often refused to learn XOR if teaching was done in a predictive fashion (00,01,10,11).

* Learning constant. OK, it's clear that the learning constant affects the learning by a great deal - try lowering your value at first.

As with all debugging, you should first start with very simple test data, e.g learning AND or OR binary gates (a perceptron is enough for this). Then you can proceed to a little bit more complicated examples, e.g the XOR gate (this requires hidden layers), and off you go.

Good luck,
-- Mikko

Mizipzor

Author

247

January 27, 2006 09:39 AM

Thanks for your input uutee. [smile]

Ive added stochastic training to my network. Didnt make it better though.

I randomize the weights in the constructor so they are within -1 and 1.

I just lowered the learning constant to 0.1 and ran a whooping 500000 times. The net just gave 0.03 on every input... couldnt be more far of. :P

Ive already made a net with a single perceptron which managed to learn AND. Its now when Ive expanded the class to be of a variable size (choose how many hidden layers and how many neurons in each layer) that the problems have started to really hard to solve.

Heres the source to my net, if you (for any reason) want to look. The training algorithm was taken from a back propagation article on generation5.org (see earlier post for link) and I tried to show how the architecture (that is what id a certain weight/output had) of my net a couple of posts up. Look at that if you dont understand how its laid up from the source.

Here it is, anyway:

NeuralNet.h

#ifndef _NEURAL_NET#define _NEURAL_NET#include <vector>#include <iostream>#include <time.h>#include <math.h>#include <fstream>#include <sstream>#include <conio.h>using namespace std;#define LEARN_RATE 0.1#define OUTPUTS 1class NeuralNet {private:	float	Sigmoid(float num)	{return (float)(1/(1+exp(-num)));};	vector<vector<float> > W;	// weights 	vector<vector<float> > X;	// the neurons outputs	int		Inputs, HiddenLayers, NeuronsInHidden;	float	NetOut;	// the total output of the net	float	x(int l, int n);		// output x in layer l from neuron n (l = 0 for the inputs to the net)	float	w(int l, int f, int n);	// weight of input f (bias = 0) for neuron n in layer l, l = 0 is hidden, l = 1 is output layer	void	SetX(int l, int n, float NewX);	// same as above but set the weight instead of read it	void	SetW(int l, int f, int n, float NewWeight);	// same as above but set the weight instead of read it	bool	Debug;		// debug output flag	//bool	StepThrough;	public:			NeuralNet(int _Inputs, int _HiddenLayers, int _NeuronsInHidden, bool _Debug);			~NeuralNet();	void	Train(vector<float> NetIn, float CorrectOutput);	// same as below but trains it to	float	Process(vector<float> NetIn);	// takes the inputs, returns the outputs of the net		void	Print();};#endif

NeuralNet.cpp

#include "NeuralNet.h"NeuralNet::NeuralNet(int _Inputs, int _HiddenLayers, int _NeuronsInHidden, bool _Debug) {	Inputs = _Inputs; 	HiddenLayers = _HiddenLayers;	NeuronsInHidden = _NeuronsInHidden; 	NetOut = 0;	// the total output of the net	Debug = _Debug;	//StepThrough = _StepThrough;	srand( (unsigned)time( NULL ) );	// seed the randomizer	W.resize(HiddenLayers+1);	// setup weights	for(int l = 0; l <= HiddenLayers; ++l) {		if(l == 0) {	// first layer, these inputs are the net inputs			W[l].resize((Inputs+1) * NeuronsInHidden);			for(int n = 0; n < NeuronsInHidden; ++n) {				for(int f = 0; f < (Inputs+1); ++f)					SetW(l,f,n, 2.0f*((float)rand()/(float)RAND_MAX)-0.5f);			}		}		else {			W[l].resize((NeuronsInHidden+1)*NeuronsInHidden);	// hidden layers, inputs to these are the outputs from the former layer			for(int n = 0; n < NeuronsInHidden; ++n) {				for(int f = 0; f < (NeuronsInHidden+1); ++f)					SetW(l,f,n, 2.0f*((float)rand()/(float)RAND_MAX)-0.5f);			}		}	}	W[HiddenLayers].resize((NeuronsInHidden+1) * OUTPUTS);	// output layer	for(int f = 0; f < ((NeuronsInHidden+1) * OUTPUTS); ++f)		SetW(HiddenLayers, f, 0, 2.0f*((float)rand()/(float)RAND_MAX)-0.5f);	// initialize the input/output holders for perceptrons	X.resize(HiddenLayers+1);	// +1 for the input layer	X[0].resize(Inputs+1, 0);	// the input layer	for(int l = 1; l <= HiddenLayers; ++l)		X[l].resize(NeuronsInHidden+1, 0);	// store biases	for(int l = 0; l <= HiddenLayers; ++l)		SetX(l,0,1);	if(Debug) {		cout << "--SETUP-------------\n";		for(int l = 0; l <= HiddenLayers; ++l) 			cout << "W Layer " << l << " have size " << W[l].size() << endl;		cout << endl;		for(int l = 0; l <= HiddenLayers; ++l)			cout << "X Layer " << l << " have size " << X[l].size() << endl;		cout << endl;		for(int l = 0; l <= HiddenLayers; ++l) {			for(int n = 0; n < W[l].size(); ++n)				cout << "w(" << l << "," << n << ") is " << W[l][n] << endl;			cout << endl;		}		getch();	}}NeuralNet::~NeuralNet() {}float NeuralNet::w(int l, int f, int n) {	// f = 0 is bias weight	if(l > W.size()) {		cout << "SetW error: Bad layer number: " << l << endl;		return 0;	}	else if(l == 0 ) { // input layer		if(((Inputs+1) * n) + f > W[l].size()) {			cout << "SetW error: Bad weight id number: " << ((Inputs+1) * n) + f << " on layer " << l << "\n\n";			return 0;		}		return W[l][((Inputs+1) * n) + f];	}	else if(l == 0 || l <= HiddenLayers)	{ 		if(((NeuronsInHidden+1) * n) + f > W[l].size()) {			cout << "SetW error: Bad weight id number: " << ((NeuronsInHidden+1) * n) + f << " on layer " << l << "\n\n";			return 0;		}		return W[l][((NeuronsInHidden+1) * n) + f];	}	return 0;	// just in case}void NeuralNet::SetW(int l, int f, int n, float NewWeight) {	if(l == HiddenLayers)	// output layer		W[HiddenLayers][((NeuronsInHidden+1) * n) + f] = NewWeight;	else if(l == 0 )	 // input layer		W[l][((Inputs+1) * n) + f] = NewWeight;	else if(l == 0 || l < HiddenLayers)	 // hidden layers		W[l][((NeuronsInHidden+1) * n) + f] = NewWeight;	else if(l < 0 && l > HiddenLayers) {		cout << "W Error: Bad layer number: " << l << endl;		return;	}}void NeuralNet::SetX(int l, int n, float NewX) {	if(l < 0 && l > HiddenLayers) 		cout << "SetX error: Bad layer number: " << l << endl;	if(n >= X[l].size()) {		cout << "SetX error: Bad layer number: " << l << endl;		return;	}	// we are inside boundries	X[l][n] = NewX;}float NeuralNet::x(int l, int n) {	// n = 0 is bias (1)	if(l < 0 && l > HiddenLayers) 		cout << "X Error: Bad layer number: " << l << endl;	if(n >= X[l].size()) {		cout << "X Error: Bad neuron number: " << n << endl;		return 0;	}	// we are inside boundries	return X[l][n];}void NeuralNet::Train(vector<float> NetIn, float d) {	// first, process so we have the correct values stored inside the neural net	Process(NetIn);	vector<vector<float> > Delta;	Delta.resize(HiddenLayers+1);		// one for the output layer to	for(int l = 0; l <= HiddenLayers; ++l) {		if(l == HiddenLayers)	// output layer			Delta[l].resize(OUTPUTS, 0);		else			Delta[l].resize(NeuronsInHidden, 0);	}	// output layer delta (we only have one output now so the loop will only run once)	// d(2,0) = x(3,0)(1 - x(3,0))(d - x(3,0))	//Delta[HiddenLayers][n] = x(HiddenLayers+1,n) * (1 - x(HiddenLayers+1,n)) * (d - x(HiddenLayers+1,n));	Delta[HiddenLayers][0] = NetOut * (1 - NetOut) * (d - NetOut);	// hidden layer delta, first one before output	// d2(1) = x2(1)(1 - x2(1))w2(1,1)d3(1)	// formula: d(l,n) = x(l,n) (1 - x(l,n)) w(l+1,n,n-1) d(l+1,n)	// loop through the net backwards	for(int l = HiddenLayers-1; l >= 0; --l) {		if(l == HiddenLayers-1) {	// layer directly before output layer			for(int n = 0; n < NeuronsInHidden; ++n) 				Delta[l][n] = x(l+1,n+1) * (1 - x(l+1,n+1)) * w(l+1,n+1,0) * Delta[HiddenLayers][0];		}		else {			for(int n = 0; n < NeuronsInHidden; ++n) 				Delta[l][n] = x(l+1,n+1) * (1 - x(l+1,n+1)) * w(l+1,n+1,n) * Delta[l+1][n];		}	}	// Delta calculated, now alter the weights (we only have one output now so the loop will only run once)	// formula: w2(0,1) = h*x1(0)*d2(1)	// formula: w(l,f,n) = h * x(l,f) * d(l,n)	for(int f = 0; f < NeuronsInHidden+1; f++) 		SetW(HiddenLayers,f,0, w(HiddenLayers,f,0)+(LEARN_RATE * x(HiddenLayers,f) * Delta[HiddenLayers][0]));	// alter the weights for the hidden layers to	for(int l = 0; l < HiddenLayers; l++) {		if(l == 0) {	// first layer			for(int n = 0; n < NeuronsInHidden; n++) {				for(int f = 0; f < Inputs+1; f++) 					SetW(0,f,n, w(0,f,n)+(LEARN_RATE * x(0,f) * Delta[0][n]));			}		}		else {			for(int n = 0; n < NeuronsInHidden; n++) {				for(int f = 0; f < NeuronsInHidden+1; f++) 					SetW(l,f,n, w(l,f,n)+(LEARN_RATE * x(l,f) * Delta[l][n]));			}		}	}	if(Debug) {		cout << "--TRAIN-------------\n";		for(int l = HiddenLayers; l >= 0; --l) {			if(l == HiddenLayers) {	// output layer				for(int n = 0; n < OUTPUTS; ++n) 					cout << "Delta(" << l << "," << n << ") " << Delta[l][n] << " "; 				cout << endl;			}			else if(l == 0) {	// input layer				for(int n = 0; n < Inputs; ++n) 					cout << "Delta(" << l << "," << n << ") " << Delta[l][n] << " "; 				cout << endl;			}			else {					for(int n = 0; n < NeuronsInHidden; ++n) 					cout << "Delta(" << l << "," << n << ") " << Delta[l][n] << " "; 				cout << endl;			}		}		cout << endl;		for(int l = 0; l <= HiddenLayers; ++l) {			for(int n = 0; n < W[l].size(); ++n)				cout << "New weight (" << l << "," << n << ") is " << W[l][n] << endl;			cout << endl;		}		getch();	}}float NeuralNet::Process(vector<float> NetIn) {	// reset values in net	for(int l = 0; l <= HiddenLayers; ++l) {		if(l == 0) {	// input layer			for(int n = 1; n < Inputs+1; ++n)				SetX(l,n,0);		}		else {			for(int n = 1; n < NeuronsInHidden+1; ++n)				SetX(l,n,0);		}	}	NetOut = 0;	// reset output neuron	// initial net inputs	for(int n = 1; n <= Inputs; ++n)		SetX(0,n,NetIn[n-1]);	// first layer	float Fire = 0;	// what the neuron fires	for(int n = 1; n <= Inputs; ++n) {		for(int i = 0; i <= Inputs; ++i)			Fire += x(0, i) * w(0, i, n-1);		SetX(1,n, Sigmoid(Fire));	// store it as output		Fire = 0;					// reset fire	}	// sort out the hidden layers outputs	for(int l = 0; l < HiddenLayers; l++) {			// loop through layers		for(int n = 1; n <= NeuronsInHidden; n++) {	// loop through hiddens, start at one so we dont overwrite the bias			for(int i = 0; i < Inputs+1; i++) 		// loop through inputs				Fire += x(l, i) * w(l, i, n-1);		// store outputs as inputs in the next layer			SetX(l+1,n, Sigmoid(Fire));		// store it as output			Fire = 0;						// reset fire		}	}	// output neuron	for(int i = 0; i <= NeuronsInHidden; i++) 		NetOut = x(HiddenLayers, i) * w(HiddenLayers, i, 0);	NetOut = Sigmoid(NetOut);	// --- Calculation done ---	if(Debug) {		cout << "--PROCESS-----------\n";		for(int l = 0; l <= HiddenLayers; ++l) {			for(int n = 1; n < X[l].size(); ++n)				cout << "x(" << l << "," << n << ") = " << x(l,n) << " ";			cout << endl;		}		cout << "Netout: " << NetOut << "\n\n";		getch();	}	return NetOut;	}void NeuralNet::Print() {	// print output	stringstream str;	// hidden layer weights	str << "Hidden W: --- ";	for(int l = 0; l < HiddenLayers; ++l) {		if(l == 0) {	// first layer			str << "\n\nLayer " << l << "\n";			for(int n = 0; n < NeuronsInHidden; ++n) {				for(int i = 0; i < Inputs+1; ++i)					str << "w(" << l << "," << i << "," << n << "): " << w(l,i,n) << "\t";				str << endl;			}		}		else {	// every other hidden layer			str << "\n\nLayer " << l << "\n";			for(int n = 0; n < NeuronsInHidden; ++n) {				for(int i = 0; i < NeuronsInHidden+1; ++i)					str << "w(" << l << "," << i << "," << n << "): " << w(l,i,n) << "\t";				str << endl;			}			}		str << endl;	}	str << "\n\n";	// output layer weights	str << "Output W: --- \n";	for(int n = 0; n < OUTPUTS; n++) {		for(int i = 0; i <= NeuronsInHidden; i++)			str << "w(" << HiddenLayers << "," << i << "," << n << "): " << w(HiddenLayers,i,n) << "\t";	}	str << "\n\n";	// open file	ofstream file("Net.txt");	if(!file.is_open()) {		cout << "Print failed, unable to create file: Net.txt\n";		return;	}	// print it	file << str.str();	cout << "Net data printed to file\n";}

Main.cpp

#include <iostream>#include <conio.h>#include "NeuralNet.h"int main() {	NeuralNet X(2, 2, 2, 1);	vector<float> NetIn;	NetIn.resize(2);	// Train	int s = 0;	// for stochastic teaching 	for(int a = 0; a < 500000; ++a) {		s = rand()%3;		cout << s << endl;		if(s == 0) {			NetIn[0] = 0; NetIn[1] = 0;			X.Train(NetIn, 0);		}		else if(s == 1) {			NetIn[0] = 1; NetIn[1] = 0;			X.Train(NetIn, 0);		}		else if(s == 2) {			NetIn[0] = 0; NetIn[1] = 1;			X.Train(NetIn, 0);		}		else if(s == 3) {			NetIn[0] = 1; NetIn[1] = 1;			X.Train(NetIn, 1);		}	}		// Output what weve learned	NetIn[0] = 0; NetIn[1] = 0;	cout << "0,0 = " << X.Process(NetIn);	NetIn[0] = 1; NetIn[1] = 0;	cout << endl << "1,0 = " << X.Process(NetIn);	NetIn[0] = 0; NetIn[1] = 1;	cout << endl << "0,1 = " << X.Process(NetIn);	NetIn[0] = 1; NetIn[1] = 1;	cout << endl << "1,1 = " << X.Process(NetIn) << "\n\n";	X.Print();	getch();	return 1;}