Problem with net learning XOR

Author

247

January 22, 2006 04:42 AM

I made a thread before about my neural net learning XOR. I was told, as the net was just a single perceptron, it wouldnt be able to learn XOR. I made it learn AND instead, that worked better. Now, Ive extended the net so it has a hidden layer with two perceptrons. Likts this:


>- i1 -- h1 -- 
      ><      -- o1 -->
>- i2 -- h2 --

i input, h hidden, o output. This net can learn AND, but not XOR... is it the limits of the net itself thats the problem again? Do I need a bigger hidden layer? Since it can learn AND I think the code must be all right... but I will post it if anyone wants to look at it. Just want to know if [n]in theory that net could learn XOR.

Fruny

1,658

January 22, 2006 05:05 AM

Yes, it should be able to, with:

H1 = f( -1 * I1 +  1 * I2 )H2 = f(  1 * I1 + -1 * I2 )O  = f(  1 * H1 +  1 * H2 )

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." — Brian W. Kernighan

Mizipzor

Author

247

January 22, 2006 05:15 AM

I was afraid of that... then it must be something faulty with my training routine.

glSmurf

216

January 22, 2006 05:49 AM

This works well with my neural network. Replace my net with yours and see if it works =)

void randomize ( unsigned int size, unsigned int indices[] ){	unsigned int i, a, b;	for ( i=0; i<size; ++i )		indices = i;	for ( i=0; i<size; ++i )	{		a = random(0, size-1);		b = random(0, size-1);		while ( a == b )			b = random(0, size-1);		indices[a] ^= indices;		indices ^= indices[a];		indices[a] ^= indices;	}}struct Pattern{	float	input[2];	float	output[1];};// XOR patternconst unsigned int samples = 4;const Pattern sample[] = {{{0.0f, 0.0f}, {0.0f}},			  {{1.0f, 1.0f}, {0.0f}},			  {{1.0f, 0.0f}, {1.0f}},			  {{0.0f, 1.0f}, {1.0f}},};void main (){	unsigned int indices[samples];	unsigned int iterations = 200;	unsigned int i;	NeuralNetwork net(3,  2,2,1); // 3 layers, 2I, 2H, 1O	float* input  = net.input();	float* output = net.output();	float* target = net.target();	while (	--iterations != 0 )	{		randomize(samples, indices);		for ( i=0; i<samples; ++i )		{			input[0]  = sample[indices].input[0];			input[1]  = sample[indices].input[1];			target[0] = sample[indices].output[0];			net.learn();		}	}	input[0] = 1; 	input[1] = 0;	net.update();	printf("%f ^ %f = %f\n", net.input(0), net.input(1), net.output(0));	input[0] = 0;	input[1] = 1;	net.update();	printf("%f ^ %f = %f\n", net.input(0), net.input(1), net.output(0));	input[0] = 1;	input[1] = 1;	net.update();	printf("%f ^ %f = %f\n", net.input(0), net.input(1), net.output(0));	input[0] = 0;	input[1] = 0;	net.update();	printf("%f ^ %f = %f\n", net.input(0), net.input(1), net.output(0));	system("pause");}

Mizipzor

Author

247

January 22, 2006 06:17 AM

Hmm thanks for the help glSmurf but I think our nets are a little to different to just replace and try. :P

Heres the training routine I use (back propgation), I think the error must be somewhere in here, you see anything obiously faulty?

void NeuralNet::Train(vector<float> NetIn, float d) {	// first, process so we have the correct values stored inside the neural net	Process(NetIn);	vector<float> HiddenDelta;	HiddenDelta.resize(NeuronsInHidden);	vector<float> OutputDelta;	OutputDelta.resize(Outputs);	// output layer delta (we only have one output now so the loop will only run once)	// d3(1) = x3(1)(1 - x3(1))(d - x3(1))	for(int n = 0; n < Outputs; n++) 		OutputDelta[n] = x(2,n) * (1 - x(2,n)) * (d - x(2,n));	// hidden layer delta	// formula: d2(1) = x2(1)(1 - x2(1))w3(1,1)d3(1)	for(int n = 0; n < NeuronsInHidden; n++) 		HiddenDelta[n] = x(1,n+1) * (1 - x(1,n+1)) * w(1,n+1,0) * OutputDelta[0];	// deltas calculated, now alter the weights (we only have one output now so the loop will only run once)	// formula: w2(0,1) = h*x1(0)*d2(1)	for(int n = 0; n < Outputs; n++) {		for(int i = 0; i < NeuronsInHidden+1; i++) 			SetW(1,i,n, w(1,i,n)+(LEARN_RATE * x(1,i) * OutputDelta[n]));	}	for(int n = 0; n < NeuronsInHidden; n++) {		for(int i = 0; i < Inputs+1; i++) 			SetW(0,i,n, w(0,i,n)+(LEARN_RATE * x(0,i) * HiddenDelta[n]));	}}

Heres the function headers, x() gives the ouput for a neuron/input to the next. w() gives the weight.

	float	x(int l, int n);		// output x in layer l from neuron n (l = 0 for the inputs to the net)	float	w(int l, int f, int n);	// weight of input f (bias = 0) for neuron n in layer l, l = 0 is hidden, l = 1 is output layer	void	SetW(int l, int f, int n, float NewWeight);	// same as above but set the weight instead of read it

Anything in there that could get me on track again? Just feels that Ive tried everything :P. Ill see if its possible to make my net make use of your code.

Mizipzor

Author

247

January 22, 2006 07:36 AM

I think ive made some progress now, I changed the way the weights are initialized. Instead of just assigning them all to a random value, I loop through them:

This:

	W.resize(2);	// setup weights	W[0].resize((Inputs+1) * NeuronsInHidden);		// hidden layer	for(int i = 0; i < ((Inputs+1) * NeuronsInHidden); ++i)		W[0] = 2.0f*((float)rand()/(float)RAND_MAX)-0.5f;	W[1].resize((NeuronsInHidden+1) * Outputs);	// output layer	for(int i = 0; i < ((NeuronsInHidden+1) * Outputs); ++i)		W[1] = 2.0f*((float)rand()/(float)RAND_MAX)-0.5f;

Instead of this:

	W.resize(2);	// setup weights	W[0].resize(((Inputs+1) * NeuronsInHidden), 2.0f*((float)rand()/(float)RAND_MAX)-0.5f);	// hidden layer	W[1].resize(((NeuronsInHidden+1) * Outputs), 2.0f*((float)rand()/(float)RAND_MAX)-0.5f);	// output layer

But Ive encountered another quite strange problem. The network manages to learn the XOR operation... if I define LEARN_RATE as 0.1. If I, however, I define it as somethign lower, 0.01 or 0.001, it fails and the output for all four combinations are somewhere around 0.5.

Anyone have any idea of what could be the cause of this?

glSmurf

216

January 22, 2006 08:53 AM

a learning rate of 0.1 is pretty low ...the lower the learning rate the longer it will take for the network to learn a pattern. With a problem as simple as XOR you should do fine with a learning rate of 1.5-2.0 (momentum = 0.5)

Anonymous

January 22, 2006 11:22 AM

How many training epochs are you using? A lower learning rate may require more epochs (while a higher learning rate might not be able to fine tune the network as well). Also with a lower learning rate it might be going towards different local minima, but the odds of this happening depend on the initial weights, the function being learned, the training algorithm, etc.

If all the input/output pairs are known in advance, there is a variation that can be used where you present all the input/output pairs and add up all the deltas to get a more global gradient, and then you adjust all the weights. This technique might provide better results.

glSmurf

216

January 22, 2006 11:45 AM

...or you could give the Delta-Bar-Delta method a try

Personally I'm going for the extended delta-bar-delta and mixing it with the directed random search to help avoiding local miminas