Advertisement

XOR Problem

Started by February 12, 2006 04:17 PM
4 comments, last by sidhantdash 18 years, 9 months ago
This is a simple net to solve the XOR problem. I am training it on all possible inputs, and my training array is

int array[12] = {2,2,1,1,1,1,2,1,2,1,2,2}; //2 is the higher state, 1 is the 
                                           //lower. So 2,2 corresponds to 
                                           //output 0, which is 1 in the array
Following this, I train my net, using the following code

void Layernet::Trainnet (int *inputarray)
{
	//present the inputs
int *input=inputarray;
int counter = 1;
for (int j=0; j<100; j++)
{
	double hid_in = w0 * input[0] + w1 *input[1] + bias1;
	double hid_out = 1 / (1 + exp (-1 * hid_in));
	double out_in = w2*hid_out + bias2;
	double out_out = 1 / (1 + exp(-1 * out_in));

	//train using backpropagation

	w0 += .5 *(out_out)* (1-out_out)* (input[2]-out_out) * w2 * (hid_out) * (1-hid_out) * input[0];
	w1 += .5 * (out_out)* (1-out_out)* (input[2]-out_out) * w2 * (hid_out) * (1-hid_out) * input[1];
	bias1 += .5* (out_out)* (1-out_out)* (input[2]-out_out) * w2 * (hid_out) * (1-hid_out) * 1;

	w2 += .5 * hid_out * out_out * (1-out_out) * (input[2] - out_out);
	bias2 += .5 * 1 * out_out * (1-out_out) * (input[2] - out_out);
  
	if (counter == 4)
	{counter = 1; input = inputarray;}
	else
	{
		counter++; input = inputarray+3;
	}}}

The Explanation of the code : In the above lines, I use the simple BP algorithm, with a learning rate of .5 (too high??). out_out is the output neurons output, hid_out is the hidden neurons output, hid_in and out_in are the corresponding inputs. w0, w1 are the weights connecting the two input neurons (having input[0] and input[1]) to the single hidden neuron. The hidden neuron has a bias of bias1. Similarly, the output neuron has weight w2 connecting it from the single hidden neuron, and it has a bias of bias2 associated with it. The counter variable is used to keep track of the input array. The training process is run 100 times (too less??), but inspite of that the net doesnt learn the XOR problem. Why is that. I have tried all sorts of representation for the input (like substituting -1 for 0, or having zeroes and ones in the input itself, but nothing seems to work. All output is either 1 or 0. This is the first NN program I have written. Till date I had concerned myself mostly with the mathematics, but it turns out that programming things is much harder. Please help!!!
I'm sure someone with more experience in AI can give a better description. But a perceptron (which is what your network is) can only classify linearly separable datasets. Google for perceptron xor for more infomation.

Basically to solve XOR you'll need to add more layers (really just one more layer).

[edit]This looks like a decent description of what's happening[/edit]
Advertisement
I think you'll need more than one node in the hidden layer to learn XOR.
I modified the code, and added an additional hidden neuron. I have also re-written parts of it for clarity. Here is the new training function.
void Layernet::Trainnet (int *inputarray){	//present the inputsint *input=inputarray;int counter = 1;double delta,delta_hid1,delta_hid2;double alpha = .4;for (int j=0; j<100; j++){	double hid1_in = w0 * input[0] + w1 *input[1] + bias1;	double hid1_out = 1 / (1 + exp (-1 * hid1_in));	double hid2_in = w2 * input[0] + w3 *input[1] + bias2;	double hid2_out = 1 / (1 + exp (-1 * hid2_in));	double out_in = w4*hid1_out + w5*hid2_out + bias3;	double out_out = 1 / (1 + exp(-1 * out_in));	//calculate the delta for output layer first	delta = (input[2] - out_out) * out_out * (1 - out_out);	//the delta for the hidden layer, using delta calculated above	delta_hid1 = delta * w4 * hid1_out * (1 - hid1_out);	delta_hid2 = delta * w5 * hid2_out * (1 - hid2_out);	//train using backpropagation, learning rate is alpha	// w0, w1, w2, w3 are the weights going from the input neurons to the hidden neurons	// w4 and w5 are the weights going from the hidden layer neurons to the o/p neuron	w0 += alpha * delta_hid1 * input[0];	w1 += alpha * delta_hid1 * input[1];	bias1 += alpha * delta_hid1 * 1;	w2 += alpha * delta_hid2 * input[0];	w3 += alpha * delta_hid2 * input[1];	bias2 += alpha * delta_hid2 * 1;	w4 += alpha * delta * hid1_out;	w5 += alpha * delta * hid2_out;	bias3 += alpha * delta * 1;	if (counter == 4)	{counter = 1; input = inputarray;}	else	{		counter++; input+=3 ;	}}}


The code that initialises the weights is as follows.
Layernet::Layernet(void){//random weights	w0 = .2l;    w1 = .13;	w2 = .11;	w3 = .22;    w4 = .31;	w5 = .12;	bias1 = .12;	bias2 = .11;	bias3 = .10;}


But the net still doesnt work. I am using a 3 layer Feed Forward Network, having 2 hidden neurons in the middle layer, and trained with the standard back-propagation algorithm (no momentun, learning rate = .2, and activation function = sigmoid)
Can someone please tell me why. Just for further clarifications on the code, w0 and w1 are the weights going from the 2 input neurons to hidden neuron 1, and bias1 is the bias of the 1st hidden neuron1. w2,w3 and bias2 are for the next hidden neuron. Finally w4,w5 and bias3 are for the output neuron. Additionally, the derivative of the sigmoid function f = (1 / (1 + exp -x)), is f * (1-f) and has been used directly in the code.
I have tried hard, maybe you can help me get this to work.

[Edit]
Here is the function that I use for o/p
int Layernet::output (int *inputarray){	double hid1_in = w0 * inputarray[0] + w1 * inputarray[1] + bias1;	double hid1_out = 1 / ( 1 + exp(-1 * hid1_in));	double hid2_in = w2 * inputarray[0] + w3 * inputarray[1] + bias2;	double hid2_out = 1 / ( 1 + exp(-1 * hid2_in));	double out_in = w4 * hid1_out + w5 * hid2_out + bias3;	double out_out = 1/ (1 + exp (-1 * out_in));	if (out_out >= .9) return 1;	else return 0;}

Also, during training I used 1,-1 instead of 1,0 but it made no difference. I get all the o/p to be 0, which most certainly isnt the XOR gate.
[/Edit]
Try increasing the number of training cycles to something like 10,000 or 100,000 and see if that helps. You're currently going through the training set 25 times, which is an extremely low number for such a basic backprop implementation. One of the weaknesses of backprop is that it can converge kind of slowly.

Also try outputting the actual output of the network instead of putting it through a step function, to see if it is at least getting closer to the correct output; for example, the output for 1,0 should at least be higher than the output for 1,1, but the way it is set up right now if both have an output less than .9 they'll both result in 0.
Thanks a lot. I increased the number of iterations to 100000, and it worked perfectly.

This topic is closed to new replies.

Advertisement