I first feed forward the input data into the single hidden layer and then from the hidden layer to the output using tanh() as the activation function:
for(int i = 0; i < numhidden; ++i) {
float value = 0.0f;
for(int j = 0; j < numinputs; ++j) {
value += w[j] * x[j];
}
hidden = tanh(value);
}
for(int i = 0; i < numoutputs; ++i) {
float value = 0.0f;
for(int j = 0; j < numhidden; ++j) {
value += v[j] * hidden[j];
}
y = tanh(value);
}
I then calculate the error terms at the output layer and update the weights:
for(int i = 0; i < numoutputs; ++i) {
errors = (target - y) * (1 - (y*y));
for(int j = 0; j < numhidden; ++j) {
v[j] += lrate * errors * hidden[j];
}
}
where y[] is the output array and target[] is the target output. I don't know at this point if I should leave the updating of the weights till after I calculate the error terms for every layer of whether to update the weights in one layer and use the results of which to calculate the errors in the previous layer (I've seen implementations do both ways) ???
I then calculate the errors at the hidden layer and update the w weights (from input->hidden):
for(int i = 0; i < numhidden; ++i) {
float sum = 0.0f;
for(int j = 0; j < numoutputs; ++j) {
sum += errors[j] * v[j];
}
hiddenerrors = sum * (1 - (hidden*hidden));
}
// adjust the w weights
for(int i = 0; i < numinputs; ++i) {
for(int j = 0; j < numhidden; ++j) {
w[j] += lrate * hiddenerrors[j] * x;
}
}
As you can see I'm indexing the weights matrices as w[input][hidden] and v[hidden][output] consistently so I don't think this is the problem however the weird thing about this is that if I change the x to x[j] on the third last line there ^ it suddenly learns the pattern which is really weird considering it is clearly incorrect as j indexes into the wrong array. In this example however numinputs and numhidden are the same so the addressing is still valid. Currently the program will just run forever and not learn, the criteria for exiting the application is when every input pattern's output is equal to it's target output (within a small margin of course (0.1)). The input in this example is the typical XOR truth table. Can you see anything I'm doing wrong?
Thanks.