hi.
i've been having some troubles with my neural net not being able to train.
here's some info so hopefully i can get some pointers to look for where i'm going wrong.
context:
my neural net is a simple one. i am using it to process some vectorised mouse input and have only 3 gestures for it to recognise. to this end it is simply a 24 input node, 6(to be tweaked) node hidden layer and a 3 node output layer. it uses a sigmoid function for it's activation and backprop for the training method.
issue:
i have got the neural net activating and setting up with random weights but as it trains i notice the values for each of the outputs to gestures tending towards (1, 1, 1) instead of (1, 0, 0), (0, 1, 0) and (0, 0, 1). This training is way out of whack and i have been debugging it as best i can and have noticed some errors which have failed to complete the network. the mean squared error also tends to 6 and stops there althout it will happily train for ever.
questions:
has anyone else had these issues with their neural nets and could tell me where i might look to check them out?
i am going to review all of my processing code to ensure the network is set up correctly but have been doing this and comparing to another working net recently to no avail.
*when comparing with the other net i did discover the only real difference was that i was recalculating my output layer weights before calculating my hidden layer error values whereas they were calculating all the error values and then recalculating the weights. but i dont see how this could cause it to screw up so badly*
code samples:
here's some code from the net that should be useful in understanding my approach. i will abstract some of it to avoid explaining all the classes i have used yadda yadda...
activation function
double sigmoid (double dInput, double dResponse) {
return ( 1 / (1 + exp (-dInput / dResponse) ));
}
backprop error function
double ANN::BackProp::Output::Error (double dObserved, double dExpected) {
return ((dExpected - dObserved) * (dObserved * (1 - dObserved)));
}
weight change for output layer
double ANN::BackProp::Output::WeightChange (double dObserved, double dError, double dLastMomentumChange, double dLearningRate) {
return (dObserved * dError * dLearningRate);
}
training iteration
for each training step we have the code...
bool TrainIter (std::vector <double> * vecdObserved, std::vector <double> * vecdExpected) {
double error = 0.0, sum=0, dWeightAdjust=0;
int n=0;
// loop through output layer calculating error
for (int i=0; i < m_vecnOutLayer.size (); i++) {
error = Error (vecdObserved->at (i), vecdExpected->at (i));
m_vecnOutLayer.at (i).m_dError = error;
double dTempError = (vecdObserved->at (i) - vecdExpected->at (i));
m_dSSE += dTempError*dTempError;
// adjust weight of h>o layer weights
for (int j=0; j < m_vecnHiddenLayer.size (); j++) {
dWeightAdjust = WeightChange (vecdObserved->at (i), vecdExpected->at (i), m_vecnOutLayer.at (i).m_vecdLastWeights.at (j));
m_vecnOutLayer.at (i).m_vecdLastWeights.at (j) = dWeightAdjust;
m_vecnOutLayer.at (i).m_vecdWeights.at (j) += dWeightAdjust;
}
dWeightAdjust = m_vecnOutLayer.at (i).m_dError * STD_BIAS * STD_LEARNINGRATE ;
m_vecnOutLayer.at (i).m_dBiasWeight += dWeightAdjust;
m_vecnOutLayer.at (i).m_dBiasLastWeight = dWeightAdjust;
}
for (int i=0; i < m_vecnHiddenLayer.size (); i++) {
// calc error for hidden layer
sum = 0.0;
for (int j=0; j < m_vecnOutLayer.size (); j++) {
sum += m_vecnOutLayer.at (j).m_vecdWeights.at (i) * m_vecnOutLayer.at (j).m_dError;
}
sum *= (m_vecnHiddenLayer.at (i).m_dValue * (1 - m_vecnHiddenLayer.at (i).m_dValue));
m_vecnHiddenLayer.at (i).m_dError = sum;
// adjust weight of i>h layer weights
for (int j=0; j<vecdObserved->size (); j++) {
dWeightAdjust = BackProp::Hidden::WeightChange (vecdObserved->at (j), sum, m_vecnHiddenLayer.at (i).m_vecdLastWeights.at (j));
m_vecnHiddenLayer.at (i).m_vecdWeights.at (j) += dWeightAdjust;
m_vecnHiddenLayer.at (i).m_vecdLastWeights.at (j) = dWeightAdjust;
}
m_vecnHiddenLayer.at (i).m_dBiasWeight += m_vecnHiddenLayer.at (i).m_dError * STD_BIAS * STD_LEARNINGRATE ;
}
if (m_dSSE < STD_ERRORTHRESHOLD) {
return (true);
}
return (false);
}
i was going to abstract the above iteration code but it is pretty simple.
there's two vectors of 'nodes' each containing a list of the weight values for the incoming nodes.
each iter is called once the nn has been fed and processed the next of the gestures in repeating order.
thanks.
nib
Edit:
i have looked over the processing function of the nn and it is simply carrying out the neural net.
for node X sum the values of the nodes in the layer below with the apropriate weighting value. add the bias * bias weight. and feed through the activation function. et voila. nothing fancy and nothing wrong. which leads me to believe the error is in the area described above.
[Edited by - niblick on July 6, 2006 4:59:07 AM]