Hello to everybody :) I've been learning NN's for some time now and I've developed a NN class which I have successfully trained for (as an example) OR/AND logic function (using 1 neuron). The training was based on the simple 'learning rule':
new weight = old weight + delta * input
new bias = old bias - delta
where: delta = target value - output value.
Everything worked fine, no problems there. However if I got it wrong let me know :) So later I decided to advance to multilayer NN and so I did. And I wanted to start with the immortal XOR function. And it turned out I can't train it... I have a network of 3 layers: 2 hidden and 1 output. Both input layers have two neurons and the output layer has one (so 5 neurons in total). All neurons look the same: 2 inputs + bias (with a constant 'input' of -1) and 1 output. I use the sigmoid as the activation function. To train it I needed backprop so I started reading. And I have read a lot. And I've learned that not a single one of the articles I read was clear enough. So I was lured here... ;) (after reading a few posts from this forums) So, the questions: #1 How to change weights? Currently I'm using this equation for all neurons (same for _all_ layers!):
foreach weight:
weight = weight + learnRate * input * output * (1-output) * delta
where:
learnRate -> constant, 0.3 for now so I get good (yet slow) results
input -> the input 'coming in' through this weight
output -> the output from this entire neuron
output * (1-output) -> the derivative of the sigmoid function
delta -> the error value calculated like this:
for output layer: delta = target - output
for each previous layer delta gets backproped and multiplied by the weight of the connection it travels through.
If it comes from a few connections it's summed.
This is based and drawn really nice on
this site. What's wrong with this algorithm? ---------------------- #2 How to change the bias? At the moment I just do:
bias = bias - learnRate * delta
---------------------- #3 How do I implement momentum? None of the articles I've read clarified what to multiply by the infamous alpha parameter when updating weight number i. Should I use weight(i-1) (the previously updated one)? If so, how to calculate momentum for i=0? ---------------------- To summarize I need to tell you that the network sometimes (VERY seldom) trains OK and does after all work for XOR. And when it does this the output is very good (value for zero is really low and value for one is almost 1). But most of the time it can't manage it. Even after 100000 iterations of backprop... The fun thing is that when I wanted to change XOR to OR and see if it manages (still using a multilayer NN) it almost did. For the OR training set it almost always found the right solution and very seldom failed. So, as you can see something's wrong and I can't find it... I will be very grateful for all your comments :) Regards, QmQ
UPDATE I'm also including this dump to make all clear. It's a bit long but it's for demo purposes only.
BEFORE TRAINING | AFTER
0 XOR 0 = 0 (0.389031) -> 1 (0.540423)
0 XOR 1 = 0 (0.371058) -> 1 (0.507694)
1 XOR 0 = 0 (0.350031) -> 1 (0.920854)
1 XOR 1 = 0 (0.333749) -> 0 (0.044733)
0 XOR 0 = 0 (0.476952) -> 1 (0.590870)
0 XOR 1 = 0 (0.482337) -> 1 (0.598985)
1 XOR 0 = 0 (0.426534) -> 1 (0.584926)
1 XOR 1 = 0 (0.445482) -> 0 (0.244864)
0 XOR 0 = 0 (0.178641) -> 0 (0.013935)
0 XOR 1 = 0 (0.192264) -> 0 (0.494459)
1 XOR 0 = 0 (0.183975) -> 1 (0.983203)
1 XOR 1 = 0 (0.199005) -> 0 (0.494931)
0 XOR 0 = 1 (0.519888) -> 0 (0.019064)
0 XOR 1 = 0 (0.453202) -> 1 (0.980676)
1 XOR 0 = 0 (0.450809) -> 1 (0.980603)
1 XOR 1 = 0 (0.399967) -> 0 (0.019860)
0 XOR 0 = 0 (0.185017) -> 0 (0.226056)
0 XOR 1 = 0 (0.204674) -> 1 (0.586697)
1 XOR 0 = 0 (0.183746) -> 1 (0.586518)
1 XOR 1 = 0 (0.202097) -> 1 (0.587045)
0 XOR 0 = 1 (0.597928) -> 0 (0.022980)
0 XOR 1 = 1 (0.627749) -> 0 (0.490592)
1 XOR 0 = 1 (0.644793) -> 1 (0.982135)
1 XOR 1 = 1 (0.671332) -> 0 (0.491087)
0 XOR 0 = 1 (0.647987) -> 1 (0.588371)
0 XOR 1 = 1 (0.620041) -> 1 (0.866711)
1 XOR 0 = 1 (0.624817) -> 1 (0.555343)
1 XOR 1 = 1 (0.594609) -> 0 (0.008398)
0 XOR 0 = 1 (0.522973) -> 0 (0.019253)
0 XOR 1 = 1 (0.531933) -> 1 (0.980488)
1 XOR 0 = 1 (0.529388) -> 1 (0.980402)
1 XOR 1 = 1 (0.540660) -> 0 (0.020064)
0 XOR 0 = 0 (0.297622) -> 1 (0.628855)
0 XOR 1 = 0 (0.313843) -> 1 (0.578957)
1 XOR 0 = 0 (0.281573) -> 1 (0.566705)
1 XOR 1 = 0 (0.301616) -> 0 (0.246367)
0 XOR 0 = 1 (0.578521) -> 1 (0.530156)
0 XOR 1 = 1 (0.579873) -> 1 (0.870113)
1 XOR 0 = 1 (0.566371) -> 1 (0.616105)
1 XOR 1 = 1 (0.562629) -> 0 (0.000170)
0 XOR 0 = 0 (0.468334) -> 1 (0.531848)
0 XOR 1 = 1 (0.509844) -> 1 (0.679352)
1 XOR 0 = 0 (0.442307) -> 1 (0.669115)
1 XOR 1 = 0 (0.481913) -> 0 (0.145127)
0 XOR 0 = 1 (0.691101) -> 1 (0.589911)
0 XOR 1 = 1 (0.712194) -> 1 (0.850123)
1 XOR 0 = 1 (0.666724) -> 1 (0.513916)
1 XOR 1 = 1 (0.685775) -> 0 (0.064833)
0 XOR 0 = 1 (0.546131) -> 0 (0.229658)
0 XOR 1 = 1 (0.567433) -> 1 (0.894752)
1 XOR 0 = 1 (0.517446) -> 0 (0.434778)
1 XOR 1 = 1 (0.535919) -> 0 (0.435251)
0 XOR 0 = 1 (0.743372) -> 0 (0.291190)
0 XOR 1 = 1 (0.738016) -> 1 (0.896570)
1 XOR 0 = 1 (0.725057) -> 0 (0.404596)
1 XOR 1 = 1 (0.724783) -> 0 (0.405093)
0 XOR 0 = 0 (0.434931) -> 0 (0.158733)
0 XOR 1 = 0 (0.420743) -> 1 (0.916917)
1 XOR 0 = 0 (0.419672) -> 0 (0.457688)
1 XOR 1 = 0 (0.407032) -> 0 (0.458170)
0 XOR 0 = 1 (0.718538) -> 1 (0.584902)
0 XOR 1 = 1 (0.748483) -> 1 (0.869437)
1 XOR 0 = 1 (0.693568) -> 1 (0.559624)
1 XOR 1 = 1 (0.715729) -> 0 (0.004674)
0 XOR 0 = 0 (0.298970) -> 0 (0.020819)
0 XOR 1 = 0 (0.299558) -> 0 (0.491469)
1 XOR 0 = 0 (0.295715) -> 1 (0.982536)
1 XOR 1 = 0 (0.296965) -> 0 (0.491933)
0 XOR 0 = 0 (0.463896) -> 1 (0.586986)
0 XOR 1 = 0 (0.479988) -> 1 (0.851331)
1 XOR 0 = 0 (0.484199) -> 1 (0.504883)
1 XOR 1 = 0 (0.499989) -> 0 (0.075276)
0 XOR 0 = 1 (0.799050) -> 0 (0.017166)
0 XOR 1 = 1 (0.811284) -> 1 (0.984691)
1 XOR 0 = 1 (0.804772) -> 0 (0.489342)
1 XOR 1 = 1 (0.815360) -> 0 (0.493099)
0 XOR 0 = 1 (0.747252) -> 0 (0.019117)
0 XOR 1 = 1 (0.735567) -> 1 (0.980641)
1 XOR 0 = 1 (0.742461) -> 1 (0.980535)
1 XOR 1 = 1 (0.733916) -> 0 (0.019912)
0 XOR 0 = 0 (0.384284) -> 1 (0.545196)
0 XOR 1 = 0 (0.386848) -> 1 (0.524427)
1 XOR 0 = 0 (0.354438) -> 1 (0.922754)
1 XOR 1 = 0 (0.355706) -> 0 (0.021771)
0 XOR 0 = 1 (0.670121) -> 1 (0.577314)
0 XOR 1 = 1 (0.667188) -> 1 (0.870601)
1 XOR 0 = 1 (0.635874) -> 1 (0.510985)
1 XOR 1 = 1 (0.632673) -> 0 (0.059036)
0 XOR 0 = 0 (0.465121) -> 0 (0.019321)
0 XOR 1 = 1 (0.536488) -> 1 (0.981718)
1 XOR 0 = 1 (0.506689) -> 0 (0.489771)
1 XOR 1 = 1 (0.564385) -> 0 (0.493413)
0 XOR 0 = 0 (0.461397) -> 1 (0.575232)
0 XOR 1 = 0 (0.449817) -> 1 (0.612399)
1 XOR 0 = 0 (0.465259) -> 1 (0.597774)
1 XOR 1 = 0 (0.456207) -> 0 (0.234763)
0 XOR 0 = 0 (0.453744) -> 0 (0.023842)
0 XOR 1 = 0 (0.474004) -> 1 (0.978284)
1 XOR 0 = 0 (0.478326) -> 0 (0.492413)
1 XOR 1 = 0 (0.497930) -> 0 (0.492878)
0 XOR 0 = 1 (0.635919) -> 0 (0.019710)
0 XOR 1 = 1 (0.626395) -> 1 (0.980007)
1 XOR 0 = 1 (0.622861) -> 1 (0.980013)
1 XOR 1 = 1 (0.615217) -> 0 (0.020488)
0 XOR 0 = 0 (0.368990) -> 1 (0.540470)
0 XOR 1 = 0 (0.359281) -> 1 (0.502355)
1 XOR 0 = 0 (0.373723) -> 1 (0.917480)
1 XOR 1 = 0 (0.362805) -> 0 (0.053266)
0 XOR 0 = 1 (0.538190) -> 1 (0.587219)
0 XOR 1 = 0 (0.485423) -> 1 (0.867185)
1 XOR 0 = 1 (0.524154) -> 1 (0.538894)
1 XOR 1 = 0 (0.477507) -> 0 (0.025398)
0 XOR 0 = 0 (0.317085) -> 0 (0.300653)
0 XOR 1 = 0 (0.314979) -> 1 (0.896030)
1 XOR 0 = 0 (0.313905) -> 0 (0.400344)
1 XOR 1 = 0 (0.311818) -> 0 (0.400875)
0 XOR 0 = 1 (0.651883) -> 1 (0.569301)
0 XOR 1 = 1 (0.667792) -> 1 (0.613104)
1 XOR 0 = 1 (0.665322) -> 1 (0.598036)
1 XOR 1 = 1 (0.680291) -> 0 (0.238971)