srand((unsigned)(time(NULL)));
srand(1269290629);
srand((unsigned)(time(NULL)));
srand(1269290629);
Quote: Original post by Deliverance
I've been playing with neural networks these days and found quite an interesting thing(for me) about them. I'm trying to solve the XOR problem using a two layer network. I found that in the initial phase every weight must be initialized with some random value. These random values seem to be very important or so i found. There are combinations of random valuse that will cause the neural network to not converge to a solution, and i wonder why is this so? How can i initialize the random values so that the neural network will always converge?
Take as an example the code here
in the file bpnet.h replace line 31 from this
*** Source Snippet Removed ***
to this
*** Source Snippet Removed ***
Now, when running the sample you'll see that the neural network did not successufully converge to an approximation of the XOR function. Why is that?
Quote: Original post by Predictor
[...sensible stuff that I agree with completely...]
Also, consider that "convergence" of the training process is likely not necessary not even desirable: By the time the training process quits, you have likely overfit the data. Better results can be had through early stopping or constraining the number of hidden nodes in the model.
Join us in Vienna for the nucl.ai Conference 2015, on July 20-22... Don't miss it!
Quote: Original post by alexjc
Note that if you use 2-3-1 as a network configuration, the local minima disappears if I remember correctly...
Join us in Vienna for the nucl.ai Conference 2015, on July 20-22... Don't miss it!
Quote: Original post by alvaroQuote: Original post by Predictor
[...sensible stuff that I agree with completely...]
Also, consider that "convergence" of the training process is likely not necessary not even desirable: By the time the training process quits, you have likely overfit the data. Better results can be had through early stopping or constraining the number of hidden nodes in the model.
I have never been very convinced by early stopping. I fully appreciate how much of a danger overfitting is, and the natural solution to me seems to be to reduce the number of free parameters that make up the model (i.e., fewer hidden nodes). If the model has more parameters than what the data grants, won't early stopping give us a function that still has too many "wrinkles", except now they are random instead of overfit?
Have you had good experiences using early stopping? Perhaps there is some way of looking at it that I am missing?