Advertisement

Question about ANN

Started by March 02, 2011 09:13 PM
4 comments, last by willh 13 years, 8 months ago
Hi, I was just wondering:

If you provide a large set of data such as weekly profits to an artificial neural network (Using the back propagation algorithm) is it true that it can never fully be trained to give the correct answer (even for the original training data) such as it can with say the XOR truth table? e.g. say you had 2 years of weekly sales 104 input vectors. and you trained the network to predict a certain weeks sales given a previous 6 weeks or so. There could possibly be duplicate 6 weeks somewhere where the network could not differentiate?

I'm asking this because I've recently implemented a network designed for a similar task and it never seems to learn completely. When I measure the least mean square (total) per iteration its slowly but surely decreasing but it's still at a very high value such as 235 and it goes down at something like 0.001 per every 10 secs or so. is this normal?
On topic: Your neural network question

Your network has 6 inputs and 1 output? And you have 104 training pairs? Does your network have hidden layers? How many weights in total are there?

Some preliminary answers:
1.) Even if there exist weights that would cause your ANN to approximate the training data with zero error, backprop is not guaranteed to converge to them.
2.) As for speed: Backprop is just gradient descent. It's not uncommon for gradient descent to be very slow, particularly in high-dimensional spaces.

Off topic: A simpler, faster alternative


If you continue to have problems, you might want to consider linear least squares regression, which can be solved very quickly by inverting a matrix of inner products, and which, like ANNs, can also approximate smooth functions arbitrarily well with enough basis functions. By linear I do not mean linear in the inputs; I only mean linear in the weights.

What the heck; here's how it works. Given training pairs (x1,y1)...(xN,yN), and assuming a function of the form

f(x,w) = w1 f1(x) + ... + wm fm(x)

you want to find the least-squares solution w=(w1,...,wm) to the system of equations,

w1 f1(x1) + ... + wm fm(x1) = y1
...
w1 f1(xN) + ... + wm fm(xN) = yN

or equivalently,

M w = y

where Mij = fj(xi). The solution is,

w = (M' M)^(-1) M' y .

Done. Grab a conjugate gradient solver off the shelf (or even implement your own; Wikipedia explains how and it's not too hard) and go home happy. Or direct-solve with QR decomposition of M if it's not too huge.
Advertisement
If you have a model that perfectly "predicts" your in-sample data, chances are it won't work at all on out-of-sample data. This phenomenon is called overfitting.

If you have a model that perfectly "predicts" your in-sample data, chances are it won't work at all on out-of-sample data. This phenomenon is called overfitting.


So in this case is it best to just train the network for a certain fixed period of time even though the error will remain quite high?
I said in original post that my error value was really high but I've just discovered that I was calculating it incorrectly. If I calculate the average error for all samples it stalls at around 0.8 now (where the input data values range from about 0-3). But (to me) this makes sense as the algorithm cannot fully converge for a data collection that does not have unique input vectors. Is this assumption correct?

and in that case would I just need to train the network for a certain period of time before using it to do the predictions?

I said in original post that my error value was really high but I've just discovered that I was calculating it incorrectly. If I calculate the average error for all samples it stalls at around 0.8 now (where the input data values range from about 0-3). But (to me) this makes sense as the algorithm cannot fully converge for a data collection that does not have unique input vectors. Is this assumption correct?

and in that case would I just need to train the network for a certain period of time before using it to do the predictions?


If you have two training sets with identical inputs (v1 == v2) but different outputs ((f(v1) != f(v2)), then your ANN will NEVER figure it out. The answer, of course, is to add another input-- there has to be something else which can diferentiate between the two sets.

In theory you should be able to get a correct answer, but this will largely be dependant on the architecture of your network and the parameters you are using to train it. i.e. XOR won't work with 2 perceptrons, no matter how many training runs it goes through.

This topic is closed to new replies.

Advertisement