Advertisement

Artificial Neural Network and NPC AI

Started by July 30, 2006 08:22 AM
12 comments, last by jolyqr 18 years, 6 months ago
Quote:
Original post by kirkd
I don't really understand your question.

What I tried to say is that if you train your NN to 100% accuracy/performance on the training set, the net will likely find random correlations which allow it to get to that level. Then when you use the net in a new situation outside those used for training, it may not perform well. I would bet that it would perform quite poorly.

The method I described is referred to as cross-validation. Take you dataset of 50 patterns, as you called them. Take out 10 at random and use the remaining 40 to train your net. The training protocol is trying to maximize performance on thost 40. If you watch the performance on teh 10 that were left out at random during the training, what you'll typically see is that the net starts off terrible (as expected), gradually it improves in performance, and then it starts to get worst. All the while it is gettting better and better on the 40 training patterns. This is where overtraining starts to set in. The net has captured most of the detail of the problem and is now starting to find random correlations in the training data. These leads to a loss in performance on the 10 left out patterns.

What typically is done is that the process is repeated a number of times in order to make sure that each training pattern was left out at some point, and also to ensure that the random selection of 10 to leave out was not biased in some way. By repeating the process and then averaging the performance results, you get a better estimate of how well the net will perform in a new situation.

I hope that helps.

-Kirk


Basically I'm trying to understand what you called cross-validation. I understand that I should train a pattern of 40 inputs, but I don't understand the use of the 10 inputs patterns.

Sorry guy, you're doing your best to explain me, but when I don't understand something, I cannot stay without asking...


thanks again for your explaning
Think of it a different way. Suppose we have a classifier that predicts good or bad for some arbitrary thing. With cross-validation, take some portion as the training set and some portion as the test set. The training set is used to make a classifier and then you use that newly developed classifier on the test set. What typically happens is that you get good results on the training set and not so good results on the test set. This is the overtraining problem I talked about.

Consider a neural network for this purpose. You use the training set for training the neural network. The training set is used to figure out how to adjust the weights. You continue training on this set, but the question comes up about when do you know when to stop. Now, take the test set and while your training the network, at each step of training check it to see how well it predicts the test set. Again, the network is determining its weights using the training set and it will continue to improve in predicting the values for that set. But, if you keep track of how well it predicts the test set at each step you'll see that it gets better for a while and then starts to get worse. The point during training at which it starts to get worse is the point of overtraining and the point at which to stop training.

At this stage, you could continue training and continue to get an improvement on the traing set, but you would start getting worse and worse on the test set. This is obviously not a good situation, so you stop the process.

Any clearer?

-Kirk

Advertisement
Quote:
Original post by jolyqr
okay. you're trying to say that the hidden neurons have to belong to the interval:

[sqrt(m),m^2]

where m is the input numbers.


No, that's not what I was saying. I was saying that based on my experience, the number of hidden nodes discovered in automated structural learning is anywhere between sqrt(m) and m^2, where m is the number of input nodes (length of the input vector for a vector (linear algebra) representation of the network).

It has nothing to do with the sorts of hidden nodes used and their input domain (well, actually the resulting number of hidden nodes does depend on their partition properties, but you can ignore this).
I have calculated the error of each output, added them together and it gave me a global error :

error1=(T-Y)^2 -> for output 1
error2=(T-Y)^2 -> for output 2
error3=(T-Y)^2 -> for output 3

Error+=error1+error2+error3 -> global error

The problem is that instead of decreasing when the epoch increases, the global error increases.

basically for each epoch, the error stays equal to 153.69858

[Edited by - jolyqr on August 18, 2006 4:26:48 PM]

This topic is closed to new replies.

Advertisement