Advertisement

Using backprop without using individual errors

Started by August 25, 2006 01:14 PM
4 comments, last by Timkin 18 years, 2 months ago
I have an idea for a neural network simulator that controls a virtual robot. When the robot is doing something good, like spinning in circles (say), I want to tell it it's being good. When it's being bad, like bumping into wall, I want to tell it it's being bad. So, as I see it, I can work out some general way to calculate the over-all error of the network: when it's bad, its error is high, when it's good, its error is low. However, it isn't possible to translate this into errors for each individual output neuron. That is, I don't want to take the output neuron controlling the left wheel and say "YOU are doing something wrong, because you should be spinning faster", I can only say whether the whole network is doing something wrong. Is it still possible to use backprop in such a situation? As I understand it, backprop requires each output to be individually told what their error is. If I just have some value for the global error, could I just feed this same value to all the output neurons? Can backprop function in this way? Is there another learning algorithm I ought to be using?
It sounds like what you are trying to do is reinforcement learning :
http://en.wikipedia.org/wiki/Reinforcement_learning

Quote: Reinforcement learning differs from the supervised learning problem in that correct input/output pairs are never presented, nor sub-optimal actions explicitly corrected.


...I love wikipedia

I could probably come up with more info, but for now I'll let you see what you can get out of that search term.
Advertisement
Yeah, I realized that was what I was asking for pretty soon after making the post. None of the methods look very easy to implement, though (fortunately I'm on a university computer, so I have access to journal articles).

Doing some research, it seems that one way that might work is to have one neuron that is explicitly trained to predict the expected reward for any action as accurately as possible (as this is semi-fixed, this can be trained through backprop). Then it gets confusing. Possibly I can make the predictor neuron act as a kind of gate-keeper: If a proposed action causes a negative prediction, block the action. However, then what? I'm thinking that if an action is blocked, the network is repeatedly randomly mutated until an action is proposed that is accepted.

Has anyone done this kind of work? Does this solution sound like it could make sense?
Check out Mat Buckland's work on evolving neural controllers: www.ai-junkie.com.

Cheers,

Timkin
Quote: Original post by Timkin
Check out Mat Buckland's work on evolving neural controllers: www.ai-junkie.com.

Cheers,

Timkin


Thanks. I actually wrote my masters thesis on evolutionary ANNs, so it's a topic I'm familiar with. You're right -- EANNs are good when all you have is an overall measure of fitness. However, my aim was to try and create some creature to which I could say "now you're doing great, keep going! Oh, wait, that was bad, now you're doing poorly"... or some such nonsense. It would be far harder to adapt a GA to work in those kinds of circumstances.

I think in the end I'm going to try and separate the output of the network from the ANN, which was what was causing me problems. I think the easiest way to do this would be to train the predictor neuron, andd then hard code the output options and set it such that the option with the highest predicted reward always gets chosen. Thus the effectors are not actually outputs of the network at all (although they do act as inputs to the predictor for training).

We'll see.
Quote: Original post by Asbestos
However, my aim was to try and create some creature to which I could say "now you're doing great, keep going! Oh, wait, that was bad, now you're doing poorly"... or some such nonsense. It would be far harder to adapt a GA to work in those kinds of circumstances.


So what you're looking for then is a parameter training scheme based on correlating weights with performance. There are a variety of training schemes out there based on correlation training. (Scrounging around on my desk for relevant papers!)... check out 'Alopex'. You might also want to look at classical correlation based methods like "cascade correlation", although I haven't played with these older methods myself, so I'm not sure how applicable they will be.

I'm actually working on correlation based learning for recurrent controllers at the moment, although I don't take such a 'high level' approach to performance metrics! ;)

Cheers,

Timkin

This topic is closed to new replies.

Advertisement