Racing ANN
Hi,
I created a neural network for a racing game. The track consists of 157 checkpoints. A racing line is defined between each checkpoint.
Inputs:
- Direction error of current racing line
- Direction error of next racing line
- Position error in respect to racing line
- Speed
Outputs:
- Gas
- Brake
- Steering
The ANN is trained by collected training data sets while the user is driving the car until enough training data sets are collected.
There s an XML file that contains the settings (how many data sets to collect, how many training runs, the number of layers and nodes etc) and is loaded on startup.
I currently drive the car with first gear on only, to see if it works properly.
Unfortunately, the ANN does not control the car properly, and i cant imagine why. Even though i dont use brakes at all during training, the car will halt before the first turning.
Does anyone have any idea why this happens? Would a training data analyser(filters data before feeding it to the network so that there arent a lot of user mistakes that the ANN learns) help?
When you train an AI, you need to have two sets of data: the training data and the testing data.
The training data is, in your case, the file generated by the user driving the car. I assume that this is composed of the inputs that were created as the user moved through the course, and the outputs that were created by the user's actions.
The testing data would be the inputs created by letting the computer drive a car on a brand-new course, but you shouldn't get to this yet.
What you need to be sure of is whether the ANN is accurately training itself.
1) When you apply backprop on the training set, what is the final error?
2) When you start the ANN at the begining of the training course, exactly in the same position as you were in for the training data, does the car behave as expected?
1 and 2 ought to be directly related. If you have a low error in (1), then this means, by definition, that when the ANN is given the same inputs as some line in the training data, it shoould give the correct outputs. After all, that is how error is calculted: does the ANN make any errors anywhere in the training set?
So the first thing to do is to make sure you keep running backprop until you reach a low error, and then make sure the car can run the training course. If you have a low error, yet the car cannot run the training course, you're doing something wrong in your measurement of error.
The testing of the ANN, i.e. running it in a brand-new scenario (a new course, or starting from a slightly different place), is merely a test to see whether the ANN has generalized the training data correctly. If it can run the training course perfectly, yet fails on the testing course, it means that it hasn't generalized enough. You need more training data, or data from more situations. Or the data isn't generalizable.
So work out the answers to 1 & 2 before going any further. If you calculate error correctly, these ought to be directly correlated: low error = able to move through the training set.
The training data is, in your case, the file generated by the user driving the car. I assume that this is composed of the inputs that were created as the user moved through the course, and the outputs that were created by the user's actions.
The testing data would be the inputs created by letting the computer drive a car on a brand-new course, but you shouldn't get to this yet.
What you need to be sure of is whether the ANN is accurately training itself.
1) When you apply backprop on the training set, what is the final error?
2) When you start the ANN at the begining of the training course, exactly in the same position as you were in for the training data, does the car behave as expected?
1 and 2 ought to be directly related. If you have a low error in (1), then this means, by definition, that when the ANN is given the same inputs as some line in the training data, it shoould give the correct outputs. After all, that is how error is calculted: does the ANN make any errors anywhere in the training set?
So the first thing to do is to make sure you keep running backprop until you reach a low error, and then make sure the car can run the training course. If you have a low error, yet the car cannot run the training course, you're doing something wrong in your measurement of error.
The testing of the ANN, i.e. running it in a brand-new scenario (a new course, or starting from a slightly different place), is merely a test to see whether the ANN has generalized the training data correctly. If it can run the training course perfectly, yet fails on the testing course, it means that it hasn't generalized enough. You need more training data, or data from more situations. Or the data isn't generalizable.
So work out the answers to 1 & 2 before going any further. If you calculate error correctly, these ought to be directly correlated: low error = able to move through the training set.
You need penalise the network based on the time it takes to travel through the full set of waypoints.
Also, if it were me doing this, I'd change your input state space to be:
Transverse distance to the racing line
Velocity Component perpendicular to the racing line
Velocity Component parallel to the racing line
Current Braking Setting
Then you should train the network to minimise the transverse distance (racing line tracking error) and maximise the average velocity parallel to the racing line over the whole set of waypoints.
..of course, I probably wouldn't use an ANN for this problem, but that's just me. 8)
Cheers,
Timkin
Also, if it were me doing this, I'd change your input state space to be:
Transverse distance to the racing line
Velocity Component perpendicular to the racing line
Velocity Component parallel to the racing line
Current Braking Setting
Then you should train the network to minimise the transverse distance (racing line tracking error) and maximise the average velocity parallel to the racing line over the whole set of waypoints.
..of course, I probably wouldn't use an ANN for this problem, but that's just me. 8)
Cheers,
Timkin
Hi,
thank you both for your replies.
I have a couple of questions on Timkin's post though:
How exactly would you suggest I penalize the network?
What do you mean by Current Braking Setting and how would that help the network?
I took on such a problem because I wanted to learn more about neural networks, but just out of curiosity, what method would you use for this application?
Regards
Alex
thank you both for your replies.
I have a couple of questions on Timkin's post though:
How exactly would you suggest I penalize the network?
What do you mean by Current Braking Setting and how would that help the network?
I took on such a problem because I wanted to learn more about neural networks, but just out of curiosity, what method would you use for this application?
Regards
Alex
Quote: Original post by aparaske
But just out of curiosity, what method would you use for this application?
Regards
Alex
For simple, a reactive system, like steering behaviors.
If you want more complex, then you can analyse each corner, consider what are the best strategies to take each corner (entering speed, point, point of exit, etc), then find the sequence of cornering techniques that will minimize your total track time.
Then, try to stick to that strategy as you race around.
The error rate is steadily decreasing at parts, reaching a very low value, but at other parts its random, or it suddenly increases tremendously. What do you think causes this?
Also,
@Timkin:
Could you please answer the above questions?
Do you think that using these inputs would improve the behaviour of the network?
Regards
Alex
Also,
Quote: Original post by aparaske
How exactly would you suggest I penalize the network?
What do you mean by Current Braking Setting and how would that help the network?
@Timkin:
Could you please answer the above questions?
Do you think that using these inputs would improve the behaviour of the network?
Regards
Alex
Quote: Original post by aparaske
The error rate is steadily decreasing at parts, reaching a very low value, but at other parts its random, or it suddenly increases tremendously. What do you think causes this?
If the error at one or more outputs isn't decreasing, you can't expect the car to behave well on the road.
And error not decreasing could have a few possible causes:
1) The network isn't programmed correctly
2) The network hasn't been given enough time to train, or keeps getting stuck at local minima
3) It is impossible to train from your data
We'll assume the network is propally programed, so we'll forget (1) at the moment.
If (2) is the case, there are a couple of possible solutions. First, make sure you keep iterating through backprop for enough time. There should be no further downward trend. If your currently training takes, say, ten minutes, double or tripple that to make sure you can make no further progress.
If you can make no further progress and the error at one or more outputs isn't low enough, you're getting stuck at a local minima. Make sure you try training from a number of different random networks: weights should be initialized to random numbers. Also, you should try changing the number of hidden neurons.
If the above doesn't help, it is quite possible that (3) is the case. After all, when you were driving the car to get training data, you probably didn't do the same thing all the time. At one point on a straight road maybe you vered to the left, at another point you vered right. So you're giving the network two conflicting sets of data, and it's not possible to solve both of them.
One possible solution would be to get a lot more training data into the system, and then average all your different runs. Or keep them all, and set the error to the minimum error compared to each of your runs.
To start out, I'd also make sure you're giving the ANN an easy enough problem. Can you train it on a straight road, where you never turn or break? If you can't do that, there is something wrong with how you are setting this up. If you can do it, make things more complicated a little at a time.
Let me know if you can train the network on a straight road.
Quote: Original post by Timkin:
Also, if it were me doing this, I'd change your input state space to be:
...
Current Braking Setting
I diagree. The break setting is one of the network's outputs. There is no reason to feed this back into the input, unless you're trying to get some kind of memory system. I assume that the brake setting at time t[0] has no effect on the brake setting at t[1]? If not, then adding this as an input doesn't do anything.
Quote: Original post by Asbestos
If the above doesn't help, it is quite possible that (3) is the case. After all, when you were driving the car to get training data, you probably didn't do the same thing all the time. At one point on a straight road maybe you vered to the left, at another point you vered right. So you're giving the network two conflicting sets of data, and it's not possible to solve both of them.
One possible solution would be to get a lot more training data into the system, and then average all your different runs. Or keep them all, and set the error to the minimum error compared to each of your runs.
I was afraid that would be the reason, that's why I thought a training data analyser/filter would help in this case. It could ensure that only the 'correct' moves are learnt by the network, and other, pointless moves are simply ignored. I increased and descreased the number of training data sets quite a lot, and the same for the training runs, but, even though i might get a minor improvement, it's nowhere near enough. An analyser however would need more mathetmatical operations, thus more delays for each step and less frames per second. Even though Timkin's suggestions may be correct, I doubt it will help in this problem. The network would still get some conflicting training data sets due to the user error. What do you think?
Regards
Alex
What's the purpose of this programming exercise? Is it to learn about backprop, or is it to design a racing AI?
If it's the first, then I'd keep doing what you're doing, but a little at a time. Has the ANN succeeded in learning how to race down a perfectly straight road? That should be the absolute first step, because if you can't train it to do that, there's something wrong with your set-up.
Also, you should try different methods of assigning error. Like I mentioned above, you could either calculate the error as the difference between the current output and the average of all your outputs at that point (where "that point" is ALL points with the exact same input. So a straight stretch of road looks like any other straight stretch of road, and so all the actions on ANY straight stretch of road ought to be averaged), or you could set it as the smallest difference between the current output and EACH ONE of your outputs.
----
However, if your reason for programming this is the second, I don't think this is the right way to go about it. If you think about it, what are you trying to do with this network? What you WANT to be doing is to reward it for successfully navigating the course. What you ARE doing is punishing individual neurons for not behaving exactly like you.
So if you really want to go the neural network route, I think you need a system that rewards the entire system, not one that punishes individual neurons. The best example of this is an evolutionary neural network, which I think would be able to solve this problem quite easily. Alternatively, you can look up some reinforcement learning techniques, but they're pretty hard.
Or you could skip the whole network, as Timkin said. There are plenty of good steering behaviors algorithms that do exactly what you're after. They're probably not quite as fun to program, however.
If it's the first, then I'd keep doing what you're doing, but a little at a time. Has the ANN succeeded in learning how to race down a perfectly straight road? That should be the absolute first step, because if you can't train it to do that, there's something wrong with your set-up.
Also, you should try different methods of assigning error. Like I mentioned above, you could either calculate the error as the difference between the current output and the average of all your outputs at that point (where "that point" is ALL points with the exact same input. So a straight stretch of road looks like any other straight stretch of road, and so all the actions on ANY straight stretch of road ought to be averaged), or you could set it as the smallest difference between the current output and EACH ONE of your outputs.
----
However, if your reason for programming this is the second, I don't think this is the right way to go about it. If you think about it, what are you trying to do with this network? What you WANT to be doing is to reward it for successfully navigating the course. What you ARE doing is punishing individual neurons for not behaving exactly like you.
So if you really want to go the neural network route, I think you need a system that rewards the entire system, not one that punishes individual neurons. The best example of this is an evolutionary neural network, which I think would be able to solve this problem quite easily. Alternatively, you can look up some reinforcement learning techniques, but they're pretty hard.
Or you could skip the whole network, as Timkin said. There are plenty of good steering behaviors algorithms that do exactly what you're after. They're probably not quite as fun to program, however.
I want to design AI with neural networks, to get a glipse of how they work and what are capable of.
How can I reward the network? Could you direct me to a source where i can find information and examples of what you're talking about?
Alex
Quote: Original post by Asbestos
However, if your reason for programming this is the second, I don't think this is the right way to go about it. If you think about it, what are you trying to do with this network? What you WANT to be doing is to reward it for successfully navigating the course. What you ARE doing is punishing individual neurons for not behaving exactly like you.
So if you really want to go the neural network route, I think you need a system that rewards the entire system, not one that punishes individual neurons. The best example of this is an evolutionary neural network, which I think would be able to solve this problem quite easily. Alternatively, you can look up some reinforcement learning techniques, but they're pretty hard.
How can I reward the network? Could you direct me to a source where i can find information and examples of what you're talking about?
Alex
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement