Neuroevolution crossover operators... do they really work?
I've been experimenting with neuro-evolution, which for those that don't know, is just neural-networks trained with a genetic algorithm.
I'm using the normal genetic algorithm, so I evaluate the fitness of a list of candidates solutions, pick two 'parents', crossover them to create two children, mutate the children, and add the children to the new population
The 'genome' for a solution here is just the list of weights for the neural network.
For selection I'm using roullette wheel selection
For crossover I'm using two-point crossover (making sure to only pick points that are in between neurons)
And for mutation I'm just randomly perturbing the weights
I'm not using a high crossover rate because I don't see how crossover for neural nets would actually speed up the evolution. If I divide the weights of two parents between the children, won't I be losing any work that either parent made to get to the desired output?
What else are people doing for neuroevolution?
As opposed to haveing one population of NN's, you can create a (sub-)population per neuron and a populations of so called blueprints (references to one member of a sub-population). You cross-over the blueprints and you cross-over the neuron-sub-populations, evaluate the blueprints for fitnesss and translate that to neuron-fitness by averaging the the performance of the blueprint(s) each member participates in. I'm sure this is not very clear, so here's a link:
http://www.cs.bham.ac.uk/~pxt/NIL/nn-coevolution.pdf
The idea is to improve specialisation (per sub-population) and then get the best combination of members of those neuron-sub-populations (by means of the blueprint-spec, basically a list of pointers to neurons).
Well anyway, what I wrote is not to clear, but the paper is much clearer! To me it made sense at least.
EDIT: The reason why this works better than doing: "The 'genome' for a solution here is just the list of weights for the neural network." can be understood as follows: Training a neural network some way, will make individual neurons "specialize" in some function/feature of the problem posed to it. In the standard approach as stated above, every network will be different in the sense that they might have different feature detectors, but not only that, they might have the same (some) feature detectors, but on different places in the network, so by randomly cutting up two networks and swapping the halfs and slapping them together, you might get the same feature detector (a neuron) twice in one network and not at all in the other network. The chances of destroying a network are actually much bigger than improving it. The above approach avoids this problem by making the evolution at the weighht level be local to a neuron and overlay this with evolution at the blueprint level.
[Edited by - Degski on April 25, 2006 2:08:20 PM]
http://www.cs.bham.ac.uk/~pxt/NIL/nn-coevolution.pdf
The idea is to improve specialisation (per sub-population) and then get the best combination of members of those neuron-sub-populations (by means of the blueprint-spec, basically a list of pointers to neurons).
Well anyway, what I wrote is not to clear, but the paper is much clearer! To me it made sense at least.
EDIT: The reason why this works better than doing: "The 'genome' for a solution here is just the list of weights for the neural network." can be understood as follows: Training a neural network some way, will make individual neurons "specialize" in some function/feature of the problem posed to it. In the standard approach as stated above, every network will be different in the sense that they might have different feature detectors, but not only that, they might have the same (some) feature detectors, but on different places in the network, so by randomly cutting up two networks and swapping the halfs and slapping them together, you might get the same feature detector (a neuron) twice in one network and not at all in the other network. The chances of destroying a network are actually much bigger than improving it. The above approach avoids this problem by making the evolution at the weighht level be local to a neuron and overlay this with evolution at the blueprint level.
[Edited by - Degski on April 25, 2006 2:08:20 PM]
What you're hoping for is that some segment of one parent is beneficial to some segment of the other parent. Each parent may have a highly optimized subset of weights but another subset of weights that are not so good. Crossover is hoping that you can find this complementary set. The probability of finding such a situation is very small, but that's how evolution works. All you need is a tiny advantage over random and you're on your way.
I assume you have mutation also involved, correct? Try this experiment: note the change in a net's fitness before and after mutation or crossover, and keep a tally of how many times you get an improvement. You can do this on a generational basis and plot out the percentage of improvements (y-axis) vs the generation number (x-axis). By seperating crossover and mutational improvements you'll be able to see how much each is contributing to the overall fitnees, and by plotting it relative to generation number, you can see how that changes over time.
And, post the results here. 8^) I'm sure we'd all like to see them.
-Kirk
I assume you have mutation also involved, correct? Try this experiment: note the change in a net's fitness before and after mutation or crossover, and keep a tally of how many times you get an improvement. You can do this on a generational basis and plot out the percentage of improvements (y-axis) vs the generation number (x-axis). By seperating crossover and mutational improvements you'll be able to see how much each is contributing to the overall fitnees, and by plotting it relative to generation number, you can see how that changes over time.
And, post the results here. 8^) I'm sure we'd all like to see them.
-Kirk
Quote:
What else are people doing for neuroevolution?
You can evolve grammars representing the ANN instead of directly evolving the ANN itself. This is a complex example, but you can make it much simpler.
Quote: Original post by Degski
EDIT: The reason why this works better than doing: "The 'genome' for a solution here is just the list of weights for the neural network." can be understood as follows: Training a neural network some way, will make individual neurons "specialize" in some function/feature of the problem posed to it. In the standard approach as stated above, every network will be different in the sense that they might have different feature detectors, but not only that, they might have the same (some) feature detectors, but on different places in the network, so by randomly cutting up two networks and swapping the halfs and slapping them together, you might get the same feature detector (a neuron) twice in one network and not at all in the other network. The chances of destroying a network are actually much bigger than improving it. The above approach avoids this problem by making the evolution at the weighht level be local to a neuron and overlay this with evolution at the blueprint level.
That's exactly what I wanted to know, thank you :)
I'll check out that paper, it sounds very promising
Quote: Original post by kirkd
And, post the results here. 8^) I'm sure we'd all like to see them.
-Kirk
Good idea. This is what I did: I printed out the average fitness of the two parents involved in the crossover versus the average fitness of the two children. As I expected, 95% of the time the average fitness would go down. I think this verifies the comment degski made, at least for my specific problem
All I'm doing is using the crossover operator from the book "AI techniques for game programming". If crossover really IS this bad I'm surprised that it made its way into the book...
Anyway, thanks everyone for your input
Regarding your experiment, you said:
The average of the parents vs. the average of the children may not be the best measure. I would expect that you'll have small improvements and lots of significant decreases. If you have one child improve slightly and the other child gets really bad, then the average will make it look like they both were worse. I don't honestly know what the result would be, but comparing the average of the two children could potentially bias your results. I would look at the children independently and keep track of whether one child is better than one or both of the parents.
Also, I wouldn't be surprised if you have a large number of decreases and fewer improvements, but this is likely to change over time. The first generations should have more improvements and then as the population converges the number of improvements will drop off significantly. It might be interesting to see the number of improvements per generation.
Regarding Degski's comments, the idea of incorporating feature selection (which inputs to use) as well as structural optimization of the network is a very good one. I've seen a number of papers where this was implemented successfully. The only issue is that it becomes a more complex problem, but once you're past the complexity, the results would be very interesting.
-Kirk
Quote:
I printed out the average fitness of the two parents involved in the crossover versus the average fitness of the two children. As I expected, 95% of the time the average fitness would go down.
The average of the parents vs. the average of the children may not be the best measure. I would expect that you'll have small improvements and lots of significant decreases. If you have one child improve slightly and the other child gets really bad, then the average will make it look like they both were worse. I don't honestly know what the result would be, but comparing the average of the two children could potentially bias your results. I would look at the children independently and keep track of whether one child is better than one or both of the parents.
Also, I wouldn't be surprised if you have a large number of decreases and fewer improvements, but this is likely to change over time. The first generations should have more improvements and then as the population converges the number of improvements will drop off significantly. It might be interesting to see the number of improvements per generation.
Regarding Degski's comments, the idea of incorporating feature selection (which inputs to use) as well as structural optimization of the network is a very good one. I've seen a number of papers where this was implemented successfully. The only issue is that it becomes a more complex problem, but once you're past the complexity, the results would be very interesting.
-Kirk
I created GA for multilayer network and tested it on simple say RGB classification of say red and blue points.
In my case only high crossover worked in getting population of all the fittest ones, lower crossover resulted in incapability to population convergence to solution!
The key point I used 1,2 point crossover and the genes(weights) are actually integers, so we get proper mutation by inverting bits.
int gene;
Floating point from it is hiword(gene)/loword(gene).
And the chromosomes are individual neurons of the ANN with number of input weights as number of genes.
Hence diversity is important, convergence of population even with very low mutation rate of 1 chance from 1000 and every chromosome crossover
In my case only high crossover worked in getting population of all the fittest ones, lower crossover resulted in incapability to population convergence to solution!
The key point I used 1,2 point crossover and the genes(weights) are actually integers, so we get proper mutation by inverting bits.
int gene;
Floating point from it is hiword(gene)/loword(gene).
And the chromosomes are individual neurons of the ANN with number of input weights as number of genes.
Hence diversity is important, convergence of population even with very low mutation rate of 1 chance from 1000 and every chromosome crossover
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement