Question about genetic algorithms

JimPrice · 2004-09-22T11:44:38

Hi, Continuing my reading on AI, I've come across genetic programming. As I understand it, it works as follows: a} Loop over n generations. b) Each generation has m members. Each member of a generation has a single realisation of each of a number of characteristics. c) Calculate a fitness function for each member; keep the top few members and evolve the next generation from pairs drawn from this set. Also allow for the possibility of mutation, to keep from finding local features of the fitness function response surface. The final 'best' members of the final generation provide our genetically superior, trained, set. I would assume that if you wanted an 'easier' set, you could just draw realisations from an earlier generation. This seems very similar to problems that are encountered in statistical analysis very frequenty (ie multi-dimensional characteristic set, evaluable response). In statistical analysis, I would solve this problem using a response-surface analysis design (linear model or generalised additive model, built using splines). Essentially you generate members with given charateristics and use them to build up a mathematical profile of the fitness function, allowing characterisation both of where the 'optimal' region is, but also how changing individual (or sets of) characteristics changes the fitness function. At this stage (and admittedly, given my still limited knowledge of the AI field), I can't see the advantages that a genetic algorithm gives - what am I missing? Is anyone aware of any literature comparing techniques? Would anyone be interested in results from this if I played with it for a bit? Regards, Jim.

Artificial Intelligence Programming

Started by JimPrice September 03, 2004 08:28 PM

23 comments, last by Nathaniel Hammen 20 years, 2 months ago

Woodsman

426

September 11, 2004 08:29 AM

Quote: Original post by fup
"Your GA can't evolve complexity or simplicity to solve the problem at hand."

They can actually. Here's an example:

http://www.cs.utexas.edu/users/kstanley/

I stand corrected! Thanks for the link, that's pretty neat.

If a plant cannot live according to its nature, it dies; so a man.

Predictor

198

September 11, 2004 11:33 AM

Quote: Original post by fup
GPs encode solution candidates as trees of operators consisting of function nodes and terminal nodes.

GPs can also encode programs other ways, such as sequential programs. See, for instance:

Genetic Programming: An Introduction

-Predictor
http://will.dwinnell.com

[Edited by - Predictor on September 11, 2004 12:33:11 PM]

Yuri Burger

116

September 13, 2004 04:04 AM

Is there somebody to help me with free GA library?

there is C++ GPL library of genetic algorithms http://sourceforge.net/projects/cpplibga/

The help in development of library is necessary...

Predictor

198

September 13, 2004 06:45 AM

Quote: Original post by Yuri Burger
Is there somebody to help me with free GA library?

there is C++ GPL library of genetic algorithms http://sourceforge.net/projects/cpplibga/

The help in development of library is necessary...

I think that by the time one understands a GA library, it would have been just as easy to create one's own, which would, of course, be completely customizeable and free of legal entanglements. GAs are not that complicated. Additionally, many optimization problems in my experience have special requirements that are never well served by generic libraries.

-Predictor
http://will.dwinnell.com

Nathaniel Hammen

136

September 14, 2004 11:49 AM

I have programmed my own Genetic Program. To test it, I had it solve the distance between two points. On my first trial it got it by the tenth generation. On my second trial, I had it run for 224 generation before I decided to quit it. It was still off by more than 1,000. It also got slower and slower as the program trees evolved more and more complexity to try to solve such a simple problem.

On my third trial it got it on the 31st generation. I looked at the code that the third trial generated and it was essentially If(false&&a whole lot of shit , more shit , more if statements that eventually led to the algorithm). So these if statements were sort of like dominant and recessive genes. I guess thats a good thing, because some negative genes can carry over until they're needed, but it makes it so that there is a lot of code just to get sqrt((L-O)dot(L-O)).

I'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.

I am the master of ideas.....If only I could write them down...

Woodsman

426

September 14, 2004 04:12 PM

Quote: Original post by Nathaniel Hammen
I have programmed my own Genetic Program. To test it, I had it solve the distance between two points. On my first trial it got it by the tenth generation. On my second trial, I had it run for 224 generation before I decided to quit it. It was still off by more than 1,000. It also got slower and slower as the program trees evolved more and more complexity to try to solve such a simple problem.

If you can, enforce a maximum depth of your trees, whether as a result of crossover or otherwise. When one child is too large, copy a random parent instead.

Quote: Original post by Nathaniel Hammen
On my third trial it got it on the 31st generation. I looked at the code that the third trial generated and it was essentially If(false&&a whole lot of shit , more shit , more if statements that eventually led to the algorithm). So these if statements were sort of like dominant and recessive genes. I guess thats a good thing, because some negative genes can carry over until they're needed, but it makes it so that there is a lot of code just to get sqrt((L-O)dot(L-O)).

Look up 'editing'. Basically, you give it patterns that go in and edit the results. An example would be pattern:
' (anything) | True' can be replaced with true.
Or 'iflessthanzero (some constant) x y' replace with x or y depending on the constant.

Quote: Original post by Nathaniel Hammen
I'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.

What I've generally seen is multiple runs, rather than larger populations or number of generations. There is no guarantee that any particular run will find a solution. Increasing the number of runs will increase the chance.

You may want to save, say, the ten best programs generated from one complete run, and introduce them into the next run either in the initial generation or later on. Perhaps even take these best 10 from several runs and then run them all together.

If a plant cannot live according to its nature, it dies; so a man.

kwatz

122

September 15, 2004 01:45 PM

Quote: Original post by Nathaniel HammenOn my third trial it got it on the 31st generation. I looked at the code that the third trial generated and it was essentially If(false&&a whole lot of shit , more shit , more if statements that eventually led to the algorithm). So these if statements were sort of like dominant and recessive genes. I guess thats a good thing, because some negative genes can carry over until they're needed, but it makes it so that there is a lot of code just to get sqrt((L-O)dot(L-O)).

You may want to remove control statement-related alleles (the if statement, true/false, etc), as you're just looking for a plain mathematical formula, not a computer program per se.

Quote: Original post by Nathaniel HammenI'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.

There are other elements that may be problems, also. There are a number of parameters to tune before you should expect your algorithm to work well. Depending on the algorithm and representation you use, these can include mutation rate, crossover rate, degree of mutation, population size, fitness function, selection methodetc. Depending on the behavior of your population, you can tweak these and related values back and forth.
Could you give some specifics about your algorithm? For example, what crossover method you're using (if any), selection method, population size, and so on.

Anonymous

September 15, 2004 02:13 PM

Quote: Original post by Nathaniel Hammen
I'm also concerned about the results I got on the second trial. It just couldn't come up with the answer. Is there any way I can prevent this, or do I need to check the program every once in a while, just in case. I was thinking that maybe I should introduce a brand new member into the population every once in a while, but this would probably make my programs worse.

There is lots of literature on this. GP is not a simple process (although that doesn't stop lots of people from just diving in, naively thinking it'll work first time). The best advice is simply to use google and read the 4 or 5 main conferences (and their papers and journals) which are devoted to e.g. "machine learning". There are hundreds of academic papers; many are just individual techniques, so they're kind of like a shopping list of "if you want to do X, read this paper".

You should especially look at:

- ADF == automatically defined functions.
- "bushiness" vs "depth"

there are many papers on each, and most contain algorithms you can copy and use in your own work.

You're probably best off just buying a recent book on GP, though, since a good one ought to cover a variety of techniques and algorithms...

Anonymous

September 15, 2004 02:22 PM

Be skeptical of the genetic algorithm. There is no garauntee that after n generations they will perform better. Genetic algorithms are a kind of educated random step. If you want to look at an algorithm that touches more closely to what I think you were talking about check out this paper (http://www.debonet.com/Research/Publications/1996/DeBonet-NIPS96-MIMIC.pdf). Basically this algorithm keeps track of the history of generations to better forecast the best next generation.

Nathaniel Hammen

136

September 15, 2004 05:42 PM

Quote: Original post by Woodsman
If you can, enforce a maximum depth of your trees, whether as a result of crossover or otherwise. When one child is too large, copy a random parent instead.

I already enforce a maximum depth on creation, so adding that to mutation as well will be no problem.

Quote: Original post by Woodsman
Look up 'editing'. Basically, you give it patterns that go in and edit the results. An example would be pattern:
' (anything) | True' can be replaced with true.
Or 'iflessthanzero (some constant) x y' replace with x or y depending on the constant.

I already thought of doing this. Looking up previous ways of doing this will probably be useful. I'm a little leery of just destroying a "gene" that might be useful in a future generation. However, I also don't want to have my programs take up too much room. I'll look it up and see if anyone has a good solution to this. Off of the top of my head, I can only think of randomly deciding whether to do this or not; possibly, this random would be weighted to occur more often in larger trees.

Quote: Original post by Woodsman
What I've generally seen is multiple runs, rather than larger populations or number of generations. There is no guarantee that any particular run will find a solution. Increasing the number of runs will increase the chance.

You may want to save, say, the ten best programs generated from one complete run, and introduce them into the next run either in the initial generation or later on. Perhaps even take these best 10 from several runs and then run them all together.

Good idea! I'll probably use this.

Quote: Original post by kwatz
You may want to remove control statement-related alleles (the if statement, true/false, etc), as you're just looking for a plain mathematical formula, not a computer program per se.

The mathematical formula was only for a test to see whether my GP worked, and could evolve the answer. After I work out all of the kinks, I'm going to run my GP on a little battle arena type thing...

Quote: Original post by kwatz
There are other elements that may be problems, also. There are a number of parameters to tune before you should expect your algorithm to work well. Depending on the algorithm and representation you use, these can include mutation rate, crossover rate, degree of mutation, population size, fitness function, selection methodetc. Depending on the behavior of your population, you can tweak these and related values back and forth.
Could you give some specifics about your algorithm? For example, what crossover method you're using (if any), selection method, population size, and so on.

Tree Selection
I have a struct called ReproductionRate that I plug numbers into at the beginning of running the program. This decides how many trees will be copied, how many will be crossed over, and how many will be mutated. The ToCopy amount of trees that had the best fitness score are copied to the next generation. To select the tree that will be mutated, I pick a ToMutat number of trees randomly from all of the trees except for the ToCopy amount of trees that had the lowest fitness score. For crossing over, I use each of the best ToCross amount of trees exactly once, and select which ones are to be crossed over with which ones randomly.

Node Selection
For both Cross Over and Mutation, I select which node to use the operation on completely randomly. I plan to change this in the future, because there are obviously more nodes near the bottom of the tree, which makes the likelyhood of a specific node being selected increases as the depth increases. This makes it so that alot of times, the node selected is on the bottom row, or just above that, which means that only three or so nodes are being crossed over or mutated.

Crossing Over
For crossing over, all I do is take the selected nodes and switch them. Also, of course I change the data so that the NodeCount is updated, and the ParentPtr points to the correct place.

Mutation
For mutation, I delete the selected node and just build random nodes as I do when creating the tree in the first place.

Quote: Original post by Anonymous Poster
There is lots of literature on this. GP is not a simple process (although that doesn't stop lots of people from just diving in, naively thinking it'll work first time).

Like me! Except that I didn't think it would work perfectly the first time, and, "Ta da!" it didn't.

Quote: Original post by Anonymous Poster
The best advice is simply to use google and read the 4 or 5 main conferences (and their papers and journals) which are devoted to e.g. "machine learning". There are hundreds of academic papers; many are just individual techniques, so they're kind of like a shopping list of "if you want to do X, read this paper".

You should especially look at:

- ADF == automatically defined functions.
- "bushiness" vs "depth"

there are many papers on each, and most contain algorithms you can copy and use in your own work.

Yes, google is always good. I'll look up those things.

Quote: Original post by Anonymous Poster
You're probably best off just buying a recent book on GP, though, since a good one ought to cover a variety of techniques and algorithms...

A book... Paying money... Probably not. But, maybe. After all, I did buy "AI Game Programming Wisdom".

Quote: Original post by Anonymous Poster
Be skeptical of the genetic algorithm. There is no garauntee that after n generations they will perform better. Genetic algorithms are a kind of educated random step. If you want to look at an algorithm that touches more closely to what I think you were talking about check out this paper (http://www.debonet.com/Research/Publications/1996/DeBonet-NIPS96-MIMIC.pdf). Basically this algorithm keeps track of the history of generations to better forecast the best next generation.

Reading it now... literally.

[Edited by - Nathaniel Hammen on September 15, 2004 6:42:39 PM]

I am the master of ideas.....If only I could write them down...