Advertisement

Genetic Algorithm and Sudoku

Started by November 04, 2005 11:22 AM
63 comments, last by Rockoon1 18 years, 11 months ago
Quote: Original post by Kylotan
Quote: Original post by RPGeezus

Would you both agree with this statement:
"The effectiveness of cross-over is directly related to how statistically unrelated the discreet components, that make up the solution, are."

Yes to the first, maybe to the second. The quality of the operator goes hand in hand with the representation of the problem.


And "Cross-over alone, by neccessity, excludes many types of problems from the mix as encoding is a limiting factor."? :D

Quote: Original post by Kylotan
I say no to the 3rd [less cycles] because crossover - where applicable - makes better use of the information encoded in your population than mutation does.


_IF_ all other requirements are met.


Quote: Original post by Kylotan
When you get 2 solutions which have evolved to be good, but in different ways, crossover provides the chance of yielding a solution that takes the best from both.


ONLY if the operands that make up the solution are not bound by order of appearance. Then yes, there is a chance.

Quote: Original post by Kylotan
Mutation does not do anything to explicitly facilitate that.


Mutation works with a single line of ancestors. A familiy rope -vs- a family tree. Concievably you are using half the code per ancestor (per ancestor) you could have used, BUT, are a gauranteed that the code you do have is statistically more significant. Sound reasonable?

Why limit cross-over to two parents? Why not 3, 4, 5 or 100? Why two? It would stand to reason that two should be the least you would want, but not the neccesarily the optimium number.


Quote: Original post by Kylotan
They're [poker hands] a poor example of what you'd use a GA for, because of the heavily discontinuous search space and impractical representation. Crossover would give you some benefit over random search, but not much. The same goes for mutation here.


You could make mutation work without having to worry about how 'orthagonal' the encoding was. If your GA evolved a compressed respresentation of a poker hand (which would be later expanded during the evaluation), or your evaluation function was dynamic, you could reach a global maximum from any starting position. The draw back would be wasted space in your compressed representation of the solution (like unreachable code).

For this to happen cross-over would not work so well.


Quote: Original post by Kylotan
There's no reason why not. GAs don't care about how many best solutions there are, they care about the shape of the solution space.


Cross-over cares about how many best solutions there are.

Maybe we're in disagreement because of our point-of-view. It would seem a system designed to work best with mutation will be a poor system for cross-over. It only would seem to work one way though-- would a system designed to work with cross-over limit the effectiveness of mutation?

Will

------------------http://www.nentari.com
Quote: Original post by RPGeezus
would a system designed to work with cross-over limit the effectiveness of mutation?


Quite the opposite. Crossover enhances mutation by providing a rapid means of distributing beneficial mutations into the population and provides the mechanism to maximise the correlation between beneficial mutations and beneficial substrings (good quality partial solutions).

Will, I definitely think you should read up on Holland's schemata theorom.

Cheers,

Timkin
Advertisement
Quote: Original post by RPGeezus
And "Cross-over alone, by neccessity, excludes many types of problems from the mix as encoding is a limiting factor."? :D


Of course it does. But that applies to all methods. Some methods can be stretched to 'work' in any given situation at the expense of optimality. eg. Mutation can be broadened so that it becomes little better than a random search, then it is guaranteed to work...eventually..., but it is no better then than picking random solutions from a hat. All methods have situations they are good at and those they are bad at.

Quote:
Quote: Original post by Kylotan
I say no to the 3rd [less cycles] because crossover - where applicable - makes better use of the information encoded in your population than mutation does.


_IF_ all other requirements are met.


Search is all about matching a method to the requirements. There is no one method that is statistically better than any other when measured over all possible problem types. Wolpert and Macready's work will elaborate on this if you're interested.

Quote: You could make mutation work without having to worry about how 'orthagonal' the encoding was.


Yes, you could, but that's not the point. You can make a linear search work without worrying about the encoding either. Or a random search. But those are rarely optimal. Neither is relying purely on mutation in many situations. The question is, is this the best approach for the given problem? In your problem, with your representation, crossover is not much use. That's not a problem with crossover.

Quote:
Quote: Original post by Kylotan
There's no reason why not. GAs don't care about how many best solutions there are, they care about the shape of the solution space.


Cross-over cares about how many best solutions there are.


No, it doesn't, and you've not produced evidence to back that assertion. The number of separate maxima are irrelevant. What is important is how the fitness landscape is shaped, as that dictates how the solutions will converge. As far as shape is concerned, it's assumed that there are numerous maxima (best or otherwise), each of which are handled identically. If there was only one maxima, and you knew this, then you'd just use hill-climbing or something simple like that.
Quote: Original post by Timkin
Quote: Original post by RPGeezus
would a system designed to work with cross-over limit the effectiveness of mutation?


Quite the opposite.


My question was rhetorical. I've been saying all along that mutated clones are king. ;)

Quote: Original post by Timkin
Crossover enhances mutation by providing a rapid means of distributing beneficial mutations.


Sure, I agree with you. It can, under the right circumstances. I'm not arguing about if it can work, just that the requirements for it to happen are so huge, and mutatation does not have those same requirements.

Again, I have to ask, why limit cross-over to two parents? Why not 5, 6, or 30?

I'll take a stab in the dark and propose that the number of dimensions in your solution space (if one could even know this) is also the optimal number of 'parents' per 'child'.


Quote: Original post by Kylotan
Mutation can be broadened so that it becomes little better than a random search, then it is guaranteed to work...eventually..., but it is no better then than picking random solutions from a hat.


I think you're down playing the utility of mutation. Cross-over will not work without mutation-- as Timkin has said Cross-over enhances mutation.

Quote: Original post by Kylotan
The number of separate maxima are irrelevant


I disagree, and here is why.

Seprate global maxima are more likely to have solutions which are NOT related. Taking components from a solution at one maxima and mixing it with another is very likely to only give you a solution that is at NEITHER maxima.

The 'shape' of your solution space will not be conducive to the crossing of solutions if there is more than one maxima.

Here is a poor ASCII picture of a hypothetical solution space:

     A             B     .'.           .'....'   '.........'   '.....


Any point between maxima A and maxima B is more inclinded be further away from either maxima.

If, on the other hand, a mutation occures on a member already close to a maxima, it is more likely to still be close to that maxima.

The more maxima you have the worse this problem gets.

Maybe an easy way of thinking about it to look at it like image processing operators-- you can blur and you can sharpen. At some point blurring will be a bad thing... We want sharp.

Quote: Original post by Kylotan
If there was only one maxima, and you knew this, then you'd just use hill-climbing or something simple like that.


That's what I said. :)

Quote: Original post by RPGeezus
if a problem only had ONE best solution you probably wouldn't be using something like GA's to find it...



I brought this up because of the Poker example, to which you replied:

Quote: Original post by Kylotan
They're a poor example of what you'd use a GA for, because of the heavily discontinuous search space


What makes the search space so discontinuous-- Multiple 'best' answers (aka global maxima).


Will
------------------http://www.nentari.com
Sorry for skimming over this thread :)

What about this instead of a genetic algorythm:

Start with a known, solved sudoku puzzle. You could have a list of these, but one is enough (even if it's a trivial one). Then iterate over a few hundred thousand modifications to this puzzle, the only constraint is that after each one the puzzle must still be considered solved. Things like switching around numbers and stuff. After this, chose some numbers to hide randomly and voila!

This is a good recipe for most puzzle games, I'm sure it will work just fine for Sudoku :)
Quote: Original post by Kylotan
Mutation produces a new solution fundamentally similar to the old solution. If it is near a local maxima, then it is likely to approach that local maxima and survive, or move away from it and die.

Crossover produces a new solution, which is - for some definition of distance appropriate to your representation - located somewhere between 2 existing solutions, which may be very distant in terms of search space. Both may be in local maxima, but if those maxima are distinct then the offspring may be in neither, meaning you've 'broken free'.

Now, you might say "but I can do that with large mutations too". The problem there is that large variations as a result of mutation make it hard to exploit a local maximum (which might actually be the global one). Imagine your whole population, or a large part of it, is already near the global maximum: wild mutation will send their offspring all over the place. Crossover won't do that.


I wouldn't propose making all mutations massive. Instead I would replace crossover with a mutation operator which had a random size. So some mutations would be slight allowing exploitation and some mutations would be large allowing exploration.

The only unique benefits I can see with crossover is first the potential that crossing the best attributes of two parents will produce even fitter offspring. But I wonder how likely is this compared to the chance of macromutation producing fitter offspring. Second crossover allows changes to spread faster (but only if crossover is producing anything useful).


Advertisement
Quote: Original post by RPGeezus
I've been saying all along that mutated clones are king. ;)


You're missing the point. Clones occur most frequently when the population is near convergence, irrespective of what point in the state space the population has converged upon.

Quote:
Quote: Original post by Timkin
Crossover enhances mutation by providing a rapid means of distributing beneficial mutations.


Sure, I agree with you. It can, under the right circumstances.


You seem to be convinced that crossover works only under very limiting constraints and in very narrow contexts. This point of view simply isn't supported by the literature, nor the mathematics. Crossover is a more effective operator for searching the state space in a directed manner (where the direction is provided by the population, which represents the successful aspects of the search so far) than mutation is. Mutation is not directed at all and thus MUST rely on some other operator, such as selection, for it to work. Otherwise, mutation is just drawing rabits out of the hat, hoping to get the right one.

Quote:
Again, I have to ask, why limit cross-over to two parents? Why not 5, 6, or 30?


Because crossover between two parents preserves as many schemata presently in the population as possible, while possibly forming new schemata which can be evaluated during selection.

Quote: I'll take a stab in the dark and propose that the number of dimensions in your solution space (if one could even know this) is also the optimal number of 'parents' per 'child'.

That doesn't make sense to me. The 'solution' is a point in the state space (or any point within a finite volume). This point (or volume) will have as many dimensions as the state space (unless it is a submanifold of the state space, in which case you encoded too much information into the problem and should have mapped the input space first to a lower dimension).

Quote:
I think you're down playing the utility of mutation. Cross-over will not work without mutation-- as Timkin has said Cross-over enhances mutation.

You're putting words into my mouth that I never intended. You're suggesting that crossovers role is only to enhance mutation, which you suggest is doing the grunt work of the search. Quite the opposite is true. Crossover does the grunt work, mutation just adds the spice (and another stochastic element in the generation of candidate solutions).

Quote:
Quote: Original post by Kylotan
The number of separate maxima are irrelevant


I disagree, and here is why.

Seprate global maxima are more likely to have solutions which are NOT related. Taking components from a solution at one maxima and mixing it with another is very likely to only give you a solution that is at NEITHER maxima.

That totally depends on the volume of state space that the maxima encompasses.
Quote:
The 'shape' of your solution space will not be conducive to the crossing of solutions if there is more than one maxima.

Again not true. GAs work very well on multimodal objective functions. In fact, they work better than many other techniques. The rate of convergence on the true global maxima will be proportional to the gradient on the submanifold that connects each of the peaks of the local maxima.

Quote: Here is a poor ASCII picture of a hypothetical solution space:

     A             B     .'.           .'....'   '.........'   '.....


And if A were slightly higher than B, or vice versa, mutation would typically fail to solve this problem, since selection pressure would mitigate any beneficial mutations that enabled a single candidate solution to hop into the other maxima. The probability of reproducing that single chromosome would be insignificant unless the population was already well populated by candidates within the confines of the region housing the global optima. Again, in this situation, crossover will find the maxima more quickly than mutation will.

Quote: Any point between maxima A and maxima B is more inclinded be further away from either maxima.

Any candidate solution generated that fell outside of A or B would be rejected very quickly by selection. The population consists of more than 1 candidate solution for just this reason... so that failed attempts don't significantly affect convergence speed.

Cheers,

Timkin
Quote: Original post by RPGeezus
Seprate global maxima are more likely to have solutions which are NOT related. Taking components from a solution at one maxima and mixing it with another is very likely to only give you a solution that is at NEITHER maxima.

...

Any point between maxima A and maxima B is more inclinded be further away from either maxima.


Yes, but for every pairing of solutions that are near the top of different maxima, there is likely to be several pairings that are on either side of the same maxima, and end up exploiting that one as a result. Just as with mutation, the selection operator is necessary here to augment the procedure.

Quote: If, on the other hand, a mutation occures on a member already close to a maxima, it is more likely to still be close to that maxima.


On the other other hand, if you're on a local maxima, and the global maxima is distant from the greater one in search space, a mutation is less likely to find that global maxima. Neither method is 'better'; neither works well without the other.

Quote:
Quote: Original post by Kylotan
If there was only one maxima, and you knew this, then you'd just use hill-climbing or something simple like that.


That's what I said. :)


Maybe I wasn't clear. GAs are perfectly well suited to problems with one single maximum value, but often, if you know in advance that there is only one, it may be worth trying something else, especially if you also know that the fitness landscape is fairly smooth. Quite often it might be the case that there is only 1 global maximum but you don't know about that beforehand - you just want to find as good an answer as possible in a certain time.

Quote:
Quote: Original post by Kylotan
They're a poor example of what you'd use a GA for, because of the heavily discontinuous search space


What makes the search space so discontinuous-- Multiple 'best' answers (aka global maxima).


No... a problem can have an infinite number of best answers and yet still be totally continuous. The mathematical definition of continuous is important here. eg. What value of x gives you the highest result from sin(x)? In this example, a tiny change in x yields a tiny change in sin(x). Poker is totally different - the smallest possible change, such as changing one card up or down a suit or number, changes the value of the hand significantly in most cases.
Quote: Original post by RunningInt
The only unique benefits I can see with crossover is first the potential that crossing the best attributes of two parents will produce even fitter offspring.


Thats the way I see it too. There is some potential, if everything is in place.

Quote: Original post by RunningInt
But I wonder how likely is this compared to the chance of macromutation producing fitter offspring.


Good question.



Quote: Original post by Timkin
You seem to be convinced that crossover works only under very limiting constraints and in very narrow contexts.


For cross-over to work we must satisfy some not-so-trivial criteria. I thought we were all on agreement in this regard, at least semantically.


Quote: Original post by Timkin
This point of view simply isn't supported by the literature, nor the mathematics.


It certainly is, as you have pointed out. Just because you would think of using GA's for a non-orthagonally encoded problem as faux-pas doesn't mean those problems cease to exist.

Maybe you disagree with me because you feel like I'm hitting a thumb-tac with a sledeg-hammer (i.e. wrong tool for the job)? :)

Quote: Original post by Timkin
Mutation is not directed at all and thus MUST rely on some other operator, such as selection, for it to work. Otherwise, mutation is just drawing rabits out of the hat, hoping to get the right one.


We need selection anyway. How is this a bad thing?

Quote: Original post by Timkin
Quote:
Again, I have to ask, why limit cross-over to two parents? Why not 5, 6, or 30?


Because crossover between two parents preserves as many schemata presently in the population as possible, while possibly forming new schemata which can be evaluated during selection.


That doesn't make any sense to me, Tim. Why would 3 parents be less effective than 2? If I cut a pie in 2 or in 10, it's still all part of the same pie. The number '2' would be suspicious to someone looking for evidence of anthropomorphic biases...

Quote: Original post by Timkin
Quote:
I think you're down playing the utility of mutation. Cross-over will not work without mutation-- as Timkin has said Cross-over enhances mutation.

Crossover does the grunt work, mutation just adds the spice (and another stochastic element in the generation of candidate solutions).


My apologies. Didn't mean to take your words out of context.

Will cross-over, without mutation, ever find a solution?

Will mutation, without cross-over, ever find a solution?


Quote: Original post by Timkin
Again not true. GAs work very well on multimodal objective functions. In fact, they work better than many other techniques. The rate of convergence on the true global maxima [.....]


We're talking about a problem with more than one global maxima. In the hypothetical example there is more than one true global maxima.


Quote: Original post by Timkin
And if A were slightly higher than B, or vice versa, mutation would typically fail to solve this problem,


There are ways to allow mutation alone to climb out of local maxima. I think thats what alleles, in biology, would facilitate.


Quote: Original post by Kylotan
Yes, but for every pairing of solutions that are near the top of different maxima, there is likely to be several pairings that are on either side of the same maxima


I agree with you. I was narrowing my view to a specific case of two maxima to illustrate a point.

Would you agree that, as the number of global maxima increases the odd's of a correct pairing go down? I'm not sure where I'm going with this. lol. :) So being to the left, or the right of a maxima could still put you to the left or the right of yet another maxima.... Does any of what I'm saying make any sense? :D


Quote: Original post by Kylotan
On the other other hand, if you're on a local maxima, and the global maxima is distant from the greater one in search space, a mutation is less likely to find that global maxima. Neither method is 'better'; neither works well without the other.


Yes. I agree with you, except that you can use mutation without cross-over. Cross-over will spread the mutation across members, at the beginning, until the population starts to settle on one solution. Selection / cloning will spread the information too, just no between different members.

Will




------------------http://www.nentari.com
Quote: Original post by RPGeezus
For cross-over to work we must satisfy some not-so-trivial criteria. I thought we were all on agreement in this regard, at least semantically.

No, I don't see agreement on this point. From my experience, crossover will always be effective on problems that are solveable by a directed random walk and crossover will perform better than said random walk. On problems that are not solveable by a directed random walk, then a random walk is your only option and so the point is moot (nothing other than mutation, which is semantically equivalent to randomly generating a solution), will work. Such problems are typically identified by very narrow, sparsely distributed optima in the objective space, which have always been very difficult to solve using optimisation techniques. If you want a good solution for such problems, check out Terence Kwok's work on combining simulated annealing and self organising maps. He uses edge-of-chaos behaviour to generate all solutions of problems such as N-queens very rapidly.


Quote: Original post by Timkin
It certainly is, as you have pointed out. Just because you would think of using GA's for a non-orthagonally encoded problem as faux-pas doesn't mean those problems cease to exist.


I never said you couldn't or shouldn't use them on such problems. Only that such problems will be more conducive to being solved by GAs that use single point crossover.


Quote: Original post by Timkin
Because crossover between two parents preserves as many schemata presently in the population as possible, while possibly forming new schemata which can be evaluated during selection.


That doesn't make any sense to me, Tim. Why would 3 parents be less effective than 2? If I cut a pie in 2 or in 10, it's still all part of the same pie. The number '2' would be suspicious to someone looking for evidence of anthropomorphic biases...


Okay, a brief introduction to Schema Theory...

Consider the two binary strings
    101100110    110101100

If we consider the set of possible offspring generated by crossover, we get the following, where '#' means 0 or 1:
    1##10#1#0

This is a schema (a template) for the result of crossover. If we add a third member to the popultion, say '010111000', then the set of schemata represented by the population, taken pairwise, is
    1##10#1#0    ###1####0    #101#1#00

These schemata represent the possible next generation of candidate solutions (ignoring for the moment mutation) that the GA can produce using pairwise crossover (they are a roadmap as to where one might find an optima). From this perspective, we can see how crossover preserves associations of bits present in a schema. Holland showed that it is indeed schema (and their relative fitness) that represent the qualtiy of knowledge regarding the optimal solution. It should be fairly obvious that '#########' represents a purely random solution set spanning the entire state space, which holds no useful knowledge of where to look for the optimal solution, since the fitness of members of this set encompasses the entire objective function.

So, now take our original 3 candidate solutions as a triplet. The schema resulting from this tuple is '###1####0', which, as you can see, is the worst (most general) schema from the pairwise set. Candidate solutions generated by producing offspring from a 3 way crossover would be randomly selected from this set and thus, would randomly select one of the possible fitness values spanned by this set. The probability of producing good quality offspring would be less than if pairwise crossover were performed.

GAs preserve good quality schema and indeed, theysearch through the schemta present in the intial population to find an optimal solution (so you can see the importance of the initial population).

Mutation disrupts a schema. However, if used at a low level, it does not generally remove a schema from the population. Furthermore, mutation can help to produce schema that might not have been present in the initial population, or that might have died out along the way due to unintended selection biases.

Understanding schemata is the key to understanding how the canonical operators affect the search of the state space... and thus, how changing those operators will affect the efficiency of the search.

Quote:
Will cross-over, without mutation, ever find a solution?

If will find the best solution available given the initial information contained in the population. See above.
Quote:
Will mutation, without cross-over, ever find a solution?

Possibly... and that is the best that can be said. There are no guarantees that you will stumble across the solution. Indeed, if your random number generator is poor, or your state space large (or both), you might go round in circles and never find it.

Quote:
We're talking about a problem with more than one global maxima. In the hypothetical example there is more than one true global maxima.

If they are both (all) global maxima, then you cannot care which global maxima you attain, since from the point of optimisation, you found the global maxima whenever you find any of the global maxima. If your task was to find ALL global maxima then this isnt, strictly speaking, an optimisation problem and very few optimisation algorithms will be useful, since such problems are typically NP-hard (and I think in that case the discussion is moot). See the aforementioned work of Terry Kwok for how to tackle such problems effectively.


Cheers,

Timkin

This topic is closed to new replies.

Advertisement