Advertisement

Recommend an algorithm for this?

Started by May 21, 2008 06:07 AM
47 comments, last by Timkin 16 years, 5 months ago
Quote: Original post by alvaro
If we use MML or a Bayesian approach (which in my view are optimization problems where the function to maximize is likelihood)

That would just be maximum likelihood estimation (ML) and not what I was advocating... although you could get it to work (this is basically the naive Bayes classifier).

Quote: I don't know enough about those methods to make good choices


That's what research publications are for! ;)

Quote: By putting it in code, you have an opportunity to show us what a reasonable application of those methods to this case would look like. I am honestly interested in seeing that code.


I doubt that seeing it in code would be any more useful in aiding your understanding than reading any of the vast quantities of publications or books on the subject. Indeed, code will simply obfuscate the issues as it isn't a natural language for the expression of mathematical or logical principles.

Unfortunately I don't have the time to sit down and write my own code for an MML solution at the moment. I have 3 other unfinished code projects on my desk at the moment, marking to do, 2 papers I'm trying to write and I'm about to have to mark nearly 300 exam papers, so things aren't going to get any better for me any time soon. Hence my suggestion for using pre-written code for those that want emperical results.

Quote: That depends on your definition of "correct". I inspected the results, and it looks like indeed it produces groupings where the averages are all close.


If all that was required was something that 'looks right' then you could probably get away with any number of sub-optimal or approximation methods and we could have avoided this whole conversation in the first place! ;)

Quote: I believe there is some theorem that can be used to prove that the probability of SA finding the global optimum tends to 1 as the number of iterations tends to infinity.


That assumes essentially that you have an unbiased generator of starting states such that in the limit as restarts goes to infinity, you cover all possible trajectories through the optimisation space. Clearly this isn't a reasonable limit or a useful theory. What we'd like to know is, given a number N of restarts, what is the expectation of the ratio of the current best solution to the global solution... and we'd like to know at what rate this ratio converges (or event that it converges uniformly to 1). I don't think this is computable in the original problem space... but I digress... this isn't particularly relevant (sorry).

Quote: Original post by Timkin
I doubt that seeing it in code would be any more useful in aiding your understanding than reading any of the vast quantities of publications or books on the subject. Indeed, code will simply obfuscate the issues as it isn't a natural language for the expression of mathematical or logical principles.


Boy, do we have different perspectives on this!

My formal training was as a mathematician, and I always thought that it was a great flaw of Mathematics that it is usually expressed using natural language, and that although we have Formal Logic, nobody has developed practical higher-level languages that all mathematicians can use to help in the verification of proofs. In order to make this objection clear, I used to draw a parallel to the development of algorithms, where one could talk all day about algorithms using language similar to that of Math, or one could actually code the algorithm and have a much better idea that it indeed works fine. Since higher-level programming languages do exist, the latter is actually done.

In any case, this is a forum for game developers, and unless someone explicitly says that they are interested in the theory behind a situation, I think we should assume they mean to find a practical solution to a real-world problem. Code matters.
Advertisement
Quote: Original post by Timkin
An MML (Minimum Message Length) approach to clustering/classification creates an information theoretic measure over the description of a cluster and the probability that the given data was generated by that description, P(H)P(D|H) (H: hypothesis, D: data). Many of the Bayesian techniques can, iirc, be shown to map back to MML or MDL (Minimum Description Length), an approximation to the strict MML solution. Solving the problem in an information space avoids the nastiness of using local optimisation in the problem space, because you're finding a globally optimal solution.


This makes sense to me. The problem I would have would be establishing the probabilities up front so that a closed form solution could be found. For example, Bayesian classifiers have often been a problem for me as the underlying distribution was not known nor easily definable. Many papers I've read make assumptions of normality as well as the associated parameters, but in my filed (computational chemistry) those assumptions aren't necessarily valid.

Back to the topic at hand, I would have a hard time defining the appropriate probability distribution functions to be able to find such a closed form solution. I would, however, be very interested in seeing what you think might be an appropriate form for this type of problem.

-Kirk

Quote: Original post by alvaro
Boy, do we have different perspectives on this!

That's quite funny, especially since my formal training is in mathematics as well (and physics and amusingly, philosophy too... yes, I'm a geek with a triple major ;) ). Of course, I HATE the use of natural language when discussing mathematics, particularly as I work with engineers and IT people (and have to teach them 8( ) and almost none of them actually understand mathematics.

Quote: In order to make this objection clear, I used to draw a parallel to the development of algorithms, where one could talk all day about algorithms using language similar to that of Math, or one could actually code the algorithm and have a much better idea that it indeed works fine.

My objection to this is that as soon as you write an algorithm down in anything other than an algorithmic expression language, you suffer from the limitations of the language you use. Computer languages are designed for computers to understand, NOT humans. I therefore don't believe (and I accept that I am in the minority when standing in a crowd of programmers) that you can learn solutions (and solution methodologies) efficiently and effectively by 'reading code', or discussing what code is doing. The problem we have is that most programmers and software engineers don't know enough mathematics such that it is meaningful to talk to them at that abstract level (because they simply cannot ground it in terms that they understand). Thus, we're stuck in the midfield. I'd prefer to drag them kicking and screaming to the maths end, whereas most of them are staunchly seated (with roots sinking deeply) at the other end of the field! ;)



Quote: In any case, this is a forum for game developers, and unless someone explicitly says that they are interested in the theory behind a situation, I think we should assume they mean to find a practical solution to a real-world problem.

Ah, so now the issue is that maths isn't, in your opinion, practical!? ;)

Quote: Code matters.


Only when it comes to working implementations. Code is the expression of a solution to a problem. It isn't the solution. If one doesn't first solve the problem in the design phase, then the implementation will not get it correct either. This is, of course, the biggest failing of the software industry... coding before they know what it is that the code is supposed to do. (That's not to say that I'm a fan of the Waterfall methodology... monolithic design docs, particularly in games, are as much a hinderance as an aid to good design).

Anyway, this is diverging into a philosophical discussion and probably one best left for another thread.

kirkd: If the attributes are uncorrelated and the attribute values uniformly distributed then we shouldn't expect any spurious clumping in the matrix above. That is, the prior expectation of the mean of any group is independent of the elements assigned to the group. Given more data an assumption of Normally distributed deviations would be reasonable due to the Central Limit Theorem. Given such a small data set though this may not hold. We may also have correlations between attributes and non-uniformly distributed attribute values. I would suspect then that a Dirichlet prior would be the next best guess as this allows you to model clumping of data around points, where the clumps have a parameterised distribution and so do the clump means. Dirichlet priors have received a lot of attention in the Bayesian literature. David Dowe (at Monash), working with (the now deceased) Chris Wallace (inventor of MML) has done a lot of work on choosing distributions and has developed MML formulae for many cases. One good measure to use to decide if the prior was well chosen is how easy it is to obtain an answer using that prior. Typically, a bad choice of prior will require more work to resolve the deficiency when computing the posterior given the data.

Ultimately, the need to select a suitable prior is one of the biggest deficiencies of Bayesian methods and the one that draws the most criticism to the field.

As for finding a closed form solution, that typically isn't possible for strict MML... one is usually reduced to solving an approximation to the strict solution.

Regards,

Timkin
I ejected from this conversation early on since it seemed to be more of a religious debate than anything else. However, it DID start reminding me of a blog post that I made about someone else's blog post (and isn't that the way of the world any more?).

The point is... at what point is the quixotic stretch for the asymptote no longer necessary because it is no longer even perceivable in the environment in question?

Again... put down the slide rule and take a gander at the context. What ARE we trying to accomplish? If we were sorting out teams on the ball field, we wouldn't be able to get things "exactly right". At some point, we would look around at the groupings and say "yeah... that'll work." Really, it comes down to the same thing we all learned in Jr. High science class about "significant digits" in your calculations. There comes a time when you have to remind Lt. Data (or Mr. Spock for you old timers) that if we ask how many days it will take us to arrive at our destination, we don't care about fractional seconds.

What is an efficient way of getting a decent result. (And remember, in the game development world, the former is more valuable than the latter.)

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

Quote: Original post by Timkin
Ultimately, the need to select a suitable prior is one of the biggest deficiencies of Bayesian methods and the one that draws the most criticism to the field.


That has certainly been my experience.


Quote: Original post by Timkin
As for finding a closed form solution, that typically isn't possible for strict MML... one is usually reduced to solving an approximation to the strict solution.


Does this then degrade into an optimization problem? My perception is that at this point we would be attempting to minimize an error function through an iterative method. Much like numerical differentiation or integration where we're essentially following a gradient until we cross a defined stopping criteria or error threshold. I'm sure that it is not technically an optimization problem, but it starts to look like a duck at some point.

-Kirk

Advertisement
Quote: Original post by alvaro
Here's a simple C++ implementation of simulated annealing to minimize the global function that I suggested. It took me about an hour to write.


Thanks alvaro, I'll compare it against my solution to see if I can make mine better.


If you're interested heres my solution, excuse my very hackish code due to my hurrying to get this done.
    public void calculate() {        //set initial temperature        float temperature = 10;                //while temperature > #        while(temperature > 0.1f) {            //for so many cycles            for(int i=0;i<50;i++) {                //randomly swap people from group to group                int studentIndex1 = 0;                int studentIndex2 = 0;                                while(studentIndex1 == studentIndex2 ||                     students.get(studentIndex1).getGroup().equals(students.get(studentIndex2).getGroup())) {                                        studentIndex1 = generator.nextInt(students.size());                    studentIndex2 = generator.nextInt(students.size());                }                                                                Student s1 = students.get(studentIndex1);                Student s2 = students.get(studentIndex2);                                int gsInd1 = s1.getIndexInGroup();                int gsInd2 = s2.getIndexInGroup();                                Group g1 = s1.getGroup();                Group g2 = s2.getGroup();                                float oldScore1 = g1.getScore();                float oldScore2 = g2.getScore();                                g1.swapStudent(gsInd1, g2, gsInd2);                                g1.calculateScore();                g2.calculateScore();                                float scoreDif1 = g1.getScore() - oldScore1;                float scoreDif2 = g2.getScore() - oldScore2;                float scoreDifTotal = (scoreDif1+scoreDif2)/2.0f;                                if(scoreDifTotal > 0) {                    float p = (float)Math.exp((double)(-scoreDifTotal/temperature));                                        if(generator.nextFloat() >= p) {                        //swap back                         g1.swapStudent(gsInd1, g2, gsInd2);                        g1.setScore(oldScore1);                        g2.setScore(oldScore2);                                            }                }            }                        //lower temperature             temperature*=0.999f;                    }


Quote: Original post by alvaro
Here's a simple C++ implementation of simulated annealing to minimize the global function that I suggested. It took me about an hour to write.


Thanks alvaro, I'll compare it against my solution to see if I can make mine better.


If you're interested heres my solution, excuse my very hackish code due to my hurrying to get this done.
    public void calculate() {        //set initial temperature        float temperature = 10;                //while temperature > #        while(temperature > 0.1f) {            //for so many cycles            for(int i=0;i<50;i++) {                //randomly swap people from group to group                int studentIndex1 = 0;                int studentIndex2 = 0;                //get people for swapping                while(studentIndex1 == studentIndex2 ||                     students.get(studentIndex1).getGroup().equals(students.get(studentIndex2).getGroup())) {                                        studentIndex1 = generator.nextInt(students.size());                    studentIndex2 = generator.nextInt(students.size());                }                                                                Student s1 = students.get(studentIndex1);                Student s2 = students.get(studentIndex2);                                int gsInd1 = s1.getIndexInGroup();                int gsInd2 = s2.getIndexInGroup();                                Group g1 = s1.getGroup();                Group g2 = s2.getGroup();                                //                float oldScore1 = g1.getScore();                float oldScore2 = g2.getScore();                                g1.swapStudent(gsInd1, g2, gsInd2);                                g1.calculateScore();                g2.calculateScore();                                //if swap is bad                float scoreDif1 = g1.getScore() - oldScore1;                float scoreDif2 = g2.getScore() - oldScore2;                float scoreDifTotal = (scoreDif1+scoreDif2)/2.0f;                                if(scoreDifTotal > 0) {                    float p = (float)Math.exp((double)(-scoreDifTotal/temperature));                                        if(generator.nextFloat() >= p) {                        //swap back                         g1.swapStudent(gsInd1, g2, gsInd2);                        g1.setScore(oldScore1);                        g2.setScore(oldScore2);                                            }                }            }                        //lower temperature             temperature*=0.999f;                    }


Quote: Original post by Timkin
Throwing your code at the problem wont answer any of the questions as to whether the problem is solvable using direct optimisation.


If clustering isn't too difficult I might try implement that afterwards out of curiousity.
Quote: Original post by kirkd
Does this then degrade into an optimization problem?

No more so than the clustering problem is an optimisation one. The approximate MML formalism is not an iterative (or DP) solution, but rather a replacement of certain terms in the strict MML formalism with approximate terms.

Quote:
My perception is that at this point we would be attempting to minimize an error function through an iterative method.

Not in this case. See above.

InnocuousFox: I agree completely that if the aim is to find a quick & dirty solution that appears reasonable, then we shouldn't be focussing on a more exacting solution method. As I said in an earlier post, if that is what the OP wanted then we could certainly redirect the discussion toward that. Unfortunately the OP didn't make this requirement clear.

On the other hand, it is often worthwhile asking whether or not there IS an exact solution to a given problem. If we can answer that question a priori, without the need to implement code to decide this question, then we can certainly save some development time in those cases where we deem that no exact solution exists (presuming we wanted one). This discussion started out with some people proposing a solution based on optimisation. My stance is that this is not the right way to go about it. Sure, you can get a solution in this way, but it is ad hoc and tells you nothing about whether you have the right solution to the problem. If you were happy with those conditions then there are far easier approaches to an approximate answer than optimisation. This was the real point of this discussion... if you want the right answer, there is a 'right' way to go about it... if you want any answer, there are simpler ways to go about it.

Cheers,

Timkin

This topic is closed to new replies.

Advertisement