Recommend an algorithm for this?

anda · 2008-06-02T20:25:41

This is my problem: Say I have a list of 20 people, who I have to split up into 5 groups of 4. Each person has 5 integer attributes (labelled eg 0,1,2,3,4) I want to form the 5 groups where for each of those attributes the average is pretty close compared to the other groups' averages of each of those attributes. eg Group 1 (avg for attribute1 is 4, avg for attribute2 is 3, etc) Group 2 (avg for attribute1 is 4.5, avg for attribute2 is 2.8, etc) etc Can anyone think of any algorithms that might help here? Thank you. edit: I was considering using back tracking... I think thats what it is called. Where I try every way and back track when a solution doesn't work. Might be slow though. [Edited by - anda on May 21, 2008 6:38:14 AM]

Rockoon1

104

May 21, 2008 02:16 PM

Quote: Original post by anda
What do you mean by being minimized and maximized?

You stated that you want to keep the attributes balanced as much as possible.

Essentialy you want to minimize some quantified metric of 'unbalanced' (or maximize some quantified metric of balanced)

But you really havent defined what it really means to be balanced. You need to quantify it in terms other than 'pretty close' because that doesn't seperate differing instances at all.

Considering a single attribute, the 5 averages could be:

2 3 3 5 7

or

1 4 5 5 5

Which one is better? Define it mathematically. Put a single value onto each so that you may perform a comparison.

In terms of GA/SA algorithms, this mathematical definition is called your fitness function and you are trying to either minimize or maximize that value, depending on the context.

I should point out that in the above 2 examples, the average of those averages is 4 for both cases.

You really need to full-stop on your GA/SA work until you figure out how to compare the two cases.

One methodology is to sum up the squared difference between each value and the mean of values, this is a standard practice in mathematics in computing the variance or standard deviation of a sequence.

The mean is 4 in both cases, so each squared difference is (4 - x)^2 in both cases.. summing them you get:

Case 1: 2^2 + 1^2 + 1^2 + 1^2 + 3^2 = 16
Case 2: 3^2 + 0^2 + 1^2 + 1^2 + 1^2 = 12

Under this metric, the second case is "better" but I'm not sure thats what you really want.

anda

Author

148

May 21, 2008 02:19 PM

Quote: Original post by Sneftel
Simulated annealing in a nutshell: You start with a bad solution. Over a series of "rounds", you try to modify it into a good solution. Each round, you think of a way to slightly change the solution, and see how good an idea it is, and maybe do it. Early on in simulated annealing, you're likely to do it no matter how bad an idea it is; near the end, you only do it if it's a good idea.

Thanks, that gives me some very good ideas.

Quote: Original post by Rockoon1

You really need to full-stop on your GA/SA work until you figure out how to compare the two cases.

I think I know how to do that.

1. Calculate the global average of each attribute.

2. For each group determine how badly on each attribute the group's avg differs from the global attribute avgs.

3. I'll have a weight for each attribute determining how important it is, and I'll multiply that weight against the difference between each of the groups' attribute's avg and the global attribute's avg.

4. Then add up all the differences in each group to determine their mismatch value.

Hmm.

[Edited by - anda on May 21, 2008 9:19:50 PM]

IADaveMark

3,839

May 21, 2008 02:43 PM

Wow... I go away for a few hours and... WHOOSH! Many posts! (At least no one whacked me on the head.)

Good call on the SA solution, Sneftel. You had more caffeine that did I.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

Timkin

864

May 22, 2008 01:30 AM

Egads... !!!

I'm all for optimisation approaches when they're needed... even using GAs and SA... but this is a clustering problem, NOT an optimisation problem. You can tell it's not an optimisation problem by the distinct lack of a viable global objective function.

You even have a fairly simple clustering problem... you're dictating that you want 5 classes based on 5 attributes... and you want to minimise the variance in each attribute within a class (find things that are most alike). You have the further constraint to help determine the assignments to classes, that you want to maximise the distance in the attribute space between each class centre (you can treat each attribute individually or use a weighted sum of attributes to measure the distance between classes).

An half-decent clustering algorithm will solve this problem trivially.

Regards,

Timkin

alvaro

21,607

May 22, 2008 07:52 AM

Quote: Original post by Timkin
Egads... !!!

I'm all for optimisation approaches when they're needed... even using GAs and SA... but this is a clustering problem, NOT an optimisation problem. You can tell it's not an optimisation problem by the distinct lack of a viable global objective function.

You even have a fairly simple clustering problem... you're dictating that you want 5 classes based on 5 attributes... and you want to minimise the variance in each attribute within a class (find things that are most alike).

How would that not be an optimization problem? If this type of optimization problem falls into a more specific category, sure it would be good to use a more specific method, but that doesn't stop it from being an optimization problem.

Besides, I think you didn't read the problem description correctly. He wants to get groups with similar means, regardless of the variance within the group. The fact that he didn't have a global objective function just means that he hadn't thought his requirements through.

In practice, I am sure SA would work absolutely fine for this (I've done it for a single attribute and 2 classes before).

kirkd

505

May 22, 2008 10:15 AM

I don't think clustering is the answer. Clustering will group individuals with similar attribute values which will cause each cluster to have quite different average values. Effectively, you want an anti-cluster where the individuals in a cluster have values that distribute uniformly over the possible values for each attribute. This would provide similar averages between groups.

IADaveMark

3,839

May 22, 2008 03:49 PM

Agreed. He does, indeed, have specific objectives about the composition of the groups and how the groups themselves must be similar. If he was just sorting them out by a characteristic or 20, then it would be ONLY a clustering problem. Methinks you did not read/understand the problem properly.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

Timkin

864

May 22, 2008 08:23 PM

Quote: Original post by kirkd
Clustering will group individuals with similar attribute values which will cause each cluster to have quite different average values.

Kirk, did you actually read that sentence once you'd written it?

The definition of a cluster is algorithm independent. I can choose to cluster based on single attributes, or any function of any number of attributes. There does NOT have to be a 1-1 linear mapping from attributes to classes. You have made the mistake of assuming that each object (person in the OPs example) has to exist in only one class. I could just as easily cluster in any functional space defined by any mapping of the attribute/object space.

Alvaro: define for me please the global objective function to be optimised in the OPs problem.

InnocuousFox: clustering is NOT sorting, nor is it 'bin packing'. It's identifying associations between data in a given metric space.

Ultimately, you all seem convinced that optimisation is the way to go with this problem. I believe you're mistaken... you believe I'm wrong. A difference of opinion is fine in this forum. I'm certainly not going to suffer if the OP treats it as an optimisation problem, so be my guest to suggest whatever you like as a solution. Just don't be so quick to dismiss other people and assume they've read the question wrong. 8)

Regards,

Timkin

alvaro

21,607

May 22, 2008 08:58 PM

Quote: Original post by Timkin
Quote: Original post by kirkd
Clustering will group individuals with similar attribute values which will cause each cluster to have quite different average values.

Kirk, did you actually read that sentence once you'd written it?

kirkd's sentence seems fine to me.

Quote: [...] There does NOT have to be a 1-1 linear mapping from attributes to classes. You have made the mistake of assuming that each object (person in the OPs example) has to exist in only one class.

Well, that's part of what "class" means... How is that a mistake?

Quote: Alvaro: define for me please the global objective function to be optimised in the OPs problem.

The OP wasn't specific enough as to what the global objective function is, but here's one that's compatible with what he requested: Given a group G and an attribute A, define the error function E(G,A) as the square of the difference between the average of the elements of G in attribute A and the average of all 20 people in attribute A. The global function to minimize is the sum of the errors in all groups and attributes Total_E = Sum_{G}(Sum_{A}(E(G,A))).

Timkin, don't take this as an offense, but I still believe you didn't read the description of the problem correctly (it can happen to anyone). If you think of the people as points in a Euclidean space with the dimensionality given by the number of attributes, I would understand clustering as identifying groups of points that are nearby. The OP wants the baricenters of the groups to be near each other, which in my intuition is almost the opposite of clustering.

Let's consider a much simpler situation (2 groups, 1 attribute): We have 20 friends that want to play an informal game of soccer. In order to have a good game, we want to make two teams which have similar average skill. If you use clustering to solve this problem, you'll end up putting all the strong players on one team and all the weak players on the other, which is about the opposite of what we are trying to achieve.

Does it make more sense now?

Sneftel

1,788

May 22, 2008 09:03 PM

Quote: Original post by Timkin
You have made the mistake of assuming that each object (person in the OPs example) has to exist in only one class. I could just as easily cluster in any functional space defined by any mapping of the attribute/object space.

? The OP said it. "I have to split up into 5 groups of 4." I suppose you could nitpick that sentence with extreme semantics, but "equalizing team strength" seems pretty clearly implied by the post.

Quote: Alvaro: define for me please the global objective function to be optimised in the OPs problem.

I'm not alvaro, but giving it a shot: given n people and m attributes (the value of attribute j for person i denoted aij), partitioning them into c classes (n/c people in each), where M(i,k) is 1 if person i is in group k and is 0 otherwise, with weighting wj on attributes, pick M to minimize:

Σ[0<j<m](wjΣ[0<k<c](|Σ[0<i<n](M(i,k)aij) - Σ[0<i'<n](ai'j/n)|L))

(L denotes the norm; I'm assuming the OP wants it to be 1.) How's that?

Recommend an algorithm for this?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Recommend an algorithm for this?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines