Advertisement

Recommend an algorithm for this?

Started by May 21, 2008 06:07 AM
47 comments, last by Timkin 16 years, 5 months ago
Quote: Original post by Timkin
Kirk, did you actually read that sentence once you'd written it?


Well, yes. Consider:

Quote: Original post by anda
This is my problem:

Say I have a list of 20 people, who I have to split up into 5 groups of 4.

Each person has 5 integer attributes (labelled eg 0,1,2,3,4)

I want to form the 5 groups where for each of those attributes the average is pretty close compared to the other groups' averages of each of those attributes.

eg

Group 1 (avg for attribute1 is 4, avg for attribute2 is 3, etc)
Group 2 (avg for attribute1 is 4.5, avg for attribute2 is 2.8, etc)

etc



Quote: Original post by Timkin
you're dictating that you want 5 classes based on 5 attributes... and you want to minimise the variance in each attribute within a class (find things that are most alike).



From my reading of the OP's post, this approach does not agree with the original intent. Perhaps I'm reading that incorrectly - could you explain, please?

Then:

Quote: Original post by Timkin
The definition of a cluster is algorithm independent. I can choose to cluster based on single attributes, or any function of any number of attributes. There does NOT have to be a 1-1 linear mapping from attributes to classes. You have made the mistake of assuming that each object (person in the OPs example) has to exist in only one class. I could just as easily cluster in any functional space defined by any mapping of the attribute/object space.


That is true, but you've already stated in your first post:

Quote: Original post by Timkin
you're dictating that you want 5 classes based on 5 attributes... and you want to minimise the variance in each attribute within a class (find things that are most alike).


Which is exactly what I said:

Quote: Original post by KirkD
Clustering will group individuals with similar attribute values
Quote: Original post by anda
I want to form the 5 groups where for each of those attributes the average is pretty close compared to the other groups' averages of each of those attributes.

For each attribute, we want the spread to be similar across groups? Sort on each dimension (in the chosen ranked order), and then take from the list in round-robin fashion to fill the groups.

[Edited by - AngleWyrm on May 23, 2008 2:13:03 AM]
--"I'm not at home right now, but" = lights on, but no ones home
Advertisement
Quote: Original post by AngleWyrm
Quote: Original post by anda
I want to form the 5 groups where for each of those attributes the average is pretty close compared to the other groups' averages of each of those attributes.

For each attribute, we want the spread to be similar across groups? Sort on each dimension (in the chosen ranked order), and then take from the list in round-robin fashion to fill the groups.

Which is analagous to a "let's pick teams" approach where the function of selection is completely objective based on skill (i.e. no friend bias, etc.). However, there are some flaws to this based on what the spread of the skills are and the location of any outliers.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

Quote: Original post by AngleWyrm
Quote: Original post by anda
I want to form the 5 groups where for each of those attributes the average is pretty close compared to the other groups' averages of each of those attributes.

For each attribute, we want the spread to be similar across groups? Sort on each dimension (in the chosen ranked order), and then take from the list in round-robin fashion to fill the groups.


I think that this approach will also suffer if there is a lack of rank-ordering of the attributes. The OP mentioned that each attribute's average should be similar across groups/classes. If the attributes are not correlated and there is no rank-order, sorting won't solve the problem.

To me it comes down to a problem of combinatorics and sampling.
Quote: Original post by kirkd
To me it comes down to a problem of combinatorics and sampling.

Which brings us back to GAs with an appropriate fitness function.

Dave Mark - President and Lead Designer of Intrinsic Algorithm LLC
Professional consultant on game AI, mathematical modeling, simulation modeling
Co-founder and 10 year advisor of the GDC AI Summit
Author of the book, Behavioral Mathematics for Game AI
Blogs I write:
IA News - What's happening at IA | IA on AI - AI news and notes | Post-Play'em - Observations on AI of games I play

"Reducing the world to mathematical equations!"

Quote: Original post by InnocuousFox
Which brings us back to GAs with an appropriate fitness function.


You read my mind. Thankfully you didn't use braille. 8^)

Advertisement
Quote: Original post by kirkd
The OP mentioned that each attribute's average should be similar across groups/classes. If the attributes are not correlated...
If they are not correlated, then there is no similarity across groups/classes. A similar average across groups is a correlation.

Choosing an 'appropriate fitness function' will still need to make the decision about grouping (1,2) with (1,3) or (3,1).

[Edited by - AngleWyrm on May 24, 2008 1:08:55 AM]
--"I'm not at home right now, but" = lights on, but no ones home
Quote: Original post by AngleWyrm
Quote: Original post by kirkd
The OP mentioned that each attribute's average should be similar across groups/classes. If the attributes are not correlated...
If they are not correlated, then there is no similarity across groups/classes. A similar average across groups is a correlation.


I think you're misinterpreting the intent of the OP. The intent is to have the average for the attributes to be similar across classes, but the average between attributes is irrelevant. If the attributes are correlated, then sorting any one will lead to at least a partial sorting of the correlated ones, and the sorting method will work fine. If they are not correlated, sorting any one will not lead to an uncorrelated one to become sorted. A secondary sorting on the uncorrelated attribute will lead to a disordering of the previously sorted one, and thus destroy any uniform distribution-based choosing method.




For 'random' start groups (before applying adjustments to move towards an optimal solution), you could do a preprocessing stage to build a sorted list of averaged ranking (average all 5 values and order list high to low).
You would then select your groups either by the 'draft pick method' or by altenately taking members from the top and then bottom of the list (fortunate that the groups are 4 ....).

The processing for the pre sorting should be trivial and having a semi-balance start state for the main algorithm may cut down the number of iterations of adjustments (versus what you could get from a purely random start set...)
--------------------------------------------[size="1"]Ratings are Opinion, not Fact
Quote: Original post by Sneftel

I'm not alvaro, but giving it a shot: given n people and m attributes (the value of attribute j for person i denoted aij), partitioning them into c classes (n/c people in each), where M(i,k) is 1 if person i is in group k and is 0 otherwise, with weighting wj on attributes, pick M to minimize:

Σ[0<j<m](wjΣ[0<k<c](|Σ[0<i<n](M(i,k)aij) - Σ[0<i'<n](ai'j/n)|L))

(L denotes the norm; I'm assuming the OP wants it to be 1.) How's that?



Hmm I'm not sure I really understand, it looks pretty cryptic.



I've since completed a basic scoring on how mismatched a system is, by calculating the difference between each group's attribute means against the global mean.

I have another value 'm' which says atleast how small the groups' scores(the mismatch) should be before it is considered good enough.

With weighting I'd multiply it against the difference for the attributes, and I think I'd have to update the value m some how.


Anyway now how would I decide which student to move from which group to where for each round?


I've got a few ideas like:
- find a person in a group that when removed and replaced with a dummy person who has the mid values for each attribute (=2.5, each attribute is from 0 to 5) makes the group's score lower. And then something....

- randomly compare people from each group and see if swapping them lowers both scores or reduces out of both the groups a certain chunk of score.

Thanks.


edit:
Sorry if I have ignored any posts, i'm going through it all now.

This topic is closed to new replies.

Advertisement