I''m not sure if this is really an AI question, but since it''s an activity that humans can perform without difficulty, I figured it might fit in here.
I have a set of (between 100-1000) nodes, each of which has a 2-way connection to a number (maybe 10%) of the other nodes. Each connection has a weight, which I''ll call a distance. The weights are entirely arbitrary so there is no guarantee that the graph could exist in 2D or 3D space. In fact, these weights are based on how many humans judge the 2 nodes to be related. There should be a high level of transivity in that the distance between A and C should correlate highly to the total of the distances between A and B and B and C.
Given this description, is there any way of finding areas of the graph that are tightly clustered, for the purposes of dividing it up into arbitrary sections? One system I''ve looked at is fuzzy clustering, but this appears to have some limitations.
[ MSVC Fixes | STL Docs | SDL | Game AI | Sockets | C++ Faq Lite | Boost
Asking Questions | Organising code files | My stuff | Tiny XML | STLPort]
Finding clusters in a graph
quote:
Original post by Kylotan
I have a set of (between 100-1000) nodes, each of which has a 2-way connection to a number (maybe 10%) of the other nodes. Each connection has a weight, which I''ll call a distance. The weights are entirely arbitrary so there is no guarantee that the graph could exist in 2D or 3D space. In fact, these weights are based on how many humans judge the 2 nodes to be related. There should be a high level of transivity in that the distance between A and C should correlate highly to the total of the distances between A and B and B and C.
Given this description, is there any way of finding areas of the graph that are tightly clustered, for the purposes of dividing it up into arbitrary sections?
This is a very conventional clustering problem. Try hierarchical clustering, which can accept distances between items as opposed to their features (coordinates). Any good statistical software should provide this functionality.
-Predictor
http://will.dwinnell.com
I don''t have access to any statistical software, and need to code something like this myself. Any good references? I found this:
http://www.analytictech.com/networks/hiclus.htm
but everything else is way over my head, not being a mathematician. Ideally I''d like to see an algorithm described.
[ MSVC Fixes | STL Docs | SDL | Game AI | Sockets | C++ Faq Lite | Boost
Asking Questions | Organising code files | My stuff | Tiny XML | STLPort]
http://www.analytictech.com/networks/hiclus.htm
but everything else is way over my head, not being a mathematician. Ideally I''d like to see an algorithm described.
[ MSVC Fixes | STL Docs | SDL | Game AI | Sockets | C++ Faq Lite | Boost
Asking Questions | Organising code files | My stuff | Tiny XML | STLPort]
The basic idea behind hierarchical clustering is to either: 1. start with every item as its own cluster, and repeatedly merge, or 2. start with one cluster including all items and repeatedly split. The complication comes in the details. There are many ways to perform hierarchical clustering (for instance, does one consider the distance between two clusters by their closest points? their furthest points? something else?), resulting in many (!) variations.
This type of clustering is used in many fields, and is very easy to program. You should be able to find alot of material online. Here is a beginning:
http://www.resample.com/xlminer/help/HClst/HClst_intro.htm
http://www.crmportals.com/hierarchical_cluster_analysis.pdf
http://149.170.199.144/multivar/hc.htm
http://www.analytictech.com/networks/hiclus.htm
Good luck!
-Predictor
http://will.dwinnell.com
This type of clustering is used in many fields, and is very easy to program. You should be able to find alot of material online. Here is a beginning:
http://www.resample.com/xlminer/help/HClst/HClst_intro.htm
http://www.crmportals.com/hierarchical_cluster_analysis.pdf
http://149.170.199.144/multivar/hc.htm
http://www.analytictech.com/networks/hiclus.htm
Good luck!
-Predictor
http://will.dwinnell.com
Yeah, I took another look at the stuff and it''s actually really easy, so sorry for asking for more details when they weren''t really necessary.
Do any of those links hint at the benefits and downsides of this method, or mention how to make a decision on when to stop merging clusters?
[ MSVC Fixes | STL Docs | SDL | Game AI | Sockets | C++ Faq Lite | Boost
Asking Questions | Organising code files | My stuff | Tiny XML | STLPort]

[ MSVC Fixes | STL Docs | SDL | Game AI | Sockets | C++ Faq Lite | Boost
Asking Questions | Organising code files | My stuff | Tiny XML | STLPort]
Another method which has gained favor over hierarchical clustering in some domains is K-means clustering. Again, very simple to implement but requires that you provide the number of clusters you expect a priori.
You could also look at partitioning which is effectivley clustering in reverse.
And if you want to get realy fancy, there''s the SOM option.
Just my $0.02.
-Kirk
You could also look at partitioning which is effectivley clustering in reverse.
And if you want to get realy fancy, there''s the SOM option.
Just my $0.02.
-Kirk
I suppose you could iteratively train K-means by picking a random number of clusters, measuring the ''quality'' of the clustering, and adjusting the number of clusters accordingly.
However, does the standard K-means algorithm require that all the points fit in Euclidean space? I could have a situation where A is 1 point away from B and B is 1 point away from C, yet A is 40 points away from C. Obviously this wouldn''t work in 2D or 3D space and any algorithm that assumes such properties may fail. By looking at it, it would appear that K-means would expect the distance between A and C to be ''reasonable'', which I can''t guarantee
I suppose I could run some sort of pre-processing on the graph to attempt to compensate for this, however.
[ MSVC Fixes | STL Docs | SDL | Game AI | Sockets | C++ Faq Lite | Boost
Asking Questions | Organising code files | My stuff | Tiny XML | STLPort]
However, does the standard K-means algorithm require that all the points fit in Euclidean space? I could have a situation where A is 1 point away from B and B is 1 point away from C, yet A is 40 points away from C. Obviously this wouldn''t work in 2D or 3D space and any algorithm that assumes such properties may fail. By looking at it, it would appear that K-means would expect the distance between A and C to be ''reasonable'', which I can''t guarantee

[ MSVC Fixes | STL Docs | SDL | Game AI | Sockets | C++ Faq Lite | Boost
Asking Questions | Organising code files | My stuff | Tiny XML | STLPort]
I don''t think any clustering method is totally dependent upon Euclidean conditions. As long as you''ve defined a distance metric which behaves properly, i.e., as a metric, I don''t think it will matter. Usually Euclidean distances are the basis due to their ease of understanding.
I could be wrong, however. It happened once before. 8^)
-Kirk
I could be wrong, however. It happened once before. 8^)
-Kirk
quote:
Original post by kirkd
Another method which has gained favor over hierarchical clustering in some domains is K-means clustering. Again, very simple to implement but requires that you provide the number of clusters you expect a priori.
You could also look at partitioning which is effectivley clustering in reverse.
And if you want to get realy fancy, there's the SOM option.
Yes, but both of these methods require coordinates, as opposed to distances. Synthetic coordinates may be derived from distances, but may require many dimensions for an accurate mapping. Consider that any N points may theoretically require as many as N-1 dimensions for accurate coordinates. For example, in the worst case scenario, 3 points (on the vertices of a triangle, not lying in a line segment) will require 2 dimensions. With hundreds or thousands of points, one may very well require hundreds or thousands of dimensions for accurate representation. Converting to coordinates, by multidimensional scaling, etc. adds another step and makes this solution clumsy.
While I tend to reach for centroid-based clustering algorithms such as you mention, I think they'd be an awkward fit to this problem.
[edited by - Predictor on September 30, 2003 8:42:43 PM]
Predictor: I was always under the impression that K-means clustering clustered each node in the space to the nearest cluster point by some (possibly abstracted) distance metric. Why would it need coordinates per-se? Perhaps in some possible state space it would be possible to have non-point attractors in the process where a change in the cluster positions caused a change in the clustering of node n which caused a change in the cluster positions which caused a change in the clustering of node n in a continuous cycle. I imagine that this cannot happen in Euclidean geometry and could happen in other state spaces, however, I don''t think this precludes use with _any_ distance metric, it just might be problematic.
Am I mis-understanding your use of the term coordinates?
I look forward to elucidation on this point
Mike
Am I mis-understanding your use of the term coordinates?
I look forward to elucidation on this point

Mike
This topic is closed to new replies.
Advertisement
Popular Topics
Advertisement
Recommended Tutorials
Advertisement