Advertisement

Finding clusters in a graph

Started by September 29, 2003 06:43 AM
26 comments, last by Kylotan 21 years, 4 months ago
quote:
Original post by Timkin
...I''m not as enthusiastic about methods such as split and merge (heirarchical clustering) or k-means/fuzzy k-means.



Could you elaborate on why? I don''t believe that anyone here is arguing that hierarchical clustering (or any of the other algorithms) is the "be all and end all" of clustering, just that they are useful tools which have proven effective for many problems. If hierarchical clustering hasn''t solved this problem, it certainly seems to be a good start:


quote:
Original post by Kylotan
...I implemented hierarchical clustering last night and had a reasonable degree of success using the median distance between cluster members. ... So the median method tended to give reasonably accurate clusters while not forcing the special cases into clusters they really didn''t belong in.






quote:
Original post by Predictor
quote:
Original post by Timkin
...I''m not as enthusiastic about methods such as split and merge (heirarchical clustering) or k-means/fuzzy k-means.



Could you elaborate on why? I don''t believe that anyone here is arguing that hierarchical clustering (or any of the other algorithms) is the "be all and end all" of clustering, just that they are useful tools which have proven effective for many problems. If hierarchical clustering hasn''t solved this problem, it certainly seems to be a good start


I''m not suggesting that methods such as split and merge or k-means don''t have their appropriate domain(s) of application, nor that they aren''t useful as a starting point. Simply that for the sorts of segmentation and clustering problems I have worked on, I have found other methods gave better results. For example, MML-based clustering. I find that the performance of most common clustering methods is greatly dependent on the level of noise in the data and they have particular difficulty in dealing with overlapping classes, since they cannot inherently describe partial assignment (although admittedly you can hybridise the method to do this).

In my work on segmenting MRI brain images we have to overcome two serious problems: 1) Sensor noise; and, 2) partial volume effects near tissue boundaries. Methods like split and merge (which for those that don''t know is a combination of top down decomposition and bottom up composition of classes) give acceptable results with regards to noise, but perform poorly on regions showing partial volume. When these two problems are combined, these algorithms tend to fall apart. The benefit of something like an MML based description of classes is that it describes a joint distribution over all things and all classes. ''Snob'' is a good example implementation of an MML based clustering algorithm.

We''ve developed our own in-house algorithm for 3D image segmentation and this outperforms other methods we''ve tested it against. The only comparison we''re yet to perform is against ''FAST''. We''re currently in the process of optimising our implementation and formalising our results and putting some papers together. I''ll be happy to share them with this group once they''re published.

Cheers,

Timkin
Advertisement
Timkin,

I, for one, would love to see some reprints/preprints once you have something publishable. Where are you planning to submit?

-Kirk
That''s still to be worked out and will, as usual, depend on the quality of our results and the written paper.
At the moment, hierarchical clustering seems to be good enough but I''m gonna add more data to see if it continues to be that way. It seems that tweaking the algorithm is going to be limited to picking a different value to the median... say the 40th percentile or the 60th instead, and that this is going to have to change to reflect the number of nodes that have zero similarity. (As if 90% of node pairs have zero similarity then using the median isn''t going to form many clusters, I expect.)

I would consider other methods, but they''d have to be simple! I''m a programmer, not a mathematician Sometimes the formulae I wade through in some of these papers are a lot longer than the code required to implement them

[ MSVC Fixes | STL Docs | SDL | Game AI | Sockets | C++ Faq Lite | Boost
Asking Questions | Organising code files | My stuff | Tiny XML | STLPort]
quote:
Original post by Kylotan
Sometimes the formulae I wade through in some of these papers are a lot longer than the code required to implement them



I find that this arises from having Computer Scientists playing at being Mathematicians!
Advertisement
I wish someone would write a "For Dummies" version with lots of colourful pictures.

There is a Cartoon Guide to Statistics (Larry Gonick, Woollcott Smith) so how about a cartoon guide to density estimation / clustering algorithms? Timkin, I think you were talking about writing up a segmentation primer for gamedev. How about it?

Will
------------------http://www.nentari.com
Oh no... I''ve been evil''d!!! I''m taking a month off work in November to do some writing... once that''s done I''ll see how I go in December... but don''t expect anything before Christmas!

Timkin

This topic is closed to new replies.

Advertisement