interpretation of cluster results

Hi All,

I am the first time user for the cluto clustering software. I am using a text file (around 20klines) to categorize the words into clusters(classes). I am using the following command:

bin/Linux-x86_64/vcluster test.mat 500 -rlabelfile=test.mat.rlabel -clabelfile=test.mat.clabel -showfeatures -nfeatures=10000 -clmethod=graph -clustercolumns -sim="cos" -crfun="wupgma" -niter=40 -ntrials=40 -agglofrom=500 -nnbrs=40 >clustered.text.out

In the clustered.text.out file, i see the almost all words are present in all clusters,

Can anyone please write :
[1] correct interpretation of the clustered.text.out file.?
[2] what is the command to assign a cluster to each word
[3] Which one among "Descriptive:" or "Discriminating:" cluster is the right one?

quick replies are highly appreciated

Thanks in Advance
--
bams

Submitted by bams on Thu, 2011-06-09 12:28

RE: If you want to cluster the

If you want to cluster the terms (columns), you may want to consider transposing the input file and using cluto to explicitly cluster the terms.

In terms of which of the descriptive/discriminating features are better representatives for clustering of the terms, the answer is the descriptive features.

When you specify -nfeatures=10000, then for each cluster it will report 10000 features, which most likely will overlap across the clusters. You may want to reduce that to something smaller; however, you will still have overlap across the clusters.

Submitted by karypis on Fri, 2011-06-10 07:13.

Navigation Menu

interpretation of cluster results

RE: If you want to cluster the