ISims higher than 1


I've been experimenting data clustering with Cluto for a few weeks now, content with the results so far. However, with my latest dataset, I've been seeing results that throw me a little off balance. Negative ISims and ESims with a low number of clusters appear in the lastest dataset. However, increasing the number of clusters soon creates a cluster with ISim greater than 1. The ISim starts at about 8 and decreases to 4 with an increasing number of clusters. On the other hand, the ESim decreases in the negative. As far as I can see, after experimenting with an increasing number of cluster, the first cluster has a fixed number of objects, supposedly the same objects between each clustering solution, that has an ISim of 4 and ESim of -9. I believe this would be a fluke from my data, but seeing as I have already filtered anomalous or extreme datapoints, it is hard to narrow down the number of objects.

I have performed my tests with the rb clustering method, using the H2 and I2 criterias in order to achieve the best overall results possible (as per the document 'Criterion Functions for Document Clustering'), and all it seems to change is the number of objects in the anomalous cluster as well as the ISim and ESim, which, however, increase and decrease respectively.

Any hint would be welcome,



RE: Can you upload some of these

Can you upload some of these files to flyspay so that I can take a look at them.