CLUTO - Software for Clustering High-Dimensional Datasets

Current version: 2.1.2, 10/18/06

CLUTO is a software package for clustering low- and high-dimensional datasets and for analyzing the characteristics of the various clusters. CLUTO is well-suited for clustering data sets arising in many diverse application areas including information retrieval, customer purchasing transactions, web, GIS, science, and biology.

CLUTO's distribution consists of both stand-alone programs and a library via which an application program can access directly the various clustering and analysis algorithms implemented in CLUTO.

Features
  • Multiple classes of clustering algorithms:
    • partitional, agglomerative, & graph-partitioning based.
  • Multiple similarity/distance functions:
    • Euclidean distance, cosine, correlation coefficient, extended Jaccard, user-defined.
  • Numerous novel clustering criterion functions and agglomerative merging schemes.
  • Traditional agglomerative merging schemes:
    • single-link, complete-link, UPGMA
  • Extensive cluster visualization capabilities and output options:
    • postscript, SVG, gif, xfig, etc.
  • Multiple methods for effectively summarizing the clusters:
    • most descriptive and discriminating dimensions, cliques, and frequent itemsets.
  • Can scale to very large datasets containing hundreds of thousands of objects and tens of thousands of dimensions.