CLUTO - Software for Clustering High-Dimensional Datasets

The CLUTO data clustering package is currently distributed as a single file that contains binary distributions for Linux, Sun, OSX, and MS Windows platforms.

Getting the files
The first step in getting and using CLUTO is to download the binary distribution file. CLUTO's distribution is available as either a Unix gziped tar file or as a Windows zip file.
Available files for download
Latest experimental release (2.1.2a):
cluto-2.1.2a.tar.gz (20MB) Linux (i686/x86_64), OSX (ppc/i386), SunOS, MSWin (x86_32, x86_64)
cluto-2.1.2a.zip (20MB) Linux (i686/X86_64), OSX (ppc/i386), SunOS, MSWin (x86_32, x86_64)
Latest stable release (2.1.1):
cluto-2.1.1.tar.gz (9.3MB) Linux i686, SunOS, MSWin
cluto-2.1.1.zip (9.3MB) Linux i686, SunOS, MSWin

Older versions of CLUTO can be found here.

Installing
On Unix systems, after downloading CLUTO you need to uncompress and untar it. This is achieved by executing the following command:
gunzip cluto-xxx.tar.gz

tar -xvf cluto-xxx.tar

Of course, 'xxx' is CLUTO's release number that you have downloaded. On Windows systems, you need to have a zip/unzip program (e.g., winzip) in order to unzip the files. In general, you just need to double click on "cluto-xxx.zip" and the file will be unzipped automatically.

At this point you should have a directory named cluto-xxx. This directory contains CLUTO's library, stand-alone clustering programs, and some test datasets.

Documentation
Instructions describing how to use CLUTO can be found at cluto-xxx/manual.pdf. You can get a PDF copy from here.
Converting Documents into CLUTO's Format
A simple perl script called doc2mat that convertes a set of documents into the vector-space format used by CLUTO is provided by downloading the following tar file: doc2mat-1.0.tar.gz.

You can read its online documentation from here.

Datasets
The various datasets used in evaluating the performance of CLUTO's clustering algorithms are available by downloading datasets.tar.gz (28MB file).

Finally, the various datasets used in evaluating the Chameleon clustering algorithm are available by downloading chameleon-data.tar.gz.

Copyright & License Notice
CLUTO is copyrighted by the Regents of the University of Minnesota. It can be freely used for educational and research purposes by non-profit institutions and US government agencies only. Other organizations are allowed to use CLUTO only for evaluation purposes, and any further uses will require prior approval. The software may not be sold or redistributed without prior approval. One may make copies of the software for their use provided that the copies, are not sold or distributed, are used under the same terms and conditions.

As unestablished research software, this code is provided on an "as is'' basis without warranty of any kind, either expressed or implied. The downloading, or executing any part of this software constitutes an implicit agreement to these terms. These terms and conditions are subject to change at any time without prior notice.