CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling
|George Karypis, Eui-Hong Han, Vipin Kumar|
|IEEE Computer 32(8): 68-75, 1999|
Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maximized and the intercluster similarity is minimized. Existing clustering algorithms, such as K-means, PAM, CLARANS, DBSCAN, CURE, and ROCK are designed to find clusters that fit some static models. These algorithms can breakdown if the choice of parameters in the static model is incorrect with respect to the data set being clustered, or if the model is not adequate to capture the characteristics of clusters. Furthermore, most of these algorithms breakdown when the data consists of clusters that are of diverse shapes, densities, and sizes. In this paper, we present a novel hierarchical clustering algorithm called CHAMELEON that measures the similarity of two clusters based on a dynamic model. In the clustering process, two clusters are merged only if the inter-connectivity and closeness (proximity) between two clusters are high relative to the internal inter-connectivity of the clusters and closeness of items within the clusters. The merging process using the dynamic model presented in this paper facilitates discovery of natural and homogeneous clusters. The methodology of dynamic modeling of clusters used in CHAMELEON is applicable to all types of data as long as a similarity matrix can be constructed. We demonstrate the effectiveness of CHAMELEON in a number of data sets that contain points in 2D space, and contain clusters of different shapes, densities, sizes, noise, and artifacts. Experimental results on these data sets show that CHAMELEON can discover natural clusters that many existing state-of-the art clustering algorithms fail to find.
A partial implementation of the CHAMELON algorithm is available in the CLUTO software package.
|Research topics: Clustering | CLUTO | Data mining|