Clustering Based On Association Rule Hypergraphs
Eui-Hong (Sam) Han, George Karypis, Vipin Kumar and Bamshad Mobasher |
Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997 |
Download Paper |
Abstract Clustering in data mining is a discovery process that groups a set of data such that the intracluster similarity is maxi- mized and the intercluster similarity is minimized. These discovered clusters are used to explain the characteristics of the data distribution. In this paper we propose a new methodology for clustering related items using association rules, and clustering related transactions using clusters of items. Our approach is linearly scalable with respect to the number of transactions. The frequent item-sets used to de- rive association rules are also used to group items into a hypergraph edge, and a hypergraph partitioning algorithm is used to find the clusters. Our experiments indicate that clustering using association rule hypergraphs holds great promise in several application domains. Our experiments with stock-market data and congressional voting data show that this clustering scheme is able to successfully group items that belong to the same group. Clustering of items can also be used to cluster the transactions containing these items. Our experiments with congressional voting data show that this method is quite effective in finding clusters of trans- actions that correspond to either democrat or republican voting patterns. Compared to the existing clustering algo- rithm Autoclass, our algorithm produced comparable quality clusters in the congressional voting data. |
Research topics: Clustering | Data mining |