Clustering Based On Association Rule Hypergraphs

Eui-Hong (Sam) Han, George Karypis, Vipin Kumar and Bamshad Mobasher
Workshop on Research Issues on Data Mining and Knowledge Discovery, 1997
Download Paper
Clustering in data mining is a discovery process that groups
a set of data such that the intracluster similarity is maxi-
mized and the intercluster similarity is minimized. These
discovered clusters are used to explain the characteristics
of the data distribution. In this paper we propose a new
methodology for clustering related items using association
rules, and clustering related transactions using clusters of
items. Our approach is linearly scalable with respect to the
number of transactions. The frequent item-sets used to de-
rive association rules are also used to group items into a
hypergraph edge, and a hypergraph partitioning algorithm
is used to find the clusters. Our experiments indicate that
clustering using association rule hypergraphs holds great
promise in several application domains. Our experiments
with stock-market data and congressional voting data show
that this clustering scheme is able to successfully group
items that belong to the same group. Clustering of items
can also be used to cluster the transactions containing these
items. Our experiments with congressional voting data show
that this method is quite effective in finding clusters of trans-
actions that correspond to either democrat or republican
voting patterns. Compared to the existing clustering algo-
rithm Autoclass, our algorithm produced comparable quality
clusters in the congressional voting data.
Research topics: Clustering | Data mining