Finding Frequent Patterns in a Large Sparse Graph

Michihiro Kuramochi and George Karypis
Data Mining and Knowledge Discovery, 11(3): 243-271, 2005
Download Paper
Graph-based modeling has emerged as a powerful abstraction capable of capturing in a single and
unified framework many of the relational, spatial, topological, and other characteristics that are present in a
variety of datasets and application areas. Computationally efficient algorithms that find patterns corresponding
to frequently occurring subgraphs play an important role in developing data mining-driven methodologies for
analyzing the graphs resulting from such datasets. This paper presents two algorithms, based on the horizontal
and vertical pattern discovery paradigms, that find the connected subgraphs that have a sufficient number of edge-disjoint
embeddings in a single large undirected labeled sparse graph. These algorithms use three different methods
for determining the number of edge-disjoint embeddings of a subgraph and employ novel algorithms for candidate
generation and frequency counting, which allow them to operate on datasets with different characteristics and to
quickly prune unpromising subgraphs. Experimental evaluation on real datasets from various domains show that
both algorithms achieve good performance, scale well to sparse input graphs with more than 120,000 vertices or
110,000 edges, and significantly outperform previously developed algorithms.
This is an expanded version of the SDM04 paper.
Research topics: Data mining | Graph mining | Pattern discovery