An Efficient Algorithm for Discovering Frequent Subgraphs

Michihiro Kuramochi and George Karypis
IEEE Trans. Knowl. Data Eng. 16(9): 1038-1051, 2004
Download Paper
Over the years, frequent itemset discovery algorithms have been used to find interesting patterns in various application areas. However, as data mining techniques are being increasingly applied to non-traditional domains, existing frequent pattern discovery approach cannot be used. This is because the transaction framework that is assumed by these algorithms cannot be used to effectively model the datasets in these domains. An alternate way of modeling the objects in these datasets is to represent them using graphs. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequently over the entire set of graphs. In this paper we present a computationally efficient algorithm, called FSG, for finding all frequent subgraphs in large graph databases. We experimentally evaluate the performance of FSG using a variety of real and synthetic datasets. Our results show that despite the underlying complexity associated with frequent subgraph discovery, FSG is effective in finding all frequently occurring subgraphs in datasets containing over 100,000 graph transactions and scales linearly with respect to the size of the database.
This paper describes an improved version of the FSG algorithm, which was initial presented in the ICDM01 paper. This improved algorithm is what has been incorporated in [hlink:[pafi/overview][PAFI]].
Research topics: Cheminformatics | Data mining | Graph mining | PAFI | Pattern discovery