PAFI Software Package for Finding Frequent Patterns in Diverse Datasets

Current version: 1.0.1, 7/7/2003

PAFI is a software package that contains a set of programs that can be used to find frequent patterns in large and diverse databases.

The current release of PAFI includes three different pattern discovery programs:

LPMiner
It is designed to find patterns corresponding to itemsets in a transaction database.
SLPMiner
It is designed to find patterns corresponding to sub-sequences in a sequential database.
FSG
It is designed to find patterns corresponding to connected undirected subgraphs in an undirected graph database.

These programs can be used to mine a wide-range of datasets arising in commercial, information retrieval, and scientific applications.

All three programs can be used to find patterns that satisfy a constant minimum support. A key feature of LPMiner and SLPMiner is that they can find long frequent patterns without finding a large number of short patterns that are often useless. This is achieved by using length-decreasing support constraints, where the minimum occurrence frequency of a pattern is given as a non-increasing function of pattern length.

PAFI's pattern discovery programs usually provide three additional functionalities. First, all three programs can generate maximal frequent patterns. A maximal frequent pattern is a frequent pattern that is not contained by any other frequent patterns. Generally, the number of maximal frequent patterns is much smaller than the number of all the frequent patterns, leading to higher readability of frequent pattern files. Second, SLPMiner and FSG can generate transaction-ID lists (TID-lists) indicating which sequences or graph transactions support a particular frequent pattern. Third, all three programs can generate parent-children lists (PC-lists) that can be used to construct the frequent pattern lattice.