BAMBOO: Itemset Mining by Deeply Pushing the Length-Decreasing Support Constraint

Jianyong Wang and George Karypis
SIAM International Conference on Data Mining, 2004
Download Paper
Abstract
Previous study has shown that mining frequent patterns with length-decreasing support constraint is very helpful in removing some uninteresting patterns based on the observation that short patterns will tend to be interesting if they have a high support, whereas long patterns can still be very interesting even if their support is relatively low. However, a large number of non-closed (i.e., redundant) patterns can still not be filtered out by simply applying the length-decreasing support constraint. As a result, a more desirable pattern discovery task could be mining closed patterns under the length-decreasing support constraint.

In this paper we study how to push deeply the length-decreasing support constraint into closed itemset mining, which is a particularly challenging problem due to the fact that the downward-closure property cannot be used to prune the search space. Therefore, we have proposed several pruning methods and optimization techniques to enhance the closed itemset mining algorithm, and developed an efficient algorithm, BAMBOO. Extensive performance study based on various length-decreasing support constraints and datasets with different characteristics has shown that BAMBOO not only generates more concise result set, but also runs orders of magnitude faster than several efficient pattern discovery algorithms, including CLOSET+, CFP-tree and LPMiner. In addition, BAMBOO also shows very good scalability in terms of the database size.

Research topics: Data mining | Pattern discovery