Efficient Closed Pattern Mining in the Presence of Tough Block Constraints

Krishna Gade, Jianyong Wang, and George Karypis
10th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2004
Download Paper
In recent years, various constrained frequent pattern mining problem formulations and associated algorithms have been developed that enable the user to specify various itemset-based constraints that better capture the underlying application requirements and characteristics. In this paper we introduce a new class of block constraints that determine the significance of an itemset pattern by considering the dense block that is formed by the pattern's items and its associated set of transactions. Block constraints provide a natural framework by which a number of important problems can be specified and make it possible to solve numerous problems on binary and real-valued datasets. However, developing computationally efficient algorithms to find these block constraints poses a number of challenges as unlike the different itemset-based constraints studied earlier, these block constraints are tough as they are neither anti-monotone, monotone, nor convertible. To overcome this problem, we introduce a new class of pruning methods that can be used to significantly reduce the overall search space and make it possible to develop computationally efficient block constraint mining algorithms. We present an algorithm called CBMiner that takes advantage of these pruning methods to develop an algorithm for finding the closed itemsets that satisfy the block constraints. Our extensive performance study shows that CBMiner generates more concise result set and can be order(s) of magnitude faster than the traditional frequent closed itemset mining algorithms.
Research topics: Data mining | Pattern discovery