Scalabale Parallel Data Mining for Association Rules

Eui-Hong (Sam) Han, George Karypis and Vipin Kumar
IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 3, pp337 - 352, 2000
Download Paper
In this paper, we propose two new parallel formulations of the Apriori algorithm that is used for computing association rules.
These new formulations, IDD and HD, address the shortcomings of two previously proposed parallel formulations CD and DD. Unlike
the CD algorithm, the IDD algorithm partitions the candidate set intelligently among processors to efficiently parallelize the step of
building the hash tree. The IDD algorithm also eliminates the redundant work inherent in DD, and requires substantially smaller
communication overhead than DD. But IDD suffers from the added cost due to communication of transactions among processors. HD
is a hybrid algorithm that combines the advantages of CD and DD. Experimental results on a 128-processor Cray T3E show that HD
scales just as well as the CD algorithm with respect to the number of transactions, and scales as well as IDD with respect to increasing
candidate set size.
Research topics: Data mining | Parallel processing | Pattern discovery