Efficient Algorithms for Creating Product Catalogs

Michael Steinbach, George Karypis and Vipin Kumar
WebMining Workshop at SIAM 1st Data Mining Conference, 2001
Download Paper
For the purposes of this paper we define a catalog to be a promotional catalog, i.e., a collection of products (items) presented to a customer with the hope of encouraging a purchase. The single mailing problem addresses how to build a collection of catalogs and distribute them to customers (one per customer) so as to achieve an optimal outcome, e.g., the most profit. Each catalog is a subset of a given set of items and different catalogs may contain some of the same items, i.e., catalogs may overlap. A slightly more general, but important extension of the single mailing problem seeks the optimal set of catalogs when multiple mailings are allowed, i.e., multiple catalogs can be sent to each customer. Catalog creation has important applications for e-commerce and traditional brick-and-mortar retailers, especially when used with personalized recommender systems.

The catalog creation problem is NP complete and some (relatively expensive) approximation algorithms have recently been developed. In this paper we describe more efficient techniques for building catalogs and show that these algorithms outperform one of the previously suggested approaches. (Indeed, the techniques previously suggested are not feasible for realistic numbers of customers and catalogs.) Some of our techniques directly use the objective function, e.g., maximize profit, to find a locally optimal solution in an approach based on gradient ascent. However, by combining such techniques with the clustering of similar customers, better results can sometimes be obtained. We also analyze the performance of our algorithms with respect to a theoretical bound and show that, in some cases, their performance is close to optimal.

Research topics: Clustering | Data mining | E-commerce