Document Categorization and Query Generation on the World Wide Web Using WebACE

D. Boley, M. Gini, R. Gross, E. Han, K. Hastings, G. Karypis, V. Kumar, B. Mobasher, and J. Moore
AI Review, Vol. 11, pp 365 - 391, 1999
We present WebACE, an agent for exploring and categorizing documents on the World Wide Web based on a user profile. The heart of the agent is an unsupervised categorization of a set of documents, combined with a process for generating new queries that is used to search for new related documents and for filtering the resulting documents to extract the ones most closely related to the starting set. The document categories are not give a priori. We present the overall architecture and describe two novel algorithms which provide significant improvement of Hierarchical Agglomerative Clustering and AutoClass algorithms.
Research topics: Classification | Clustering | Data mining | Information retrieval | Text mining