Data Mining

The goal of this project is to develop effective and computationally efficient algorithms for analyzing large volumes of data. The ultimate purpose of these analyses is to discover key and actionable information and gain insights about the underlying processes/systems that created the data (or are being described by the data).

This emerging discipline is becoming increasingly important as advances in data collection have led to the explosive growth in the amount of available data. Data mining algorithms are used extensively to analyze business, commerce, scientific, engineering, and security data and dramatically improve the effectiveness of applications in areas such as marketing, predictive modeling, life sciences, information retrieval, and engineering.

Our research was initially focused on developing high-performance scalable parallel algorithms for solving core data mining problems but in recent years, it has expanded to include research on fundamental data mining algorithms in the areas of data clustering, classification, pattern discovery, sequence mining, graph mining, and its applications in information retrieval, collaborative filtering, and bioinformatics.

Our latest research is focusing on the following areas:

  • Algorithms for finding meaningful clusters in large sparse graphs like those arising in relational/social networks and the web.
  • Large-margin and kernel-based classification algorithms with an emphasis towards algorithms that can learn arbitrary output spaces.
  • Algorithms that can mine large and complex graphs.

The research over the years has been funded by a number of Federal agencies including ARL, NSF, and NIH.