Bioinformatics

Bioinformatics has emerged as an exciting new research area giving rise to numerous challenging computational problems whose successful solution will ultimately impact every aspect of our every day life. This is currently one of the lab's main research thrust areas and is primarily designed to develop and apply data-mining and knowledge-based techniques to solve various problems arising in this field.

Our ongoing research has led to the development of clustering and classification algorithms suitable for analyzing gene expression data, DNA- and protein-sequence-based classification algorithms, highly accurate remote homology recognition and fold prediction algorithms, scalable clustering algorithms for protein sequences, and algorithms that predict various aspects of a protein's secondary and tertiary structure based on its primary sequence.

Many problems arising in bioinformatics can be formulated as classification or prediction problem instances whose goal is to gain some higher-level knowledge from primary information. Examples of such problems are gene prediction, promoter identification, protein family assignment, gene functional assignment, secondary structure prediction, fold-recognition, tertiary structure prediction, etc. Developing effective algorithms for these problems usually involves two steps. The first step is that of identifying the signals present in the data that capture the key physical/chemical/biological properties of the various objects and classes, whereas the second step is that of developing supervised machine learning algorithms that can properly model and exploit them toward the goal of building accurate classifiers. Within this context, our research is focused on identifying the right set of signals for various problems, developing novel classification algorithms, and analyzing whether or not there are sufficiently strong signals present in the datasets to allow for the effective use of computational techniques in the first place.

Our current research in this area is focusing on protein function and structure prediction. Some of the specific research projects are:

  • Improve the performance of remote homology recognition and fold prediction algorithms by designing novel and effective kernel methods that combine various observed and/or predicted signals.
  • Enhance the effectiveness of local structure prediction algorithms by designing structural alphabets that combine predictability with structure reproducibility.
  • Improve the performance of ab initio

The research over the years has been funded by a number of Federal agencies including ARL, NSF, and NIH.