Protein Structure Prediction using String Kernels

Huzefa Rangwala, Kevin DeRonne, and George Karypis
UMN CSE 06-005, 2006
Download Paper
With recent advances in large scale sequencing technologies, we have seen an exponential growth in protein sequence information. Currently,
our ability to produce sequence information far out-paces the rate at which we can produce structural and functional information.
Consequently, researchers increasingly rely on computational techniques to extract useful information from known structures contained in
large databases, though such approaches remain incomplete. As such, unraveling the relationship between pure sequence information and
three dimensional structure remains one of the great fundamental problems in molecular biology.
In this report we aim to show several ways in which researchers try to characterize the structural, functional and evolutionary nature of
proteins. Specifically, we focus on three common prediction problems, secondary structure prediction, remote homology and fold prediction.
We describe a class of methods employing large margin classifiers with novel kernel functions for solving these problems, supplemented with
a thorough evaluation study.
Research topics: Bioinformatics | Data mining