Building Multiclass Classifiers for Remote Homology Detection and Fold Recognition

Huzefa Rangwala and George Karypis
BMC Bioinformatics, Jan 15;23(2):e17-23., 2006
Download Paper
Abstract
Motivation
Protein remote homology prediction and fold recognition are central problems
in computational biology. Supervised learning algorithms based on support
vector machines are currently one of the most effective methods for solving these
problem. These methods are primarily used to solve binary classification problems
and they have not been extensively used to solve the more general multiclass remote
homology prediction and fold recognition problems.

Methods
We developed a number of methods for building SVM-based multiclass
classification
schemes in the context of the SCOP protein classification. These
methods
includes schemes that directly build an SVM-based multiclass model, schemes
that
employ a second level learning approach to combine the predictions generated
by
a set of binary SVM-based classifiers, and schemes that build and combine
binary
classifiers for various levels of the SCOP hierarchy beyond those defining
the
target classes.

Results
We performed a comprehensive study analyzing the different approaches using
four different
datasets. Our results show that most of the proposed multiclass SVM-based
classification
approaches are quite effective in solving the remote homology prediction and
fold
recognition problems and that the schemes that use predictions from binary
models
constructed for ancestral categories within the SCOP hierarchy tend to
qualitatively
improve the prediction results.

Comments
Supplementary material is available [hlink:[http://bioinfo.cs.umn.edu/supplements/mc-fold/][here]].
Research topics: Bioinformatics | Classification | Data mining