Gene Classification using Expression Profiles: A Feasibility Study

Michihiro Kuramochi and George Karypis
2nd IEEE International Conference on Bioinformatics and Bioengineering (BIBE), pp. 191-200, 2001
Download Paper
As various genome sequencing projects have already been completed or are near completion, genome researchers are shifting their focus from structural genomics to functional genomics. Functional genomics represents the next phase, that expands the biological investigation to studying the functionality of genes of a single organism as well as studying and correlating the functionality of genes across many different organisms. Recently developed methods for monitoring genome-wide mRNA expression changes hold the promise of allowing us to inexpensively gain insights into the function of unknown genes. In this paper we focus on evaluating the feasibility of using supervised machine learning methods for determining the function of genes based solely on their expression pro les. We experimentally evaluate the performance of traditional classification algorithms such as support vector machines and k -nearest neighbors on the yeast genome, and present new approaches for classification that improve the overall recall with moderate reductions in precision. Our experiments show that the accuracies achieved for different classes varies dramatically. In analyzing these results we show that the achieved accuracy is highly dependent on whether or not the genes of that class were significantly active during the various experimental conditions, suggesting that gene expression profiles can become a viable alternative to sequence similarity searches provided that the genes are observed under a wide range of experimental conditions.
Research topics: Bioinformatics | Classification | Data mining