Multi-Assay-Based SAR Models: Improving SAR Models by Incorporating Activity Information from Related Targets

Xia Ning, Huzefa Rangwala, and George Karypis
J. Chem. Inf. Modeling, 2009
Download Paper
Abstract
Structure-activity relationship (SAR) models are used to inform and to guide the iterative optimization of chemical leads, and they play a fundamental role in modern drug discovery. In this paper, we present a new class of methods for building SAR models, referred to as multi-assay based, that utilize activity information from different targets. These methods first identify a set of targets that are related to the target under consideration, and then they employ various machine learning techniques that utilize activity information from these targets in order to build the desired SAR model. We developed different methods for identifying the set of related targets, which take into account the primary sequence of the targets or the structure of their ligands, and we also developed different machine learning techniques that were derived by using principles of semi-supervised learning, multi-task learning, and classifier ensembles. The comprehensive evaluation of these methods shows that they lead to considerable improvements over the standard SAR models that are based only on the ligands of the target under consideration. On a set of 117 protein targets, obtained from PubChem, these multi-assay-based methods achieve a receiver-operating characteristic score that is, on the average, 7.0 -7.2% higher than that achieved by the standard SAR models. Moreover, on a set of targets belonging to six protein families, the multi-assay-based methods outperform chemogenomics based approaches by 4.33%.
Research topics: Cheminformatics | Data mining