AFGen: Fragment-based Descriptors for Chemical Compounds

Current version: 2.0.0, 5/12/2008

AFGen is a program that takes as input a set of chemical compounds and generates their vector-space representation based on the set of fragment-based descriptors they contain. The descriptor space consists of graph fragments that can have three different types of topologies: paths (PF), acyclic subgraphs (AF), and arbitrary topology subgraphs (GF). This vector-based representation can be used for different tasks in cheminformatics including similarity search, virtual screening, and library design.

These descriptors are quite effective in capturing the structural characteristics of chemical compounds. Experiments in the context of SVM-based classification and ranked-retrieval show that these descriptors consistently and statistically outperform previously developed schemes based on the widely used fingerprint- and Maccs keys-based descriptors, as well as recently introduced descriptors obtained by mining and analyzing the structure of the molecular graphs.