The following files constitute AFGen's distribution:
AFGen
| This is the Linux binary.
| AFGenWin.exe
|
| This is the MS Windows binary. It is a console program and should be executed
from within a command-line window (i.e., cmd in Windows XP).
| manual.html
|
| This is AFGen's documentation (i.e., this file) | sample.sdf
|
| This is a sample input file containing a set of compounds. | afpaper.pdf
|
| A paper containing a comprehensive experimental evaluation of the descriptors generated by AFGen.
| VERSION
|
| This file contains the version number of the distribution.
| LICENSE
|
| This file contains the copyright notice and license information.
| |
Usage |
AFGen [options] <input file> |
Input |
AFGen's only required parameter is a file that stores the input compounds.
AFGen supports the sdf and mol2 file formats that are selected based
on the extension of the input file. If the file has an ".sdf"
extension, then AFGen expects the compounds to be specified in SDF format; whereas
if the extension is ".mol2", AFGen assumes that the file is
in Mol2 format.
For more information on SDF file format visit MDL
and for Mol2 format visit TRIPOS.
Note that only a single input file is allowed that stores all the compounds to be analyzed. |
Output |
The output consists of two files: fragfile
and descrfile. The
fragfile contains the fragments that were generated by AFGen, whereas
the descrfile contains the fragment-based
representation of each input compound (i.e., the descriptor representation).
The fragments are stored using the same format as that used by the input compounds (i.e., sdf or mol2). The name of the fragfile is derived from the name of the input file by appending a "_frags.sdf" or "_frags.mol2" on the input filename's filestem.
Example 1:
If the input filename is "mycompounds.sdf", the name of the fragfile is "mycompounds_frags.sdf". For labeling purposes, AFGen assigns to each of these fragments an identifier from 1 to N, where N is the total number of unique fragments that were generated. The descriptor-based representation of the compounds is stored in the descrfile file. The name of descrfile is derived from the name of the input file by appending a ".out" on the input filename's filestem.
Example 2:
The descrfile contains as many lines as the number of compounds
and the ith line stores the descriptor-based representation of the ith
input compound. The descriptor-based representation of each compound is a comma separated
list whose first entry is the compound's identifier (as specified in the input file) followed
by a list of (fragment-identifiers, occurrence-frequency) pairs.
If the input filename is "mycompounds.sdf", the name of the descrfile is "mycompounds.out".
Example 3:
"Benzene",2,1,10,1,58,2,64,1 ... "Folic Acid",10,1,50,4 ... ... ... In this example the compound "Benzene" has fragments 2, 10, 58 etc. and their frequencies are 1, 1, and 2 respectively. The fragment identifiers correspond to the numerical identifiers assigned to these fragments in fragfile (i.e., from 1 to N). Note that the occurrence frequency is nothing more than the number of times each fragment occurs in the compound. Two occurrences are considered different if they have at least one different edge. |
Options |
-ds {AF,TF,PF}
Specifies the type of fragments to be generated. The possible values are:
-lmin [1...]
AF Acyclic Fragments (default) TF Tree Fragments (only acyclic fragments consisting of trees) PF Path Fragments (only acyclic fragments consisting of paths)
Specifies the minimum number of bonds (i.e., length) of the generated fragments.
The default value is one.
-lmax [1...]
Specifies the maximum number of bonds of the generated fragments. Note that lmax must be greater than or equal
to lmin. The default value is seven.
-fmin [1...]
Specifies the minimum frequency that a fragment must have before it becomes a
descriptor. The frequency of a fragment is based on the number of distinct
compounds that it occurs at. The default value is one (i.e., all fragments
are treated as descriptors).
-NoAtmLabels
This option forces AFGen to ignore the fine atom typing specified in
the input file (if any). If this option is used, then only the basic atom types are used
(e.g., P, N, O, etc.). This option applies only to inputs files that use
the Mol2 format, as the SDF format does not support fine atom typing. By
default AFGen uses the supplied atom typing.
-NoBndLabels
This option forces AFGen to ignore the bond typing specified in the input
file (if any). If this option is used, then all bonds are treated as
belonging to the same type. By default AFGen uses the supplied bond typing.
-ofile <outfstem>
Specifies the stem of the output file. The output files will be of the
forms outfstem.out, outfstem_frags.sdf, and outfstem_frag.mol2 for the
descriptor space and fragment files, respectively. If output stem is not
specified then the output stem is the same as input stem.
|
Examples |
> AFGen -ds AF -lmin 2 -lmax 6 -fmin 1 -ofile output sample.sdfGenerates all Acyclic Fragments containing between 2 and 6 bonds that occur in at least one compound. The generated fragments will be stored at the file output_frag.sdf and the fragment-based representation of each compound will be stored in output.out. |
Once installed, the OpenBabel package can be used to perform such normalizations as follows:
In citing AFGen in your papers, please use the following reference:
The software may not be sold or redistributed without prior approval. One may make copies of the software for their use provided that the copies, are not sold or distributed, are used under the same terms and conditions.
As unestablished research software, this code is provided on an ``as is'' basis without warranty of any kind, either expressed or implied. The downloading, or executing any part of this software constitutes an implicit agreement to these terms. These terms and conditions are subject to change at any time without prior notice.