A review of FAst MEtabolizer 2 (FAME2)

Whilst much computational work is undertaken to support, library design, virtual screening, hit selection and affinity optimisation the reality is that the most challenging issues to resolve in drug discovery often revolve around absorption, distribution, metabolism and excretion (ADME). Whilst we can measure the levels of parent drug in various medium tracking metabolic fate can often be a considerably more difficult proposition requiring significant resources. For this reason prediction of sites of metabolism has become the subject of current interest.

The challenge of the problem should not be under-estimated, there are multiple different potential enzymic reaction types, those that act directly on the drug (phase I) and those the further functionalize metabolites (phase II), any drug can be a substrate for multiple enzymes. Predictions involving QM calculations or docking to the flexible binding sites of cytochrome P450 enzymes require prohibitive computing resources.

FAME DOI is a collection of random forest models trained on a comprehensive and highly diverse data set of 20,000 small molecules annotated with their experimentally determined sites of metabolism taken from multiple species (rat, dog and human). In addition dedicated models are available to predict sites of metabolism of phase I and II processes. Remarkably this is achieved using only 7 easily calculated descriptors (Table 1), six interpretable atomic descriptors (encoding the element type, hybridization state, and electronic configuration of each atom) and one molecular descriptor (encoding the topological size of a molecule). 

FAME 2 DOI builds on this work to improve accuracy, in addition FAME 2 uses a slightly modified version of the visualisation developed by Patrik Rydberg and implemented in SMARTCyp using ChemDoodle Web Components.

It is really useful to have two sites of metabolism tools available that use contrasting methodologies, FAME 2 using curated dataset of experimentally determined metabolism data to build a machine learning model using simple descriptors. In contrast SMARTCyp uses precomputed activation energies from density functional theory (DFT) calculations of model compounds. These are used to predict the reactivity of similar fragments within the target molecule the final score is modified to reflect the accessibility to the active site of the different CYP450 iso forms and improvements for N-oxidations of tertiary amines are included, specifically an empirical corrections to unlikely oxidations of tertiary alkylamines

In FAME 2 rather than using the simple random forest machine learning algorithm used in the original method, an extremely randomised trees approach is used DOI which is a computationally efficient classification algorithm. FAME used a set of 2D descriptors 7 easily calculated descriptors, six interpretable atomic descriptors (encoding the element type, hybridization state, and electronic configuration of each atom) and one molecular descriptor (encoding the topological size of a molecule). In contrast FAME 2 uses circular descriptions of atoms and their environments. As can be seen in the help message below it is possible to change the diameter of the atom encoding fingerprint from 1 to 6. The default ‘circCDKATF1′ is a model based on the atom itself and its immediate neighbors (atoms at most one bond away).

The predictions are generated as a simple HTML page (shown below) which displays the structure of the compound with the predicted SoMs highlighted with yellow circles, moving the cursor over the structure reals the atom numbers that correspond to the numbers in the table.

FAME II Output

Produced: 2017-08-15_20-53-43.

Input file: [/Users/username/Desktop/fame2/example_compounds/tamoxifen.sdf].

Visualization:

I also used SMARTCyp to predict the sites of metabolism for Tamoxifen, the results are very similar and predict the known routes of metabolism. In particular they flag the CYP2D6 mediated 4-hydroxylation to give the active metabolite 4-hydroxytamoxifen and the the demethylation sites.

It is also possible to use SMILES as input

I looked at the influence of the different models used to generate predictions.

Whilst the default circCDKATF1 is the fastest I found instances where circCDKATF6 gave more accurate results as shown below.

ummary

When I first reviewed FAME2 there were a couple of minor bugs, when I reported them to the developers the bugs were fixed and a new version of FAME2 was made available within a day, really impressive support! Unfortunately (unlike FAME1) FAME2 only predicts CYP450 mediated metabolism, apparently the non-CYP mediated metabolism data was not available to the author.

Related Posts