A recent publication described BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification DOI. There are a number of tools that predict sites of metabolism on a molecule and I’ve mentioned a couple FAME and SMARTCyp in the past. These packages flag potential metabolic hot spots (mainly for CYP mediated metabolism) but don’t attempt to provide any information on the putative metabolites.

BioTransformer combines a machine learning approach with a knowledge-based approach to predict small molecule metabolism in human tissues (e.g. liver tissue), the human gut as well as the environment (soil and water microbiota), via its metabolism prediction tool. In addition BioTransformer provides information on secondary metabolism.

A comprehensive evaluation of BioTransformer showed that it was able to outperform two state-of-the-art commercially available tools (Meteor Nexus and ADMET Predictor), with precision and recall values up to 7 times better than those obtained for Meteor Nexus or ADMET Predictor on the same sets of pharmaceuticals, pesticides, phytochemicals or endobiotics under similar or identical constraints. 

BioTransformer is available as an open access command-line tool (GNU GENERAL PUBLIC LICENSE Version 2), or a software library. It is freely available at https://bitbucket.org/djoumbou/biotransformerjar/. If you are not comfortable using the command line it can be accessed via an open access RESTful application at www.biotransformer.ca where you can simply paste in a SMILES string or upload an sdf file.

BioTransformer consists of a metabolism prediction tool (BMPT), and a metabolite identification tool (BMIT). The BMPT consists of five independent prediction modules called “transformers”, namely: (1) the Enzyme Commission based (EC-based) transformer, (2) the CYP450 (phase I) transformer, (3) the phase II transformer, (4) the human gut microbial transformer, and (5) the environmental microbial transformer. For the prediction of metabolites, BioTransformer implements two approaches, a rule-based or knowledge-based approach, and a machine learning approach. BioTransformer’s knowledge-based system consists of three major components: (1) a biotransformation database (called MetXBioDB) containing detailed annotations of experimentally confirmed metabolic reactions, (2) a reaction knowledgebase containing generic biotransformation rules, preference rules, and other constraints for metabolism prediction, and (3) a reasoning engine that implements both generic and transformer-specific algorithms for metabolite prediction and selection. The BMPT machine learning system uses a set of random forest and ensemble prediction models for the prediction of CYP450 substrate selectivity, and for the Phase II filtering of molecules. BioTransformer’s Metabolite Identification Tool builds on the BMPT to identify specific metabolites using mass spectrometry (MS) data, namely accurate mass or chemical formula information.

I downloaded it and it is perhaps worth noting that you also need to download the folders database and support files, and save them in the same folder as the .jar file.

The help can be called from the command line

In addition the help also provides a few examples and tips

Running the application

I tried this out with the structure of Emend which I also used when looking at FAME and SMARTCyp.


The whole process took around 90 seconds and the results were written to an sdf file and were displayed in Vortex. As well as the putative metabolite structure a range of other properties are calculated including InChiKey which can be used to search many other databases, Major Isotopic Mass which could be used in metabolite identification. Together with a description of the reaction and the likely enzymes involved. For the prediction of CYP450 metabolism, BioTransformer makes use of CypReact, a tool for CYP450 substrate specificity prediction.

If we use Diazepam as the query molecule and add the -a option to search PubChem for each product, and annotate with CID and synonyms, when available.

The results now include synonyms and the Pubchem CID which can be searched for further information on the structure.

Running the process of Diazepam yielded a sdf file containing yielded 151 structures but on closer examination there were many duplicate structures and only 51 unique structures (I used this Vortex script to flag duplicate structures). This is because several reaction patterns can yield the same structure. Around 30 had PubChem CID suggesting these are previously identified metabolites.

Looking at the results in more detail

One thing that is readily obvious is that BioTransformer is very comprehensive, this means that it suggests potential metabolites that you may never actually see in vivo, it also does not give any information about the rate at which the metabolites might be formed or which are the most likely. This is absolutely invaluable for anyone involved in metabolite identification but might be a little daunting for a medicinal chemist simply wanting to know which site to address to block metabolism, however as I mentioned the total number of putative metabolites will include duplicate structures and the number of unique structures will be much fewer.

To further aid metabolite ID BioTransformer calculates metabolite major isotopic weight but also provides the precursor major isotopic weight and precursor InChiKey, I can see that BioTransformer could easily incorporated into a metabolite ID workflow and be used to elucidate metabolic pathways.

Last Updated 4 May 2019

Related Posts