Vortex script using BioTransformer

Prediction of the metabolism of small molecules is very challenging and so having a variety of different tools is always useful. I’ve previously written Vortex scripts to use SMARTCyp and FAME2 and BioTransformer [DOI] is another useful tool to have in the toolbox.

BioTransformer 3.0 is a software tool that predicts small molecule metabolism in mammals, their gut microbiota, as well as the soil/aquatic microbiota. Moreover, it can also assist scientists in the identification of metabolites, which is based on the metabolism prediction. BioTransformer uses both a knowledge-based approach and a machine learning based approach to predict small molecules metabolism. In particular, BioTransformer consists of five independent modules: EC-based, CYP450, Phase II, Human Gut Microbial and Environmental Microbial.

Whilst it is possible to access BioTransformer via the web server (https://biotransformer.ca) I would not submit confidential structures however. Instead it is useful to be able to run the models locally, whilst you could do this from the command line this Vortex script does make things a little easier.

The source code can be downloaded from here and the download should look like this.

You can view details of of how to run it

 /usr/bin/java -jar '/Users/chrisswain/Projects/BioTransformer/BioTransformer3.0_20230525.jar' -h

1	/usr/bin/java -jar '/Users/chrisswain/Projects/BioTransformer/BioTransformer3.0_20230525.jar' -h

usage:
java -jar biotransformer-4.0.jar [-a] [-b <BioTransformer Option>] [-cm
       <CYP450 Prediction Mode>] [-f <Formulas>] [-h] [-icsv <csv Input>]
       [-imol <MOL Input>] [-isdf <Sdf Input>] [-ismi <SMILES Input>] [-k
       <BioTransformer Task>] [-m <Masses>] [-mt <mass threshold for input
       substrates>] [-multiThread <MultiThread input>] [-ocsv <Csv
       Output>] [-osdf <Sdf Output>] [-pm <PhaseII Prediction Mode>] [-q
       <BioTransformer Sequence Option>] [-s <Number of steps>] [-t <Mass
       Tolerance>] [-useDB <Use retrieving from database feature>]
       [-useSub <Use the first 14 characters of the InChIkey when
       retrieving from database feature>]

This is the version 4.0 of BioTransformer. BioTransformer is a software
tool that predicts small molecule metabolism in mammals, their gut
microbiota, as well as the soil/aquatic microbiota. BioTransformer also
assists scientists in metabolite identification, based on the metabolism
prediction.

 -a,--annotate
 Search PuChem for each product, and store with CID and synonyms, when
 available.
 -b,--btType <BioTransformer Option>
 The type of description: Type of biotransformer - EC-based  (ecbased),
 CYP450 (cyp450), Phase II (phaseII), Human gut microbial (hgut), human
 super transformer* (superbio, or allHuman), Environmental microbial
 (envimicro)**.
 If option -m is enabled, the only valid biotransformer types are
 allHuman, superbio and env.
 -cm,--cyp450mode <CYP450 Prediction Mode>
 Specify the CYP450 predictoin Mode here: 1) CypReact + BioTransformer
 rules; 2) CyProduct only; 3) Combined: CypReact + BioTransformer rules +
 CyProducts.
 Default mode is 1.
 -f,--formulas <Formulas>
 Semicolon-separated list of formulas of compounds to identify
 -h,--help
 Prints the usage.
 -icsv,--csvinput <csv Input>
 The input, which can be an csv file.
 -imol,--molinput <MOL Input>
 The input, which can be a Mol file
 -isdf,--sdfinput <Sdf Input>
 The input, which can be an SDF file.
 -ismi,--ismiles <SMILES Input>
 The input, which can be a SMILES string
 -k,--task <BioTransformer Task>
 The task to be permed: pred for prediction, or cid for compound
 identification
 -m,--masses <Masses>
 Semicolon-separated list of masses of compounds to identify
 -mt,--massThreshold <mass threshold for input substrates>
 Specify the mass threshold used to validate input substrates
 Default massThreshold is 1500Da.
 -multiThread <MultiThread input>
 Please input the parameters using the specific format to run this
 multi-thread feature
 -ocsv,--csvoutput <Csv Output>
 Select this option to return CSV output(s). You must enter an output
 filename
 -osdf,--sdfoutput <Sdf Output>
 Select this option to return SDF output(s). You must enter an output
 filename
 -pm,--phaseIImode <PhaseII Prediction Mode>
 Specify the PhaseII predictoin Mode here: 1) BioTransformer rules; 2)
 PhaseII predictor only; 3) Combined: PhaseII predictor + BioTransformer
 rules.
 Default mode is 1.
 -q,--bsequence <BioTransformer Sequence Option>
 Define an ordered sequence of biotransformer/nr_of_steps to apply. Choose
 only from the following BioTranformer Types: allHuman, cyp450, ecbased,
 env, hgut, and phaseII. For instance, the following string representation
 describes a sequence of 2 steps of CYP450 metabolism, followed by 1 step
 of Human Gut metabolism, 1 step of Phase II, and 1 step of Environmental
 Microbial Degradation:
 'cyp450:2; hgut:1; phaseII:1; env:1'
 -s,--nsteps <Number of steps>
 The number of steps for the prediction. This option can be set by the
 user for the EC-based, CYP450, Phase II, and Environmental microbial
 biotransformers. The default value is 1.
 -t,--mTolerance <Mass Tolerance>
 Mass tolerance for metabolite identification (default is 0.01).
 -useDB <Use retrieving from database feature>
 Please specify if you want to enable the retrieving from database
 feature.
 -useSub <Use the first 14 characters of the InChIkey when retrieving from
 database feature>   Please specify if you want to enable the using first
 14 characters of InChIKey when retrieving from database feature.

(* ) While the 'superbio' option runs a set number of transformation steps
in a pre-defined order (e.g. deconjugation first, then
Oxidation/reduction, etc.), the 'allHuman' option predicts all possible
metabolites from any applicable reaction(Oxidation, reduction,
(de-)conjugation) at each step.(** ) For the environmental microbial
biodegradation, all reactions (aerobic and anaerobic) are reported, and
not only the aerobic biotransformations (as per default in the EAWAG
BBD/PPS system).

usage:

java -jar biotransformer-4.0.jar [-a] [-b <BioTransformer Option>] [-cm

<CYP450 Prediction Mode>] [-f <Formulas>] [-h] [-icsv <csv Input>]

[-imol <MOL Input>] [-isdf <Sdf Input>] [-ismi <SMILES Input>] [-k

<BioTransformer Task>] [-m <Masses>] [-mt <mass threshold for input

substrates>] [-multiThread <MultiThread input>] [-ocsv <Csv

Output>] [-osdf <Sdf Output>] [-pm <PhaseII Prediction Mode>] [-q

<BioTransformer Sequence Option>] [-s <Number of steps>] [-t <Mass

Tolerance>] [-useDB <Use retrieving from database feature>]

[-useSub <Use the first 14 characters of the InChIkey when

retrieving from database feature>]

This is the version 4.0 of BioTransformer. BioTransformer is a software

tool that predicts small molecule metabolism in mammals, their gut

microbiota, as well as the soil/aquatic microbiota. BioTransformer also

assists scientists in metabolite identification, based on the metabolism

prediction.

-a,--annotate

Search PuChem for each product, and store with CID and synonyms, when

available.

-b,--btType <BioTransformer Option>

The type of description: Type of biotransformer - EC-based (ecbased),

CYP450 (cyp450), Phase II (phaseII), Human gut microbial (hgut), human

super transformer* (superbio, or allHuman), Environmental microbial

(envimicro)**.

If option -m is enabled, the only valid biotransformer types are

allHuman, superbio and env.

-cm,--cyp450mode <CYP450 Prediction Mode>

Specify the CYP450 predictoin Mode here: 1) CypReact + BioTransformer

rules; 2) CyProduct only; 3) Combined: CypReact + BioTransformer rules +

CyProducts.

Default mode is 1.

-f,--formulas <Formulas>

Semicolon-separated list of formulas of compounds to identify

-h,--help

Prints the usage.

-icsv,--csvinput <csv Input>

The input, which can be an csv file.

-imol,--molinput <MOL Input>

The input, which can be a Mol file

-isdf,--sdfinput <Sdf Input>

The input, which can be an SDF file.

-ismi,--ismiles <SMILES Input>

The input, which can be a SMILES string

-k,--task <BioTransformer Task>

The task to be permed: pred for prediction, or cid for compound

identification

-m,--masses <Masses>

Semicolon-separated list of masses of compounds to identify

-mt,--massThreshold <mass threshold for input substrates>

Specify the mass threshold used to validate input substrates

Default massThreshold is 1500Da.

-multiThread <MultiThread input>

Please input the parameters using the specific format to run this

multi-thread feature

-ocsv,--csvoutput <Csv Output>

Select this option to return CSV output(s). You must enter an output

filename

-osdf,--sdfoutput <Sdf Output>

Select this option to return SDF output(s). You must enter an output

filename

-pm,--phaseIImode <PhaseII Prediction Mode>

Specify the PhaseII predictoin Mode here: 1) BioTransformer rules; 2)

PhaseII predictor only; 3) Combined: PhaseII predictor + BioTransformer

rules.

Default mode is 1.

-q,--bsequence <BioTransformer Sequence Option>

Define an ordered sequence of biotransformer/nr_of_steps to apply. Choose

only from the following BioTranformer Types: allHuman, cyp450, ecbased,

env, hgut, and phaseII. For instance, the following string representation

describes a sequence of 2 steps of CYP450 metabolism, followed by 1 step

of Human Gut metabolism, 1 step of Phase II, and 1 step of Environmental

Microbial Degradation:

'cyp450:2; hgut:1; phaseII:1; env:1'

-s,--nsteps <Number of steps>

The number of steps for the prediction. This option can be set by the

user for the EC-based, CYP450, Phase II, and Environmental microbial

biotransformers. The default value is 1.

-t,--mTolerance <Mass Tolerance>

Mass tolerance for metabolite identification (default is 0.01).

-useDB <Use retrieving from database feature>

Please specify if you want to enable the retrieving from database

feature.

-useSub <Use the first 14 characters of the InChIkey when retrieving from

database feature> Please specify if you want to enable the using first

14 characters of InChIKey when retrieving from database feature.

(* ) While the 'superbio' option runs a set number of transformation steps

in a pre-defined order (e.g. deconjugation first, then

Oxidation/reduction, etc.), the 'allHuman' option predicts all possible

metabolites from any applicable reaction(Oxidation, reduction,

(de-)conjugation) at each step.(** ) For the environmental microbial

biodegradation, all reactions (aerobic and anaerobic) are reported, and

not only the aerobic biotransformations (as per default in the EAWAG

BBD/PPS system).

The format for accessing the prediction tool is

java -jar /Users/chrisswain/Projects/BioTransformer/BioTransformer3.0_20230525.jar -k pred -b allHuman -ismi "O[C@@H]1CC2=C(O)C=C(O)C=C2O[C@@H]1C1=CC=C(O)C(O)=C1" -osdf outputfile  -s 2 -m "292.0946;304.0946" -t 0.01 -a

1	java -jar /Users/chrisswain/Projects/BioTransformer/BioTransformer3.0_20230525.jar -k pred -b allHuman -ismi "O[C@@H]1CC2=C(O)C=C(O)C=C2O[C@@H]1C1=CC=C(O)C(O)=C1" -osdf outputfile -s 2 -m "292.0946;304.0946" -t 0.01 -a

Vortex Script

# BioTransformer
# BioTransformer 3.0  A Web Server for Accurately Predicting Metabolic Transformation  Products [Submitted in Nucleic Acids Research, Webserver Issue.Apr.2022]
# Wishart DS, Tian S, Allen D, Oler E, Peters H, Lui VW, Gautam V, Djoumbou Feunang Y, Greiner R, Metz TO;
# doi: https://doi.org/10.1093/nar/gkac313


import sys
import com.dotmatics.vortex.util.Util as Util
import subprocess
import csv


import tempfile

smiles = ''

col = vtable.findColumnWithName('Structure', 0)

if (col == None):
   col = vtable.findColumnWithName('SMILES', 0)
   if (col == None):
      vortex.alert('Load a workspace with a Structure or SMILES column please.')
      quit()
   else:
      inputsmiles = col.getValueAsString(cell_row)
else:
   inputsmiles = vortex.getMolProperty(vtable.getMolFileManager().getMolFileAtRow(cell_row), 'SMILES')
inputsmiles = inputsmiles.encode('utf-8')
#for testing
#vortex.alert('SMILES = '+inputsmiles)


	
# Get the list of column names for the table
columnNames = vtable.getColumnNames()

# Create a set of tabs for the different parts of the user interface
tabs = javax.swing.JTabbedPane()

# Create the main part of the user interface
content = javax.swing.JPanel()

layout.fill(content, javax.swing.JLabel("ID Column"), 0, 0)

columnID = javax.swing.JComboBox(columnNames)
	
layout.fill(content, columnID, 1, 0)

#Arguments input
Transformation = ["Phase 1 (CYP450)", "EC-based", "Phase II", "Human Gut", "Env Microbial", "AllHuman", "SuperBio", "MultiBo", "AbioticBio"]
CYP450mode = ["Combined", "RuleBased", "CyProduct"]
Iterations = ["1", "2", "3"]

#transformation input	
transformations = javax.swing.JComboBox(Transformation)

transformations.setSelectedIndex(5)
	
layout.fill(content, transformations, 1, 5)

layout.fill(content, javax.swing.JLabel("Transformation"), 0, 5)

#CYP450 mode
CYP450 = javax.swing.JComboBox(CYP450mode)

CYP450.setSelectedIndex(0)
	
layout.fill(content, CYP450, 1, 10)

layout.fill(content, javax.swing.JLabel("CYP450 mode"), 0, 10)

#Iterations
num_iterations = javax.swing.JComboBox(Iterations)

num_iterations.setSelectedIndex(0)
	
layout.fill(content, num_iterations, 1, 20)

layout.fill(content, javax.swing.JLabel("Number Iterations"), 0, 20)



# Add the column content to the tab panel
tabs.addTab("Arguments", content)


# show the dialog and process the input if the user presses ok
ret = vortex.showInDialog(tabs, "Calculate BioTransformations")

if ret == vortex.OK:
	input_idx = columnID.getSelectedIndex()
	colID = vtable.getColumn(input_idx)
	taskID = colID.getValueAsString(cell_row)
	#vortex.alert(taskID)
	biotrans_index = transformations.getSelectedIndex()
	biotrans = Transformation[biotrans_index]
	#vortex.alert(biotrans)
	numSteps_index = num_iterations.getSelectedIndex()
	numSteps = Iterations[numSteps_index]
	#vortex.alert(numSteps)
	

tmpdir = tempfile.mkdtemp()
outputfile = tmpdir + "/" + taskID + '.sdf'
#for testing
#outputfile = '/Users/chrisswain/Public/'+ taskID + '.sdf'


#Demo command
#java -jar /Users/chrisswain/Projects/BioTransformer/BioTransformer3.0_20230525.jar -k pred -b allHuman -ismi "O[C@@H]1CC2=C(O)C=C(O)C=C2O[C@@H]1C1=CC=C(O)C(O)=C1" -osdf outputfile  -s 2 -m "292.0946;304.0946" -t 0.01 -a

# You can edit these arguments
task = "pred"
#biotrans = "allHuman"
#numSteps = "2"


#Need path to jar file
subprocess.call(['/usr/bin/java', '-jar', '/Users/chrisswain/Projects/BioTransformer/BioTransformer3.0_20230525.jar', '-k', task, '-b', biotrans, '-ismi', inputsmiles, '-osdf', outputfile,  '-s', numSteps], cwd='Users/chrisswain/Projects/BioTransformer')

import com.dotmatics.vortex.table.VortexTableModel as vtm	
#vortex.alert(outputfile)
vortex.loadAnyFile(outputfile)

vtable.fireTableStructureChanged()

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

# BioTransformer

# BioTransformer 3.0 A Web Server for Accurately Predicting Metabolic Transformation Products [Submitted in Nucleic Acids Research, Webserver Issue.Apr.2022]

# Wishart DS, Tian S, Allen D, Oler E, Peters H, Lui VW, Gautam V, Djoumbou Feunang Y, Greiner R, Metz TO;

# doi: https://doi.org/10.1093/nar/gkac313

import sys

import com.dotmatics.vortex.util.Util as Util

import subprocess

import csv

import tempfile

smiles = ''

col = vtable.findColumnWithName('Structure', 0)

if (col == None):

col = vtable.findColumnWithName('SMILES', 0)

if (col == None):

vortex.alert('Load a workspace with a Structure or SMILES column please.')

quit()

else:

inputsmiles = col.getValueAsString(cell_row)

else:

inputsmiles = vortex.getMolProperty(vtable.getMolFileManager().getMolFileAtRow(cell_row), 'SMILES')

inputsmiles = inputsmiles.encode('utf-8')

#for testing

#vortex.alert('SMILES = '+inputsmiles)

# Get the list of column names for the table

columnNames = vtable.getColumnNames()

# Create a set of tabs for the different parts of the user interface

tabs = javax.swing.JTabbedPane()

# Create the main part of the user interface

content = javax.swing.JPanel()

layout.fill(content, javax.swing.JLabel("ID Column"), 0, 0)

columnID = javax.swing.JComboBox(columnNames)

layout.fill(content, columnID, 1, 0)

#Arguments input

Transformation = ["Phase 1 (CYP450)", "EC-based", "Phase II", "Human Gut", "Env Microbial", "AllHuman", "SuperBio", "MultiBo", "AbioticBio"]

CYP450mode = ["Combined", "RuleBased", "CyProduct"]

Iterations = ["1", "2", "3"]

#transformation input

transformations = javax.swing.JComboBox(Transformation)

transformations.setSelectedIndex(5)

layout.fill(content, transformations, 1, 5)

layout.fill(content, javax.swing.JLabel("Transformation"), 0, 5)

#CYP450 mode

CYP450 = javax.swing.JComboBox(CYP450mode)

CYP450.setSelectedIndex(0)

layout.fill(content, CYP450, 1, 10)

layout.fill(content, javax.swing.JLabel("CYP450 mode"), 0, 10)

#Iterations

num_iterations = javax.swing.JComboBox(Iterations)

num_iterations.setSelectedIndex(0)

layout.fill(content, num_iterations, 1, 20)

layout.fill(content, javax.swing.JLabel("Number Iterations"), 0, 20)

# Add the column content to the tab panel

tabs.addTab("Arguments", content)

# show the dialog and process the input if the user presses ok

ret = vortex.showInDialog(tabs, "Calculate BioTransformations")

if ret == vortex.OK:

input_idx = columnID.getSelectedIndex()

colID = vtable.getColumn(input_idx)

taskID = colID.getValueAsString(cell_row)

#vortex.alert(taskID)

biotrans_index = transformations.getSelectedIndex()

biotrans = Transformation[biotrans_index]

#vortex.alert(biotrans)

numSteps_index = num_iterations.getSelectedIndex()

numSteps = Iterations[numSteps_index]

#vortex.alert(numSteps)

tmpdir = tempfile.mkdtemp()

outputfile = tmpdir + "/" + taskID + '.sdf'

#for testing

#outputfile = '/Users/chrisswain/Public/'+ taskID + '.sdf'

#Demo command

#java -jar /Users/chrisswain/Projects/BioTransformer/BioTransformer3.0_20230525.jar -k pred -b allHuman -ismi "O[C@@H]1CC2=C(O)C=C(O)C=C2O[C@@H]1C1=CC=C(O)C(O)=C1" -osdf outputfile -s 2 -m "292.0946;304.0946" -t 0.01 -a

# You can edit these arguments

task = "pred"

#biotrans = "allHuman"

#numSteps = "2"

#Need path to jar file

subprocess.call(['/usr/bin/java', '-jar', '/Users/chrisswain/Projects/BioTransformer/BioTransformer3.0_20230525.jar', '-k', task, '-b', biotrans, '-ismi', inputsmiles, '-osdf', outputfile, '-s', numSteps], cwd='Users/chrisswain/Projects/BioTransformer')

import com.dotmatics.vortex.table.VortexTableModel as vtm

#vortex.alert(outputfile)

vortex.loadAnyFile(outputfile)

vtable.fireTableStructureChanged()

The Vortex script needs to placed in the /Users/username/vortex/scripts/Vortex_Add-ons/context folder as shown below

If you now right-click on the structure in a workspace you should see something like this menu structure.

If you now click on BioTransformer3 a dialog box will open

For the dropdown menu “ID Column” choose the field that contains the molecule identifier.

From the “Transformation” dropdown the default is “AllHuman”, but there are options to focus on “Phase I”, “Phase II” or microbial etc.

The “CYP450” allows the user to choose the CYP rule-based prediction mode.

“Number of iterations” is the number of steps to include, Phase 1 only, or to also include phase 2 and subsequent reactions.

There are a number of other options but I decided to try and keep it as simple as possible but still useful. If you click “OK” it will take a few mins to run depending on the complexity of the output.

Using this example.

The output look something like this (I’ve hidden some of the empty columns), it contains molecular identifiers (InChi, InChiKey), calculated properties (MWt. Mol Formula) and notes about the metabolic reaction and enzymes that might be involved.

The script can be downloaded from here

BioTransformer3.vpy Download

With the InChIKey for potential metabolites defined it is also possible look up further information on the molecules, for example you can search using Un1Chem (this is an external search). https://macinchem.org/2023/03/11/vortex-script-for-flexible-search-using-un1chem/.

This will add information from a number of additional datasorces.

Bluesky Discussion

View on Bluesky

No replies yet. Be the first to comment on Bluesky!

Vortex script using BioTransformer

Vortex Script

Bluesky Discussion

Related Posts

BBEdit updated

AI Club for Biomedicine