An earlier script showed how to calculate molecular similarity between the first molecule in the file and all the other molecules using OpenBabel as shown below.
1 2 3 4 5 6 7 |
PROMPT> babel mysmiles.smi mymols.sdf -ofpt MOL_00000067 Tanimoto from first mol = 0.0888889 MOL_00000083 Tanimoto from first mol = 0.0869565 MOL_00000105 Tanimoto from first mol = 0.0888889 MOL_00000296 Tanimoto from first mol = 0.0714286 MOL_00000320 Tanimoto from first mol = 0.0888889 |
This is fine for examples where the first structure is perhaps the active lead structure and the other molecules are potential compounds for evaluation derived from a substructure, pharmacophore searches, docking etc. However on many occasions you may want to simply calculate the similarity to a probe structure that is not part of the file. To do this in Vortex we can capture the structural query from the in built sketcher. Vortex supports a variety of 3rd party chemical drawing packages as well as their own in house package Elemental, however by capturing the content of the structural query box we can simply ignore which package was used to generate the content.
This command will capture MolFile
1 2 |
vtable.findColumnWithName("Structure", 0).getChemLink().getMolFile() |
whilst this will get the structure in SMARTS format
1 2 |
vtable.findColumnWithName("Structure", 0).getChemLink().getSmarts() |
Unfortunately if you run this from the console you get are rather unexpected result.
1 2 3 4 |
smiQuery = vtable.findColumnWithName("Structure",0).getChemLink().getSmarts() smiQuery u'C{-2.6813,1.2964}=1C{-3.3957,0.8839}=C{-3.3957,0.0589}C{-2.6813,-0.3536}=C{-1.9668,0.0589}C{-1.9668,0.8839}1C{-1.2523,1.2964}(C{-0.5378,0.8839})=O{-1.2523,2.1214}' |
It seems the coordinates are put inline with the SMARTS, but you can fix it with the following one liner
1 2 |
re.sub("{[^}]+}", "", smiQuery) |
(remember to import re at the top)
OpenBabel provides two programs obabel and babel which are cross-platform programs designed to interconvert between many file formats used in molecular modelling and computational chemistry and related areas. They can also be used for filtering molecules and for simple manipulation of chemical data. Whilst there are many similarities between the programs essentially obabel is a modern version of babel with additional capabilities and a more standard interface. Over time, obabel will replace babel.
Specifically, the differences are as follows:
- obabel requires that the output file be specified with a -O option. This is closer to the normal Unix convention for commandline programs, and prevents users accidentally overwriting the input file.
- obabel is more flexible when the user needs to specify parameter values on options. For instance, the –unique option can be used with or without a parameter (specifying the criteria used). With babel, this only works when the option is the last on the line; with obabel, no such restriction applies. Because of the original design of babel, it is not possible to add this capability in a backwards-compatible way.
- obabel has a shortcut for entering SMILES strings. Precede the SMILES by -: and use in place of an input file. The SMILES string should be enclosed in quotation marks.
More than one can be used, and a molecule title can be included if enclosed in quotes:
1 2 3 4 |
obabel -:"O=C(O)c1ccccc1OC(=O)C" -ocan obabel -:"O=C(O)c1ccccc1OC(=O)C aspirin" -:"Oc1ccccc1C(=O)O salicylic acid" -ofpt |
- obabel cannot use concatenated single-character options.
So the babel command we need is
1 2 |
obabel -:"CC(=O)C1=CC=CC=C1Br" /Users/username/Desktop/ChemicalStructures/acetophenones.sdf -ofpt |
We can add the -: and the query SMARTS becomes
1 2 |
newSmi= '-:' + smi_Query |
The rest of the script is very similar to the previous scripts, we run the process and capture the output. Create two new columns, you can pass a third argument to the findColumnWithName which will set the type for the column if it gets created, the value 3 will set it to string, 1 is double.
1 2 3 4 |
column = vtable.findColumnWithName('Q_Sim_FP2', 1, 1) column = vtable.findColumnWithName('Superstructure', 1, 3) vtable.fireTableStructureChanged() |
We then parse the output and populate the columns. One minor issue is the fireTableStructureChanged() call triggers a reset of all the filter cells including the structure query which means you will need to redraw the query if you want to modify it.
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
import sys # Uncomment the following 2 lines if running in console #vortex = console.vortex #vtable = console.vtable sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib') import re import subprocess # Get the path to the currently open sdf file sdfFile = vortex.getFileForPropertyCalculation(vtable) # Get the SMARTS of the drawn structure smiQuery = vtable.findColumnWithName("Structure", 0).getChemLink().getSmarts() # Remove coordinates from the SMARTS string smi_Query = re.sub("{[^}]+}", "", smiQuery) newSmi= '-:' + smi_Query # obabel -:"CC(=O)C1=CC=CC=C1Br" /Users/username/Desktop/ChemicalStructures/acetophenones.sdf -ofpt p = subprocess.Popen(['/usr/local/bin/obabel', newSmi, sdfFile, '-ofpt', '-xfFP2'], stdout=subprocess.PIPE) output = p.communicate()[0] column = vtable.findColumnWithName('Q_Sim_FP2', 1, 1) column = vtable.findColumnWithName('Superstructure', 1, 3) vtable.fireTableStructureChanged() # Parse output lines = output.split('\n') currentRow = 0 for i in range(1, len(lines)-1): if lines[i][0] == '>': column = vtable.findColumnWithName('Q_Sim_FP2', 0) column.setValueFromString(currentRow, lines[i].split()[-1]) currentRow += 1 # Clear Superstructure column of any previous data column = vtable.findColumnWithName('Superstructure', 0) column.setValueFromString(currentRow-1, '') elif lines[i][0:23] == 'Possible superstructure': column = vtable.findColumnWithName('Superstructure', 0) column.setValueFromString(currentRow-1, 'YES') |
In this example I’ve used the FP2 fingerprint but OpenBabel supports several different fingerprints
1 2 3 4 5 6 |
PROMPT> babel -L fingerprints FP2 Indexes linear fragments up to 7 atoms. FP3 SMARTS patterns specified in the file patterns.txt FP4 SMARTS patterns specified in the file SMARTS_InteLigand.txt MACCS SMARTS patterns specified in the file MACCS.txt |
In a similar way to the first script it is straight-forward to choose an alternative fingerprint. The results should look like the image shown below.
If you just want to compare the similarity with a structure in the worksheet, simply right-click on the structure and select Copy to ChemLink from the drop down menu, the structure will then be copied to the chemical query box.
The script can be downloaded here SimqueryFP2.vpy.zip
Last Updated 7 Feb 2011