I’m finding that I using Vortex more and more in my day job, it is an excellent application for displaying and exploring large or complex datasets. In fact the only issue is getting data into Vortex. It is possible to save the dataset in sdf format and then use other applications to generate addition fields and then use the rather nice merge function within Vortex, but I end up with multiple copies of the datasets as I use different applications to calculate descriptors or properties, and it would be nicer to be able to choose an option and have all the columns of data added automatically. Since all the applications I want to use have a command line interface I thought this might be an ideal opportunity to try scripting Vortex to send the structures to an external application and import the results.
Vortex contains a powerful scripting facility built on Jython a java implementation of the Python programming language and allows access to the key components of Vortex, Python and Java. Whilst it is possible to build a Swing JPanel to provide a GUI the scripts I have in mind will not need a user interface. Scripts in Vortex can be accessed via the scripts menu. This menu is dynamically built from the content of a users local files folder (On Mac you will find the vortex folder in a users home area (addressable via ~/vortex)). I created a sub folder inside the Script folder and called it “My Scripts”. Vortex scripts can also be executed by running a .vpy file from the system explorer.
These first four scripts use some of the tools provided by OpenBabel a free opensource Chemistry Toolbox, one of these tools is the obprop program a tool to print a set of standard molecular properties for all molecules in a file, the output includes:-
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
name [Name] formula [Formula] mol_weight [Molecular Weight] exact_mass [Isotopic Mass] canonical_SMILES [String] num_atoms [Number] num_bonds [Number] num_residues [Number] sequence [Residue Sequence] num_rings [Number of Rings (by SSSR)] logP [Number (octanol-water partition)] PSA [Number (topological polar surface area)] MR [Number (molar refractivity)] |
The obprop tool can be accessed from the Terminal, and the output for the first molecule in the file is shown below where $$$$ is the delimiter between molecule records.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
MacbookPro:~ PROMPT$ /usr/local/bin/obprop '/Users/username/Desktop/temp.sdf' name N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O formula C21H20N6O mol_weight 372.423 exact_mass 372.17 canonical_SMILES N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(C1=O)c1ccncc1 N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O InChI InChI=1S/C21H20N6O/c22-11-16-1-3-17(4-2-16)14-26-15-24-12-19(26)13-25-20-7-10-27(21(20)28)18-5-8-23-9-6-18/h1-6,8-9,12,15,20,25H,7,10,13-14H2/t20-/m1/s1 num_atoms 48 num_bonds 51 num_residues 0 sequence - num_rings 4 logP 2.54908 PSA 86.84 MR 107.424 $$$$ |
The Vortex script
The script starts by getting the path of the sdf that was imported into Vortex, we then construct the obprop command and pipe the output into a variable “output”. The next part creates the columns and uses the names from the obprop output to name them. The last part is used to parse the output “$$$$” is the divider between molecule records, each line is then a name and value pair.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
import sys # Uncomment the following 2 lines if running in console #vortex = console.vortex #vtable = console.vtable sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib') import subprocess # Get the path to the currently open sdf file sdfFile = vortex.getFileForPropertyCalculation(vtable) # Run obprop on the file p = subprocess.Popen(['/usr/local/bin/obprop', sdfFile], stdout=subprocess.PIPE) output = p.communicate()[0] # Create new columns in table if needed lines = output.split('\n') keys = [] for i in lines: words = i.split(' ', 1) if len(words) == 2: keys.append(words[0]) columns = list(set(keys)) for c in columns: column = vtable.findColumnWithName(c, 1) vtable.fireTableStructureChanged() # Parse the output rows = output.split('$$$$') for r in range(0, vtable.getRealRowCount()): keyvals = rows[r].split('\n') if len(keyvals) > 1: for i in keyvals: words = i.split(' ', 1) if len(words) == 2: key = words[0] value = words[1].lstrip() column = vtable.findColumnWithName(key, 0) column.setValueFromString(r, value) |
One advantage of this approach is that if further properties are added to obprop the script will automatically add further columns.
The result looks like this, I’ve hidden the “sequence” and “num residues” columns).
Similarity Calculation Scripts
The next three scripts calculate molecular similarity. One of the tasks I regularly undertake is to take an active lead structure and run a series of searches in order to identify potential compounds for evaluation (substructure, pharmacophore searches, docking etc.) and it is useful to be able to compare the results with similarity measures.
OpenBabel supports four different fingerprints
1 2 3 4 5 6 |
PROMPT> babel -L fingerprints FP2 Indexes linear fragments up to 7 atoms. FP3 SMARTS patterns specified in the file patterns.txt FP4 SMARTS patterns specified in the file SMARTS_InteLigand.txt MACCS SMARTS patterns specified in the file MACCS.txt |
These fingerprints can be used for similarity searches, for example the following command gives you the Tanimoto coefficient between a SMILES string in mysmiles.smi and all the molecules in mymols.sdf:
1 2 3 4 5 6 7 |
PROMPT> babel mysmiles.smi mymols.sdf -ofpt MOL_00000067 Tanimoto from first mol = 0.0888889 MOL_00000083 Tanimoto from first mol = 0.0869565 MOL_00000105 Tanimoto from first mol = 0.0888889 MOL_00000296 Tanimoto from first mol = 0.0714286 MOL_00000320 Tanimoto from first mol = 0.0888889 |
If you don’t specify a query file babel will just use the first molecule in the sdf file as the query as shown below
1 2 |
PROMPT> babel /Users/username/Desktop/temp.sdf -ofpt |
The default fingerprint used is the FP2 fingerprint. You change the fingerprint using the “f” output option, the example below shows the command and the output.
1 2 3 4 5 6 7 8 9 10 11 |
MacbookPro:~ PROMPT$ babel /Users/username/Desktop/temp.sdf -ofpt -xfMACCS N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(C1=O)c1cccc2ncccc12 Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.979592 Possible superstructure of N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1C Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.843137 N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1N Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.830189 N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1O Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.803571 N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1OC Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.789474 N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1S Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.767857 N#Cc1ccc(cc1)Cn1cncc1CN[C@H]1CCN(C1)C(=O)c1cccnc1SC Tanimoto from N#Cc1ccc(cc1)Cn1cncc1CN[C@@H]1CCN(c2ccncc2)C1=O = 0.754386 |
The Similarity script
Again the first part gets the path to the sdf file imported into Vortex (the file has the active lead structure as the first record), the next part constructs and runs the babel script. The results are piped into output. The columns are created if needed (note occasionally you may get “Possible superstructure”). Each record is separated by a linefeed “\n” and each line is parsed to get the similarity score, the exception being if a line is “Possible superstructure”.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
import sys # Uncomment the following 2 lines if running in console #vortex = console.vortex #vtable = console.vtable sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib') import subprocess # Get the path to the currently open sdf file sdfFile = vortex.getFileForPropertyCalculation(vtable) # Run obprop on the file p = subprocess.Popen(['/usr/local/bin/babel', sdfFile, '-ofpt', '-xfMACCS'], stdout=subprocess.PIPE) output = p.communicate()[0] column = vtable.findColumnWithName('Sim_MACCS', 1) column = vtable.findColumnWithName('Possible_Superstructure', 1) vtable.fireTableStructureChanged() lines = output.split('\n') currentRow = 1 for i in range(1, len(lines)-1): if lines[i][0] == '>': column = vtable.findColumnWithName('Sim_MACCS', 0) column.setValueFromString(currentRow, lines[i].split()[-1]) currentRow += 1 elif lines[i][0:23] == 'Possible superstructure': column = vtable.findColumnWithName('Possible_Superstructure', 0) column.setValueFromString(currentRow-1, 'YES') |
The script above uses the MACCS fingerprints if you want to use one of the other fingerprints just alter -xfMACCS and Sim_MACCS
Updated 31 October 2011
One thought on “Scripting Vortex and OpenBabel”
Comments are closed.