MOE from Chemical Computing Group is probably best known as a graphical user interface to a suite of computational chemistry tools, whilst this is indubitably the means by which many users will interact with the program it is worth finding out about the command-line tools that are available. These tools are often accessed by pipeline tools such as Knime to allow rapid processing of large files. CCG provides four very useful command-line tools
- sdwash prepares SD files by carrying out a number of operations on the molecular data field, which include 2D depiction layout, hydrogen correction, salt and solvent removal, chirality and bond type normalization, tautomer generation, adjustment and enumeration of protonation states, and expansion of fragment abbreviations.
- sdfilter performs selective filtering of SD files, removing molecules which do not meet certain criteria, such as druglike/leadlike characteristics, or have calculated properties which fall outside of a specified range; e.g., acceptor/donor count, rotatable bonds, molecular weight, log P, etc.
- sdsort sorts SD files according to the molecular structure or by a data field, and can remove duplicates (taking tautomers into account) or compute differences between SD files.
- sddesc allows the calculation of some or all of the MOE molecular descriptors for each molecular entry, with the results stored in corresponding SD file data fields.
It is the last of these we will be using to add descriptors to a Vortex table.
Typing
1 2 |
sddesc -help |
in a Terminal window gives a list of the options available.
Usage:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
sddesc [options...] [infiles...] [-o outfile] infile name of input file (- for stdin) outfile name of output file (- for stdout, . for null) Options: -help prints helpful information -verbose enable information printing -quiet disable information printing -records range process only given range of records -sdf output SD file (default) -ascii output ascii comma separated files with SMILES -keepfield field SD field to transfer to ASCII output file -comma comma/quote separated ASCII output (default) -tab tab separated ASCII output -calc code_list calculate descriptors (comma separated) -nocalc skip_list skip a set of descriptors (comma separated) -class class calculate descriptors in class -forcefield filename use given forcefield file for 3D descriptors Range Syntax: range = n equal to n range = n- less than or equal to n range = n+ greater than or equal to n range = n,m n through m (inclusive) |
So while the command is designed to work with sdf files it can be used to generate ascii output as either “comma” delimited or “tab” delimited text. After a little experimentation I found this command gave the desired result
1 2 |
sddesc -ascii -tab -calc Weight /Users/username/Desktop/ChemicalStructures/acetophenones.sdf |
Note you have to include “-ascii” and “-tab”. In the above example I’ve only calculated the molecular weight but MOE can calculate many, many more descriptors. For a full list of the 300+ molecular descriptors, both 2D and 3D, available for calculation in MOE, contact Chemical ComputingGroup through their website, www.chemcomp.com . Extra, custom descriptors are very straightforward to code up in MOE’s Scientific Vector Language platform. It is important to note that if you submit a 2D structure file to the calculations any 3D descriptors generated will be inappropriate.
When I first tried this command in a Vortex script I got no output and a number of cryptic error messages, I then included the full path to sddesc
1 2 |
/Applications/moe2011/bin/sddesc -ascii -tab -calc Weight /Users/username/Desktop/ChemicalStructures/acetophenones.sdf |
But still got no output and got the following error message in the console,
1 2 |
Vortex: /Applications/moe2011/bin/sddesc: line 3: /bin/moebatch: No such file or directory |
After generous help from Matt, Dotmatics and CCG I worked out what was wrong. It seems that line 3 in $MOE/bin/sddec is
1 2 |
$MOE/bin/moebatch -run $0 $* |
which will open a MOE/batch session, and “run” $MOE/bin/sddesc as an SVL file, using the arguments that were sent when $MOE/bin/sddesc was launched. The problem is that the program is running in a shell that does not have access to all the environment variables defined in my .bash_profile. We can define the environment variables needed by moebatch thus
1 2 3 4 |
my_env = os.environ my_env["PATH"] = '/Applications/moe2011/bin/'+my_env.get('PATH', '') my_env["MOE"] = '/Applications/moe2011/'+my_env.get('$MOE', '') |
The command to run sddesc then becomes
1 2 3 |
p = subprocess.Popen(['/Applications/moe2011/bin/sddesc', '-ascii', '-tab', '-calc', 'Weight,SlogP,mr,TPSA', sdfFile], stdout=subprocess.PIPE, env=my_env) output = p.communicate()[0] |
The remainder of the script parses the data, adds columns and headers, and then inserts the data. Again the beauty of this approach is that more descriptors can be added to the list for calculation and they will be automatically added to the Vortex table.
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
import sys # Uncomment the following 2 lines if running in console #vortex = console.vortex #vtable = console.vtable sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib') import subprocess import os my_env = os.environ my_env["PATH"] = '/Applications/moe2011/bin/'+my_env.get('PATH', '') my_env["MOE"] = '/Applications/moe2011/'+my_env.get('$MOE', '') # Get the path to the currently open sdf file sdfFile = vortex.getFileForPropertyCalculation(vtable) # Run sddesc on the file # /Applications/moe2011/bin/sddesc -ascii -tab -calc Weight /Users/swain/Desktop/ChemicalStructures/acetophenones.sdf p = subprocess.Popen(['/Applications/moe2011/bin/sddesc', '-ascii', '-tab', '-calc', 'Weight,SlogP,mr,TPSA', sdfFile], stdout=subprocess.PIPE, env=my_env) output = p.communicate()[0] # Create new columns in table if needed lines = output.split('\n') colName = lines[0].split('\t') for c in colName: column = vtable.findColumnWithName(c, 1) vtable.fireTableStructureChanged() keys = [] for i in lines: words = i.split('\t') if len(words) == 2: keys.append(words[0]) # Parse the output rows = lines[1:len(lines)] for r in range(0, vtable.getRealRowCount()): vals = rows[r].split('\t') for j in range(0, len(vals)): column = vtable.findColumnWithName(colName[j], 0) column.setValueFromString(r, vals[j]) |
The script can be downloaded from here CCGcolumns.vpy.zip
Last Updated 7 Feb 2012