One of the really neat features of the latest version of Vortex (> build 29622) is the ability to script multiple sub-structure searches using SMARTS. There are many occasions when this sort of feature is useful, if you want to flag molecules that contain reactive functional groups, toxicophores, or PAINS functional groups that have been shown to interfere with a variety of screens. Alternatively if you have a drug discovery project with multiple chemotypes you might want to tag particular groups of compounds as belonging to a named series to aid analysis.
PAINS filter
A recent comment in Nature “Naivety about promiscuous, assay-duping molecules is polluting the literature and wasting resources” underlines the importance of ensuring that any hits from a bioassay are genuine ligands and not a non-selective artefact.
These molecules — pan-assay interference compounds, or PAINS — have defined structures, covering several classes of compound. But biologists and inexperienced chemists rarely recognize them. Instead, such compounds are reported as having promising activity against a wide variety of proteins. Time and research money are consequently wasted in attempts to optimize the activity of these compounds. Chemists make multiple analogues of apparent hits hoping to improve the ‘fit’ between protein and compound. Meanwhile, true hits with real potential are neglected….Most of all, academic drug discoverers must be more vigilant. Molecules that show the strongest activity in screening might not be the best starting points for drugs. PAINS hits should almost always be ignored. Even trained medicinal chemists have to be careful until they become experienced in screening. Take it from us: do not even start down these treacherous routes.
Jonathan B. Baell and Georgina A. Holloway published a very interesting paper on their analysis of frequent hitters from screening assays. DOI, in the supplementary information they provided the corresponding filters in Sybyl Line Notation (SLN) format. These were converted to SMARTS format for use with filter-it but can also be used to create a Vortex script to provide a PAINS filter as nicely demonstrated by Dan Ormsby and Mike Hartshorn.
The first part of the script contains the series of SMARTS strings and the associated text label, these are in a standard format (only a limited selection are shown below).
1 2 |
["ene_six_het_A(483)","[#6]-1(-[#6](~[!#6&!#1]~[#6]-[!#6&!#1]-[#6]-1=[!#6&!#1])~[!#6&!#1])=[#6;!R]-[#1]"], |
It then adds explicit Hs to all molecules before matching to deal with the PAINS SMARTS as the patterns contain explicit Hs. (currently this is limited to molecules with <1000 heavy atoms).
The script also displays a progress box, rather nicely this also displays the name of the SMARTS string currently being used as the query.
One of the nice things is that Vortex uses multiple threads. (Kmeans clustering and property calculation are all threaded too). Java asks the OS how many processors are there. My six core MacPro has 12 notional cores resulting in 1100% CPU usage.
The PAINS filter Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 |
# # Custom version of the Match Patterns script # # This version add explicit Hs to all molecules before matching to deal with the # PAINS set from http://pubs.acs.org/doi/abs/10.1021/jm901137j as the patterns # contain explicit Hs. # # Script to carry out matches of multiple patterns against a workspace. # If the workspace comes from an SD file then it will use the structure column, # otherwise it will use a column called SMILES. # # Author: Mike Hartshorn & Dan Ormsby # Copyright (C) Dotmatics Limited, 2013 import time import java from com.dotmatics.vortex.mol2img import Mol2Img #PAINS patterns here patterns = [ ["ene_six_het_A(483)","[#6]-1(-[#6](~[!#6&!#1]~[#6]-[!#6&!#1]-[#6]-1=[!#6&!#1])~[!#6&!#1])=[#6;!R]-[#1]"], ["hzone_phenol_A(479)","c:1:c:c(:c(:c:c:1)-[#6]=[#7]-[#7])-[#8]-[#1]"], ["anil_di_alk_A(478)","[#6](-[#1])(-[#1])-[#7](-[#6](-[#1])-[#1])-c:1:c:c(:c(:c(:c:1)-[$([#1]),$([#6](-[#1])-[#1]),$([#8]-[#6](-[#1])(-[#1])-[#6](-[#1])-[#1])])-[#7])-[#1]"], ["indol_3yl_alk(461)","n:1(c(c(c:2:c:1:c:c:c:c:2-[#1])-[#6;X4]-[#1])-[$([#6](-[#1])-[#1]),$([#6]=,:[!#6&!#1]),$([#6](-[#1])-[#7]),$([#6](-[#1])(-[#6](-[#1])-[#1])-[#6](-[#1])(-[#1])-[#7](-[#1])-[#6](-[#1])-[#1])])-[$([#1]),$([#6](-[#1])-[#1])]"], ["quinone_A(370)","[!#6&!#1]=[#6]-1-[#6]=,:[#6]-[#6](=[!#6&!#1])-[#6]=,:[#6]-1"], ["azo_A(324)","[#7;!R]=[#7]"], ["imine_one_A(321)","[#6]-[#6](=[!#6&!#1;!R])-[#6](=[!#6&!#1;!R])-[$([#6]),$([#16](=[#8])=[#8])]"], ["mannich_A(296)","[#7]-[#6;X4]-c:1:c:c:c:c:c:1-[#8]-[#1]"], ["anil_di_alk_B(251)","c:1:c:c(:c:c:c:1-[#7](-[#6;X4])-[#6;X4])-[#6]=[#6]"], ["anil_di_alk_C(246)","c:1:c:c(:c:c:c:1-[#8]-[#6;X4])-[#7](-[#6;X4])-[$([#1]),$([#6;X4])]"], .... .... Several hundred patterns are in the script .... .... ["ene_one_one_B(1)","[#6]-1(-[#6](=[#8])-[#6](-[#1])(-[#1])-[#6]-[#6](-[#1])(-[#1])-[#6]-1=[#8])=[#6](-[#7]-[#1])-[#6]=[#8]"], ["dhp_amino_CN_H(1)","[#7](-[#1])(-[#1])-[#6]-1=[#6](-[#6]#[#7])-[#6](-[#1])(-[#6]:[#6])-[#16]-[#6;X4]-[#16]-1"], ["het_66_anisole(1)","[#6](-[#1])(-[#1])-[#8]-c:1:c(:c(:c(:c(:c:1-[#1])-[#1])-[#1])-[#1])-[#7](-[#1])-c:2:c:c:n:c:3:c(:c:c:c(:c:2:3)-[#8]-[#6](-[#1])-[#1])-[#8]-[#6](-[#1])-[#1]"], ["thiazole_amine_N(1)","[#6](-[#1])(-[#1])-[#8]-c:1:c(:c(:c(:c(:c:1-[#1])-[#1])-[#8]-[#6](-[#1])-[#1])-[#1])-[#7](-[#1])-c:2:n:c(:c:s:2)-c:3:c:c:c(:c:c:3)-[#8]-[#6](-[#1])-[#1]"], ["het_pyridiniums_C(1)","[#6]~1~3~[#7](-[#6]:[#6])~[#6]~[#6]~[#6]~[#6]~1~[#6]~2~[#7]~[#6]~[#6]~[#6]~[#7+]~2~[#7]~3"], ["het_5_E(1)","[#7]-3(-c:2:c:1:c:c:c:c:c:1:c:c:c:2)-[#7]=[#6](-[#6](-[#1])-[#1])-[#6](-[#1])(-[#1])-[#6]-3=[#8]"] ] class match_multiple(ProgressRunnable): def __init__(self): self.starttime = time.time() self.useMatchCount = 0 self.molfiles = 1 self.structureColumn = vtable.findColumnWithName(vtable.MolfileColumn, 0) self.resultCol = vtable.findColumnWithName('PAINS', 1, vortex.STRING) if not self.structureColumn: self.structureColumn = vtable.findColumnWithName("SMILES", 0) self.molfiles = 0 self.dummifyColumn = vtable.findColumnWithName('SMILES_WITH_Hs', 1, vortex.STRING) if not self.structureColumn: vortex.alert("You need an SD file or a SMILES column") def updateProgress(self, perc, message): self.setProgressValue(perc) self.setProgressMessage(message) def run(self): self.updateProgress(0, 'calc_structures') if self.structureColumn: smis = vortex.getMolProperty(self.structureColumn.getStructureTexts(), 'DUMMIFY_SMILES') smis = [i.replace('*', 'H') for i in smis] self.dummifyColumn.setValuesFromStrings(smis) del(smis) vtable.fireTableStructureChanged() self.updateProgress(0, 'fingerprinting molecules/patterns') results = ['' for i in range(0, vtable.getRealRowCount())] message = '' for i in range(0, len(patterns)): self.updateProgress(int(100 * (float(i) / float(len(patterns)))), patterns[i][0]) t0 = time.time() hits = Mol2Img.doSearch(self.dummifyColumn, patterns[i][1], 'mdl', 0) mylist = hits.keySet().toArray() for j in range(0, len(mylist)): if results[mylist[j]] == '': results[mylist[j]] = patterns[i][0] else: results[mylist[j]] = results[mylist[j]] + ',' + patterns[i][0] t1 = time.time() message = message + " %.2f " % (t1 - t0) + str(len(mylist)) + ' ' + patterns[i][0] + '\n' self.resultCol.setValuesFromStrings(results) vtable.fireTableStructureChanged() import javax.swing b = javax.swing.JTextArea(message); s = javax.swing.JScrollPane(b); s.setVerticalScrollBarPolicy(javax.swing.JScrollPane.VERTICAL_SCROLLBAR_ALWAYS) p = javax.swing.JPanel() p.add(s) # vortex.showInDialog(p, 'Benchmark results') vtable.fireTableStructureChanged() if vws is None: vortex.alert("You must have a workspace loaded...") else: if len(com.dotmatics.vortex.Vortex.getVersion().split('.')) == 1: v = -1 else: v = int(com.dotmatics.vortex.Vortex.getVersion().split('.')[2]) if (v < 29622): vortex.alert("<html><h2>This script required Vortex build 29622</h2><h3>If you proceed prepare for all kinds of calamity</h3></html>") matcher = match_multiple() vortex.run(matcher, "Generating matches") |
Reactive Groups Filter
One of the important steps in building a screening collection is to remove molecules containing chemically reactive groups (unless you are looking for covalent modifiers). Most companies have there own set of functional groups they don’t want in the screening collection. The list of groups shown below in the script I’ve compiled over the years (usually as the result of finding a false positive in a screen). It would be trivial to add or remove groups.
The first part of the script contains the series of SMARTS strings and the associated text label, these are in a standard format
1 2 |
["Alpha_HaloCarbonyl","[F,Cl,Br,I]CC=O"], |
It is very straight forward to add new structural alerts by simply adding the appropriate query string. I would really recommend using the free online SMARTSviewer to check the queries.
The Remove Reactive Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 |
# # A script to filter out chemically reactive compounds # from com.dotmatics.vortex.mol2img.jni import genImage import time import java from threading import Thread from com.dotmatics.vortex.mol2img import Mol2Img #Reactive SMARTS patterns here, [["4-Nitrophenyl Ester","[O-][N+](=O)c1ccc(OC=O)cc1"], patterns = [ ["4-Nitrophenyl Ester","[O-][N+](=O)c1ccc(OC=O)cc1"], ["Acid Chloride","C(=O)Cl"], ["Acid Bromide","C(=O)Br"], ["Acid Iodide","C(=O)I"], ["Acid Fluoride","C(=O)F"], ["Acyl Cyanide","N#CC(=O)"], ["Acyl hydrazine","NNC=O"], ["Anhydride","C(=O)OC(=O)"], ["Allyl Bromide","BrCC=C"], ["Allyl Chloride","ClCC=C"], ["Allyl Fluoride","FCC=C"], ["Allyl iodide","ICC=C"], ["Alpha_HaloCarbonyl","[F,Cl,Br,I]CC=O"], ["Beta_HaloCarbonyl","[F,Cl,Br,I]CCC=O "], ["Azide","N=[N+]=[N-]"], ["Aziridine","C1CN1"], ["Azo","[N;X2]=[N;X2]"], ["Benzyl Bromide","[H]C([H])(Br)c"], ["Benzyl Chloride","[H]C([H])(Cl)c"], ["Benzyl Iodide","[H]C([H])(I)c"], ["Beta ammonium carbonyl","C[N+](C)(C)CCC=O"], ["Carbazide","O=*N=[N+]=[N-]"], ["Carbodimide","N=C=N"], ["Chloramine","[N;X3](Cl)"], ["Chloro Silane","Cl[Si]"], ["Cyanohydrin","N#CC[OH]"], ["cyanamides","N[CH2]C#N"], ["Cyanate","O=C=N"], ["diazo","cN=Nc"], ["Diazonium","[N+]#N"], ["Dichloramine","[N;X3](Cl)Cl"], ["Disulphide","SS"], ["Epoxide","C1CO1"], ["HaloAmine","[F,Cl,Br,I]N"], ["Beta_HaloAmine","[F,Cl,Br,I]CCN"], ["HaloMethylEther","[F,Cl,Br,I]C[OH0;X2]"], ["HaloMethylThioEther","[F,Cl,Br,I]C[SH0;X2]"], ["HydroxyBenzoylTriazole","C(=O)Onnn"], ["Imidoyl Chloride","ClC=N"], ["Imidoyl Bromide","BrC=N"], ["Iodoso","I(=O)"], ["Iodoxy","O=I=O"], ["Isocyanate","N=C=O"], ["Isothiocyanate","N=C=S"], ["isonitriles","[N+]#[C-]"], ["Ketene","C=C=O"], ["Lawesson's_reagents","P(=S)(S)S"], ["Nitroso","[N;X2]=O"], ["Oxaziridine","C1NO1"], ["Pentafluorophenyl Ester","Fc1c(F)c(F)c(OC=O)c(F)c1F"], ["Peroxide","OO"], ["Phosphine Chloride","PCl"], ["Phosphine Bromide","PBr"], ["Phosphine Fluoride","PF"], ["Phosphine Iodide","PI"], ["Cationic Br","[Br+]"], ["Cationic Cl","[Cl+]"], ["Cationic I","[I+]"], ["Cationic O","[O+,o+]"], ["Cationic P","[P+]"], ["Cationic S","[S+]"], ["Sulphonyl Chloride","S(=O)(=O)[Cl]"], ["Sulphonyl Bromide","S(=O)(=O)[Br]"], ["Sulphonyl Fluoride","S(=O)(=O)[F]"], ["Sulphonate Ester","COS(c)(=O)=O"], ["Sulphonyl Cyanide","S(=O)(=O)C#N"], ["Thioacyl Chloride","C(=S)Cl"], ["Thioacyl Bromide","C(=S)Br"], ["Thio Halides","[S][Cl,Br,F,I]"], ["Thiocyanate","SC#N"], ["Triflate","OS(=O)(=O)C(F)(F)F"], ["Vinylous Acid Chloride","ClC=CC=O"] ] class match_multiple(ProgressRunnable): def __init__(self): self.starttime = time.time() self.useMatchCount = 0 self.molfiles = 1 self.structureColumn = vtable.findColumnWithName(vtable.MolfileColumn, 0) self.resultCol = vtable.findColumnWithName('Reactive Group', 1, vortex.STRING) if not self.structureColumn: self.structureColumn = vtable.findColumnWithName("SMILES", 0) self.molfiles = 0 self.dummifyColumn = vtable.findColumnWithName('SMILES_WITH_Hs', 1, vortex.STRING) if not self.structureColumn: vortex.alert("You need an SD file or a SMILES column") def updateProgress(self, perc, message): self.setProgressValue(perc) self.setProgressMessage(message) def run(self): self.updateProgress(0, 'calc_structures') if self.structureColumn: smis = vortex.getMolProperty(self.structureColumn.getStructureTexts(), 'DUMMIFY_SMILES') smis = [i.replace('*', 'H') for i in smis] self.dummifyColumn.setValuesFromStrings(smis) del(smis) vtable.fireTableStructureChanged() self.updateProgress(0, 'fingerprinting molecules/patterns') results = ['' for i in range(0, vtable.getRealRowCount())] message = '' for i in range(0, len(patterns)): self.updateProgress(int(100 * (float(i) / float(len(patterns)))), patterns[i][0]) t0 = time.time() hits = Mol2Img.doSearch(self.dummifyColumn, patterns[i][1], 'mdl', 0) mylist = hits.keySet().toArray() for j in range(0, len(mylist)): if results[mylist[j]] == '': results[mylist[j]] = patterns[i][0] else: results[mylist[j]] = results[mylist[j]] + ',' + patterns[i][0] t1 = time.time() message = message + " %.2f " % (t1 - t0) + str(len(mylist)) + ' ' + patterns[i][0] + '\n' self.resultCol.setValuesFromStrings(results) vtable.fireTableStructureChanged() import javax.swing b = javax.swing.JTextArea(message); s = javax.swing.JScrollPane(b); s.setVerticalScrollBarPolicy(javax.swing.JScrollPane.VERTICAL_SCROLLBAR_ALWAYS) p = javax.swing.JPanel() p.add(s) # vortex.showInDialog(p, 'Benchmark results') vtable.fireTableStructureChanged() if vws is None: vortex.alert("You must have a workspace loaded...") else: if len(com.dotmatics.vortex.Vortex.getVersion().split('.')) == 1: v = -1 else: v = int(com.dotmatics.vortex.Vortex.getVersion().split('.')[2]) if (v < 29622): vortex.alert("<html><h2>This script required Vortex build 29622</h2><h3>If you proceed prepare for all kinds of calamity</h3></html>") matcher = match_multiple() vortex.run(matcher, "Generating matches") |
Organising into Structural Classes
When working on a drug discovery project with multiple chemotypes you often want to tag particular groups of compounds as belonging to a named structural class to aid analysis. An example is shown below which classifies structures into Indoles, Indazoles, Benzimidazoles etc. by simply replacing the SMARTS patterns with those defining each of the structural classes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
# # A script to assign structural classes # from com.dotmatics.vortex.mol2img.jni import genImage import time import java from threading import Thread from com.dotmatics.vortex.mol2img import Mol2Img #Structural class SMARTS patterns here, ["Indole","[N1C=CC2=C1C=CC=C2"], patterns = [ ["Indole","[N1C=CC2=C1C=CC=C2"], ["Indazole","N1N=CC2=C1C=CC=C2"], ["Benzimidaole","N1C=NC2=C1C=CC=C2"], ["Quinoline","C1=CC2=C(C=C1)N=CC=C2"], etc |
The scripts can be downloaded here
Page Updated 9 October 2014
One thought on “Vortex script for Matching muliple SMARTS queries such as PAINS”
Comments are closed.