I recently wrote a review of Reaction Workflows, a web-based tool that allow users to build workflows from nodes that provide inputs and outputs or perform actions, including ones to perform reaction-, scaffold-, and transform-based enumeration, and it is all done within a web browser interface using drag and drop. Whilst you can draw input structures one of the real strengths is the ability to import pre-categorised reagent files e.g.Acid Chlorides or secondary amines. Whilst Workflows comes with a set of pre-categorised reagents I’m sure most users will want to include their own proprietary or catalogues of commercial reagents.
This script is intended to help with the categorisation, it uses SMARTS strings to define queries. If you are not familiar with SMARTS then the Daylight Theory pages are a good starting place. I also find the SMARTSviewer at the Univ of Hamburg really helpful. There is a pascal script Checkmol that does somethings similar.
SMARTS is a language that allows you to specify substructures using rules that are straightforward extensions of SMILES. For example, to search a database for phenol-containing structures, one would use the SMARTS string [OH]c1ccccc1, which should be familiar to those acquainted with SMILES.
The script is a variation of the high performance sub-structure search scripts described previously, however instead of simply flagging the presence (or absence) of a SMARTS query we provide a count of the number of times a SMARTS query is identified within a molecule. The script uses all available cores and is thus capable of running multiple queries in parallel and can thus handle very large datasets. The script currently contains around 70 different SMARTS queries for both functional groups and atom counts and I’d be happy to add any suggestions.
The result is shown in the screenshot below
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
import java from com.dotmatics.vortex.mol2img import Mol2Img from Queue import Queue from threading import Thread processorcount = java.lang.Runtime.getRuntime().availableProcessors() class smilesworker(Thread): def __init__(self, q, eval_column): self.q = q self.eval_column = eval_column Thread.__init__(self) def run(self): while 1: row = self.q.get() if row == None: return try: vortex_tmp_value = vortex.getMolProperty(vtable.getStructureText(row), "SMILES") except: vortex_tmp_value = None if (vortex_tmp_value == None): self.eval_column.setValueFromString(row, None) else: self.eval_column.setValueFromString(row, str(vortex_tmp_value)) #Patterns here patterns = [ #Carbon functional groups ('aro', '[a]'), ('acetylene', 'C#[CH1]'), ('carbonyl', '[CX3]=[OX1]'), #urea will count as 2 amides ('amide', '[OX1]=CN'), etc etc #old school HBA/D model (Count N,O and N or O bearing H) ('HBA', '[#7,#8]'), ('HBD', '[OX2H,NX3H,NX2H]') ] class match_multiple(ProgressRunnable): def __init__(self): self.useMatchCount = 0 self.calcSMILES = False self.nostructure = False self.structureColumn = vtable.findColumnWithName("SMILES") if self.structureColumn == None: self.calcSMILES = True #vortex.alert(str(self.calcSMILES)) #vortex.alert(str(vtable.findColumnWithName(vtable.MolfileColumn))) if (self.calcSMILES == True ) & (vtable.findColumnWithName(vtable.MolfileColumn) == None): vortex.alert("You need an SD file or a SMILES column") self.nostructure = True def doCalcSmiles(self): self.structureColumn.setValueFromString(vtable.getRealRowCount() - 1, None) q = Queue(processorcount * 20) #The workers t = [] #Create workers for i in range(0, processorcount): t.append(smilesworker(q, self.structureColumn)) #Start the workers for i in range(0, processorcount): t[i].start() #Load the Q for row in range(0, vtable.getRealRowCount()): q.put(row) #Something to sell the workers to stop for i in range(0, processorcount): q.put(None) for i in range(processorcount): t[i].join() def updateProgress(self, perc, message): self.setProgressValue(perc) self.setProgressMessage(message) def run(self): if not self.nostructure: self.updateProgress(0, 'Calculating SMILES') if (self.calcSMILES): self.structureColumn = vtable.findColumnWithName("SMILES", 1, vortex.STRING) self.doCalcSmiles() self.updateProgress(0, 'Indexing SMILES (for performance)') Mol2Img.doSearch(self.structureColumn, '[U].Cl.F.Br.N.O.S', 'nomdl', 1) results = [] for i in range(0, vtable.getRealRowCount()): results.append([]) message = '' ttotal = 0 for i in range(0, len(patterns)): self.updateProgress(int(100 * (float(i) / float(len(patterns)))), patterns[i][0]) hits = Mol2Img.doSearch(self.structureColumn, patterns[i][1], 'nomdl', 1) mycol = vtable.findColumnWithName(patterns[i][0] + "_count", 1, vortex.INT) for i in range(vtable.getRealRowCount()): if hits.containsKey(i): mycol.setInt(i, hits[i]) else: mycol.setInt(i, 0) vtable.fireTableStructureChanged() if vws is None: vortex.alert("You must have a workspace loaded...") else: matcher = match_multiple() vortex.run(matcher, "Generating matches") |
The script can be downloaded from here
Last Updated 21 June 2017