Vortex script to flag potential aggregators

Promiscuous inhibition caused by small molecule aggregation is a major source of false positive results in high-throughput screening. To mitigate this, use of a nonionic detergent such as Triton X-100 or Tween-80 has been studied, which can disrupt aggregates, and is now common in screening campaigns DOI.

Three key results emerge from this study: first, detergent-dependent identification of aggregate-based inhibition is feasible on the large scale. Second, 95% of the actives obtained in this screen are aggregate-based inhibitors. Third, aggregate-based inhibition is correlated with steep dose-response curves, although not absolutely.

Physicochemical Properties of Aggregators

A recent particularly valuable publication, Irwin, Duan, Torosyan, Doak, Ziebart, Sterling, Tumanian and Shoichet, J Med Chem, 2015, 58(1 7), 7076-7087 DOI, has collated over 12,000 organic molecules known to act as aggregators at concentrations used in screening campaigns, and provides a resource Aggregation Advisor that can be used to try and predict possible false positives. However in many instances it would be unwise to submit proprietary information to the public web service. Potential aggregators are flagged based on calculated LogP >3 and/or similarity >0.85 to a known aggregator (using path based fingerprint) this script calculates xLogP using the algorithm provided by Dotmatics and then uses OpenBabel fast search to calculate the closest similarity to a known aggregator.

OpenBabel Fastsearch

OpenBabel is a chemical toolbox designed to speak the many languages of chemical data. It’s an open, collaborative project allowing anyone to search, convert, analyse, or store data from molecular modelling, chemistry, solid-state materials, biochemistry, or related areas. One of the ready made applications is the fastsearch utility. This uses molecular fingerprints to prepare and search an index of a multi-molecule datafile. It allows very fast substructure and structural similarity searching. The indexing is a slow process (~30 minutes for a 250,000 molecule file) but the subsequent searching is much faster, a few seconds, and so can be done interactively.

Uses molecular fingerprints in an index file.
Writing to the fs format makes an index (a very slow process)
 babel datafile.xxx index.fs
Reading from the fs format does a fast search for:
Substructure
    babel index.fs -sSMILES outfile.yyy   or
    babel datafile.xxx -ifs -sSMILES outfile.yyy
Molecular similarity based on Tanimoto coefficient
    babel index.fs -sSMILES outfile.yyy -at0.7  (Tanimoto &gt;0.7)
    babel index.fs -sSMILES outfile.yyy -at15   (best 15 molecules)
The structure spec can be a molecule from a file: -Spatternfile.zzz

Write Options (when making index) e.g. -xfFP3 
f# Fingerprint type
N# Fold fingerprint to # bits
u  Update an existing index

Read Options (when searching) e.g. -at0.7
t# Do similarity search: #mols or # as min Tanimoto
a  Add Tanimoto coeff to title
l# Maximum number of candidates. Default&lt;4000&gt;

Uses molecular fingerprints in an index file.

Writing to the fs format makes an index (a very slow process)

babel datafile.xxx index.fs

Reading from the fs format does a fast search for:

Substructure

babel index.fs -sSMILES outfile.yyy or

babel datafile.xxx -ifs -sSMILES outfile.yyy

Molecular similarity based on Tanimoto coefficient

babel index.fs -sSMILES outfile.yyy -at0.7 (Tanimoto >0.7)

babel index.fs -sSMILES outfile.yyy -at15 (best 15 molecules)

The structure spec can be a molecule from a file: -Spatternfile.zzz

Write Options (when making index) e.g. -xfFP3

f# Fingerprint type

N# Fold fingerprint to # bits

u Update an existing index

Read Options (when searching) e.g. -at0.7

t# Do similarity search: #mols or # as min Tanimoto

a Add Tanimoto coeff to title

l# Maximum number of candidates. Default<4000>

OpenBabel supports a number of different fingerprints but the linear fingerprints FP2 are similar to those used in the publication.

user$ babel -L fingerprints
FP2    Indexes linear fragments up to 7 atoms.
FP3    SMARTS patterns specified in the file patterns.txt
FP4    SMARTS patterns specified in the file SMARTS_InteLigand.txt
MACCS    SMARTS patterns specified in the file MACCS.txt

user$ babel -L fingerprints

FP2 Indexes linear fragments up to 7 atoms.

FP3 SMARTS patterns specified in the file patterns.txt

FP4 SMARTS patterns specified in the file SMARTS_InteLigand.txt

MACCS SMARTS patterns specified in the file MACCS.txt

There is a comprehensive tutorial describing Openbabel fastsearch available online.

The authors very generously provide a file containing all the aggregators that can be downloaded from here http://advisor.bkslab.org/faq/, they were downloaded as SMILES strings but I converted them into sdf format as a simple means to check all SMILES were valid.

We first need to create the fast search index.

/usr/local/bin/obabel   '/Users/username/Desktop/Aggregators/aggregators.sdf' -ofs -xFP2   '/Users/username/Desktop/Aggregators/aggregators.fs

1 2	/usr/local/bin/obabel '/Users/username/Desktop/Aggregators/aggregators.sdf' -ofs -xFP2 '/Users/username/Desktop/Aggregators/aggregators.fs

You will need to make a note of the path to the .fs file to include in the script below. You can then test the search using the command shown below. It should return the SMILES string, ID and similarity for the most similar molecule.

usename$ '/usr/local/bin/obabel' '/Users/username/Desktop/Aggregators/aggregators.fs' -osmiles -S'Fc1cc2NCCc2cc1' -at1 -aa
Cc1ccc(NCCc2c(cc(cc2&#91;N+](=O)&#91;O-])&#91;N+](=O)&#91;O-])&#91;N+](=O)&#91;O-])cc1  MLS000590224-01 0.536585
1 molecule converted

usename$ '/usr/local/bin/obabel' '/Users/username/Desktop/Aggregators/aggregators.fs' -osmiles -S'Fc1cc2NCCc2cc1' -at1 -aa

Cc1ccc(NCCc2c(cc(cc2[N+](=O)[O-])[N+](=O)[O-])[N+](=O)[O-])cc1 MLS000590224-01 0.536585

1 molecule converted

We can now use this in the Vortex script.

The first part of the script calculates XLogP DOI as implemented within Vortex. Then the script loops through all the records in the workspace, generates the SMILES string for an individual record and then uses the OpenBabel fast search to find the most similar structure amongst the known aggregators. The last part of the script parses the output because the data returned contains a mixture of tab and space delimiters. Then the table is populated.

The final part of the script categorises compounds based on XLogP and similarity.

The Vortex Script

import sys
import com.dotmatics.vortex.util.Util as Util
import subprocess



xlogps = &#91;float(i) for  i in vortex.getMolProperty(vtable.getStructureTexts(), 'XLogP')]

col = vtable.findColumnWithName('XLogP', 1, vortex.DOUBLE)

col.setDoubles(xlogps)
smiles = ''
#Need to edit path to fast search file!!
aggfs = '/Users/username/Desktop/Aggregators/aggregators.fs'

col = vtable.findColumnWithName('Structure', 0)

rows = vtable.getRealRowCount()
for r in range(0, int(rows)):

    if (col == None):
        vortex.alert('Load a workspace with a Structure column please.')
        quit()

    else:
        smiles = vortex.getMolProperty(vtable.getMolFileManager().getMolFileAtRow(r), 'SMILES')

#vortex.alert('SMILES = '+smiles)

#  '/usr/local/bin/obabel' '/Users/username/Desktop/Aggregators/aggregators.fs' -osmiles -S'Fc1cc2NCCc2cc1' -at1 -aa

        p = subprocess.Popen(&#91;'/usr/local/bin/obabel', aggfs, '-osmiles', '-S',smiles, '-at1', '-aa'], stdout=subprocess.PIPE)
        output = p.communicate()&#91;0]

#parse output
        output = output.replace("\t", " ")
        output = output.replace("\n", "")
        vals = output.split(' ')

        column = vtable.findColumnWithName('sim SMILES', 1)
        column.setValueFromString(r, vals&#91;0])
        column = vtable.findColumnWithName('ID', 1)
        column.setValueFromString(r, vals&#91;1])
        column = vtable.findColumnWithName('SimScore', 1,1)
        column.setValueFromString(r, vals&#91;2])
vtable.fireTableStructureChanged()  


rows = vtable.getRealRowCount()
for r in range(0, int(rows)):
    score = ''
    col1 = vtable.findColumnWithName('XLogP', 0)
    col2 = vtable.findColumnWithName('SimScore', 0)
    taskID = col1.getValue(r)
    taskID2 = col2.getValue(r)
    if taskID &gt;3.0:
        score = 'High LogP'
    elif taskID &lt;3.0:
        score = 'Low LogP'
    if taskID2 &gt;0.83:
        score = score + ', similar'
    if taskID2 &lt;0.83:
        score = score + ', no similar'
    column = vtable.findColumnWithName('Agg Score', 1)
    column.setValueFromString(r, score)

vtable.fireTableStructureChanged()

import sys

import com.dotmatics.vortex.util.Util as Util

import subprocess

xlogps = [float(i) for i in vortex.getMolProperty(vtable.getStructureTexts(), 'XLogP')]

col = vtable.findColumnWithName('XLogP', 1, vortex.DOUBLE)

col.setDoubles(xlogps)

smiles = ''

#Need to edit path to fast search file!!

aggfs = '/Users/username/Desktop/Aggregators/aggregators.fs'

col = vtable.findColumnWithName('Structure', 0)

rows = vtable.getRealRowCount()

for r in range(0, int(rows)):

if (col == None):

vortex.alert('Load a workspace with a Structure column please.')

quit()

else:

smiles = vortex.getMolProperty(vtable.getMolFileManager().getMolFileAtRow(r), 'SMILES')

#vortex.alert('SMILES = '+smiles)

# '/usr/local/bin/obabel' '/Users/username/Desktop/Aggregators/aggregators.fs' -osmiles -S'Fc1cc2NCCc2cc1' -at1 -aa

p = subprocess.Popen(['/usr/local/bin/obabel', aggfs, '-osmiles', '-S',smiles, '-at1', '-aa'], stdout=subprocess.PIPE)

output = p.communicate()[0]

#parse output

output = output.replace("\t", " ")

output = output.replace("\n", "")

vals = output.split(' ')

column = vtable.findColumnWithName('sim SMILES', 1)

column.setValueFromString(r, vals[0])

column = vtable.findColumnWithName('ID', 1)

column.setValueFromString(r, vals[1])

column = vtable.findColumnWithName('SimScore', 1,1)

column.setValueFromString(r, vals[2])

vtable.fireTableStructureChanged()

rows = vtable.getRealRowCount()

for r in range(0, int(rows)):

score = ''

col1 = vtable.findColumnWithName('XLogP', 0)

col2 = vtable.findColumnWithName('SimScore', 0)

taskID = col1.getValue(r)

taskID2 = col2.getValue(r)

if taskID >3.0:

score = 'High LogP'

elif taskID <3.0:

score = 'Low LogP'

if taskID2 >0.83:

score = score + ', similar'

if taskID2 <0.83:

score = score + ', no similar'

column = vtable.findColumnWithName('Agg Score', 1)

column.setValueFromString(r, score)

vtable.fireTableStructureChanged()

The Result

The result is shown below, the workspace now contains the calculated LogP (xLogP), the structure, ID, Tanimoto coef of the most similar known aggregator (sim SMILES, ID, SimSore) and aggregation category (Agg Score).

The script can be downloaded from here

aggregatorsxLogP.vpy Download

Update

A profile of the physicochemical properties (HBD, HBA, PSA, HAC, LogP, LogD, MWt, RBC) was generated using an Applescript that uses evaluate from ChemAxon to calculate the physicochemical properties and Aabel to construct the histograms. I also used it to determine pKa in order to identify acidic or basic groups and categorized the aggregators accordingly, in addition I calculated the fraction of aromatic atoms (number of aromatic atoms/number of heavy atoms). I’ve also included npri (Normalized ratio of principle moments of inertia) as described by Sauer WH, Schwarz MK (2003) Molecular shape diversity of combinatorial libraries: A prerequisite for broad bioactivity. J Chem Inf Comput Sci 43:987–10030. DOI this was calculated using MOE. One very striking feature of this analysis is the lack of ionisable groups observed in the aggregator compounds. Whilst the majority of aggregators do have a calculated LogP >3, however because of the lack of ionisable groups it might be better to use a cut off of LogD >3.

An alternative script that uses LogD can be downloaded from here

aggregatorsxLogD.vpy Download

Page Updated 5 Novenber 2015

Vortex script to flag potential aggregators

Physicochemical Properties of Aggregators

OpenBabel Fastsearch

The Vortex Script

The Result

Update

Related Posts

Boltz on Apple Silicon

Vortex script to change display of workspace