Vortex script using Using OpenBabel fastsearch

One thing I’ve needed to do a couple of times recently is give an idea of how many similar compounds are available to the set of compounds I’m currently viewing. For example in designing a fragment library it is very useful to know for a particular fragment how many similar fragments are commercially available. Or when looking at the results of a high-throughput screen how many similar analogues to a particular hit were also screened.

To do this we need a way of doing a rapid similarity search of the reference database. I use OpenBabel in particular using the fast search capability with molecular fingerprints.

Molecular fingerprints encode molecular structure in a series of binary digits (bits) that represent the presence or absence of particular substructures in the molecule. Comparing fingerprints will allow you rapidly to determine the similarity between two molecule

Using OpenBabel

We first need to create the fast search index, by default the fingerprint is FP2 (Indexes linear fragments up to 7 atoms) and does not need to be defined but I’ve left it in so you can see the format if you want to use other fingerprints

/usr/local/bin/obabel   '/Users/swain/Projects/FragFiles/sdf/allFragments.sdf'    -O '/Users/swain/Projects/FragFiles/sdf/allFragments.fs -xfFP2

1 2	/usr/local/bin/obabel '/Users/swain/Projects/FragFiles/sdf/allFragments.sdf' -O '/Users/swain/Projects/FragFiles/sdf/allFragments.fs -xfFP2

The available fingerprints are

FP2    Indexes linear fragments up to 7 atoms.
FP3    SMARTS patterns specified in the file patterns.txt
FP4    SMARTS patterns specified in the file SMARTS_InteLigand.txt
MACCS    SMARTS patterns specified in the file MACCS.txt

FP2 Indexes linear fragments up to 7 atoms.

FP3 SMARTS patterns specified in the file patterns.txt

FP4 SMARTS patterns specified in the file SMARTS_InteLigand.txt

MACCS SMARTS patterns specified in the file MACCS.txt

This process may take a little while, on my MacBook Pro a 50K sdf took less than a minute to generate the fast search index. You can test using a simple search, the following Terminal command returns all structures with a Tanimoto similarity of > 0.85.

/usr/local/bin/obabel  '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at0.85

1 2	/usr/local/bin/obabel '/Users/swain/Projects/FragFiles/sdf/allFragments.fs' -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at0.85

c1(N2CCNCC2)ccccc1
N1(CCN(CC1)c1ccccc1)c1ccccc1
N1(CCN(CC1)C)c1ccccc1
N(c1ccccc1)CCN(CC)CC
N1(c2ccc(N)cc2)CCN(CC1)C
N1(c2ccc(N)cc2)CCN(CC1)CC
N1(c2ccc(N)cc2)CCN(CC1)C
N1(c2ccc(N)cc2)CCN(CC1)CC
N1(c2ccc(N)cc2)CCN(CC1)CC
N1(c2ccc(N)cc2)CCNCC1
N1(c2c(N)cccc2)CCNCC1
N1(c2c(N)cccc2)CCN(CC1)C
N1(c2cc(N)ccc2)CCN(CC1)C
N1(c2c(N)cccc2)CCN(CC1)C
N1(c2c(N)cccc2)CCN(CC1)CC
c1cccc(c1)N(CCN)CC
16 molecules converted

Whilst the following command would return the 5 most similar structures

/usr/local/bin/obabel  '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5

1 2	/usr/local/bin/obabel '/Users/swain/Projects/FragFiles/sdf/allFragments.fs' -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5

c1(N2CCNCC2)ccccc1
N1(CCN(CC1)c1ccccc1)c1ccccc1
N1(CCN(CC1)C)c1ccccc1
N(c1ccccc1)CCN(CC)CC
N1(c2ccc(N)cc2)CCN(CC1)CC
5 molecules converted

There is a slight catch, whilst all the output is displayed in the Terminal window not all is derived from stdout, this is because by default stderr is also displayed in the Terminal window and stdlog is mapped to stderr. We can check what comes from where by selectively suppressing either stdout or stderr. Many thanks to Chris Morley for explaining this.

This command stops the stderr output

/usr/local/bin/obabel  '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5 2> /dev/null

1	/usr/local/bin/obabel '/Users/swain/Projects/FragFiles/sdf/allFragments.fs' -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5 2> /dev/null

This results in

c1(N2CCNCC2)ccccc1 N1(CCN(CC1)c1ccccc1)c1ccccc1
N1(CCN(CC1)C)c1ccccc1
N(c1ccccc1)CCN(CC)CC
N1(c2ccc(N)cc2)CCN(CC1)CC

If we now stop the stdout output

/usr/local/bin/obabel  '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5 > /dev/null

1	/usr/local/bin/obabel '/Users/swain/Projects/FragFiles/sdf/allFragments.fs' -osmi -s'C1CN(CCN1)C1=CC=CC=C1' -at5 > /dev/null

This results in

5 molecules converted

Since for this script we only need to know the number of similar molecules we need to capture the stderr output.

Script Explanation

The first part imports various libraries and I have to confess I now use a simple boilerplate so not all are actually needed. We then open a dialog to allow the user to select the appropriate fast search file and get the absolute path. We then create a column to store the results in, and then loop through the table extracting the SMILES string for each structure. The next step is key, we create the subprocess but ensure we capture both stdout and stderr, (we actually only need stderr but by capturing both it does give options for modifying the script). You can obviously change the similarity coefficient from 0.85 if you want.

p = subprocess.Popen(&#91;'/usr/local/bin/obabel', fsFile, '-o', 'smi', '-s',  smiles, '-at0.85'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

1 2	p = subprocess.Popen(['/usr/local/bin/obabel', fsFile, '-o', 'smi', '-s', smiles, '-at0.85'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

The output is in the following format

'1 molecule converted\n&amp;rsquo;

1 2	'1 molecule converted\n&rsquo;

We now parse the output, add it to the table and update the display.

Since we are doing a separate search for each row in the table this might be slow for large datasets.

The Vortex Script

# This uses the OpenBabel fastsearch (http://openbabel.org/wiki/Tutorial:Fingerprints)


import sys
import com.dotmatics.vortex.util.Util as Util

if Util.getPlatform() == Util.PlatformIsWindows:
  sys.path.append(vortex.getVortexFolder() + '\\modules\\jythonconsole')
  sys.path.append(vortex.getVortexFolder() + '\\modules\\jythonlib')
else:
  sys.path.append(vortex.getVortexFolder() + '/modules/jythonconsole')
  sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import urllib2
import urllib
import subprocess

# Open a dialog to choose a file
# getFile(title, extensions, 0 = Open, 1 = Save)
# vortex will keep track of the last folder you looked in etc

file = vortex.getFile("Choose a fastsearch file", &#91;".fs"], 0)
#display file path
if file:
    vortex.alert(str(file))


# Need to coerce file path  
fsFile=file.getAbsolutePath() 


#   /usr/local/bin/babel '/Users/swain/Projects/FragFiles/sdf/allFragments.fs'  -osmi  -xt -s'NC(=N)c1ccccc1' 



column = vtable.findColumnWithName('NumSim', 1, 3)
mfm = vtable.getMolFileManager()
rows = vtable.getRealRowCount()
for r in range(0, rows):
    smiles = vortex.getMolProperty(mfm.getMolFileAtRow(r), 'SMILES')
    p = subprocess.Popen(&#91;'/usr/local/bin/obabel', fsFile, '-o', 'smi', '-s',  smiles, '-at0.85'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# for stdout output = p.communicate()&#91;0]

# stderr

    output = p.communicate()&#91;1]

    # output format '1 molecule converted\n&amp;rsquo;

# Parse the output


    vals= output.split(' ')

    column = vtable.findColumnWithName('NumSim', 0)     
    column.setValueFromString(r, vals&#91;0])


vtable.fireTableStructureChanged()

# This uses the OpenBabel fastsearch (http://openbabel.org/wiki/Tutorial:Fingerprints)

import sys

import com.dotmatics.vortex.util.Util as Util

if Util.getPlatform() == Util.PlatformIsWindows:

sys.path.append(vortex.getVortexFolder() + '\\modules\\jythonconsole')

sys.path.append(vortex.getVortexFolder() + '\\modules\\jythonlib')

else:

sys.path.append(vortex.getVortexFolder() + '/modules/jythonconsole')

sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import urllib2

import urllib

import subprocess

# Open a dialog to choose a file

# getFile(title, extensions, 0 = Open, 1 = Save)

# vortex will keep track of the last folder you looked in etc

file = vortex.getFile("Choose a fastsearch file", [".fs"], 0)

#display file path

if file:

vortex.alert(str(file))

# Need to coerce file path

fsFile=file.getAbsolutePath()

# /usr/local/bin/babel '/Users/swain/Projects/FragFiles/sdf/allFragments.fs' -osmi -xt -s'NC(=N)c1ccccc1'

column = vtable.findColumnWithName('NumSim', 1, 3)

mfm = vtable.getMolFileManager()

rows = vtable.getRealRowCount()

for r in range(0, rows):

smiles = vortex.getMolProperty(mfm.getMolFileAtRow(r), 'SMILES')

p = subprocess.Popen(['/usr/local/bin/obabel', fsFile, '-o', 'smi', '-s', smiles, '-at0.85'], stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# for stdout output = p.communicate()[0]

# stderr

output = p.communicate()[1]

# output format '1 molecule converted\n&rsquo;

# Parse the output

vals= output.split(' ')

column = vtable.findColumnWithName('NumSim', 0)

column.setValueFromString(r, vals[0])

vtable.fireTableStructureChanged()

The vortex script can be downloaded from here

FindNumSimilar.vpy Download

Last Updated 7 May 2013

Vortex script using Using OpenBabel fastsearch

Using OpenBabel

Script Explanation

The Vortex Script

Related Posts

Selecting random clusters from a large dataset in Vortex

Using ChemDraw as input for Boltz docking