One thing I’ve needed to do a couple of times recently is give an idea of how many similar compounds are available to the set of compounds I’m currently viewing. For example in designing a fragment library it is very useful to know for a particular fragment how many similar fragments are commercially available. Or when looking at the results of a high-throughput screen how many similar analogues to a particular hit were also screened.
To do this we need a way of doing a rapid similarity search of the reference database. I use OpenBabel in particular using the fast search capability with molecular fingerprints.
In Scripting Vortex 13 there is a new script to do this.
There are many more scripts and hints here.
In the previous tutorial we made use of the Virtual Computational Chemistry Laboratory web service to calculate aLogP and LogS, both these results were returned in a simple text format. More recently there has been an increased use of JSON format for data exchange.
Molinspiration provide a number of cheminformatics tools but also provide a RESTful web service these web services can be used to calculate a range of molecular properties and bioactivity predictions.
The output from both web services is available either as a JSON string or plain text, the web service can be accessed by submitting a URL
I’ve just added the latest script for Vortex.
In previous scripts we have generated data using a local Java program, C program, PERL script, and SVL program. In this tutorial rather than have a local application generate the data we will use a web service.
There are more scripts on the Hints and Tutorial pages.
Vortex is an advanced data analysis package that understands chemistry, the capabilities of Vortex can be extended by the use of scripts. I’ve now created Vortex script exchange that users can use to download or share scripts.
There are also a series of scripting tutorials here to provide a starting point for creating new scripts.
Hopefully these scripts will be valuable to you.
I recently wrote a review of ForgeV10 in which I imported the results into Vortex for analysis. This works fine the only issue being the resulting structures are 3D which makes interpretation of the structure sometimes difficult to discern, this script uses OpenBabel to create SMILES which can be rendered as 2D images.
One of the critical activities of most drug discovery programs is the identification of novel leads, these hits can come from high throughput screening or fragment-based screening There is however great interest in virtual screening which allows the evaluation in silico of a vast number of compounds and the selection of a subset that have a greater chance of desired activity. The virtual screening can be achieved by searching using sub-structures or molecular descriptors, by docking potential ligands into the target protein and scoring the resulting docked pose, or by comparing with the shape and/or electrostatic map of a known ligand.
Shape-it is a tool developed by Silicos-it that aligns a reference molecule against a set of database molecules using the shape of the molecules as the align criterion. It is based on the use of Gaussian volumes as descriptor for molecular shape as it was introduced by Grant, J.A.; Gallardo, M.A.; Pickup, B.T. (1996) ‘A fast method of molecular shape comparison: a simple application of a Gaussian description of molecular shape’,J. Comp. Chem. 17, 1653-1666.
This script shows how to run shape-it from within Vortex, bringing in the shape matching scores for filtering and analysis.
I’ve just added a new Vortex script, this one uses a PERL script that is part of the excellent MayaChemTools.
Scripting Vortex Using OpenBabel
Scripting Vortex 2 Using filter-it
Scripting Votrex 3 Using cxcalc
Scripting Vortex 4 Using MOE
Scripting Vortex 5 Calculating similarities using OpenBabel
Scripting Vortex 6 Filtering compounds
Scripting Vortex 7 Using MayaChemTools
I’ve just added another Vortex script. In this script we will make use of the ability of filter-it to categorise input molecules into 1) a set of molecules that fulfil all criteria as defined in the filter definition file (passed molecules), and 2) a set of molecules that do not fulfil at least one of the defined filter criteria (failed molecules). The filter file defines the criteria for acceptable calculated phisicochemical properties and also any substructures that should be included or excluded during the filtering. The filter file is a simple text file that users can define for themselves, there is a detailed explanation on the silicos-it website. They also provide several example filters “Leadlike”, “Druglike”, “CMCLike” and “Clean” which cleans up a file without imposing a “drug like” filter. It should be relatively straight-forward for users to create their own filters, one could imagine a rule-of-3 filter that might be used in fragment-based screening approaches, or a toxicphore filter based on SMARTS shown to be implicated in a specific toxicity. It might also be possible to define project specific filters if a project requires a specific profile. If you need help it might be worth contacting Silicos-it.
I’ve just posted the latest tutorial on scripting the chemically intelligent spreadsheet application Vortex, this tutorial shows how to use OpenBabel to provide similarity searching.
The full list of Vortex scripting tutorials are shown below.
More hints and tutorials can be found here.
This might be of interest.
Dotmatics is looking to expand the team working on Vortex, its data analysis platform. The candidate should have several years software development experience with Java and preferably with the Swing graphical user interface toolkit. The ideal candidate will have a degree or PhD in the life sciences, and will have experience with data visualisation and analysis techniques such as clustering. Experience with cheminformatics systems or statistical software, such as R, will be advantageous. Candidates will probably have experience working within the pharmaceutical/biotech sector or the life science software development industry.
The position will be based at the UK headquarters in Bishops Stortford (Herts, UK). We offer a competitive salary, benefits and a pleasant working environment at the Old Monastery site. Further information about the company and our software can be found at http://www.dotmatics.com.
This is the fourth tutorial on scripting Vortex a chemically intelligent data visualisation package. In the previous tutorials we have looked at getting data from OpenBabel, sieve, and cxcalc in this tutorial we will be using MOE as the compute engine. MOE from Chemical Computing Group is probably best known as a graphical user interface to a suite of computational chemistry tools, whilst this is indubitably the means by which many users will interact with the program it is worth finding out about the command-line tools that are available. These tools are often accessed by pipeline tools such as Knime to allow rapid processing of large files. CCG provides four very useful command-line tools in particular sddesc allows the calculation of some or all of the MOE molecular descriptors for each molecular entry.
The Vortex Scripts
Whilst Vortex has tools that allow you to do some analysis and of course you can use the scripting facility to access statistical or model building packages like R in this tutorial we will be using a model taken from the literature and implementing it within Vortex using a calculation field to construct the algorithm.
ChemAxon's Calculator (cxcalc) is a really useful command line program in Marvin Beans and JChem that performs chemical calculations using calculator plugins. There are a lot of calculations provided by ChemAxon (e.g. charge, pKa, logP, logD), and others can be added by writing custom plugins, perhaps one of the most useful is the ability to calculate the acidic and basic pKa. Calculation of pKa is essential to get a reasonable hold on the LogD of a molecule. LogD is probably the most critical physicochemical property in drug discovery, it has a major influence on absorption, cell penetration, metabolism, CYP450 inhibition and induction, PGP transporter activity and activity at the HERG channel, and is often a critical component of any structure activity relationship.
These scripts make use of cxcalc to generate data columns in Vortex
This is the second page on scripting Vortex, on the first page I described how to use OpenBabel to calculate a limited selection of chemical properties. In this script we will use one of the brilliant tools from silicos.
SIEVE is a program for filtering out molecules with unwanted properties. It is based on the Open Babel open source C++ API for rapid calculation of 45 different molecular properties.