Vortex script to flag potential aggregators

Promiscuous inhibition caused by small molecule aggregation is a major source of false positive results in high-throughput screening. To mitigate this, use of a nonionic detergent such as Triton X-100 or Tween-80 has been studied, which can disrupt aggregates, and is now common in screening campaigns DOI.

Three key results emerge from this study: first, detergent-dependent identification of aggregate-based inhibition is feasible on the large scale. Second, 95% of the actives obtained in this screen are aggregate-based inhibitors. Third, aggregate-based inhibition is correlated with steep dose-response curves, although not absolutely. 

Physicochemical Properties of Aggregators

A recent particularly valuable publication, Irwin, Duan, Torosyan, Doak, Ziebart, Sterling, Tumanian and Shoichet, J Med Chem, 2015, 58(1 7), 7076-7087 DOI, has collated over 12,000 organic molecules known to act as aggregators at concentrations used in screening campaigns, and provides a resource Aggregation Advisor that can be used to try and predict possible false positives. However in many instances it would be unwise to submit proprietary information to the public web service. Potential aggregators are flagged based on calculated LogP >3 and/or similarity >0.85 to a known aggregator (using path based fingerprint) this script calculates xLogP using the algorithm provided by Dotmatics and then uses OpenBabel fast search to calculate the closest similarity to a known aggregator.

OpenBabel Fastsearch

OpenBabel is a chemical toolbox designed to speak the many languages of chemical data. It’s an open, collaborative project allowing anyone to search, convert, analyse, or store data from molecular modelling, chemistry, solid-state materials, biochemistry, or related areas. One of the ready made applications is the fastsearch utility. This uses molecular fingerprints to prepare and search an index of a multi-molecule datafile. It allows very fast substructure and structural similarity searching. The indexing is a slow process (~30 minutes for a 250,000 molecule file) but the subsequent searching is much faster, a few seconds, and so can be done interactively.

OpenBabel supports a number of different fingerprints but the linear fingerprints FP2 are similar to those used in the publication.

There is a comprehensive tutorial describing Openbabel fastsearch available online.

The authors very generously provide a file containing all the aggregators that can be downloaded from here http://advisor.bkslab.org/faq/, they were downloaded as SMILES strings but I converted them into sdf format as a simple means to check all SMILES were valid.

We first need to create the fast search index.

You will need to make a note of the path to the .fs file to include in the script below. You can then test the search using the command shown below. It should return the SMILES string, ID and similarity for the most similar molecule.

We can now use this in the Vortex script.

The first part of the script calculates XLogP DOI as implemented within Vortex. Then the script loops through all the records in the workspace, generates the SMILES string for an individual record and then uses the OpenBabel fast search to find the most similar structure amongst the known aggregators. The last part of the script parses the output because the data returned contains a mixture of tab and space delimiters. Then the table is populated.

The final part of the script categorises compounds based on XLogP and similarity.

The Vortex Script

The Result

The result is shown below, the workspace now contains the calculated LogP (xLogP), the structure, ID, Tanimoto coef of the most similar known aggregator (sim SMILES, ID, SimSore) and aggregation category (Agg Score).

The script can be downloaded from here 

Update

A profile of the physicochemical properties (HBD, HBA, PSA, HAC, LogP, LogD, MWt, RBC) was generated using an Applescript that uses evaluate from ChemAxon to calculate the physicochemical properties and Aabel to construct the histograms. I also used it to determine pKa in order to identify acidic or basic groups and categorized the aggregators accordingly, in addition I calculated the fraction of aromatic atoms (number of aromatic atoms/number of heavy atoms). I’ve also included npri (Normalized ratio of principle moments of inertia) as described by Sauer WH, Schwarz MK (2003) Molecular shape diversity of combinatorial libraries: A prerequisite for broad bioactivity. J Chem Inf Comput Sci 43:987–10030. DOI this was calculated using MOE. One very striking feature of this analysis is the lack of ionisable groups observed in the aggregator compounds. Whilst the majority of aggregators do have a calculated LogP >3, however because of the lack of ionisable groups it might be better to use a cut off of LogD >3.

An alternative script that uses LogD can be downloaded from here 

Page Updated 5 Novenber 2015

Related Posts