Selecting random clusters from a large dataset in Vortex

When making selections from large datasets it is worth mentioning that as datasets get larger a simple random selection is often the best (and quickest) choice.

Worth reading, Relationships between Molecular Complexity, Biological Activity, and Structural Diversity http://pubs.acs.org/doi/abs/10.1021/ci0503558

None of the diversity selection methods studied, namely OptiSim, divisive K-means clustering, and self-organizing maps, yielded subsets covering the activity space of the IC50 summary data set better than subsets selected randomly

I’ve previously described a random selection script https://macinchem.org/2023/03/11/vortex-script-to-make-a-random-selection/ However, sometimes when selecting molecules for a screening collection you might want to have a few similar analogues to your randomly selected molecules to give an early indication of SAR.

This script first makes a user defined random selection, and then selects closest analogues for each molecule in the random selection.

The first part of the script asks the user to chose the number of randomly selected molecules, these are then selected. The second dialog then asks how many should be in each cluster, and then performs a similarity search for each of the randomly selected compounds. The results are shown below in which the Cluster Number refers to the row number of the initial selected compounds in the table (which are highlighted).

The script can be downloaded here.

Related Posts