Vortex script to count Identical structures in two datasets

Sometimes I have two datasets and I just want to know the overlap of identical structures. This script counts the number of identical structures by comparing InChIKeys. We start by reading two files into separate workspaces.

he next part of the script generates the InChiKey for each molecule in both workspaces. We then check for duplicates first in each table, and then for duplicates between the tables. A new workspace is then generated with the results as shown below.

The figure in the top left of the Matrix (1137) is the number of unique structures there are in “PublishedFragments”, the number in the bottom right (1500) is the number of unique structures in DiverseFragmentLibrary. In the case of “PublishedFragments” this is actually less than the numbers in the workspace, this is because there are a number of duplicate structures in that file. The figure of 120 corresponds to how many identical structures there are between the two datasets.

The Vortex Script

The script can be downloaded here 


Last updated 7 March 2018

Related Posts