Un1Chem is a new web resource provided by the EBI, it is a ‘Unified Chemical Identifier’ system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between databases. Currently the uniChem contains data from 21 different data sources:-

  • ChEMBL
  • DrugBank
  • PDBe (Protein Data Bank Europe)
  • International Union of Basic and Clinical Pharmacology
  • PubChem (‘Drugs of the Future’ subset)
  • KEGG (Kyoto Encyclopedia of Genes and Genomes) Ligand
  • ChEBI (Chemical Entities of Biological Interest).
  • NIH Clinical Collection
  • ZINC
  • eMolecules
  • IBM strategic IP insight platform and the National Institutes of Health
  • Gene Expression Atlas
  • IBM strategic IP insight platform and the National Institutes of Health.
  • FDA/USP Substance Registration System (SRS)
  • SureChem
  • PharmGKB
  • Human Metabolome Database (HMDB)
  • Selleck
  • PubChem (‘Thomson Pharma’ subset)
  • PubChem Compounds
  • Mcule

Un1Chem’s primary function is to maintain cross references between EBI chemistry resources. These include primary chemistry resources (ChEMBL, ChEBI and PDBeChem), and other resources where the main focus is not small molecules, but which may nevertheless contain some small molecule information (eg: Gene Expression Atlas). When I last looked UniChem contained 62,187,830 structures.

Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI.

Searching uses either the source compound Id, InChI or InChI Key and has the following format

More details can be found here.

The InChIKey is a short, fixed-length character signature based on a hash code of the InChI string. By definition, hashing is a one-way conversion procedure and the original structure cannot be restored from the InChiKey allowing confidential searching, however it should be noted that this script does search a public resource.

Recently Sune Askjær contacted me about a Vortex script he had written and he has kindly agreed for this to be described on this site. This script has a couple of dependences, it requires jyson-1.0.2.jar, which can be downloaded from here the jar file should be placed in [profile]/vortex/libs from where it will be imported automatically.

It also requires the file unichem_linkpagetemplate.html to be placed in a folder called “unichem’” that needs to be created your Vortex folder. The path to the template file will thus be

As shown in the image below.

If you now import an sdf file into a Vortex workspace and run the script you are first presented with a dialog asking which data sources are of interest and whether to store the InChIKey in the table in the workspace. It should be noted that this scripts searches and external public resource, whilst only InChiKeys are used you may want to check that this is acceptable for your institution.

Once you click “OK” the script works through the table generating InChIKeys and submitting them to Un1Chem web service. A new column is created containing the InChIKey if requested, and a column indicating how many data source hits were obtained together with a link to a locally generated web page (stored in the temporary item folder) containing all the information.

If you now click on the hypertext link the corresponding page will open in your web browser.

Vortex Script

The script is shown below and a couple of points are worth commenting on. One of the issues of writing scripts like this is dealing with the different file paths used by Mac OSX, Linux and Windows, this can be resolved using, as suggested by Matt,

Most of the scripts I’ve created have little or no user interface, in this script the java swing GUI toolkit is used to create a sophisticated dialog box that allows the user to customise the input. The data from Un1chem is parsed using the json library and then inserted into the html template file. It should be very easy to customise the template file if needed.

Update

There was a minor bug in the above script, if there were blank cells in the workspace the script would fail.

If you replace the line 97 reading:

with the 4 lines

The script now functions fine.

The updated script, template file and installation instructions can be downloaded from here

Page Updated 20 March 2014

Related Posts