UniChem is a web resource provided by the EBI, it is a ‘Unified Chemical Identifier’ system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between multiple databases. Currently the UniChem contains data from 27 different data sources. Currently UniChem provides links to 108,941,995 structures.
Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI: http://dx.doi.org/10.1186/1758-2946-5-3
The previous script showed how to search using ChEMBLID, however one of the attractions of UniChem is that you can search with any molecule identifier if you know the corresponding datasource. This script allows the user to use any molecule identifiers and then search a specified datasource using a common web service.
The first part of the script populates a dialog box that allows the user to select both the column contains the molecule id and the datasource that is to be searched.

All RESTful queries are constructed using the following base url
https://www.ebi.ac.uk/unichem/rest/
Specific query urls are then constructed by adding a method name to this base url, followed by input data.
Input data may consist of three types
src_compound_id (the molecule identifier)
src_id (the number for the datasource, ChEMBL is 1)
InChIKey
If the column contained ChEMBLID the URL would have the form,
https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL1089/1
for other datasources we just need the src_id.
One slight complication is that whilst there are 27 datasources the numbering for the datasources goes up to 31. This is because 13, 16, 19 and 30 are missing. So whilst we can get the index position of the datasource.
input_dbx = input_db.getSelectedIndex()
This does might correspond to the number of the src_id required for the URL, so we need to have a list of datasource numbers
datasourceNumbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '14', '15', '17', '18', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '31']
and then use the index position of the datasource to get the src_id
chosen_db = datasourceNumbers[input_dbx]
We can then construct the url
api_url = 'https://www.ebi.ac.uk/unichem/rest/src_compound_id/%s/%s' % (chembl_id, chosen_db)
The rest of the script is similar to the previous version.

The Vortex Script
#Flexible Unichem search to get all ID
#Authored by Chris Swain (http://www.macinchem.org)
#All rights reserved.
# Python imports
import urllib2
import urllib
from com.xhaus.jyson import JysonCodec as json
# Vortex imports
import com.dotmatics.vortex.util.Util as Util
import com.dotmatics.vortex.mol2img.jni.genImage as genImage
import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img
import jarray
import binascii
import string
import os
columnNames = vtable.getColumnNames()
datasources = ['ChEMBL', 'Drugbank', 'PDB', 'Guide to Pharm', 'Drugs of the Future', 'Kegg Ligand', 'ChEBI', 'NIH Clinical', 'ZINC', 'eMolecules', 'IBM IP', 'Gene Expression', 'NFDA Substance', 'SureChEMBL Patents', 'PharmGKB', 'Human Metab', 'Selleck', 'Thomson Pharma', 'Pubchem', 'Mcule', 'NMR shift DB', 'Networks', 'Toxicology Resource', 'Human Metab', 'MolPort', 'Japanese Chemicals', 'BindingDB']
datasourceNumbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '14', '15', '17', '18', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '31']
input_label = swing.JLabel("ID column (for input)")
input_cb = javax.swing.JComboBox(columnNames)
input_db_label = swing.JLabel("Datasource (for input)")
input_db = javax.swing.JComboBox(datasources)
panel = swing.JPanel()
layout.fill(panel, input_label, 0, 0)
layout.fill(panel, input_cb, 1, 0)
layout.fill(panel, input_db_label, 0, 2)
layout.fill(panel, input_db, 1, 2)
ret = vortex.showInDialog(panel, "Choose ID column and Datasource")
if ret == vortex.OK:
input_idx = input_cb.getSelectedIndex()
input_dbx = input_db.getSelectedIndex()
#vortex.alert(input_dbx)
if input_idx == 0:
vortex.alert("you must choose a column")
else:
chosen_col = vtable.getColumn(input_idx )
chosen_db = datasourceNumbers[input_dbx]
#vortex.alert(chosen_db)
#col names from here https://www.ebi.ac.uk/unichem/ucquery/listSources
cols = {
'1': vtable.findColumnWithName('ChEMBL', 1), #1
'2': vtable.findColumnWithName('Drugbank', 1), #2
'3': vtable.findColumnWithName('PBD', 1), #3
'4': vtable.findColumnWithName('Guide to Pharm', 1), #4
'5': vtable.findColumnWithName('Drugs of the Future', 1), #5
'6': vtable.findColumnWithName('Kegg Ligand', 1), #6
'7': vtable.findColumnWithName('ChEBI', 1), #7
'8': vtable.findColumnWithName('NIH Clinical', 1), #8
'9': vtable.findColumnWithName('ZINC', 1), #9
'10': vtable.findColumnWithName('eMolecules', 1), #10
'11': vtable.findColumnWithName('IBM IP', 1), #11
'12': vtable.findColumnWithName('Gene Expression', 1), #12
'14': vtable.findColumnWithName('NFDA Substance', 1), #13
'15': vtable.findColumnWithName('SureChEMBL Patents', 1), #14
'17': vtable.findColumnWithName('PharmGKB', 1), #15
'18': vtable.findColumnWithName('Human Metab', 1), #16
'20': vtable.findColumnWithName('Selleck', 1), #17
'21': vtable.findColumnWithName('Thomson Pharma', 1), #18
'22': vtable.findColumnWithName('Pubchem', 1), #19
'23': vtable.findColumnWithName('Mcule', 1), #20
'24': vtable.findColumnWithName('NMR shift DB', 1), #21
'25': vtable.findColumnWithName('Networks', 1), #22
'26': vtable.findColumnWithName('Toxicology Resource', 1), #23
'27': vtable.findColumnWithName('Human Metab', 1), #24
'28': vtable.findColumnWithName('MolPort', 1), #25
'29': vtable.findColumnWithName('Japanese Chemicals', 1), #26
'31': vtable.findColumnWithName('BindingDB', 1), #27
}
rows = vtable.getRealRowCount()
for r in range(0, int(rows)):
chembl_id = chosen_col.getValueAsString(r)
#vortex.alert(chembl_id)
# "https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL1089/1"
#Need to add chosen_db
api_url = 'https://www.ebi.ac.uk/unichem/rest/src_compound_id/%s/%s' % (chembl_id, chosen_db)
try:
molecule_record = urllib2.urlopen(api_url).read()
except urllib2.HTTPError:
continue
j = json.loads(molecule_record)
for entry in j:
src_id = entry['src_id']
if src_id in cols:
cols[src_id].setValueFromString(r, entry['src_compound_id'])
vtable.fireTableStructureChanged()
The script can be downloaded from here
Page Updated 15 February 2016
One thought on “Vortex script for flexible search using Un1chem”
Comments are closed.