UniChem is a web resource provided by the EBI, it is a ‘Unified Chemical Identifier’ system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between multiple databases. Currently the UniChem contains data from 27 different data sources. Currently UniChem provides links to 108,941,995 structures.
Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI: http://dx.doi.org/10.1186/1758-2946-5-3
Whilst I’ve written about a script to search using InChi keys it is also possible to search using compound identifiers.
ChEMBL also provide a RESTful Web service that users can use to retrieve data from the UniChem database in a programmatic fashion.
All RESTful queries are constructed using the following base url
1 2 |
https://www.ebi.ac.uk/unichem/rest/ |
Specific query urls are then constructed by adding a method name to this base url, followed by input data.
Input data may consist of three types
1 2 3 4 |
src_compound_id (the molecule identifier) src_id (the number for the datasource, ChEMBL is 1) InChIKey |
Since the different datasources will have different molecule identifiers for the same molecule it is important to have both the ID and the corresponding datasource.
Since we have the ChEMBLID our URL will have the form
1 2 |
https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL1089/1 |
By default the data is returned in JSON format, with the key-value pairs being, datasource and the compound ID.
1 2 |
#Data format [{"src_id":"1","src_compound_id":"CHEMBL1089"},{"src_id":"2","src_compound_id":"DB00780"},{"src_id":"4","src_compound_id":"7266"},{"src_id":"6","src_compound_id":"C07430"},{"src_id":"7","src_compound_id":"8060"},{"src_id":"8","src_compound_id":"SAM002589985"},{"src_id":"10","src_compound_id":"1987170"},{"src_id":"11","src_compound_id":"F484C6DCFFC08118224D7D07C06DD841"},{"src_id":"14","src_compound_id":"O408N561GF"},{"src_id":"15","src_compound_id":"SCHEMBL34335"},{"src_id":"17","src_compound_id":"PA450903"},{"src_id":"18","src_compound_id":"HMDB14918"},{"src_id":"21","src_compound_id":"15297289"},{"src_id":"22","src_compound_id":"3675"},{"src_id":"23","src_compound_id":"MCULE-2911295500"},{"src_id":"25","src_compound_id":"LSM-5928"},{"src_id":"26","src_compound_id":"51-71-8"},{"src_id":"29","src_compound_id":"J4.125D"},{"src_id":"31","src_compound_id":"50105417"}] |
The first part of the script asks the user to select the column contains the ChEMBLID, then we create the columns. Then we loop through the workspace calling the web service for each ID, parse the returned JSON and populate the workspace as shown below.
It should be straightforward to modify the script to search any of the datasources with the appropriate list of molecule identifiers.
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
#Use ChEMBLid to search using Unichem to get all data #http://www.macinchem.org #All rights reserved. # Python imports import urllib2 import urllib from com.xhaus.jyson import JysonCodec as json # Vortex imports import com.dotmatics.vortex.util.Util as Util import com.dotmatics.vortex.mol2img.jni.genImage as genImage import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img import jarray import binascii import string import os input_label = swing.JLabel("ChEMBLid column (for input)") input_cb = workspace.getColumnComboBox() panel = swing.JPanel() layout.fill(panel, input_label, 0, 0) layout.fill(panel, input_cb, 1, 0) ret = vortex.showInDialog(panel, "Choose ChEMBLid column") if ret == vortex.OK: input_idx = input_cb.getSelectedIndex() if input_idx == 0: vortex.alert("you must choose a column") else: chosen_col = vtable.getColumn(input_idx - 1) #col names from here https://www.ebi.ac.uk/unichem/ucquery/listSources cols = { '2': vtable.findColumnWithName('Drugbank', 1), #2 '3': vtable.findColumnWithName('PBD', 1), #3 '4': vtable.findColumnWithName('Guide to Pharm', 1), #4 '5': vtable.findColumnWithName('Drugs of the Future', 1), #5 '6': vtable.findColumnWithName('Kegg Ligand', 1), #6 '7': vtable.findColumnWithName('ChEBI', 1), #7 '8': vtable.findColumnWithName('NIH Clinical', 1), #8 '9': vtable.findColumnWithName('ZINC', 1), #9 '10': vtable.findColumnWithName('eMolecules', 1), #10 '11': vtable.findColumnWithName('IBM IP', 1), #11 '12': vtable.findColumnWithName('Gene Expression', 1), #12 '14': vtable.findColumnWithName('NFDA Substance', 1), #14 '15': vtable.findColumnWithName('SureChEMBL Patents', 1), #15 '17': vtable.findColumnWithName('PharmGKB', 1), #17 '18': vtable.findColumnWithName('Human Metab', 1), #18 '20': vtable.findColumnWithName('Selleck', 1), #20 '21': vtable.findColumnWithName('Thomson Pharma', 1), #21 '22': vtable.findColumnWithName('Pubchem', 1), #22 '23': vtable.findColumnWithName('Mcule', 1), #23 '24': vtable.findColumnWithName('NMR shift DB', 1), #24 '25': vtable.findColumnWithName('Networks', 1), #25 '26': vtable.findColumnWithName('Toxicology Resource', 1), #26 '27': vtable.findColumnWithName('Human Metab', 1), #27 '28': vtable.findColumnWithName('MolPort', 1), #28 '29': vtable.findColumnWithName('Japanese Chemicals', 1), #29 '31': vtable.findColumnWithName('BindingDB', 1), #31 } rows = vtable.getRealRowCount() for r in range(0, int(rows)): chembl_id = chosen_col.getValueAsString(r) # "https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL1089/1" api_url = 'https://www.ebi.ac.uk/unichem/rest/src_compound_id/%s/1' % chembl_id try: molecule_record = urllib2.urlopen(api_url).read() except urllib2.HTTPError: continue j = json.loads(molecule_record) for entry in j: src_id = entry['src_id'] if src_id in cols: cols[src_id].setValueFromString(r, entry['src_compound_id']) vtable.fireTableStructureChanged() |
The script can be downloaded from here
Page Updated 15 February 2016
2 thoughts on “Vortex script for Getting UniChem data from ChEMBL”
Comments are closed.