UniChem is a web resource provided by the EBI, it is a ‘Unified Chemical Identifier’ system, designed to assist in the rapid cross-referencing of chemical structures, and their identifiers, between multiple databases. Currently the UniChem contains data from 27 different data sources. Currently UniChem provides links to 108,941,995 structures.
Chambers, J., Davies, M., Gaulton, A., Hersey, A., Velankar, S., Petryszak, R., Hastings, J., Bellis, L., McGlinchey, S. and Overington, J.P. UniChem: A Unified Chemical Structure Cross-Referencing and Identifier Tracking System. Journal of Cheminformatics 2013, 5:3 (January 2013). DOI: http://dx.doi.org/10.1186/1758-2946-5-3
The previous script showed how to search using ChEMBLID, however one of the attractions of UniChem is that you can search with any molecule identifier if you know the corresponding datasource. This script allows the user to use any molecule identifiers and then search a specified datasource using a common web service.
The first part of the script populates a dialog box that allows the user to select both the column contains the molecule id and the datasource that is to be searched.
All RESTful queries are constructed using the following base url
1 2 |
https://www.ebi.ac.uk/unichem/rest/ |
Specific query urls are then constructed by adding a method name to this base url, followed by input data.
Input data may consist of three types
1 2 3 4 |
src_compound_id (the molecule identifier) src_id (the number for the datasource, ChEMBL is 1) InChIKey |
If the column contained ChEMBLID the URL would have the form,
1 2 |
https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL1089/1 |
for other datasources we just need the src_id.
One slight complication is that whilst there are 27 datasources the numbering for the datasources goes up to 31. This is because 13, 16, 19 and 30 are missing. So whilst we can get the index position of the datasource.
1 2 |
input_dbx = input_db.getSelectedIndex() |
This does might correspond to the number of the src_id required for the URL, so we need to have a list of datasource numbers
1 2 |
datasourceNumbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '14', '15', '17', '18', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '31'] |
and then use the index position of the datasource to get the src_id
1 2 |
chosen_db = datasourceNumbers[input_dbx] |
We can then construct the url
1 2 |
api_url = 'https://www.ebi.ac.uk/unichem/rest/src_compound_id/%s/%s' % (chembl_id, chosen_db) |
The rest of the script is similar to the previous version.
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
#Flexible Unichem search to get all ID #Authored by Chris Swain (http://www.macinchem.org) #All rights reserved. # Python imports import urllib2 import urllib from com.xhaus.jyson import JysonCodec as json # Vortex imports import com.dotmatics.vortex.util.Util as Util import com.dotmatics.vortex.mol2img.jni.genImage as genImage import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img import jarray import binascii import string import os columnNames = vtable.getColumnNames() datasources = ['ChEMBL', 'Drugbank', 'PDB', 'Guide to Pharm', 'Drugs of the Future', 'Kegg Ligand', 'ChEBI', 'NIH Clinical', 'ZINC', 'eMolecules', 'IBM IP', 'Gene Expression', 'NFDA Substance', 'SureChEMBL Patents', 'PharmGKB', 'Human Metab', 'Selleck', 'Thomson Pharma', 'Pubchem', 'Mcule', 'NMR shift DB', 'Networks', 'Toxicology Resource', 'Human Metab', 'MolPort', 'Japanese Chemicals', 'BindingDB'] datasourceNumbers = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '14', '15', '17', '18', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '31'] input_label = swing.JLabel("ID column (for input)") input_cb = javax.swing.JComboBox(columnNames) input_db_label = swing.JLabel("Datasource (for input)") input_db = javax.swing.JComboBox(datasources) panel = swing.JPanel() layout.fill(panel, input_label, 0, 0) layout.fill(panel, input_cb, 1, 0) layout.fill(panel, input_db_label, 0, 2) layout.fill(panel, input_db, 1, 2) ret = vortex.showInDialog(panel, "Choose ID column and Datasource") if ret == vortex.OK: input_idx = input_cb.getSelectedIndex() input_dbx = input_db.getSelectedIndex() #vortex.alert(input_dbx) if input_idx == 0: vortex.alert("you must choose a column") else: chosen_col = vtable.getColumn(input_idx ) chosen_db = datasourceNumbers[input_dbx] #vortex.alert(chosen_db) #col names from here https://www.ebi.ac.uk/unichem/ucquery/listSources cols = { '1': vtable.findColumnWithName('ChEMBL', 1), #1 '2': vtable.findColumnWithName('Drugbank', 1), #2 '3': vtable.findColumnWithName('PBD', 1), #3 '4': vtable.findColumnWithName('Guide to Pharm', 1), #4 '5': vtable.findColumnWithName('Drugs of the Future', 1), #5 '6': vtable.findColumnWithName('Kegg Ligand', 1), #6 '7': vtable.findColumnWithName('ChEBI', 1), #7 '8': vtable.findColumnWithName('NIH Clinical', 1), #8 '9': vtable.findColumnWithName('ZINC', 1), #9 '10': vtable.findColumnWithName('eMolecules', 1), #10 '11': vtable.findColumnWithName('IBM IP', 1), #11 '12': vtable.findColumnWithName('Gene Expression', 1), #12 '14': vtable.findColumnWithName('NFDA Substance', 1), #13 '15': vtable.findColumnWithName('SureChEMBL Patents', 1), #14 '17': vtable.findColumnWithName('PharmGKB', 1), #15 '18': vtable.findColumnWithName('Human Metab', 1), #16 '20': vtable.findColumnWithName('Selleck', 1), #17 '21': vtable.findColumnWithName('Thomson Pharma', 1), #18 '22': vtable.findColumnWithName('Pubchem', 1), #19 '23': vtable.findColumnWithName('Mcule', 1), #20 '24': vtable.findColumnWithName('NMR shift DB', 1), #21 '25': vtable.findColumnWithName('Networks', 1), #22 '26': vtable.findColumnWithName('Toxicology Resource', 1), #23 '27': vtable.findColumnWithName('Human Metab', 1), #24 '28': vtable.findColumnWithName('MolPort', 1), #25 '29': vtable.findColumnWithName('Japanese Chemicals', 1), #26 '31': vtable.findColumnWithName('BindingDB', 1), #27 } rows = vtable.getRealRowCount() for r in range(0, int(rows)): chembl_id = chosen_col.getValueAsString(r) #vortex.alert(chembl_id) # "https://www.ebi.ac.uk/unichem/rest/src_compound_id/CHEMBL1089/1" #Need to add chosen_db api_url = 'https://www.ebi.ac.uk/unichem/rest/src_compound_id/%s/%s' % (chembl_id, chosen_db) try: molecule_record = urllib2.urlopen(api_url).read() except urllib2.HTTPError: continue j = json.loads(molecule_record) for entry in j: src_id = entry['src_id'] if src_id in cols: cols[src_id].setValueFromString(r, entry['src_compound_id']) vtable.fireTableStructureChanged() |
The script can be downloaded from here
Page Updated 15 February 2016
One thought on “Vortex script for flexible search using Un1chem”
Comments are closed.