ChEMBL is a manually curated chemical database of bioactive molecules . It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL), based at the Wellcome Trust Genome Campus, Hinxton, UK. The database currently contains over 1.4 million unique structures with the associated activity at 10,579 different targets. It also acts as a repository for Open Access primary screening and medicinal chemistry data directed at neglected diseases.
Whilst the database can be downloaded the data can also be accessed via a web interface (shown below) and a series of web services
The currently available web services are :-
General Methods
Check API status
Compound Methods
Get compound by ChEMBLID
Get compound by Standard InChiKey
Get list of compounds matching Canonical SMILES
Get list of compounds matching Canonical SMILES using HTTP POST
Get list of compounds containing the substructure represented by a given Canonical SMILES
Get list of compounds containing the substructure represented by a given Canonical SMILES using HTTP POST
Get list of compounds similar to the one represented by a given Canonical SMILES, at a given cutoff percentage
Get list of compounds similar to the one represented by a given Canonical SMILES, at a given cutoff percentage using HTTP POST
Get image of a ChEMBL compound by ChEMBLID
Get individual compound bioactivities
Get alternative compound forms (e.g. parent and salts) of a compound
Get mechanism of action details for compound (where compound is a drug)
Target Methods
Get all targets
Get target by ChEMBLID
Get target by UniProt Accession Identifier
Get individual target bioactivities
Get approved drugs for target
Assay Methods
Get assay by ChEMBLID
Get individual assay bioactivities
We can use these web services to access ChEMBL data from within Vortex, the following scripts illustrate some of the means to do this.
UniprotID to ChEMBL target information.
When reading interesting results in the literature it is often useful to find out more about a particular target, this script uses the Uniprot ID to interrogate ChEMBL using the “Get target by UniProt Accession Identifier” web service to bring back target information. Because we can’t be sure what the column containing the Uniprot IDs will be entitled (e.g. Uniprot ID, uniprot_id, UNIPROTid etc) the first part of the script pops up a dialog asking the user to select the desired column.
We then construct the query string to access the appropriate web service, and then pull back the data. There is a little error trapping because some Uniprot IDs may not be in ChEMBL.
1 2 |
mystr = "http://www.ebi.ac.uk/chemblws/targets/uniprot/" + uniprotID + ".json" |
The data is returned in json format as shown below.
1 2 |
{"target": {"targetType": "PROTEIN FAMILY", "chemblId": "CHEMBL2095179", "geneNames": "Unspecified", "description": "Adenylate cyclase", "compoundCount": 75, "bioactivityCount": 137, "proteinAccession": "P26769", "synonyms": "ATP pyrophosphate-lyase 2,Adenylate cyclase type II,Adenylyl cyclase 2,4.6.1.1,Adcy2,Adenylate cyclase type 2", "organism": "Rattus norvegicus", "preferredName": "Adenylate cyclase"}} |
The last part of the script parses the data and populates the table.
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 |
#ChEMBL Target Search Search using uniprotID # Python imports import urllib2 import urllib from com.xhaus.jyson import JysonCodec as json # Vortex imports import com.dotmatics.vortex.util.Util as Util import com.dotmatics.vortex.mol2img.jni.genImage as genImage import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img import jarray import binascii import string import os input_label = swing.JLabel("Uniprot column (for input)") input_cb = workspace.getColumnComboBox() panel = swing.JPanel() layout.fill(panel, input_label, 0, 0) layout.fill(panel, input_cb, 1, 0) ret = vortex.showInDialog(panel, "Choose uniprot column") if ret == vortex.OK: input_idx = input_cb.getSelectedIndex() if input_idx == 0: vortex.alert("you must choose a column") else: col = vtable.getColumn(input_idx - 1) rows = vtable.getRealRowCount() for r in range(0, int(rows)): uniprotID = col.getValueAsString(r) mystr = "http://www.ebi.ac.uk/chemblws/targets/uniprot/" + uniprotID + ".json" try: myreturn = urllib2.urlopen(mystr).read() except urllib2.HTTPError: continue # some not found Target not found for accession:P55957 # if myreturn.find('Target not found') != -1: j = json.loads(myreturn) TheData = str(j['target']['chemblId']) colChemblID = vtable.findColumnWithName('ChEMBLID', 1) colChemblID.setValueFromString(r, TheData) TheData = str(j['target']['compoundCount']) colCompounds = vtable.findColumnWithName('Num Compds', 1) colCompounds.setValueFromString(r, TheData) TheData = str(j['target']['bioactivityCount']) colBio = vtable.findColumnWithName('BioactivityCount', 1) colBio.setValueFromString(r, TheData) TheData = str(j['target']['targetType']) colType = vtable.findColumnWithName('target_Type', 1) colType.setValueFromString(r, TheData) TheData = str(j['target']['preferredName']) colType = vtable.findColumnWithName('preferred_Name', 1) colType.setValueFromString(r, TheData) vtable.fireTableStructureChanged() |
Getting ChEMBL Target Data
After pulling back the target information associated with a particular Uniprot ID we may want to find out more about the compounds that have been tested against this target. The table now contains the ChEMBLID (highlighted in red) for the target and we can use this to interrogate ChEMBL to find all molecules that have been tested against this target.
To capture the desired ChEMBL ID we need to know the column and the particular cell containing the ID. To do this we can use an action from the user right-clicking on a cell to capture the contents.
1 2 |
taskID = col.getValueAsString(cell_row) |
We also capture the text in the “preferred_name” column to use as the label for a new workspace that will contain the results.
1 2 3 4 |
col1 = vtable.findColumnWithName('preferred_Name', 0) TableName = col1.getValueAsString(cell_row) |
We then construct the URL needed to access the web service and then pull back the data.
1 2 |
mystr = "https://www.ebi.ac.uk/chemblws/targets/" + taskID + "/bioactivities.json" |
The data in json format looks like this
1 2 |
{"bioactivities": [{"units": "nM", "reference": "Bioorg. Med. Chem. Lett., (2010) 20:19:5811", "target_chemblid": "CHEMBL2111430", "target_name": "MIF/CD74 (Macrophage migration inhibitory factor and HLA-DR antigens-associated invariant chain)", "bioactivity_type": "IC50", "ingredient_cmpd_chemblid": "CHEMBL1257355", "value": "7000", "operator": "=", "parent_cmpd_chemblid": "CHEMBL1257355", "assay_chemblid": "CHEMBL1259539", "activity_comment": "Unspecified", "name_in_reference": "10", "assay_description": "Inhibition of human recombinant biotinylated MIF/CD74 interaction after 30 mins", "organism": "Homo sapiens", "assay_type": "B", "target_confidence": 5}]} |
The last part of the script parses the data into a cvs string, and then create column headers.
We then create a new workspace using all the items we created in the script.
1 2 |
arrayToWorkspace(rows, column_names, TableName) |
The result is shown below, a new workspace showing all molecules that have been assayed against that target.
ou need to put this script in the “context” folder which is inside the “Vortex_Add-ons” folder.
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
#ChEMBL Targets Data Search Search #Authored by Chris Swain (http://www.macinchem.org) #All rights reserved. # Python imports import urllib2 import urllib import csv import sys from com.xhaus.jyson import JysonCodec as json # Vortex imports import com.dotmatics.vortex.util.Util as Util import com.dotmatics.vortex.mol2img.jni.genImage as genImage import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img import com.dotmatics.vortex.table.VortexTableModel as vtm import jarray import binascii import string import os # Example search string # http://www.ebi.ac.uk/chemblws/targets/CHEMBL2095179/bioactivities.json col = vtable.findColumnWithName('ChEMBLID', 0) col1 = vtable.findColumnWithName('preferred_Name', 0) if (col == None): vortex.alert('Load a workspace with a ChEMBLID column please.') quit() else: taskID = col.getValueAsString(cell_row) # taskID = "CHEMBL2095179" TableName = col1.getValueAsString(cell_row) mystr = "https://www.ebi.ac.uk/chemblws/targets/" + taskID + "/bioactivities.json" myreturn = urllib2.urlopen(mystr).read() j = json.loads(myreturn) rows = [] for ba in j['bioactivities']: values = [ba['parent_cmpd_chemblid'], ba['target_name'], ba['bioactivity_type'], ba['value'], ba['units'], ba['assay_description'], ba['organism']] row = ([str(i) for i in values]) rows.append(row) #vortex.addTable("Bioactivities", csvstring, 0, 4, -1, 0) column_names = ['parent_cmpd_chemblid', 'target_name', 'bioactivity_type', 'value', 'units', 'assay_description', 'organism'] arrayToWorkspace(rows, column_names, TableName) vtable.fireTableStructureChanged() |
ChEMBLID to SMILES script
Whilst the table above contains the textual information associated with an assay it does not include the chemical structure. This script uses the parentcmpdchemblid field and the https://www.ebi.ac.uk/chemblws/compounds/CHEMBL1.json web service to access the chemical data.
The data in json format looks like this
1 2 |
{"compound": {"smiles": "COc1ccc2[C@@H]3[C@H](COc2c1)C(C)(C)OC4=C3C(=O)C(=O)C5=C4OC(C)(C)[C@@H]6COc7cc(OC)ccc7[C@H]56", "chemblId": "CHEMBL1", "passesRuleOfThree": "No", "molecularWeight": 544.59, "molecularFormula": "C32H32O8", "acdLogp": 7.67, "stdInChiKey": "GHBOEFUAGSHXPO-XZOTUCIWSA-N", "knownDrug": "No", "medChemFriendly": "Yes", "rotatableBonds": 2, "alogp": 3.63, "numRo5Violations": 1, "acdLogd": 7.67}} |
By parsing the data we can pull out the SMILES string and populate the table, Vortex them renders the SMILES to display the structure. It is also possible to modify the script to access the calculated properties and add them to the table.
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
#Use ChEMBLid to get SMILES string #Authored by Chris Swain (http://www.macinchem.org) #All rights reserved. # Python imports import urllib2 import urllib from com.xhaus.jyson import JysonCodec as json # Vortex imports import com.dotmatics.vortex.util.Util as Util import com.dotmatics.vortex.mol2img.jni.genImage as genImage import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img import jarray import binascii import string import os # "https://www.ebi.ac.uk/chemblws/compounds/CHEMBL1.json" colsmi = vtable.findColumnWithName('SMILES', 0) input_label = swing.JLabel("ChEMBLid column (for input)") input_cb = workspace.getColumnComboBox() panel = swing.JPanel() layout.fill(panel, input_label, 0, 0) layout.fill(panel, input_cb, 1, 0) ret = vortex.showInDialog(panel, "Choose ChEMBLid column") if ret == vortex.OK: input_idx = input_cb.getSelectedIndex() if input_idx == 0: vortex.alert("you must choose a column") else: col = vtable.getColumn(input_idx - 1) rows = vtable.getRealRowCount() for r in range(0, int(rows)): chemblId = col.getValueAsString(r) mystr = "http://www.ebi.ac.uk/chemblws/compounds/" + chemblId + ".json" try: myreturn = urllib2.urlopen(mystr).read() except urllib2.HTTPError: continue # some not found j = json.loads(myreturn) TheData = str(j['compound']['smiles']) colsmi = vtable.findColumnWithName('SMILES', 1) colsmi.setValueFromString(r, TheData) vtable.fireTableStructureChanged() |
Getting ChEMBL Compound Data Search
Now we have a workspace containing all the molecules tested against a particular target, the next step in the analysis might be to select an particularlyy interesting molecule and see if there is any more biological data in ChEMBL associated with the molecule.
To capture the desired ChEMBL ID we need to know the column and the particular cell containing the ID. To do this we can use an action from the user right-clicking on a cell to capture the contents.
1 2 |
taskID = col.getValueAsString(cell_row) |
We also capture the text in the “preferred_name” column to use as the label for a new workspace that will contain the result
The data is returned in this format and can be parsed to populate a new workspace.
1 2 |
{"bioactivities": [{"reference": "Bioorg. Med. Chem. Lett., (2004) 14:9:2047", "target_chemblid": "CHEMBL1985", "target_name": "Glucagon receptor", "organism": "Homo sapiens", "ingredient_cmpd_chemblid": "CHEMBL63923", "value": "73", "operator": "=", "assay_chemblid": "CHEMBL680804", "parent_cmpd_chemblid": "CHEMBL63923", "units": "nM", "activity_comment": "Unspecified", "name_in_reference": "6k", "assay_description": "In vitro binding affinity against human glucagon receptor (h-GlucR) was determined", "bioactivity_type": "Ki", "assay_type": "B", "target_confidence": 8}, {"reference": "Bioorg. Med. Chem. Lett., (2004) 14:9:2047", "target_chemblid": "CHEMBL2097167", "target_name": "Adenylate cyclase", "organism": "Homo sapiens", "ingredient_cmpd_chemblid": "CHEMBL63923", "value": "2000", "operator": ">", "assay_chemblid": "CHEMBL645297", "parent_cmpd_chemblid": "CHEMBL63923", "units": "nM", "activity_comment": "Unspecified", "name_in_reference": "6k", "assay_description": "In vitro inhibitory activity against glucagon induced human adenylate cyclase", "bioactivity_type": "Ki", "assay_type": "B", "target_confidence": 4}]} |
The result is shown below.
The Vortex Script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
#ChEMBL Compound Data Search #Authored by Chris Swain (http://www.macinchem.org) #All rights reserved. # Python imports import urllib2 import urllib import csv import sys from com.xhaus.jyson import JysonCodec as json # Vortex imports import com.dotmatics.vortex.util.Util as Util import com.dotmatics.vortex.mol2img.jni.genImage as genImage import com.dotmatics.vortex.mol2img.Mol2Img as mol2Img import com.dotmatics.vortex.table.VortexTableModel as vtm import jarray import binascii import string import os # Example search string # http://www.ebi.ac.uk/chemblws/compounds/CHEMBL63923/bioactivities.json input_label = swing.JLabel("ChEMBLid column (for input)") input_cb = workspace.getColumnComboBox() panel = swing.JPanel() layout.fill(panel, input_label, 0, 0) layout.fill(panel, input_cb, 1, 0) # Get name of column containing the compound ChEMBLID ret = vortex.showInDialog(panel, "Choose ChEMBL_ID column") input_idx = input_cb.getSelectedIndex() col = vtable.getColumn(input_idx - 1) #col = vtable.findColumnWithName('parent_cmpd_chemblid', 0) if (col == None): vortex.alert('Load a workspace with a parent_cmpd_chemblid column please.') quit() else: taskID = col.getValueAsString(cell_row) # taskID = "CHEMBL2095179" TableName = taskID + " BioProfile" # Use this string in console for testing # mystr = "http://www.ebi.ac.uk/chemblws/compounds/CHEMBL2095179/bioactivities.json" mystr = "https://www.ebi.ac.uk/chemblws/compounds/" + taskID + "/bioactivities.json" myreturn = urllib2.urlopen(mystr).read() j = json.loads(myreturn) |
I rows = [] for ba in j[‘bioactivities’]: values = [ba[‘parentcmpdchemblid’], ba[‘targetname’], ba[‘bioactivitytype’], ba[‘operator’], ba[‘value’], ba[‘units’], ba[‘assay_description’], ba[‘organism’], ba[‘reference’]] row = ([str(i) for i in values]) rows.append(row)
1 2 3 4 5 6 7 8 9 |
#vortex.addTable("Bioactivities", csvstring, 0, 4, -1, 0) column_names = ['parent_cmpd_chemblid', 'target_name', 'bioactivity_type', 'qual', 'value', 'units', 'assay_description', 'organism', 'Reference'] arrayToWorkspace(rows, column_names, TableName) vtable.fireTableStructureChanged() |
The four scripts can be downloaded from here.
These two scripts need to be added to the scripts folder.
ChEMBLid2SMILES.vpy
ChEMBLtargetfromUniprot.vpy
Whilst these two scripts need to be stored in the context folder which is in the VortexAddon folder.
ChEMBLTargetDataV1.vpy
ChEMBLCompoundDataV1.vpy
Page Updated 31 October 2014