Vortex script using predictive models on OCHEM


OCHEM is a free open access site of annotated models and chemical data. OCHEM contains 1831772 experimental records for about 477 properties collected from 12457 sources you are free to upload your own data and also build predictive models using existing or your own data. 

OCHEM is a platform for medicinal chemists, toxicologists and chemical informatics. Using OCHEM, highly precise models for substance properties – quantitative struture-activity and structure-activity models (QSPR / QSAR) can be efficiently generated. It is a comprehensive software suite that provides an integration of multiple software components into a highly complex product. The platform is an evolution of the Virtual Computational Chemistry Laboratory (VCCLAB). The VCCLAB website is visited by more than 5,000 unique visitors per month and performs more than 136,000 analyses per year. OCHEM incorporates machine-learning tools that were used to develop the “ePhysProp” suite, i.e. the same highly predictive algorithms, which have demonstrated an exceptional performance of our software in aforementioned benchmarking studies.

There are a number of modelling tools available, these include linear and non-linear methods for model development, and estimation of their prediction accuracy

  • ASNN (ASsociative Neural Networks)
  • FSMLR (Fast Stagewise Multiple Linear Regression)
  • KNN (K-Nearest Neighbors)
  • Library model (A model based on another ASNN model enriched with new compounds data)
  • LibSVM wrapper with grid-search parameter optimisation
  • MLR (Multiple Linear Regression)
  • PLS (Partial Least Square)
  • WEKA-J48 (Weka-based implementation of C4.5 decision tree)
  • KNN (Weka implementation)
  • LADTree (Weka implementation)
  • Naive Bayes (Weka implementation)
  • REPTree (Weka implementation)
  • WEKA-RF (Weka-based implementation of Random Forest)

There are also a number of already built models that the public can access, these include

  • Ames test: Model ID 94fb6ea7-c694-44e2-9561-1ba561327a15
  • CYP1A2 inhibition: Model ID 49583412-8a84-41c1-ab45-0658d7d3e869
  • LogP and Solubility: Model ID f1f2c89e-70b2-41fd-88f4-46605f7f1f09

Whilst you can access the models from the website directly there are alternative licensing options.

OCHEM Academia is a public, freely accessible online version for academic users.
OCHEM Lite is a standalone version. The standard is based on a 27”iMac, which is operated under Ubuntu. They deliver a fully installed server including one days service for introduction and integration into your IT environment.
OCHEM Flex the configurable OCHEM standard version. They install OCHEM on one of your servers or provide you with a fully installed server. Do you own a computer cluster or a proven 3D-optimization process? They integrate OCHEM for you into your existing computer infrastructure. You decide on the maximum size of the computing cluster – and you can adjust it as needed anytime within minutes. They provide service for two days, including introduction and integration into your IT environment.
OCHEM Enterprise Edition is the version for large companies. This package comes without any restrictions. The version includes a complete month of installation support by an eADMET engineer plus all other necessary work to ensure a company-wide implementation and integration into the existing infrastructure.

Full documentation is available online

Accessing OCHEM via a RESTful web service

You can also run predictions on OCHEM using simple REST-like web services. Since a prediction is not instantaneous and can take several seconds to minutes depending on the model, number of compounds and server load, the prediction is performed asynchronously, that is in two steps: 1. Start a prediction task and get a task ID 2. Fetch your prediction task using the task ID from step (1).

To post a task, you need to format the request as follows

Were the model ID can be the ID from the public models described above or an ID for a private model you have built provided by the OCHEM admin. The molecule can be in SMILES or SD format. For both the formats, multiple molecules can be posted using $$$$ separator. It is much more efficient to predict molecules in batches rather than posting separate tasks for each molecule.

The response will look like this

You can then use the taskId to check the progress of the calculation.

If the task is still running the response will be

When finished the response will look like this in json format.

The Vortex Script, posting data

To access the web service we need two scripts, the first submits the data to the web service, the second script will get the data when the calculation is completed,

The following script submits the list of molecules as SMILES strings to the AMES prediction web service. The first part is pretty standard and warns the user they are submitting data to public site, of course if you have your own internal server this would need to be amended. We then create a column to store the taskID, and then loop through the list of molecules getting the SMILES and compiling a list with the “$$$$” separating each SMILES string. We then URLencode the SMILES to avoid issues with some of the characters that might be present in SMILES strings. We then create the URL and post it to the server, the json response from the server is then parsed to get the taskID. 

The last part uses the taskID to see if the job is completed and puts the taskID into the column. If the task is very fast and is completed then it pops up a message to tell you “Your job is finished:.

The Vortex Script, getting data

Once the task is complete we have to now get the results. The first part of the script gets the taskID and if there is a valid taskID it then creates the columns for the predicted data and get the number of rows. We then request the model data in json format and then parse the response filling in the table row by row, and finally update the table. You need to put this script in the “context” folder which is inside the “Vortex_Add-ons” folder. This means that you simply need to right-click on the taskID to get the option to retrieve the data.

The result should look something like this.

We can retrieve the results for the other models in a similar manner, there is a minor modification needed for the LogP Solubility script because it returns to predictions for each of the molecules.

The results should look like this

You can download the pack of six scripts, a script to submit to each of the three models (CYP1a2, AMES, and LogP/LogS predictions) together with a separate script to retrieve data from each of the models which will need to go into the “context” folder..


Page Updated 13 October 2013

Related Posts