An interesting web app that fetches ChEMBL bioactivity data for a target (via UniProt ID), computes molecular descriptors, and trains a simple predictive model (regression, with an optional classification fallback). A great example of using the ChEMBL API and RDKit .
Code is on GitHub https://github.com/agiani99/CHEMBL2ML
What it does
- Input: a UniProt ID (e.g.
P00533). - Fetches:
- HGNC gene symbol (via genenames.org)
- ChEMBL target, assays, and activities (via ChEMBL API)
- Builds a dataset with:
pchembl_valueas the main label- RDKit physicochemical descriptors (e.g. MW, LogP, HBD/HBA, TPSA, rings, rotatable bonds)
- RDKit fragment descriptors (
fr_*) - Optional ErG-style features using
FixedPharmacophoreAnalyzerfromerg_calc_fragments_topo.py
- Trains:
- Regression: Random Forest (default) or XGBoost (if installed)
- Optional fallback: binary classification (Active if
pChEMBL ≥ 6.5) with optional SMOTE
- Exports:
- Pickled model + preprocessing objects
- Predictions CSV
- Top-features JSON
- Full dataset CSV
Bluesky Discussion
View on BlueskyNo replies yet. Be the first to comment on Bluesky!