An interesting web app that fetches ChEMBL bioactivity data for a target (via UniProt ID), computes molecular descriptors, and trains a simple predictive model (regression, with an optional classification fallback). A great example of using the ChEMBL API and RDKit .

Code is on GitHub https://github.com/agiani99/CHEMBL2ML

What it does

  • Input: a UniProt ID (e.g. P00533).
  • Fetches:
    • HGNC gene symbol (via genenames.org)
    • ChEMBL target, assays, and activities (via ChEMBL API)
  • Builds a dataset with:
    • pchembl_value as the main label
    • RDKit physicochemical descriptors (e.g. MW, LogP, HBD/HBA, TPSA, rings, rotatable bonds)
    • RDKit fragment descriptors (fr_*)
    • Optional ErG-style features using FixedPharmacophoreAnalyzer from erg_calc_fragments_topo.py
  • Trains:
    • Regression: Random Forest (default) or XGBoost (if installed)
    • Optional fallback: binary classification (Active if pChEMBL ≥ 6.5) with optional SMOTE
  • Exports:
    • Pickled model + preprocessing objects
    • Predictions CSV
    • Top-features JSON
    • Full dataset CSV

Bluesky Discussion

View on Bluesky

No replies yet. Be the first to comment on Bluesky!

Related Posts