Recently a guest post from NVIDIA described GPU-Accelerated Clustering with nvMolKit that uses CUDA.
A recent post no describes a port of the nvMolKit (CUDA) molecular clustering pipeline to Apple Metal via MLX. All code is on GitHub https://github.com/guillaume-osmo/mlxmolkit?tab=readme-ov-file
Implements the same 3-step workflow as the RDKit blog post:
- Morgan Fingerprinting — RDKit
GetMorganGenerator(CPU, multi-threaded) - Pairwise Tanimoto Similarity — Custom Metal kernel (GPU)
- Butina Clustering — Greedy algorithm on CSR neighbour list (CPU)
Key results
Tested on Enamine REAL 10.4M subset (same dataset as the blog), Apple M3 Max:
| N | Fused sim→CSR | Butina | Total | vs RDKit | Memory |
|---|---|---|---|---|---|
| 20k | 0.26s | 0.09s | 0.35s | 152x | 0.1 MB |
| 50k | 1.26s | 0.36s | 1.62s | — | 0.5 MB |
| 100k | 4.87s | 0.97s | 5.84s | — | 1.3 MB |