Looking at mlxmolkit — GPU-accelerated molecular clustering on Apple Silicon

I’ve written several posts on the various options for clustering molecules https://macinchem.org/?s=clustering and a recent post from NVIDIA described GPU-Accelerated Clustering with nvMolKit that uses CUDA. This looks very interesting but lies on NVIDIA chips.

A recent post now describes a port of the nvMolKit (CUDA) molecular clustering pipeline to Apple Metal via MLX. All code is on GitHub https://github.com/guillaume-osmo/mlxmolkit?tab=readme-ov-file so I thought I’d have a look.

I first cloned the GitHub repo and then created a conda virtual environment, then cd into the repo and with the conda environment activated installed using pip. The comparison was run using the Jupyter notebook below showing the code used which is based on the scripts in the repo.

clustering

The first part imports the libraries and then imports a random 50K molecules taken from ZINC. Then we generate the fingerprints that are needed for both the standard RDKit clustering and the MLX accelerated.

The first run used the RDKit clustering, the results are

--- RDKit (BulkTanimotoSimilarity + ClusterData) ---
  Similarity:  179.870s
  Clustering:  55.766s
  Total:       235.636s  ->  11267 clusters (largest: 188)

I then ran the MLX accelerated clustering and the results are

--- MLX/Metal (Fused Tanimoto→CSR + Butina CPU) ---
  Fused sim→CSR: 5.589s  (Metal, no N×N matrix)
  Butina:        0.253s  (CPU CSR greedy)
  Total:         5.842s  ->  11213 clusters (largest: 188)
  Edges: 809,760  |  Memory saved: 10000 MB (no sim matrix)

This is a 40-fold improvement in time taken!! Clearly a great piece of work by the author.

The last part simply annotates each molecule with the appropriate cluster number and then exports the results to an sdf file.

I tried clustering 150K molecules and got the following error.

--- MLX/Metal (Fused Tanimoto→CSR + Butina CPU) ---
libc++abi: terminating due to uncaught exception of type std::runtime_error: [METAL] Command buffer execution failed: Impacting Interactivity (0000000e:kIOGPUCommandBufferCallbackErrorImpactingInteractivity)

Unfortunately solving this is beyond my capabilities, but I've posted the issue on the repo.

Related Posts