MS/MS library cleaning pipeline in matchms

30 July 202430 July 2024 chris

The key to building any AI/ML models is data quality, this pipeline [DOI] built on widely used open-source Python package matchms. It covers a number of aspects.

Basic filters. Runs basic metadata harmonization.
Default filters. Runs basic metadata harmonization, but also derives missing metadata from other fields, requiring metadata about ionmode and precursor mz and normalizing intensities.
Library cleaning. Runs all default filters, but in addition repairs errors in the annotations and requires complete annotations after all repairs were run.

All code is available on GitHub https://github.com/matchms

Macinchem Blog meetings

WWDC26

24 March 202624 March 2026 chris

Macinchem Blog Other Tips

Running Qwen3.5-397B on a M3 MacBook Pro

22 March 202622 March 2026 chris

View post by on Bluesky

@macinchem.bsky.social 3 days

Running Qwen3.5-397B on a M3 MacBook Pro

A while back Apple published a paper entitled LLM in a flash: Efficient Large Language Model Inference with Limited Memory [DOI] This paper tackles the challenge of...

https://macinchem.org/2026/03/22/running-qwen3-5-397b-on-a-m3-macbook-pro/
View post by on Bluesky

@macinchem.bsky.social 7 days

RSC CICAG Chemical Structure Representations Meeting 2026

This meeting will be held at Burlington House, London, UK on Wednesday 8th April 2026. A one-day conference on Chemical Structure...
View post by on Bluesky

@macinchem.bsky.social 1 week

OpenADMET Blind Challenge: Predicting PXR Induction

The next OpenADMET blind challenge focuses on predicting human Pregnane-X Receptor (hPXR) induction. The pregnane X receptor (hPXR) is the major...
View post by on Bluesky

@macinchem.bsky.social 1 week

OpenFold3-preview

OpenFold3-preview is a biomolecular structure prediction model aiming to be a bitwise reproduction of DeepMind'sAlphaFold3, developed by the AlQuraishi Lab at Columbia University and the OpenFold consortium. This research...
View post by on Bluesky

@macinchem.bsky.social 2 weeks

SCORE MLX Distilled CheMeleon molecular fingerprints on Apple Silicon

ChemeleonSMD distills the CheMeleon pretrained Directed Message Passing Neural Network (DMPNN) into a SCORE-style...

https://macinchem.org/2026/03/14/score-mlx-distilled-chemeleon-molecular-fingerprints-on-apple-silicon/

Last updated 10 minutes ago

MS/MS library cleaning pipeline in matchms

Related Posts

WWDC26

Running Qwen3.5-397B on a M3 MacBook Pro