MS/MS library cleaning pipeline in matchms

The key to building any AI/ML models is data quality, this pipeline [DOI] built on widely used open-source Python package matchms. It covers a number of aspects.

  • Basic filters. Runs basic metadata harmonization.
  • Default filters. Runs basic metadata harmonization, but also derives missing metadata from other fields, requiring metadata about ionmode and precursor mz and normalizing intensities.
  • Library cleaning. Runs all default filters, but in addition repairs errors in the annotations and requires complete annotations after all repairs were run.

All code is available on GitHub https://github.com/matchms

Related Posts