The key to building any AI/ML models is data quality, this pipeline [DOI] built on widely used open-source Python package matchms. It covers a number of aspects.
- Basic filters. Runs basic metadata harmonization.
- Default filters. Runs basic metadata harmonization, but also derives missing metadata from other fields, requiring metadata about ionmode and precursor mz and normalizing intensities.
- Library cleaning. Runs all default filters, but in addition repairs errors in the annotations and requires complete annotations after all repairs were run.
All code is available on GitHub https://github.com/matchms