Open Source Cheminformatics Toolkits

When I wrote the article entitled A few thoughts on scientific software one of the responses I got was that people did not know about the existence of open-source chemistry toolkits so I thought I’d publish a page that hopefully prevent stop people reinventing the wheel. Here are a few open-source toolkits that I’m aware of.

OpenBabel (http://openbabel.org/wiki/Main_Page)

Open Babel: An open chemical toolbox Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. Also Cheminformatics nodes for KNIME 

Authors: Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch and Geoffrey R Hutchison Journal of Cheminformatics 2011 3:33 DOI https://doi.org/10.1186/1758-2946-3-33

Extensively used in nearly 50 projects (http://openbabel.org/wiki/Related_Projects) installs available for Linux, MacOSX and Windows.

OpenBabel is written in C++ and source code is available, bindings are also available to allow scripting access using Java, .NET, Perl, Python or Ruby.

License GNU GPL.
Source code: https://sourceforge.net/projects/openbabel/files/.
Mailing list: https://sourceforge.net/projects/openbabel/lists/openbabel-discuss

RDKit (http://www.rdkit.org)

The RDKit is an open source toolkit for cheminformatics, 2D and 3D molecular operations, descriptor generation for machine learning, etc. There’s also a molecular database cartridge for PostgreSQL and cheminformatics nodes for KNIME (distributed from the KNIME community site: https://www.knime.org/rdkit)

Installs available for Linux, MacOSX and Windows

The RDKit core algorithms and data structures are written in C++. Wrappers are provided to use the toolkit from either Python (2.x and 3.x), Java, or C#.

License: BSD.
Source code: https://github.com/rdkit/rdkit.
Frequency of releases: new feature releases twice a year. bug-fix releases every 6-8 weeks. .
Mailing lists: https://sourceforge.net/p/rdkit/mailman/

How to contribute https://github.com/rdkit/rdkit/wiki/HowToContribute.

CDK (https://cdk.github.io)

The Chemistry Development Kit (CDK) is a collection of modular Java libraries for processing chemical information (Cheminformatics). The modules are free and open-source and are easy to integrate with other open-source or in-house projects. Also Cheminformatics nodes for KNIME

Authors: Willighagen et al. The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching. J. Cheminform. 2017; 9(3), DOI:https://doi.org/10.1186/s13321-017-0220-4

The latest release JAR with all dependencies included from GitHub (https://github.com/cdk/cdk/releases/tag/cdk-2.9). CDK is written in Java

License GNU Lesser General Public License, version 2.1 (or later)..
Source code: https://github.com/cdk/cdk

Indigo (http://lifescience.opensource.epam.com/indigo/)

Indigo is a universal molecular toolkit that can be used for molecular fingerprinting, substructure search, and molecular visualization. Also capable of performing a molecular similarity search, it is 100% open source and provides enhanced stereochemistry support for end users, as well as a documented API for developers. Also Cheminformatics nodes for KNIME

Installs available for Linux, MacOSX and Windows

Indigo is written in C++ and source code is available, bindings are also available to allow scripting access using Java, .NET, Python.

License GNU General Public License .
Source code:https://github.com/epam/Indigo

OpenChemLib (https://github.com/Actelion/openchemlib/)

Open source Java-based chemistry library, also openchemlib-js JavaScript interface with the openchemlib java library

License https://github.com/Actelion/openchemlib/blob/master/LICENSE..
Source code: https://github.com/Actelion/openchemlib

ChemDoodle Web Components (https://web.chemdoodle.com)

The ChemDoodle Web Components library is a pure Javascript chemical graphics and cheminformatics library derived from the ChemDoodle® application and produced by iChemLabs. ChemDoodle Web Components allow the developer to present publication quality 2D and 3D graphics and animations for chemical structures, reactions and spectra.

License Gnu Public License (v3.0)..
Source code: https://web.chemdoodle.com/installation/download/

Kekule.js (http://partridgejiang.github.io/Kekule.js/index.html)

Kekule.js is an open source JavaScript library for chemoinformatics released under MIT license. Currently, it is molecule-centric, focusing on providing the ability to represent, draw, edit, compare and search molecule structures on web browsers.

License MIT license..
Source code: https://github.com/partridgejiang/Kekule.js

WebMolKit (https://github.com/aclarkxyz/web_molkit)

Cheminformatics toolkit built with TypeScript. Can be used to carry out some fairly sophisticated cheminformatics tasks on a contemporary web browser, such as rendering molecules for display, format conversions, calculations, interactive sketching, among other things. The library can be used within any JavaScript engine, including web browsers, NodeJS and Electron.

Demo of molecular sketchwer.

Written in TypeScript. Requires the TypeScript compiler (tsc) to cross-compile into JavaScript

License Apache license 2.0.
Source code: https://github.com/aclarkxyz/web_molkit

Chempy (https://pypi.org/project/chempy/)

Chempy is a Python package useful forchemistry (mainly physical/inorganic/analytical chemistry).

  • Numerical integration routines for chemical kinetics (ODE solver front-end)
  • Integrated rate expressions (and convenience fitting routines)
  • Solver for equilibria (including multiphase systems)
  • Relations in physical chemistry:

Author Bjoern I. Dahlgren

License: BSD.
Source code: https://github.com/bjodah/chempy

ChemmineR (https://www.bioconductor.org/packages/release/bioc/vignettes/ChemmineR/inst/doc/ChemmineR.html)

ChemmineR is a cheminformatics package for analyzing drug-like small molecule data in R. Its latest version contains functions for efficient processing of large numbers of small molecules, physicochemical/structural property predictions, structural similarity searching, classification and clustering of compound libraries with a wide spectrum of algorithms

Authors Kevin Horan, Yiqun Cao, Tyler Backman, Thomas Girke

License: Artistic-2.0
Source code: http://cran.at.r-project.org/

MolecularGraph.jl (https://github.com/mojaie/MolecularGraph.jl)

MolecularGraph.jl is a graph-based molecule modeling and chemoinformatics analysis toolkit fully implemented in Julia.

Author Seiji Matsuoka

License: MIT
Source code: https://github.com/mojaie/MolecularGraph.jl

LillyMol (https://github.com/EliLillyCo/LillyMol)

LillyMol is a C++ library for Cheminformatics. This repo also contains a variety of useful command line tools that have been built with LillyMol.
LillyMol does only a subset of Cheminformatics tasks, but tries to do those tasks efficiently and correctly.
LillyMol has some novel approaches to substructure searching, reaction enumeration and chemical similarity. These have been developed over many years, driven by the needs of Computational and Medicinal Chemists at Lilly and elsewhere.
LillyMol is fast and scalable, with modest memory requirements.
This release includes a number of C++ unit tests. All tests can be run with address sanitizer, with no problems reported.
The file Molecule_Tools/introduction.cc provides an introduction to LillyMol for anyone wishing to develop with C++.

Authors Xuyan Ru, Ian Watson, G-Huang

License: Apache 2.0
Source code: https://github.com/EliLillyCo/LillyMol

Chython (https://github.com/chython/chython)

Library for processing molecules and reactions in python way.

Features:
Read/write/convert formats: MDL .RDF (.RXN) and .SDF (.MOL), .MRV, SMILES, INCHI (inchi-trust library), .XYZ, .PDB
Standardize molecules and reactions and valid structures checker
Supported python-magic
Tetrahedron, Allene and CIS-TRANS stereo supported
Perform subgraph search
Build/edit molecules and reactions with Python API
Produce template based reactions and molecules
Atom-to-atom mapping, checking and rule-based fixing
Perform MCS search
2d coordinates generation (based on SmilesDrawer)
2d/3d depiction with Jupyter support
SMARTS parser with restrictions
Protective groups remover
Common reaction templates collection

Chython is fork of CGRtools for which there has been no development for several years, but could be reintiated.

Authors Ramil Nugmanov

License: GNU LESSER GENERAL PUBLIC LICENSE version 3.
Source code:https://github.com/chython/chython

CDPkit (https://cdpkit.org)

CDPKit (short for Chemical Data Processing Toolkit) is an open-source cheminformatics toolkit implemented in C++. CDPKit comprises a suite of software tools and a programming library called the Chemical Data Processing Library (CDPL) which provides a high-quality and well-tested modular implementation of basic functionality typically required by any higher-level software application in the field of cheminformatics. In addition to the CDPL C++ API, an equivalent Python-interfacing layer is provided that allows to harness all of CDPL’s functionality easily from Python code.

Authors Thomas Seidel

License: GNU LESSER GENERAL PUBLIC LICENSE version 2 or later.
Source code:https://github.com/molinfo-vienna/CDPKit

Also worth reading

A curated list of Cheminformatics libraries and software https://github.com/hsiaoyi0504/awesome-cheminformatics.

A curated list of Python packages related to chemistry https://github.com/lmmentel/awesome-python-chemistry.

Last Updated 22 Jul 24

Related Posts

2 thoughts on “Open Source Cheminformatics Toolkits

Comments are closed.