OpenBabel
Open Babel: An open chemical toolbox Open Babel presents a solution to the proliferation of multiple chemical file formats. In addition, it provides a variety of useful utilities from conformer searching and 2D depiction, to filtering, batch conversion, and substructure and similarity searching. For developers, it can be used as a programming library to handle chemical data in areas such as organic chemistry, drug design, materials science, and computational chemistry. Also Cheminformatics nodes for KNIME
Authors: Noel M O’Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch and Geoffrey R Hutchison Journal of Cheminformatics 2011 3:33 DOI https://doi.org/10.1186/1758-2946-3-33
Extensively used in nearly 50 projects (http://openbabel.org/wiki/Related_Projects) installs available for Linux, MacOSX and Windows.
OpenBabel is written in C++ and source code is available, bindings are also available to allow scripting access using Java, .NET, Perl, Python or Ruby.
OpenBabel was installed using miniconda
1 2 |
conda install -c conda-forge openbabel |
File conversion
For testing the file conversion is used a selection of structures from ChEMBL, 2D structures in sdf file format. MWt 250 to 500, calc LogP 0 to 5. This is a 2.6 GB file containing 1,144,624 molecules
The command used was
1 2 |
time obabel -isdf 'ChEMBLsubset.sdf' -osmiles -O 'ChEMBLsubset.smiles' |
Generating 3D structures
To test generating 3D structures I took a random 1000 structures from ChEMBL as 2D structures in sdf format and generated a sdf file containing 3D structures
The command used was
1 2 |
time obabel -isdf ChEMBL1000_2D.sdf -osdf -O ChEMBL1000_3D.sdf --gen3D |
Generating conformations
The next test was to generate conformations using a Genetic algorithm: This is a stochastic conformer generator that generates diverse conformers either on an energy or RMSD basis
The command used was
1 2 |
time obabel myfile.sdf -O ga_conformers.sdf --conformer --nconf 100 --score rmsd --writeconformers |
Filter based on a calculated property
The command line option –filter restricts conversion to only those molecules which meet specified chemical (and other) criteria. It makes it easy to select a subset of molecules. The information to do this can come either from properties imported with the molecule, as from a SDF file, or from calculations made by OpenBabel on the molecule. The test was run on 10K random structures from ZINC.
1 2 3 |
time obabel -isdf ZincRandom10K.sdf -osdf -O filtered.sdf --filter "MW<300" 4495 molecules converted |
Generating a Fastsearch file
OpenBabel provides a format called the fs — fastsearch index which should be used when searching large datasets (like ChEMBL) for molecules similar to a particular query. There are faster ways of searching (like using a chemical database) but FastSearch is convenient, and should give reasonable performance for most people. Generating the initial fast search index takes a while but subsequent searching is very fast.
The command used was
1 2 |
time obabel -isdf 'ChEMBLsubset.sdf' -ofs -O 'chemblefastsearch.fs' |
The timings are shown in the table below.
Task | Intel time | M1 max time | M2 Air time |
---|---|---|---|
File Conversion | 5 min 45 secs | 2 min 52 secs | 2 min 37 secs |
Convert to 3D | 2 min 15 secs | 1 min 41 secs | |
Generate conformations | 27 sec | 14 secs | 14 secs |
Filter | 3.8 sec | 1.7 secs | 1.6 secs |
Generate fs | 12.8 mins | 6.6 mins | 6.6 mins |
OpenBabel is single threaded and so these commands do not test multi-core performance.
List of tools tested https://macinchem.co.uk/software-reviews/cheminformatics-and-compchem-on-apple-silicon/
Last updated 14 August 2022