Indexing the internet in a chemically intelligent manner
Some time ago I described a Safari extension that uses the chemicalize.org to index a web page for chemical content.
For an example of a “chemicalized” page have a look at this
As you can see below all molecules mentioned in the page become links that on a mouse over reveal the structure, they also provide a handy ribbon of structures across the top of the page that is useful for quickly scanning and navigation.

A recent publication by Southan and Stracz, Extracting and connecting chemical structures from text sources using chemicalize.org. Journal of Cheminformatics 2013, 5:20 describes how this information is being used to provide better indexing of the internet in a chemically intelligent manner. They include a demonstration of a number of web pages and document sources that were indexed in this manner including PDF’s from the patent office.
chemicalize.org now has 15000 unique visitors a month – which is a huge growth compared to spring 2012. These users contribute to the database every day, making sure it’s up-to-date and contains new interests as well. The database today contains 327000 structures that were converted from 545000 names and identifiers coming from 367000 webpages.
These structures and links have now been uploaded to PubChem and if you are interested in what sort of molecules have been registered via chemicalize.org you can browse them on the PubChem website here
Marvin Updated
Marvin 5.12.2 has been released with a couple of bug fixes
Molecule Representation
- Conversion from explicit hydrogen to implicit one removed stereo centers not having explicit hydrogen ligand.
Import/Export SMILES/SMARTS
- Non ring bond information were imported as query strings from SMARTS.
- After SMARTS import, those atoms that had no explicit aromatic property but had aromatic bond got query aromaticity property.
Marvin 5.12.1 released
New features and improvements
Import/Export
S orbitals and oval shaped s or p orbitals are imported from CDX/CDXML.
Bugfixes
Painting, Charge symbol on carbon atoms was missing when the atom numbers were visible and the display of carbon atom labels was turned off. When two atoms had more than one electron flow arrows between them, the electron flow arrows overlapped each other. The second electron flow arrow started from a wrong position when a single electron and an electron pair flow arrow started from an atom which had a lone pair and a radical as well.
Editing, Atom Lists and NOT Lists could not be created by typing atomic symbols separated with commas (e.g., "f,br,cl" or "!f,br,cl").
Import/Export, MRV and CML export wrote out characters incorrectly which are not supported by the character set. SDF files having invalid header could not been imported. Deuterium and tritium isotopes were converted to simple hydrogen atom if a molecule was exported to ChemAxon compressed MOL format (CSMOL). MolExporter.exportToObject() added an extra newline to SMILES. Nitrogens connecting two aromatic rings had radical after import if nitrogen was bracketed in the SMILES representation. Absolute stereo flag was missing during InChi export/import and InChiKey export.
Molecule Representation, Number of added implicit Hydrogen atoms were incorrect in some cases for positively charged sulfur atom.
Calculation, After canonical tautomer generation, the information of "double cis or trans" bond type might have been lost in certain cases.
Accessing the Chemical Identifier Resolver from Marvin
With the release of Marvin 5.12.0 users can now also access a custom web-service to extend name to structure conversion - for instance, with corporate IDs or common name dictionaries. I thought it might be useful to have a look at this new feature however I don’t have a corporate web service that I can use. This is where use of the Chemical Identifier Resolver (CIR) comes into play
Marvin 5.12.0 has been released
Marvin 5.12.0 has been released. This has an important updates for Mac OS X users, in that image to structure conversion using OSRA and text OCR for scanned documents is now supported on Mac OS X.

In addition Structure Checker configuration can be accessed via URL from MarvinSketch, Structure Checker application, and via Structure Checker API call. Users can now also access a custom web-service to extend name to structure conversion - for instance, with corporate IDs or common name dictionaries. Typing abbreviated group names is now case sensitive, When pasting unrecognised format onto the canvas, "Import as" dialog appears, and the user can choose the correct format. Structures can be copied as "Daylight SMARTS" and "ChemAxon SMARTS (CXSMARTS)" formats. The MMFF94 forcefield has been added to Generate3D and can also be used in the Conformer Plugin and Molecular Dynamics Plugin.
The complete release notes are available here
Marvin Update
Marvin Updated
Marvin from ChemAxon has been updated to version 5.11
New features and improvements
- Image I/O
- Recently added rendering options are now available to be set from MolPrinter API (Absolute label visibility, Peptide display type, R-group visibility, Any bond style, Lone pair rendering style, Charge rendering style). Documentation
- MSketch GUI
- A new "imageImportServiceURL=[URL]" program argument was added to the MarvinSketch application.
- MSketch applet
- A new "imageImportServiceURL" was added as an applet parameter.
- Graphical object handling
- When an MMidPoint object was set as an end point for an MPolyLine, getting the MMidPoint location caused a StackOverFlowError.
- Import/Export
- Document to Structure (d2s)
- Names broken over two lines with a hyphen (-) are now recognized.
- Names followed by a superscript text, for instance, a reference or footnote number (e.g., "aspirin11") are now recognized.
- Name to Structure (n2s)
- In some cases, such as "4-methylthiophenylmethyl", there is an ambiguity whether "thiophenyl" refers to a compound derived from thiophene or thiophenol. Name to Structure now gives priority to the thiophenol related compound interpretation; though, "thiophenyl" by itself will still be supported as thiophene derivatives.
- Painting
- If R-group visibility was turned off and any of the bonds had label(s) to paint, an ArrayIndexOutOfBounds exception was thrown.
- Image I/O
- Display parameters of charge, lone pair, peptide could not be set for molexporter. The default values were charge "in a circle", lone pair "as line", peptide "three letter format". Image copy also used these values.
- Import/Export
- MOL, SDF, RXN, RDF
- Aliphatic query properties of atoms with query string were not read from MDL formats.
- After importing Extended MOL files that contain superatom S-groups the orientation of S-groups could be changed.
- Atom containing both aliphatic and unsaturated query properties were exported incorrectly to MDL formats.
- SDF import returned structure with incorrect S-group embedding.
- SMILES/SMARTS
- SMILES T* option did not export all SDF fields, but only those which appeared in the first molecule.
- Molecule Representation
- S-groups
- Two superatom S-groups being each others' parents caused infinite loop. In these cases, now java.lang.IllegalStateException is thrown.
- Valence Check
- Phosphorous atom in hexafluorophosphate was not accepted by Valence Check. Forum topic, Forum topic
- Stereochemistry
- Cloning of BicyclostereoDescriptor in RxnMolecules threw java.lang.ArrayIndexOutOfBoundException.
- Clean 2D
- Terminal methyl-group in phosphate-ester was cleaned incorrectly.
- Clean2D could not handle condensed adamantane derivatives. Forum topic
- Calculations
- Other (HBDA, Huckel Analysis, ...)
- The --pH command line option did not work in hydrogen bond acceptor-donor calculation.
- Structure Checker
- If fixer action was not defined, default fixer was not applied in structurechecker command line tool.
Scripting Vortex 3
ChemAxon's Calculator (cxcalc) is a really useful command line program in Marvin Beans and JChem that performs chemical calculations using calculator plugins. There are a lot of calculations provided by ChemAxon (e.g. charge, pKa, logP, logD), and others can be added by writing custom plugins, perhaps one of the most useful is the ability to calculate the acidic and basic pKa. Calculation of pKa is essential to get a reasonable hold on the LogD of a molecule. LogD is probably the most critical physicochemical property in drug discovery, it has a major influence on absorption, cell penetration, metabolism, CYP450 inhibition and induction, PGP transporter activity and activity at the HERG channel, and is often a critical component of any structure activity relationship.
These scripts make use of cxcalc to generate data columns in Vortex