Calculating molecular properties using the ChemAxon cxcalc

ChemAxon’s Calculator (cxcalc) is a really useful command line program in Marvin Beans and JChem that performs chemical calculations using calculator plugins. There are a lot of calculations provided by ChemAxon (e.g. charge, pKa, logP, logD), and others can be added by writing custom plugins, perhaps one of the most useful is the ability to calculate the acidic and basic pKa. Calculation of pKa is essential to get a reasonable hold on the LogD of a molecule. LogD is probably the most critical physicochemical property in drug discovery, it has a major influence on absorption, cell penetration, metabolism, CYP450 inhibition and induction, PGP transporter activity and activity at the HERG channel, and is often a critical component of any structure activity relationship.

Calculator performs plugin calculations in a uniform way: it processes general parameters referring to input, output, and SDF file tag names for storing calculation result as well as plugin specific parameters that are different for each plugin. General Options

cxcalc -h, --help              this help message, 
                             list of available calculations
cxcalc &lt;plugin&gt; -h, --help     plugin specific help message
-o, --output &lt;filepath&gt;        output file path (default: stdout)
-t, --tag                      name of the SDFile tag to store the
                             calculation results     
                             default tag name: see plugin help  
-i, --id &lt;tag name|format&gt;     SDFile tag that stores the molecule ID
                             if no such tag exists in the input molecule
                             then molecule ID is the molecule itself
                             converted to the specified format
                             (default: ID = molecule index)
-N, --do-not-display &lt;i|h|ih&gt;  do not display molecule ID and/or
                             table header (in table output form):
                             i  - no molecule ID
                             h  - no table header
                             ih - neither molecule ID nor table header
-S, --sdf-output               SDF output with results in SDF tags
 -M, --mrv-output               result molecule output in MRV format
                             (if neither -S nor -M is specified then
                             plugin results are written in table form)
-g, --ignore-error             continue with next molecule on error
-v, --verbose                  print calculation warnings to the console

cxcalc -h, --help this help message,

list of available calculations

cxcalc <plugin> -h, --help plugin specific help message

-o, --output <filepath> output file path (default: stdout)

-t, --tag name of the SDFile tag to store the

calculation results

default tag name: see plugin help

-i, --id <tag name|format> SDFile tag that stores the molecule ID

if no such tag exists in the input molecule

then molecule ID is the molecule itself

converted to the specified format

(default: ID = molecule index)

-N, --do-not-display <i|h|ih> do not display molecule ID and/or

table header (in table output form):

i - no molecule ID

h - no table header

ih - neither molecule ID nor table header

-S, --sdf-output SDF output with results in SDF tags

-M, --mrv-output result molecule output in MRV format

(if neither -S nor -M is specified then

plugin results are written in table form)

-g, --ignore-error continue with next molecule on error

-v, --verbose print calculation warnings to the console

The general format from the command line to calculate the pka is

'/Applications/ChemAxon/MarvinBeans/bin/cxcalc' /Users/username/Desktop/temp.sdf pka -b 1 -a 1

1 2	'/Applications/ChemAxon/MarvinBeans/bin/cxcalc' /Users/username/Desktop/temp.sdf pka -b 1 -a 1

You will need to check the path to cxcalc, it seems to vary depending on version.

Where -b and -a define the first acidic and basic ionisation. The tab delimited text output looks like this

id  apKa1   bpKa1   atoms
1   3.61    4.97    5,2
2   12.07   -4.54   1,5
3   16.30   9.24    3,9
4   10.96   6.43    12,2
5   8.83        9

id apKa1 bpKa1 atoms

1 3.61 4.97 5,2

2 12.07 -4.54 1,5

3 16.30 9.24 3,9

4 10.96 6.43 12,2

5 8.83 9

Where apKa1 is the most acid pka, bpKa1 the most basic and ,atoms the atom numbers that are ionised. The Vortex script is shown below the first part gets the path to the sdf file as before and constructs the cxcalc script. The output is then parsed, using \n to separate each line and \t to separate each value on each line. The first line contains the column names and these are used to populate the Vortex columns, the other lines contain the data and this is used to populate the table.

import sys

# Uncomment the following 2 lines if running in console
#vortex = console.vortex
#vtable = console.vtable

sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import subprocess

# Get the path to the currently open sdf file
sdfFile = vortex.getFileForPropertyCalculation(vtable)

# Run cxcalc on the file
# ''/Applications/ChemAxon/MarvinBeans/bin/cxcalc' /Users/username/Desktop/temp.sdf pka -b 1 -a 1
p = subprocess.Popen(&#91;'/Applications/ChemAxon/MarvinBeans/bin/cxcalc', sdfFile, 'pka', '-b', '1', '-a', '1'], stdout=subprocess.PIPE)
output = p.communicate()&#91;0]

# Create new columns in table if needed
lines = output.split('\n')
colName = lines&#91;0].split('\t')
for c in colName:
column = vtable.findColumnWithName(c, 1)
vtable.fireTableStructureChanged()

keys = &#91;]
for i in lines:
words = i.split('\t')
if len(words) == 2:
    keys.append(words&#91;0])

# Parse the output
rows = lines&#91;1:len(lines)]
for r in range(0, vtable.getRealRowCount()):
vals = rows&#91;r].split('\t')
for j in range(0, len(vals)):
    column = vtable.findColumnWithName(colName&#91;j], 0)
    column.setValueFromString(r, vals&#91;j])

import sys

# Uncomment the following 2 lines if running in console

#vortex = console.vortex

#vtable = console.vtable

sys.path.append(vortex.getVortexFolder() + '/modules/jythonlib')

import subprocess

# Get the path to the currently open sdf file

sdfFile = vortex.getFileForPropertyCalculation(vtable)

# Run cxcalc on the file

# ''/Applications/ChemAxon/MarvinBeans/bin/cxcalc' /Users/username/Desktop/temp.sdf pka -b 1 -a 1

p = subprocess.Popen(['/Applications/ChemAxon/MarvinBeans/bin/cxcalc', sdfFile, 'pka', '-b', '1', '-a', '1'], stdout=subprocess.PIPE)

output = p.communicate()[0]

# Create new columns in table if needed

lines = output.split('\n')

colName = lines[0].split('\t')

for c in colName:

column = vtable.findColumnWithName(c, 1)

vtable.fireTableStructureChanged()

keys = []

for i in lines:

words = i.split('\t')

if len(words) == 2:

keys.append(words[0])

# Parse the output

rows = lines[1:len(lines)]

for r in range(0, vtable.getRealRowCount()):

vals = rows[r].split('\t')

for j in range(0, len(vals)):

column = vtable.findColumnWithName(colName[j], 0)

column.setValueFromString(r, vals[j])

The result can be seen in the image below.

xcalc can be used to calculate other properties such as logP, logD, mass, acceptorcount, donorcount, polarsurfacearea, and rotatablebondcount and the script can be modified to calculate all of these properties. So simply changing the part of the script that calls cxcalc as shown below calculates a new set of properties, and because the output follows a standard format the rest of the script that parses the output to generate the column headings, and populate the data fields etc. does not need to be altered.

# Run cxcalc on the file
# '/Applications/ChemAxon/MarvinBeans/bin/cxcalc' /Users/swain/Desktop/temp.sdf logp logd -H 7.4 
p = subprocess.Popen(&#91;'/Applications/ChemAxon/MarvinBeans/bin/cxcalc', sdfFile, 'logp', 'logd', '-H', '7.4', 'mass', 'acceptorcount', 'donorcount', 'polarsurfacearea', 'rotatablebondcount'], stdout=subprocess.PIPE)
output = p.communicate()&#91;0]

# Run cxcalc on the file

# '/Applications/ChemAxon/MarvinBeans/bin/cxcalc' /Users/swain/Desktop/temp.sdf logp logd -H 7.4

p = subprocess.Popen(['/Applications/ChemAxon/MarvinBeans/bin/cxcalc', sdfFile, 'logp', 'logd', '-H', '7.4', 'mass', 'acceptorcount', 'donorcount', 'polarsurfacearea', 'rotatablebondcount'], stdout=subprocess.PIPE)

output = p.communicate()[0]

The results can be seen in the table below.

One of really nice benefits of having command line tools that give the results in a consistent format is that it becomes trivial to add additional properties, simply add them to the command below and the output should be parsed and additional columns added to Vortex without the need to modify the rest of the script.

p = subprocess.Popen(&#91;'/Applications/ChemAxon/MarvinBeans/bin/cxcalc', sdfFile, 'logp', 'logd', '-H', '7.4', 'mass', 'acceptorcount', 'donorcount', 'polarsurfacearea', 'rotatablebondcount'], stdout=subprocess.PIPE)

1 2	p = subprocess.Popen(['/Applications/ChemAxon/MarvinBeans/bin/cxcalc', sdfFile, 'logp', 'logd', '-H', '7.4', 'mass', 'acceptorcount', 'donorcount', 'polarsurfacearea', 'rotatablebondcount'], stdout=subprocess.PIPE)

Updated Script

I often need to simply classify molecules as acid, base, neutral or zwitterion, so I’ve updated the script to create another column containing a text annotation. First we need to check if a Pka exists and then score it based on the value of both the calculated acid and basic pka. We then annotate on the resulting scores.

# Calculate abnz
# check if there is no pka

colapka = vtable.findColumnWithName('apKa1', 0)
colbpka = vtable.findColumnWithName('bpKa1', 0)
rows = vtable.getRealRowCount()
for r in range(0, int(rows)):
    apkaExists = colapka.isDefined(r)
    bpkaExists = colbpka.isDefined(r)
    if apkaExists is True:
        taskaID = colapka.getValue(r)
        if taskaID &lt;7.0:
            aScore=1    
        elif taskaID &gt; 7.0:
            aScore = 0  
    elif apkaExists is False:
        aScore = 0
    if bpkaExists is True:  
        taskbID = colbpka.getValue(r)
        if taskbID &gt;7.5:
            bScore=1    
        elif taskbID &lt; 7.5:
            bScore = 0
    elif bpkaExists is False:   
        bScore = 0
    if  aScore == 1 and bScore == 1:
        TheScore = 'Zwitterion'
    elif aScore == 1 and bScore == 0:
        TheScore = 'Acid'
    elif aScore == 0 and bScore == 1:
        TheScore = 'Base'
    elif aScore == 0 and bScore == 0:
        TheScore = 'Neutral'                    
    column = vtable.findColumnWithName('ABNZ', 1)
    column.setValueFromString(r, TheScore)

# Calculate abnz

# check if there is no pka

colapka = vtable.findColumnWithName('apKa1', 0)

colbpka = vtable.findColumnWithName('bpKa1', 0)

rows = vtable.getRealRowCount()

for r in range(0, int(rows)):

apkaExists = colapka.isDefined(r)

bpkaExists = colbpka.isDefined(r)

if apkaExists is True:

taskaID = colapka.getValue(r)

if taskaID <7.0:

aScore=1

elif taskaID > 7.0:

aScore = 0

elif apkaExists is False:

aScore = 0

if bpkaExists is True:

taskbID = colbpka.getValue(r)

if taskbID >7.5:

bScore=1

elif taskbID < 7.5:

bScore = 0

elif bpkaExists is False:

bScore = 0

if aScore == 1 and bScore == 1:

TheScore = 'Zwitterion'

elif aScore == 1 and bScore == 0:

TheScore = 'Acid'

elif aScore == 0 and bScore == 1:

TheScore = 'Base'

elif aScore == 0 and bScore == 0:

TheScore = 'Neutral'

column = vtable.findColumnWithName('ABNZ', 1)

column.setValueFromString(r, TheScore)

The scripts can be downloaded from here

The orginal pka calculation chemaxon_pka.vpy.zip

chemaxon_pka.vpy Download

Updated script to include acid/base/neutral/zwitterion annotation chemaxon_pka2.vpy.zip

chemaxon_pka2.vpy Download

Script for Log P etc. chemaxonlogPlogD_etc.vpy.zip

chemaxon_logP_logD_etc.vpy Download

Calculating molecular properties using the ChemAxon cxcalc

Updated Script

Related Posts

Selecting random clusters from a large dataset in Vortex

Using ChemDraw as input for Boltz docking