TMAP on Apple Silicon

tmap is a very fast visualisation library for large, high-dimensional data sets. It was published in 2020 DOI and the code is available on GitHub (https://github.com/reymond-group/tmap). In theory it should be possible to install using conda but it seems the arm-64 format binaries for osx are not being generated (https://github.com/reymond-group/tmap/issues/55). Fortunately, it is possible to locally build the binary for Apple Silicon.

Detailed instructions and a shell script are available here https://gist.github.com/mjwen/0548a685412881f8802afcb31552b9f1

I’ve reproduced the script below and can confirm it all works fine on my MacBook Pro M1 Max. You do need to have conda installed (https://conda.io/projects/conda/en/latest/user-guide/install/macos.html).

#!/usr/bin/env bash

# This script builds and installs TMAP (https://github.com/reymond-group/tmap). 
# Particully, this aims at ARM macOS, for which no official binary is provided.  

# Requires:
# Conda is needed to obtain the prerequisites: cmake and openmp. 

# To install:
# Place this script in a directory where you want to store the TMAP repo, then
# $ bash install_tmap.sh
 
# To use:
# This script installs tmap in a conda environment named `tmap-env`. 
# After installing, activate this conda environment to use it. 


# Achieved via the below steps:
# 0. clone the repo
# 1. creating an conda env named `tmap-env`
# 2. get `cmake` and `libomp` using conda
# 3. compile OGDF, which is shipped with tmap
# 4. export LIBOGDF_INSTALL_PATH
# 5. install tmap


# 0
GIT_REPO=https://github.com/reymond-group/tmap.git
git clone ${GIT_REPO}

# 1 and 2
source ${CONDA_PREFIX}/etc/profile.d/conda.sh # make sure conda activate works in a bash script
conda env remove --name tmap-env # remove env if it exists
conda create --name tmap-env -y
conda activate tmap-env
conda install -c conda-forge pip cmake llvm-openmp -y

# 3
cd tmap
pushd ogdf-conda/src/
mkdir build
mkdir installed
cd build
cmake ../ -DCMAKE_INSTALL_PREFIX=../installed
make -j 10
make install
popd

# 4
export LIBOGDF_INSTALL_PATH=$(pwd)/ogdf-conda/src/installed

# 5
# -I${CONDA_PREFIX}/include to use omp.h
CXXFLAGS=-I${CONDA_PREFIX}/include  pip install -e .

#!/usr/bin/env bash

# This script builds and installs TMAP (https://github.com/reymond-group/tmap).

# Particully, this aims at ARM macOS, for which no official binary is provided.

# Requires:

# Conda is needed to obtain the prerequisites: cmake and openmp.

# To install:

# Place this script in a directory where you want to store the TMAP repo, then

# $ bash install_tmap.sh

# To use:

# This script installs tmap in a conda environment named `tmap-env`.

# After installing, activate this conda environment to use it.

# Achieved via the below steps:

# 0. clone the repo

# 1. creating an conda env named `tmap-env`

# 2. get `cmake` and `libomp` using conda

# 3. compile OGDF, which is shipped with tmap

# 4. export LIBOGDF_INSTALL_PATH

# 5. install tmap

# 0

GIT_REPO=https://github.com/reymond-group/tmap.git

git clone ${GIT_REPO}

# 1 and 2

source ${CONDA_PREFIX}/etc/profile.d/conda.sh # make sure conda activate works in a bash script

conda env remove --name tmap-env # remove env if it exists

conda create --name tmap-env -y

conda activate tmap-env

conda install -c conda-forge pip cmake llvm-openmp -y

# 3

cd tmap

pushd ogdf-conda/src/

mkdir build

mkdir installed

cd build

cmake ../ -DCMAKE_INSTALL_PREFIX=../installed

make -j 10

make install

popd

# 4

export LIBOGDF_INSTALL_PATH=$(pwd)/ogdf-conda/src/installed

# 5

# -I${CONDA_PREFIX}/include to use omp.h

CXXFLAGS=-I${CONDA_PREFIX}/include pip install -e .

Put the install_tmap.sh script in the folder you want to install TMAP then type

bash install_tmap.sh

1	bash install_tmap.sh

Once completed you can check that the env has been created by typing

(base) ChrisM1MBP ~ % conda env list
# conda environments:
#
base                  *  /Users/chrisswain/miniconda3
chemprop                 /Users/chrisswain/miniconda3/envs/chemprop
memfast                  /Users/chrisswain/miniconda3/envs/memfast
myRDKitenv               /Users/chrisswain/miniconda3/envs/myRDKitenv
tmap-env                 /Users/chrisswain/miniconda3/envs/tmap-env

(base) ChrisM1MBP ~ % conda env list

# conda environments:

base * /Users/chrisswain/miniconda3

chemprop /Users/chrisswain/miniconda3/envs/chemprop

memfast /Users/chrisswain/miniconda3/envs/memfast

myRDKitenv /Users/chrisswain/miniconda3/envs/myRDKitenv

tmap-env /Users/chrisswain/miniconda3/envs/tmap-env

Activate the conda environment by typing.

conda activate tmap-env
(tmap-env) chrisswain@ChrisM1MBP ~ %

1 2	conda activate tmap-env (tmap-env) chrisswain@ChrisM1MBP ~ %

I also installed RDKIt, faerun and matplotlib for plotting.

conda install -c conda-forge rdkit
pip install faerun
pip install matplotlib

conda install -c conda-forge rdkit

pip install faerun

pip install matplotlib

I tested the installation with one of the examples on the tmap GitHub page.

import pandas as pd
import tmap as tm
from mhfp.encoder import MHFPEncoder
from faerun import Faerun

import pandas as pd

import tmap as tm

from mhfp.encoder import MHFPEncoder

from faerun import Faerun

# Loading data from:
# PUBCHEM_BIOASSAY: Navigating the Kinome.
# (https://www.ebi.ac.uk/chembl/assay_report_card/CHEMBL1963834/)
# File is provided on GitHub in examples
df = pd.read_csv("CHEMBL25-chembl_activity-X98QJiCI4eAUAQSKQevT44ZjymoCjs8alCsnJir8aUU=.csv.gz", sep=";")

# Loading data from:

# PUBCHEM_BIOASSAY: Navigating the Kinome.

# (https://www.ebi.ac.uk/chembl/assay_report_card/CHEMBL1963834/)

# File is provided on GitHub in examples

df = pd.read_csv("CHEMBL25-chembl_activity-X98QJiCI4eAUAQSKQevT44ZjymoCjs8alCsnJir8aUU=.csv.gz", sep=";")

# The number of permutations used by the MinHashing algorithm
perm = 512

# Initializing the MHFP encoder with 512 permutations
enc = MHFPEncoder(perm)

# Initialize the LSH Forest
lf = tm.LSHForest(perm)

# Create MHFP fingerprints from SMILES
# The fingerprint vectors have to be of the tm.VectorUint data type
fingerprints = [tm.VectorUint(enc.encode(s)) for s in df["Smiles"]]

# Add the Fingerprints to the LSH Forest and index
lf.batch_add(fingerprints)
lf.index()

# Get the coordinates
x, y, s, t, _ = tm.layout_from_lsh_forest(lf)

# Let's color by active / inactive, which is set in the "Comments" field
# of the dataframe
active = [1 if a == "active" else 0 for a in df["Comment"]]

# Create the labels for SMILES with ID
df["SmilesID"] = df["Smiles"] + '__' + df["Molecule ChEMBL ID"]

# Now plot the data
faerun = Faerun(view="front", coords=False)
faerun.add_scatter(
    "Assay",
    {"x": x, "y": y, "c": [active, df["AlogP"]], "labels": df["SmilesID"]},
    point_scale=5,
    colormap=["tab10", "viridis"],
    has_legend=True,
    categorical=[True, False],
    legend_labels=[(0, "Inactive"), (1, "Active")],
    series_title=["Activity", "ALogP"],
    ondblclick="window.open(`https://www.ebi.ac.uk/chembl/compound_report_card/${labels[1]}/`, '_blank');"
)
faerun.add_tree("Assay_tree", {"from": s, "to": t}, point_helper="Assay")

# Choose the "smiles" template to display structure on hover
faerun.plot(template="smiles", notebook_height=750)

# The number of permutations used by the MinHashing algorithm

perm = 512

# Initializing the MHFP encoder with 512 permutations

enc = MHFPEncoder(perm)

# Initialize the LSH Forest

lf = tm.LSHForest(perm)

# Create MHFP fingerprints from SMILES

# The fingerprint vectors have to be of the tm.VectorUint data type

fingerprints = [tm.VectorUint(enc.encode(s)) for s in df["Smiles"]]

# Add the Fingerprints to the LSH Forest and index

lf.batch_add(fingerprints)

lf.index()

# Get the coordinates

x, y, s, t, _ = tm.layout_from_lsh_forest(lf)

# Let's color by active / inactive, which is set in the "Comments" field

# of the dataframe

active = [1 if a == "active" else 0 for a in df["Comment"]]

# Create the labels for SMILES with ID

df["SmilesID"] = df["Smiles"] + '__' + df["Molecule ChEMBL ID"]

# Now plot the data

faerun = Faerun(view="front", coords=False)

faerun.add_scatter(

"Assay",

{"x": x, "y": y, "c": [active, df["AlogP"]], "labels": df["SmilesID"]},

point_scale=5,

colormap=["tab10", "viridis"],

has_legend=True,

categorical=[True, False],

legend_labels=[(0, "Inactive"), (1, "Active")],

series_title=["Activity", "ALogP"],

ondblclick="window.open(`https://www.ebi.ac.uk/chembl/compound_report_card/${labels[1]}/`, '_blank');"

)

faerun.add_tree("Assay_tree", {"from": s, "to": t}, point_helper="Assay")

# Choose the "smiles" template to display structure on hover

faerun.plot(template="smiles", notebook_height=750)

Using MayaChem Tools

MayaChemTools is a growing collection of Perl and Python scripts, modules, and classes to support a variety of day-to-day computational discovery needs. It includes a set of command line Python scripts based on RDKit provide functionality for a variety of tasks, including a command line Python script based on TMAP that provides functionality to visualize chemspaces.

Using the provided example file SampleChemspace.csv from the MayaChemTools download.

The command to use the script is shown below.

python VisualizeChemspaceUsingTMAP.py  --categoricalDataCols Source -i mayachemtools/data/SampleChemspace.csv  -o SampleChemspace.html

1	python VisualizeChemspaceUsingTMAP.py --categoricalDataCols Source -i mayachemtools/data/SampleChemspace.csv -o SampleChemspace.html

You can view the interactive page here :-

https://macinchem.org/wp-content/uploads/2024/04/SampleChemspace.html

Using MayaChem Tools

Related Posts

Selecting random clusters from a large dataset in Vortex

Using ChemDraw as input for Boltz docking