Science

The importance of chemical diversity

Computational chemistry methods can significantly reduce experimental costs in early stages of a drug development project by filtering out unsuitable candidates and, therefore reducing experimental costs.

Chemical diversity of selected compounds in a lead discovery project is critical for a successful hit identification. In lead discovery and lead optimization phases, multiple scaffolds need to be considered in order to maximize the chances that one or more of the selected families successfully moves forward to the next drug development stages and to increase the chances that candidate molecules are not protected by existing IP.

Icon made by https://www.flaticon.com/authors/pause08 from www.flaticon.com Pause08 Field-based similarity models

In order to align and score two different compounds in 3D in-silico methods, most software packages rely on maximizing shape similarity of the compounds and use pharmacophoric points to represent specific physicochemical properties. This approach, however, heavily relies on atomic positions, which greatly constrains the search space to similar molecular structures.

Nonetheless, the atoms themselves and their distribution within the molecule are not what determines the interaction of a ligand with the receptor. Interaction fields are what really governs the activity of a ligand with a receptor. Different atomic distributions can lead to similar interaction fields and, therefore, interact similarly with the receptor. However, these chemotypes cannot be uncovered if molecular alignment and scoring techniques heavily tied to atomic positions are used.

With the goal of finding more chemical diversity, Pharmacelera’s tools make use of all the relevant interaction fields of compounds in 3D and use them to optimize molecular alignment and similarity evaluation.

A similarity score is computed in virtual screening by combining the Tanimoto indexes of the selected fields of interaction. PharmScreen is capable of generating 7 different fields of interaction: Electrostatic, Steric, Hydrophobic, Electrostatic contributions to Hydrophobicity, Cavitational contributions to Hydrophobicity, Van der Waals contributions to Hydrophobicity, Hydrogen Bond Acceptors/Donors. Although a default configuration is used if no information from the binding site is available, the user can adjust the model taking into account experimental data and receptor information.

iccons made by https://www.flaticon.com/authors/nhor-phai from https://www.flaticon.com High quality descriptors

Another important aspect in computational chemistry software is the quality of their molecular descriptors. In many cases, atom type-based approaches are used in order to increase the speed of calculations. However, these simplifications may result in incorrect results.

The importance of using accurate descriptors can be seen in the example above, where two compounds show very different transfer free energy for the same atom type due to the effect of the rest of the molecule. This differences in transfer free energy are translated into different lipophilic profiles and, therefore, different interactions with the receptor [DESC1].

Pharmacelera’s 3D interaction fields are generated with high quality molecular descriptors computed with semi-empirical Quantum-Mechanical (QM) methods [DESC2]. These accurate calculations take into account the influence of conformations and model new chemical groups not present in empirical databases, better representing the lipophilic profile of the molecules.

iccons made by https://www.flaticon.com/authors/Freepik from https://www.flaticon.com Field-based alignment

Molecular alignment is a critical step in 3D molecular modelling. Precise alignments are fundamental for obtaining correct and accurate results from virtual screening, scaffold hopping, pharmacophore generation and structure-activity relationships. Otherwise, the results are meaningless. Superposing two structures is often easy when parts of the molecules are identical. However, finding the correct alignment is not trivial when the structures are different.

Even bioactive overlays of structures that share a central scaffold do not necessarily need to overlay this central scaffold atom by atom because small changes in the molecule might change its position in the binding site. Existing tools that make alignments based on matching atom triplets or on field extrema are not able to capture these subtle alignment differences that later have a great impact in the accuracy of the results.

Pharmacelera’s tools use state-of-the-art algorithms to maximize field similarity using multiple expansion centres and tensors. This approach finds the alignments that maximize interaction field similarity regardless of the underlying molecular structure, better representing the way that compounds are positioned in the binding site.

The quality of Pharmacelera’s alignment technology was validated in a study [ALIGN1] that evaluated how well it reproduces the bioactive overlays of 1456 crystal structures of 121 different receptors from the AstraZeneca Overlays Validation Test Set (https://www.ccdc.cam.ac.uk/support-and-resources/downloads/) from the Cambridge Crystallographic Data Center.

Field-based alignment example. PharmScreen fields of the two aligned molecules superposed (first and second figures), proposed alignment of the green structure using as a reference the purple structure (third figure) and comparison of the proposed alignment (green) with the crystal structure (blue) (fourth figure).

Each crystal structure was used as the reference in order to overlay the rest on top and then compare the alignment with the existing crystal overlays. In the above image, an alignment example of the green structure using the purple structure as the reference is shown. As it can be observed, aligning the molecules using the interaction fields reproduces the correct position in the crystal. Learn more about it in our blog publication

In the following figure, the alignment results over the AstraZeneca Overlays Validation Test Set are shown reproducing the methodology of previous studies [ALIGN2]. In those studies, the crystal sets were classified into four categories based on the alignment complexity: easy, moderate, hard and unfeasible.

It can be seen in the figure that for easy sets (where most ligands share a central scaffold) most alignment techniques work properly. However, using more accurate molecular descriptors is key to find the correct alignments in complex sets (where multiple chemical families and different binding modes are present).

icons made by https://www.flaticon.com/authors/Nhor-Phai from https://www.flaticon.com QSAR

Pharmacelera’s fields can also be used to generate Quantitative Structure Activity Relationship models with PharmQSAR.PharmQSAR is capable of generating traditional COMFA models with a PLS regression method and also additional QSAR models using Pharmacelera’s advanced hydrophobic descriptors.

The usage of Pharmacelera’s hydrophobic fields for QSAR analysis was validated in several studies that demonstrated that the degree of representativity of the hydrophobic fields from the electrostatic and cavitational contributions to LogP is comparable to COMFA and provides additional insights about the lipophilic profile of the molecule [QSAR1][QSAR2].

The figure above shows the models generated for six different targets for the training (blue) and test sets (red). It can be seen that Pharmacelera’s descriptors are able to reproduce the experimental results and, therefore, are very suitable for representing ligand-receptor interactions.

icons made by https://www.flaticon.com/authors/Smashicons from https://www.flaticon.com High Performance Computing

Pharmacelera’s expertise also includes the use of High-Performance Computing (HPC) and hardware acceleration in order to perform complex calculations faster. All the software has been parallelized to take full advantage of current microprocessor designs and configurations and it shows great scalability [HPC1]. For instance, Pharmacelera has been able to use PharmScreen with a library containing 95 million of molecular structures.

In a previous collaboration, Pharmacelera was able to reduce the execution time of some kernels from the Protein Energy Landscape Exploration (PELE) from the Barcelona Supercomputing Center - Centro Nacional de Supercomputación (BSC-CNS) by 20X, reducing it from 84 hours to 4 hours.

Pharmacelera is a member of the Industrial Advisory Committee (IAC) of the European PRACE (Partnership for Advanced Computing in Europe) program as the pharmaceutical industry representative. The goal of PRACE is to promote the use of High-Performance Computing for future scientific discoveries and for solving complex industrial problems. Learn more about PRACE in our blog post

[DESC1] Muñoz-Muriedas et al.,Journal of Computer-Aided Molecular Design, 19, 2005.
[DESC2] Luque et al., Journal of Computer-Aided Molecular Design 13(139), 1999.
[ALIGN1] Vázquez et al., “Development and Validation of Molecular Overlays Derived From 3D Hydrophobic Similarity with PharmScreen”, Journal of Chemical Information and Modeling (JCIM), July 2018.
[ALIGN2] [1] Giangreco et al., “Assessment of a Cambridge Structural Database-Driven Overlay Program”, J. Chem. Inf. Model. (2014) 54:3091 − 3098
[QSAR1] Ginex et al., “Development and validation of hydrophobic molecular fields from the quantum mechanical IEF/PCM-MST solvation models in 3D-QSAR”, Journal of Computational Chemistry (JCC), January 2016.
[QSAR2] Ginex at al., “Application of the Quantum Mechanical IEF/PCM-MST Hydrophobic Descriptors to Selectivity in Ligand Binding”, Journal of Molecular Modelling (JMM), June 2016.
[HPC1] Kondratyev et al., “HPC Methodologies for PharmScreen”, PRACE White Paper, 2016.