TUTORIAL 2

Analysis of butyrophenones with serotoninergic affinities

 

Objectives

Obtain GRIND descriptors for a series of compounds taken from literature [1] and use them to build, validate and interpret 3D-QSAR models.

 

Sections

  1. Conventions
  2. Introduction
  3. Descriptor calculation
  4. PCA modeling
  5. PLS modeling
  6. Variable selection
  7. External prediction
  8. Bibliography

 

Conventions

[MENU] Means that you have to choose the menu option identified by this label. Usually you have to click with the mouse or type the label. The labels separated by the symbol >>> mean that you have to "navigate" some submenus. For instance A>>>B>>>C means "choose menu A, then a submenu appears where you have to choose option B, then a submenu appears where you have to choose option C.

[DIALOG] Means that a dialog window is open for the user choice. Select the option indicated.

[BUTTON] Press the button with this label

 

Introduction

The dataset contains a series of 25 "conformationally constrained" butyrophenones taken from literature [1]. These compounds possess binding affinity to the serotonergic 5-HT2A and 5-HT2B rat receptor. 5-HT2 antagonists have demonstrated activity for the treatment of anxiety and disthymic disorders [2] as well as for the reduction of extrapyramidal side effects produced by classic neuroleptic. Drugs with mixed dopamine D2 and 5-HT2 receptor-antagonist activities have been used for the treatment of schizophrenia (e.g. Risperidone, Janssen). This series was used also as an example of application of the GRIND in the original article [3] mentioned in the background section.

The name, structure and binding affinity of each compound are listed in table 1. In all the series, the conformation of the butyl chain is restricted due to the presence of a cycloalkanone ring fused to a benzene ring or an heterocycle (struct), every compound contains a central piperazine or piperidine ring (subst).

The structures of the 25 butyrophenones were initially built in 2D coordinates from the published structures and from them, suitable 3D structures were obtained using program CORINA. All compounds were considered to be protonated, with a formal charge +1 placed on one nitrogen of the piperazine or piperidine ring. Chiral descriptors were included in the 2D files in order to avoid divergences in the 3D structures. In this tutorial, the 3D structures of the compounds were generated using CORINA and they differ significantly from the conformations used in [3], therefore, the model obtained here is considerably different from the published one.

The structures of the compounds in the series can be found in directory tutor2 and their names are already included in the file list inhi25.lst.

IMPORTANT: This tutorial describes the method used by the authors of [1] to build and interpret an ALMOND model. However, due to the differences in the version of ALMOND used (3.0.1 vs 3.3), the aspect of the plots and some numerical values of the models might be different from those originaly published. Moreover, at the time the article was published, some of the latest ALMOND refinements (e.g. intra-group removal, and TIP probe were not used) were not developed. Therefore this tutorial intends to describe a practical application of ALMOND but should not be understood as the best possible approach to build ALMOND models.

 

Descriptor calculation

Descriptors will be computed using the 3D structures of the 25 butyrophenones. Copy the contents of the directory tutor2 to a local directory where you have reading and writing permission and start ALMOND typing in a UNIX shell:

 

almond

 

This command starts the program in interactive mode, using its graphical interface.

GRIND are calculated automatically when a series of 3D structures is imported. In the import dialog, a number of method parameters should also be defined. The meaning of each one of these parameters is thoroughly discussed in the manual.

The computation starts, a new window indicating the process state is displayed while the following information is printed on the main text window:

After this first step, you should be able to:

 

PCA modeling

The descriptors generated using ALMOND are now in memory and can be used for different purposes. For example, we will use them now to represent the similarities of the compounds in this series by building a PCA model:

The results of the PCA analysis are written by ALMOND in the main text window. A large part of the X variance is explained by the first component (XVarExp = 36%), additional components only contribute slightly to the variance explanation.

Before looking to the plots, we will start assigning colors to the objects according to the substituent present in the molecule (subst), in order to make easier its interpretation.

rule based pattern

Almost every compounds is assigned to the right cluster in this way, but for some compounds the assignment should be changed:

We will start looking to the first two PC´s, since these components explain most of the variance.

A new window with the PCA score plot appears in the screen. According to the coloring scheme, compounds may be separated in 3 groups. An obvious cluster of "red" compounds on the positive side of PC1, a slight cluster of "white" compounds on the negative side and, below it, a third group of molecules with various substituents. Therefore, the first PC is mainly devoted to explain the differences between red compounds and all the others

Let's look at the compound's name to identify the clusters: place the mouse pointer within the plot window and hold pressed the right mouse button, a pop-up menu will show in which we can select the command set symbol>>>Name (N). Resize the window if the names appear too much overlaid to be read.

The red names correspond to the molecules with e substituent (subst) which is the only substituent that contains a propyl linker. As CORINA generates extended conformations, molecules with the e substituent are much longer than the others, which may explain the strong separation from all the other compounds in the plot. To confirm this hypothesis, we can use the MACC2 and the PCA profile of PC1.

Placing both plot windows as shown in the figure below allow to appreciate the correspondence between the actual values of the variables (in the plot at the top) and the corresponding loadings in the first PC (in the plot at the bottom). It can be seen that some of the variables with higher loadings are those representing interactions at long distance (the last of every correlogram block), characterizing compounds with longer linkers (in red in the top plot), like the compound 35e.

MACC2 and PC1 profile

{short description of image}

Indeed, by looking at regions in the MACC2 profile corresponding to the highest positive peaks in the loading profile, you can find again the same color pattern as in the PC1 score plot. Regarding the lowest negative peak, you can also retrieve the color pattern, but upside-down.

Most highest positive peaks represent long distance interactions (i.e. 11-46, 12-32 and 13-42). In order to see in 3D the interactions represented by this variable, click on the highest loading (12-32) with the middle mouse button to activate it. A little red cross should appear on the top of the peak. Then open the filtered Grid plot corresponding to the correlogram of this peak.

The plot appears in a new window. Move the mouse pointer into the window and perform :

Look at molecules with the e substituent by playing with the object list. In all the e series, the activated couple of nodes represents the distance between a O probe (red nodes) interacting with the charged nitrogen and a DRY probe (ivory nodes) interacting with the aromatic ring of the e substituent. In compounds with other substituents, a couple of nodes may also be activated, however, the nodes interact with other zones of the molecule, and their energy is less intense. For example, the O probe in molecule 10a interacts with the aromatic ring of the butyrophenone with a low binding energy compared to the charged amine. Let´s check the MIF values for probe O:

Go back to the Grid filtered window, select compound 10a in the object list and then press Ctrl-C with the mouse pointer inside the window to copy the view, then move the mouse pointer into the Grid raw window and press Ctrl-P to paste the view. Now both window should show the same compound in the same orientation.

In the plot's window :

Move the slide control of the negative values and observe that the interaction energy with the charged nitrogen field is more than three times higher than the interaction field caused by the aromatic ring. If you wish, look at other molecules on the Grid filtered window, and if you have any doubt about the energy of interaction of a node, you can check it as you did for the former filtered field.

By now we have seen how to :

 

PLS modeling

Close all the plot windows opened to clean up your screen.

Load activity values using command [MENU] File>>>Import Activity..., and then enter the name of the activity list file activ25.txt. Since this file contains the name of the objects, select "skip first field". Press the OK button. The values of activity already loaded in ALMOND can be printed in the main window using the command [MENU] List>>>Activity

Create and validate the PLS model as following :

PLS models should be interpreted graphically, therefore start producing a plot of the r2 and q2 obtained for the different model dimensionality.

Compare the evolution of r2 and q2 according to the number of LVs included in the model. The r2 value always rises whereas q2 rises more slowly for LV4 and LV5. Hence, introduction of latent variables 4 and 5 probably contribute little to the model. Also, contribution of LV3 to the model is very weak and may be irrelevant, we will discard it to simplify the model. Notice that the q2 value of LV1 is particularly low, and how the introduction of LV2 into the model remarkably increases the predictive ability of the model.

As pKi have been imported previously, we can color the compounds according to their activity value to help interpreting the plots:

Open PCA and PLS score plots for the two firsts components.

In both plots, change the color of the compounds according to their activities: move the mouse pointer into the plot window, hold pressed the right mouse button and select symbol color>>>spectrum. Hold pressed the right button one more time and select set symbol>>>name.

Start looking to the PCA score plot, you can notice a partial clustering according to compound's activity. Even if pKis values are not used during PCA modeling active compounds and inactive compounds tend to cluster together. This is a clear indication that the descriptors are working well on recognizing molecular features relevant for describing the biological activity of the compounds.

Now observe the PLS score plot. This time, pKi values have been included into the model and the projection attempted to discriminate between compound with different activities. In this plot, 4 groups can be recognized:

  1. A first group of e compounds of middle activity on the up left and corner of the plot.
  2. A second group of low activity f compounds on the bottom left and corner of the plot.
  3. A third group of highly active compounds with a or b substituents on the right side of LV2.
  4. A last group of molecules with various activity situated between the clusters of very and poorly active compounds.

We will start focussing our attention on the first LV.

The main difference between the two plots is the predicted activity of the three least active compounds (i.e.. 35f, 10f and If) which decrease of one pKi unit from the model with one component to the model with two components. Therefore, the particularly low q2 of the model with only one LV is probably due to an inefficient prediction of the activity of these compounds.

To get a better understanding of the model, the variables contributing the most should be identified. This can be done using plots of PLS weights, which express the contribution of the original variables to the LV´s. Close the Pred. vs Exper. plots and open the profile plot for the LV1 weights.

The profile is particularly complex, we will focus only on the analysis of the highest peak located in the O-N1 correlogram.

Move the mouse pointer into the PLS profile, click on the bar 23-6 with the middle mouse button, then hold pressed the shift key of the keyboard and click on the bar 23-7 with the mouse middle button. You should see 2 little red crosses meaning that the distances 23-6 and 23-7 have been activated.

Observe the MIF of every compounds containing a or b substituent. For most of the molecules (e.g. 23a), the two couple of points activated represent the distance between a O probe (red nodes) interacting with the charged amine, and a N1 probe (blue nodes) interacting with an hydrogen-bond acceptor situated in the substituent moiety. However, for some compounds (e.g. 35b), N1 probes (blue nodes) interacting with the butyrophenone ketone are also localized near the charged amine, and so are involved in the activated distances, instead of the N1 probes interacting with the substituent moiety. This phenomenon is due to the MACC2 transform that only allows the back tracing of the maximum energy product for a given distance range.

Now look at the MACC2 profile, and observe the correlation between the energy product of distances 6 and 7 and the compound's activity: most compounds with a high value for these variables have red color (high binding affinity) but for the two main outliers, namely molecules 35f and If, which have blue color and correspond exactly with the two molecules with the worst calculated activity. See 2D plot Recalc. residuals for component 1. Besides, If you go back to the filtered plot and select one of these molecules, you will see N1 probes (blue nodes) interacting with the butyrophenone ketone near the charged amine, as for compound 35b (see above). This little example points out one important limitation of GRIND: the same variable can be produced by different interactions in different compounds. So far, ALMOND doesn´t include any method to prevent the presence of this kind of inconsistencies and these should be checked during the model interpretation phase.

If you wish, you can use the same method as above for the analysis of the other important peaks. For example, in correlograms DRY and DRY-O, notice that compounds with a, b and e substituent have all high energy products for distances corresponding to the highest positive peaks, but that only e compounds receive an important contribution of the large negative peaks since there are the only ones with very long distance interactions.

When you have finished, close all the plot windows.

After this part of the tutorial, you should be able to :

 

Variable selection

It is obvious that not all node-node interactions are relevant for the description of the activity of the compounds. Irrelevant variables only contribute to the addition of a noise in the description of the molecule and make the model interpretation more complicated. Their removal may be performed manually, according to the user interpretation of the model, or automatically using the FFD procedure. We will use both technics for this tutorial.

Manual exclusion

Manual exclusion can be performed either on complete correlograms or only on a subgroup of distances. Manual exclusion is highly recommended for the validation of user's hypotheses about the influence of a group of variables on the behavior of the model.

Open the Grid plot for the filtered O MIF:

The plot appears in a new window. Change the setting as before :

If you observe a few molecules you will see that the activated node-node interactions always involve at least one O probe interacting with an aromatic area. The relevance of these nodes are doubtful: their presence in the MIF is strongly dependant on the number of nodes selected during the filtering, besides the DRY probe is much more appropriated for this type of interaction. Consequently the O-O block looks like to a reduced copy of the DRY-O block, which contains peaks of clear relevance. Exclusion of the O-O block will not affect the model prediction and will simplify the model.

As a result of variable exclusion, a new PLS model need to be generated and validated.

q2 value increase slightly compared to the initial model, which means that the model simplification has not disturbed the model.

All the other remaining variables cannot be excluded at the block level, we will use the FFD procedure for the next selection step.

Automatic exclusion

IMPORTANT, select Execution in a Window. Press the OK button. The FFD variables selection can take a some time to compute. After a few seconds the window will show an estimated time of completion.

Once the selection is completed, proceed to exclude the unselected variables:

First notice that the predictive ability of the model is now optimum with 3 components and increased considerably with respect to the first model. If you want to make a detailed comparison using plots and profiles, you can retrieve the original model by clicking on [MENU] Pretreatment>>>Reload original, and recompute the PLS model.

After this part of the tutorial, you should be able to :

 

External prediction

An important use of a QSAR model is the prediction of the activity of new compounds. In order to do this, the model should be saved in a library and then used to predict the activity of two well known 5-HT2A inhibitors, namely haloperidol and risperidone (pred). However, since the structures of these compounds is rather different from the series studied, the results can not be expected to be good.

The structures of the two compounds in the external data set were also generated by CORINA starting from 2D structures. Hydrogen atoms were automatically added, and conformation were oriented according to their moment of inertia.

The same messages as during series importing are printed on the screen. Once the computation is finished, the prediction palette is displayed.

Compare the predicted activities with the experimental ones (see activity table). The pKi of haloperidol (7.70) is relatively well predicted whereas the risperidone activity (9,70) is under predicted by about 1 pKi unit. Notice that risperidone is the most active compounds, consequently it is out of the range of the pKis of the 25 butyrophenone model, which makes its activity prediction harder. However, the predicted activity of risperidone is roughly the same as the highest predicted activity of the butyrophenone model (i.e. 23b activity).

The tutorial is now finished, After this last part you should be able to :

 

Bibliography

  1. Conformationally constrained butyrophenones with mixed dopaminergic (D(2)) and serotoninergic (5-HT(2A), 5-HT(2C)) affinities: synthesis, pharmacology, 3D-QSAR, and molecular modeling of (aminoalkyl)benzo- and -thienocycloalkanones as putative atypical antipsychotics. E. Raviña, J. Negreira, J. Cid, C.F. Masaguer, E. Rosa, M.E. Rivas, J.A. Fontenla, M.I. Loza, H. Tristan, M.I. Cadavid, F. Sanz, E. Lozoya, A. Carotti, A. Carrieri J. Med Chem. 1999, 42, 2774-97.
  2. The year's drug news. J.R. Prous Prous Science, Inc 1995.
  3. M. Pastor, G. Cruciani, I. McLay,S. Pickett and Sergio Clementi. Grid Independent Descriptors (GRIND). A Novel Class of Alignment-Independent Three-Dimensional Molecular Descriptors. J.Med.Chem., 43, 3233-3243 (2000).

Latest versions

Login

Username

Password

Register | Lost password?