|
TUTORIAL 2 |
Analysis of butyrophenones with serotoninergic affinities |
Objectives
Obtain GRIND descriptors for a series of compounds taken from literature [1] and use them to build, validate and interpret 3D-QSAR models.
Sections
- Conventions
- Introduction
- Descriptor calculation
- PCA modeling
- PLS modeling
- Variable selection
- External prediction
- Bibliography
| [MENU] | Means that you have to choose the menu option
identified by this label. Usually you have to click with the mouse or
type the label. The labels separated by the symbol >>> mean
that you have to "navigate" some submenus. For instance
A>>>B>>>C means
"choose menu A, then a submenu appears where you have to choose
option B, then a submenu appears where you have to choose option C. |
| [DIALOG] | Means that a dialog window is open for the user choice.
Select the option indicated. |
| [BUTTON] | Press the button with this label |
The dataset contains a series of 25 "conformationally constrained" butyrophenones taken from literature [1]. These compounds possess binding affinity to the serotonergic 5-HT2A and 5-HT2B rat receptor. 5-HT2 antagonists have demonstrated activity for the treatment of anxiety and disthymic disorders [2] as well as for the reduction of extrapyramidal side effects produced by classic neuroleptic. Drugs with mixed dopamine D2 and 5-HT2 receptor-antagonist activities have been used for the treatment of schizophrenia (e.g. Risperidone, Janssen). This series was used also as an example of application of the GRIND in the original article [3] mentioned in the background section.
The name, structure and binding affinity of each compound are listed in table 1. In all the series, the conformation of the butyl chain is restricted due to the presence of a cycloalkanone ring fused to a benzene ring or an heterocycle (struct), every compound contains a central piperazine or piperidine ring (subst).
The structures of the 25 butyrophenones were initially built in 2D coordinates from the published structures and from them, suitable 3D structures were obtained using program CORINA. All compounds were considered to be protonated, with a formal charge +1 placed on one nitrogen of the piperazine or piperidine ring. Chiral descriptors were included in the 2D files in order to avoid divergences in the 3D structures. In this tutorial, the 3D structures of the compounds were generated using CORINA and they differ significantly from the conformations used in [3], therefore, the model obtained here is considerably different from the published one.
The structures of the compounds in the series can be found in directory tutor2 and their names are already included in the file list inhi25.lst.
IMPORTANT: This tutorial describes the method used by the authors of [1] to build and interpret an ALMOND model. However, due to the differences in the version of ALMOND used (3.0.1 vs 3.3), the aspect of the plots and some numerical values of the models might be different from those originaly published. Moreover, at the time the article was published, some of the latest ALMOND refinements (e.g. intra-group removal, and TIP probe were not used) were not developed. Therefore this tutorial intends to describe a practical application of ALMOND but should not be understood as the best possible approach to build ALMOND models.
Descriptors will be computed using the 3D structures of the 25 butyrophenones. Copy the contents of the directory tutor2 to a local directory where you have reading and writing permission and start ALMOND typing in a UNIX shell:
almond
This command starts the program in interactive mode, using its graphical interface.
GRIND are calculated automatically when a series of 3D structures is imported. In the import dialog, a number of method parameters should also be defined. The meaning of each one of these parameters is thoroughly discussed in the manual.
- Select command [MENU] File>>>Import series...
- [DIALOG] select SYBYL mol2 for the Series type and run ALMOND for the Task to perform.
- In the field Input series files/s write the file list name: inhi25.lst. In the field ALMOND new file (.alm) write the name of the ALMOND file to be created (e.g. inhi25.alm).
- Select the probes DRY, O, and N1 of the probe chooser. Probes are chosen to represent at best potential interactions with the receptor binding site. The DRY probe represents hydrophobic interactions, the O probe (carbonyl oxygen) represents hydrogen bond acceptor groups, and the N1 probe (amide nitrogen) represents hydrogen bond donor groups. Leave the grid spacing set to 0.5 Å.
- Choose to keep the KONT files. Usually these files are removed because of their huge size, but here we will need one of these files later in the tutorial
- Press OK
- [DIALOG] Press Define filtering Set the number of filtered nodes to 150 and the relative weight of the field to 35. Press OK
The computation starts, a new window indicating the process state is displayed while the following information is printed on the main text window:
- The parameters defined for the computation.
- A line that confirms the successful format conversion of the input file.
- A summary of The GRIN and the GRID process for every compound.
- The number of GRIND correlograms and variables.
After this first step, you should be able to:
- Import series of molecules.
- Define GRID and filtering parameters.
The descriptors generated using ALMOND are now in memory and can be used for different purposes. For example, we will use them now to represent the similarities of the compounds in this series by building a PCA model:
- [MENU] Modeling>>>Generate PCA model...
- [DIALOG] Select 5 components, press OK
The results of the PCA analysis are written by ALMOND in the main text window. A large part of the X variance is explained by the first component (XVarExp = 36%), additional components only contribute slightly to the variance explanation.
Before looking to the plots, we will start assigning colors to the objects according to the substituent present in the molecule (subst), in order to make easier its interpretation.
- [MENU] File>>>Object type>>>Modify (rule based)...
- [DIALOG] Insert the character # at the third position of the Rule definition to get the following pattern :

- [BUTTON] Apply
- [BUTTON] OK
Almost every compounds is assigned to the right cluster in this way, but for some compounds the assignment should be changed:
- [MENU] File>>>Object type>>>Modify (manual)...
- [DIALOG] Choose Object type white and select molecule IIIa and Ia. Choose Object type red and select molecule IIIe and Ie. Finally choose Object type green and select molecule IIIf and If. Press OK
We will start looking to the first two PC´s, since these components explain most of the variance.
- [MENU] Plot>>>2D plot>>>PCA scores...
- [DIALOG] select component 1 for the X and component 2 for the Y axis, press OK
A new window with the PCA score plot appears in the screen. According to the coloring scheme, compounds may be separated in 3 groups. An obvious cluster of "red" compounds on the positive side of PC1, a slight cluster of "white" compounds on the negative side and, below it, a third group of molecules with various substituents. Therefore, the first PC is mainly devoted to explain the differences between red compounds and all the others
Let's look at the compound's name to identify the clusters: place the mouse pointer within the plot window and hold pressed the right mouse button, a pop-up menu will show in which we can select the command set symbol>>>Name (N). Resize the window if the names appear too much overlaid to be read.
The red names correspond to the molecules with e substituent (subst) which is the only substituent that contains a propyl linker. As CORINA generates extended conformations, molecules with the e substituent are much longer than the others, which may explain the strong separation from all the other compounds in the plot. To confirm this hypothesis, we can use the MACC2 and the PCA profile of PC1.
- [MENU] Plot>>>Correlogram>>>ALMOND series...
- [DIALOG] Leave all the correlograms selected, press OK
- [MENU] Plot>>>Profile>>>PCA loadings...
- [DIALOG] Select component 1 for the Y axis and mark all the blocks. Press OK
Placing both plot windows as shown in the figure below allow to appreciate the correspondence between the actual values of the variables (in the plot at the top) and the corresponding loadings in the first PC (in the plot at the bottom). It can be seen that some of the variables with higher loadings are those representing interactions at long distance (the last of every correlogram block), characterizing compounds with longer linkers (in red in the top plot), like the compound 35e.


Indeed, by looking at regions in the MACC2 profile corresponding to the highest positive peaks in the loading profile, you can find again the same color pattern as in the PC1 score plot. Regarding the lowest negative peak, you can also retrieve the color pattern, but upside-down.
Most highest positive peaks represent long distance interactions (i.e. 11-46, 12-32 and 13-42). In order to see in 3D the interactions represented by this variable, click on the highest loading (12-32) with the middle mouse button to activate it. A little red cross should appear on the top of the peak. Then open the filtered Grid plot corresponding to the correlogram of this peak.
- [MENU] Plot>>>Grid plot>>>Grid filtered...
- [DIALOG] select field 12, press OK
The plot appears in a new window. Move the mouse pointer into the window and perform :
- [MENU] Data>>>Object list
Look at molecules with the e substituent by playing with the object list. In all the e series, the activated couple of nodes represents the distance between a O probe (red nodes) interacting with the charged nitrogen and a DRY probe (ivory nodes) interacting with the aromatic ring of the e substituent. In compounds with other substituents, a couple of nodes may also be activated, however, the nodes interact with other zones of the molecule, and their energy is less intense. For example, the O probe in molecule 10a interacts with the aromatic ring of the butyrophenone with a low binding energy compared to the charged amine. Let´s check the MIF values for probe O:
- [MENU] Plot>>>Grid plot>>>Grid raw...
- [DIALOG] select field 2, press OK
Go back to the Grid filtered window, select compound 10a in the object list and then press Ctrl-C with the mouse pointer inside the window to copy the view, then move the mouse pointer into the Grid raw window and press Ctrl-P to paste the view. Now both window should show the same compound in the same orientation.
In the plot's window :
- [MENU] Data>>>Data Levels
- [DIALOG] deselect positive value,
Move the slide control of the negative values and observe that the interaction energy with the charged nitrogen field is more than three times higher than the interaction field caused by the aromatic ring. If you wish, look at other molecules on the Grid filtered window, and if you have any doubt about the energy of interaction of a node, you can check it as you did for the former filtered field.
By now we have seen how to :
- Color molecules manually or automatically.
- Interpret a score plot.
- Compare a MACC2 and a loading profile.
- Use plots interactively to analyze a peak in a profile.
- Compare a raw and a filtered Grid field.
Close all the plot windows opened to clean up your screen.
Load activity values using command [MENU] File>>>Import Activity..., and then enter the name of the activity list file activ25.txt. Since this file contains the name of the objects, select "skip first field". Press the OK button. The values of activity already loaded in ALMOND can be printed in the main window using the command [MENU] List>>>Activity
Create and validate the PLS model as following :
- [MENU] Modeling>>>Generate PLS model...
- [DIALOG] Select 5 components and press OK
- [MENU] Modeling>>>Validate PLS model...
- [DIALOG] Select Random Groups as validation mode, press OK
PLS models should be interpreted graphically, therefore start producing a plot of the r2 and q2 obtained for the different model dimensionality.
- [MENU] Plot>>>2D plot>>>SDEC & SDEP
Compare the evolution of r2 and q2 according to the number of LVs included in the model. The r2 value always rises whereas q2 rises more slowly for LV4 and LV5. Hence, introduction of latent variables 4 and 5 probably contribute little to the model. Also, contribution of LV3 to the model is very weak and may be irrelevant, we will discard it to simplify the model. Notice that the q2 value of LV1 is particularly low, and how the introduction of LV2 into the model remarkably increases the predictive ability of the model.
As pKi have been imported previously, we can color the compounds according to their activity value to help interpreting the plots:
- [MENU] File>>>Object type>>>Define spectrum...
- [DIALOG] select last variable (labeled as "Y1") and press OK
- [DIALOG] Leave the spectrum range and colors by default, press OK
Open PCA and PLS score plots for the two firsts components.
- [MENU] Plot>>>2D plot>>>PCA scores...
- [DIALOG] press OK
- [MENU] Plot>>>2D plot>>>PLS scores...
- [DIALOG] press OK
In both plots, change the color of the compounds according to their activities: move the mouse pointer into the plot window, hold pressed the right mouse button and select symbol color>>>spectrum. Hold pressed the right button one more time and select set symbol>>>name.
Start looking to the PCA score plot, you can notice a partial clustering according to compound's activity. Even if pKis values are not used during PCA modeling active compounds and inactive compounds tend to cluster together. This is a clear indication that the descriptors are working well on recognizing molecular features relevant for describing the biological activity of the compounds.
Now observe the PLS score plot. This time, pKi values have been included into the model and the projection attempted to discriminate between compound with different activities. In this plot, 4 groups can be recognized:
- A first group of e compounds of middle activity on the up left and corner of the plot.
- A second group of low activity f compounds on the bottom left and corner of the plot.
- A third group of highly active compounds with a or b substituents on the right side of LV2.
- A last group of molecules with various activity situated between the clusters of very and poorly active compounds.
We will start focussing our attention on the first LV.
- [MENU] Plot>>>2D plot>>>Pred. vs Exper.
- [DIALOG] choose only one LV for the model dimension, press OK
- [MENU] Plot>>>2D plot>>>Pred. vs Exper.
- [DIALOG] choose two LVs for the model dimension, press OK
The main difference between the two plots is the predicted activity of the three least active compounds (i.e.. 35f, 10f and If) which decrease of one pKi unit from the model with one component to the model with two components. Therefore, the particularly low q2 of the model with only one LV is probably due to an inefficient prediction of the activity of these compounds.
To get a better understanding of the model, the variables contributing the most should be identified. This can be done using plots of PLS weights, which express the contribution of the original variables to the LV´s. Close the Pred. vs Exper. plots and open the profile plot for the LV1 weights.
- [MENU] Plot>>>profile>>>PLS weight
- [DIALOG] Select component 1 for the Y axis, press OK
The profile is particularly complex, we will focus only on the analysis of the highest peak located in the O-N1 correlogram.
Move the mouse pointer into the PLS profile, click on the bar 23-6 with the middle mouse button, then hold pressed the shift key of the keyboard and click on the bar 23-7 with the mouse middle button. You should see 2 little red crosses meaning that the distances 23-6 and 23-7 have been activated.
- [MENU] Plot>>>Correlogram>>>ALMOND series...
- [DIALOG] unselect all the correlograms apart from correlogram 23, press OK. Move the mouse pointer into the MACC2 profile window, hold pressed the right mouse button and select symbol color>>>spectrum
- [MENU] Plot>>>Grid plot>>>Grid filtered...
- [DIALOG] select field 23, press OK
Observe the MIF of every compounds containing a or b substituent. For most of the molecules (e.g. 23a), the two couple of points activated represent the distance between a O probe (red nodes) interacting with the charged amine, and a N1 probe (blue nodes) interacting with an hydrogen-bond acceptor situated in the substituent moiety. However, for some compounds (e.g. 35b), N1 probes (blue nodes) interacting with the butyrophenone ketone are also localized near the charged amine, and so are involved in the activated distances, instead of the N1 probes interacting with the substituent moiety. This phenomenon is due to the MACC2 transform that only allows the back tracing of the maximum energy product for a given distance range.
Now look at the MACC2 profile, and observe the correlation between the energy product of distances 6 and 7 and the compound's activity: most compounds with a high value for these variables have red color (high binding affinity) but for the two main outliers, namely molecules 35f and If, which have blue color and correspond exactly with the two molecules with the worst calculated activity. See 2D plot Recalc. residuals for component 1. Besides, If you go back to the filtered plot and select one of these molecules, you will see N1 probes (blue nodes) interacting with the butyrophenone ketone near the charged amine, as for compound 35b (see above). This little example points out one important limitation of GRIND: the same variable can be produced by different interactions in different compounds. So far, ALMOND doesn´t include any method to prevent the presence of this kind of inconsistencies and these should be checked during the model interpretation phase.
If you wish, you can use the same method as above for the analysis of the other important peaks. For example, in correlograms DRY and DRY-O, notice that compounds with a, b and e substituent have all high energy products for distances corresponding to the highest positive peaks, but that only e compounds receive an important contribution of the large negative peaks since there are the only ones with very long distance interactions.
When you have finished, close all the plot windows.
After this part of the tutorial, you should be able to :
- Color molecules according to their activity.
- Generate and Validate a PLS model.
- Compare predicted and experimental values.
- Analyze a PLS model by using simultaneously a MACC2 profile, a PLS profile, and a Grid filtered plot.
It is obvious that not all node-node interactions are relevant for the description of the activity of the compounds. Irrelevant variables only contribute to the addition of a noise in the description of the molecule and make the model interpretation more complicated. Their removal may be performed manually, according to the user interpretation of the model, or automatically using the FFD procedure. We will use both technics for this tutorial.
Manual exclusion
Manual exclusion can be performed either on complete correlograms or only on a subgroup of distances. Manual exclusion is highly recommended for the validation of user's hypotheses about the influence of a group of variables on the behavior of the model.
- [MENU] Plot>>>Profile>>>PLS coefficients...
- [DIALOG] Select the correlogram 22 and two components for the Y axis. Press OK
- In the profile window, activate distance 22-22 and 22-23 with the middle mouse button
Open the Grid plot for the filtered O MIF:
- [MENU] Plot>>>Grid plot>>>Grid filtered...
- [DIALOG] select field 2, press OK
The plot appears in a new window. Change the setting as before :
- [MENU] Data>>>Object list
If you observe a few molecules you will see that the activated node-node interactions always involve at least one O probe interacting with an aromatic area. The relevance of these nodes are doubtful: their presence in the MIF is strongly dependant on the number of nodes selected during the filtering, besides the DRY probe is much more appropriated for this type of interaction. Consequently the O-O block looks like to a reduced copy of the DRY-O block, which contains peaks of clear relevance. Exclusion of the O-O block will not affect the model prediction and will simplify the model.
- [MENU] Pretreatment>>>Exclude vars.>>>Exclude blocks...
- [DIALOG] select block 22, press OK
As a result of variable exclusion, a new PLS model need to be generated and validated.
- [MENU] Modeling>>>Generate PLS model...
- [DIALOG] Select 5 components and press OK
- [MENU] Modeling>>>Validate PLS model...
- [DIALOG] Select Random Groups as validation mode, press OK
q2 value increase slightly compared to the initial model, which means that the model simplification has not disturbed the model.
All the other remaining variables cannot be excluded at the block level, we will use the FFD procedure for the next selection step.
Automatic exclusion
- [MENU] Pretreatment>>>F. Factorial Selection...
- [DIALOG] FFD parameters must be defined as below (accept the defaults
for the rest of the options):
- Max dimensionality : 2
- Validation mode : Random Groups
- Retain uncertain variables: activated
IMPORTANT, select Execution in a Window. Press the OK button. The FFD variables selection can take a some time to compute. After a few seconds the window will show an estimated time of completion.
Once the selection is completed, proceed to exclude the unselected variables:
- [MENU] Pretreatment>>>Exclude vars>>>Exclude FFD...
- [DIALOG] select elimination 1 : 245 -> 191 Comp.=2, press OK
- [MENU] Modelling>>>Generate PLS model...
- [DIALOG] Select 5 components and press OK
- [MENU] Modelling>>>Validate PLS model...
- [DIALOG] Select Random Groups as validation mode, press OK
First notice that the predictive ability of the model is now optimum with 3 components and increased considerably with respect to the first model. If you want to make a detailed comparison using plots and profiles, you can retrieve the original model by clicking on [MENU] Pretreatment>>>Reload original, and recompute the PLS model.
After this part of the tutorial, you should be able to :
- Perform a manual block removal
- Perform an automatic FFD variable selection
An important use of a QSAR model is the prediction of the activity of new compounds. In order to do this, the model should be saved in a library and then used to predict the activity of two well known 5-HT2A inhibitors, namely haloperidol and risperidone (pred). However, since the structures of these compounds is rather different from the series studied, the results can not be expected to be good.
- [MENU] Preferences>>>Directories...
- [DIALOG] In the Local model library path field, enter the name of an existing directory in which you have writing permission. From now onwards, ALMOND local models will be stored here.
- [MENU] File>>>Model library...
- [BUTTON] Add
- [DIALOG] enter a name for your model, press OK
- [BUTTON] Exit
The structures of the two compounds in the external data set were also generated by CORINA starting from 2D structures. Hydrogen atoms were automatically added, and conformation were oriented according to their moment of inertia.
- [MENU] File>>>Direct projection...
- [DIALOG] In Series type, select SYBYL mol2.
- in the field Input series file/s type :pred.lst. This file contains a list with the name of the 2 molecules of interest.
- Select the library created previously, press OK
The same messages as during series importing are printed on the screen. Once the computation is finished, the prediction palette is displayed.
- [DIALOG] in the PLS domain of the prediction palette, press prediction.
Compare the predicted activities with the experimental ones (see activity table). The pKi of haloperidol (7.70) is relatively well predicted whereas the risperidone activity (9,70) is under predicted by about 1 pKi unit. Notice that risperidone is the most active compounds, consequently it is out of the range of the pKis of the 25 butyrophenone model, which makes its activity prediction harder. However, the predicted activity of risperidone is roughly the same as the highest predicted activity of the butyrophenone model (i.e. 23b activity).
The tutorial is now finished, After this last part you should be able to :
- Save a model in the library.
- Predict activity of compounds.
- Conformationally constrained butyrophenones with mixed dopaminergic (D(2)) and serotoninergic (5-HT(2A), 5-HT(2C)) affinities: synthesis, pharmacology, 3D-QSAR, and molecular modeling of (aminoalkyl)benzo- and -thienocycloalkanones as putative atypical antipsychotics. E. Raviña, J. Negreira, J. Cid, C.F. Masaguer, E. Rosa, M.E. Rivas, J.A. Fontenla, M.I. Loza, H. Tristan, M.I. Cadavid, F. Sanz, E. Lozoya, A. Carotti, A. Carrieri J. Med Chem. 1999, 42, 2774-97.
- The year's drug news. J.R. Prous Prous Science, Inc 1995.
- M. Pastor, G. Cruciani, I. McLay,S. Pickett and Sergio Clementi. Grid Independent Descriptors (GRIND). A Novel Class of Alignment-Independent Three-Dimensional Molecular Descriptors. J.Med.Chem., 43, 3233-3243 (2000).