|
TUTORIAL 1 |
ALMOND quick start |
Objectives
Understand the basic functionality of ALMOND: Generate GRIND and perform a simple chemometric model
Sections
| [MENU] | Means that you have to choose the menu option
identified by this label. Usually you have to click with the mouse or
type the label. The labels separated by the symbol >>> mean
that you have to "navigate" some submenus. For instance
A>>>B>>>C means
"choose menu A, then a submenu appears where you have to choose
option B, then a submenu appears where you have to choose option C. |
| [DIALOG] | Means that a dialog window is open for the user choice.
Select the option indicated. |
| [BUTTON] | Press the button with this label
|
This tutorial introduces GRIND descriptors in a series of compounds formed by two closely related aromatic xantines: caffeine and theophylline. The example illustrates how the descriptors are computed in a very simple way starting from their 3D structures.
In addition, in order to demonstrate that GRIND are fairly insensitive to the spatial orientation of the molecules, three differently oriented structures for each compound were included in the series. It will be shown that the descriptors obtained starting from different orientations are nearly identical and that a PCA performed on the matrix will show only two distinct clusters, one for caffeine and another for theophylline.
GRIND are calculated directly from the 3D structure of the molecules in a very simple way. First start the program typing in any UNIX shell:
almond
This command starts the program in interactive mode, using its graphical interface.
Descriptors are calculated automatically when a series of 3D structures is imported. In the import dialog, a number of method parameters should be defined. The meaning of each one of these parameters is thoroughly discussed in the manual.
- [MENU] File>>>Import series...
- [DIALOG] select Multi-mol2 for the Series type and choose run ALMOND for the Task to perform.
- In the field Input series files/s write the input file name: tutor1.mol2. In the field ALMOND new file (.alm) write the name of the ALMOND file to be created (e.g. tutor1.alm).
- Select the probes O, and N1 of the probe chooser. The O probe (carbonyl oxygen) represents hydrogen bond acceptor groups, and the N1 probe (amide nitrogen) represents hydrogen bond donor groups. Set the GRID spacing to 0.5 Å
- Unselect the Inter-atom only control.
- Press OK
- [DIALOG] Keep the default values of the filtering and the MACC2 parameters. Press OK
The computation starts, a new window indicating the process state is displayed while the following information is printed on the main text window:
- The parameters defined for the computation.
- A line that confirms the successful format conversion of the input file.
- A summary of The GRIN and the GRID process for every compounds.
- The number of GRIND correlograms and variables.
Now that the descriptors have been generated, we can perform a Principal Component Analysis (PCA) to analyze the differences between the structures imported.
- [MENU] Modeling>>>Generate PCA model...
- [DIALOG] Select 3 components, press OK
Results of the PCA analysis are written by ALMOND in the main text window. Almost all the X variance is explained by the first component, therefore only one component is needed to plot the scores.
- [MENU] Plot>>>2D plot>>>PCA scores...
- [DIALOG] select component 1 for both the X and Y axis, press OK
A new PCA score plot is displayed on the screen. Compounds are distributed in two small clusters, one on the bottom-left and corner and one on the top-right and corner.

Click on the points of each cluster with the left mouse button to read the molecule's name. As you can see, every points inside the same clusters have the same name, hence even if the orientation of a compound is strongly modified, the coordinates of this compounds in the Principal Component space are virtually the same.
One of the strengths of the GRIND is their interpretability, any variable can be associated to a distance between two MIF nodes via interactive plots. Interactive plots make the interpretation of PCA and PLS models very easy. We are going to see an example of their use.
- [MENU] Plot>>>profile>>>PCA loadings...
- [DIALOG] Select component 1 for the Y axis. Press OK
Click on the highest positive peak with the left mouse button, the name of the variable (12-5, i.e. variable 5 of correlogram 12) is printed on the plot. This peak represents the GRIND variables that describe better the differences between caffeine and theophylline. Now open the GRID plot which corresponds with correlogram 12.
- [MENU] Plot>>>Grid plot>>>Grid filtered..
- [DIALOG] select field 12, press OK
The plot appears in a new window. move the mouse pointer into the window and perform :
- [MENU] Data>>>Object list. Select one of the THEOPHYLLINE compounds.
Click again on the highest peak but this time with the middle mouse button. A little red cross should appear on the top of the peak. Hold the shift key pressed and click again with the middle mouse button on some highest peaks of correlogram 12 (e.g. 12-5, 12-26, 12-27..) to highlight other important variables. Every time a variable is activated a new distance is drawn on the Grid plot, this distance links the two nodes that correspond with the activated variable. You should obtain something like the following picture:
In the object list click on the other THEOPHYLLINEs, observe how the orientations change but how the distances remain conserved inside the same clusters of nodes. Then click on the CAFFEINEs, The distances disappear, which means that for CAFFEINE, there is no couple of nodes separated by the activated distances.
ALMOND can show the very values of the correlograms either for one compound or, superimposed, for all the compounds in the series. In this example, we will represent the 12 cross-correlogram, obtained from couples of point obtained with probe N1 and O.
This can be monitored by means of the MACC2 profile
- [MENU] Plot>>>Correlogram>>>ALMOND series...
- [DIALOG] deselect correlogram 11 and 22, press OK
Each point on the plot represents a couple of node of only one compound. The X axis is the same as for the PCA profile but the Y axis represents the energy product of the couple of nodes. Clicking a point with the left mouse button links all the points of the same object, displaying the name of the selected variable.

Click on some unlinked points and compare the name of the points of low energy product with the name of the points of high energy product. Energy products of CAFFEINE are much lower than energy products of THEOPHYLLINE, this is due to the type of interaction involved. The O probes makes a strong interaction with a THEOPHYLLINE nitrogen which is hydrogen bond donor, whereas in CAFFEINE, there is no hydrogen bond donor so the O probe interacts weakly with the aromatic ring.
Please notice that the values of the descriptors obtained for the different orientations of the compound are not 100% identical. Since they are situated in different positions within the 3D grid, the interaction energies measured are slightly different. This is a consequence of the fact that GRID is sampling a continuous MIF only at regular intervals (the grid spacing) and therefore the MIF exhibit differences. Provided that the grid spacing is short enough these inaccuracies should not produce any inconvenient in practice.