Chapter 36. Studying a set of targets with Grid

A set of targets means several compounds which are going to be studied one after the other. Several different probes can be used on each target. Thus with N targets and M probes one would generate NxM separate Grid maps (GRIDKONT Files). It is convenient to outline the procedure in reverse order; ie: we will begin by considering programme Grid and then consider Grin and Great.

36.1. Preparing a "file.list" for programme Grid

The normal input to programme Grid is a GRINKOUT file of atom coordinates and energy variables, previously prepared by programme Grin. However, an alternative procedure is also available. Instead of reading the actual GRINKOUT file itself into Grid through channel INPT, one can read a pointer to the GRINKOUT file. Programme Grid behaves differently when this is done, and its output can be printed in eye-readable ASCII characters or in computer-readable binary. The ASCII output can be sent over a network to another computer, if that is necessary for a statistical analysis of the Grid results. In fact channel INPT can accept several pointers each on a different line, each pointing to a different GRINKOUT file. The input file for Grid might therefore look like this:

  phenol.kout
  phenolate.kout
  pyridine.kout

This list of file names could be input to Grid through channel INPT, and would cause Grid to process the three files one after the other. The suffix .kout is assumed by programme Grid, and the Grid input could therefore be shortened to:

  phenol  
  phenolate  
  pyridine

This shorter input style without suffixes is recommended. Moreover a MESSAGE of upto 11 characters can be associated with each filename. This MESSAGE should appear on the same line as the name, and be separated from it by at least one space. Since free format is used it could appear like this:

  phenol     RED
     phenolate             WHITE
pyridine BLUE

in which each MESSAGE is a colour, but of course the input file would be much better arranged something like this:

  phenol     RED  
  phenolate  WHITE 
  pyridine   BLUE

The "TAB" character should not be used in order to format this list of names which is called a file.list.

Note that there must be at least one space between the compound name and the MESSAGE, but spaces are NOT permitted in the MESSAGE itself. In practice, of course, the MESSAGE is normally a number such as the biological activity of the compound.

36.2. Using the "file.list" for programme Grin

In the above example, it would have been necessary to prepare the three ".kout" files with Programme Grin before running them as a set through Grid. The same type of input procedure can therefore be used with programme Grin, but it would be necessary to start with the three PDB (Brookhaven) files: phenol.pdb phenolate.pdb and pyridine.pdb as the Grin input.

The input to Grin would therefore consist of datafile grub.dat as usual, and the other input to Grin would be precisely the same file as shown above:

  phenol     RED  
  phenolate  WHITE  
  pyridine   BLUE

In this case however for programme Grin, the file would be input through channel INKO and the implied suffix would be .pdb. Thus the input files for Grin would be phenol.pdb phenolate.pdb and pyridine.pdb and the one list of filenames (without suffixes, but with messages if need be) would define the targets to be used by both programmes Grin and Grid.

Warning: molecule names in pdb files

Three columns of characters in a pdb file are reserved for the name of the molecule (eg: GLY for glycine). We recommend that each molecule in a set of targets should have a different three letter name. For example the molecule names in files phenol.pdb phenolate.pdb and pyridine.pdb are PHE and PHA and PYR. Note that phenolate has the molecule name PHA and not PHE which had already been used for phenol.

36.3. Preparing a set of targets with programme Great

The ATOMS and HETATMS in a PDB file must have correct atom names, before the file can be used as input for programmes Grin and Grid. In fact the names must match the atom names in datafile GRUB, and this match can be assured by preprocessing the files through Programme GREAT. The procedure is described in Section 4.3.

It is important to note that GRID provides two new tools that allows to assign automatically atom types in a fast way and with a very high degree of accuracy without using the semiautomatic mode with Great: gmol2 and gsdf. Gmol2 and gsdf convert molecular structures of small compounds from standard formats (Tripos mol2 and MDL sdFiles) into a PDB file in which the atom name contains appropriate atom types for GRID. The ouput file is suitable for being used directly as input for program Grin. See Section 44.8 for more information.

Great can also be controlled by a "file list" when it preprocesses the files, but its file list must have the special name great.list. You can normally prepare the file.list for programmes Grin and Grid by editing great.list, or using the utility programme g2f (See Section 44.10). There is no limitation to the number of targets in great.list or in a file.list for Grin.

In outline then the three programmes Great, Grin and Grid can be used one after the other in order to process several targets as a set. The detailed procedure will now be described.

36.4. General procedure for inputting a set

The procedure for studying several target molecules as a set is therefore as follows:

  1. Reorganise your coordinate input files so that they are all in the same directory, and use programme Great to pre-process the files as described in Section 4.3.

  2. You may need to adjust the target coordinates so that all the molecules are superimposed on top of each other, at the same place in a suitable coordinate framework.

  3. Prepare a file.list containing the names of the compounds which are to be studied and their associated "MESSAGES". You will often be able to prepare this file.list by editing a great.list file. We have supplied the three PDB input files called: phenol.pdb phenolate.pdb and pyridine.pdb on the tutorials directory, as input for your first trial run with a Set of Targets. Please make sure that you use these files exactly as they were supplied on the tutorials. We have also supplied a file called FILE.LIST which looks like this:

      phenol      1.5
      phenolate  -1.5  
      pyridine    0.5
    and the 'MESSAGES' for these compounds are the numbers 1.5 -1.5 and 0.5 (These are rough values for the octanol-water partition coefficients of the compounds). Note that the file extension (.pdb or .kout) is omitted from the FILE.LIST since it is to be used as the main input file for both Programmes GRIN and GRID.

  4. Edit the command file grin.in for Programme GRIN. FILE.LIST should be the input file assigned to channel INKO. You use FILE.LIST instead of the PDB input file which you would normally use for Programme GRIN, and you use datafile GRUB.DAT for the input to Programme GRIN as usual.

  5. Start GRIN by typing grin<grin.in under UNIX. GRIN will generate three .KOUT files of output called: phenol.kout phenolate.kout and pyridine.kout. All error messages or warnings will go to a single lineprinter output file GRINLOUT.DAT

  6. Edit the command file grid.in for Programme GRID. Programme GRID also uses the same file FILE.LIST as its main input, instead of the GRINKOUT file normally used. In the present case FILE.LIST will tell Programme GRID to use phenol.kout phenolate.kout and pyridine.kout as three successive input files. FILE.LIST is the input file assigned by 'grid.in' under UNIX. Figure 36-1 below shows the command file used for this demonstration run.

  7. Type grid<grid.in in order to run programme Grid. You should get one output file GRIDKONT containing coordinates and energy values, and another lineprinter output file GRIDLONT. Both these files will normally be written in ASCII (See below).

  8. Note that each Target can be studied with several different Probes when you use a FILE.LIST as input. You would just enter the Probe symbols into the command file for GRID on separate lines as usual like this:

     C3 
     OH2 
     N3+ 
     O::
    Thus with three Targets (phenol phenolate and pyridine) and four Probes, you would generate twelve separate Grid maps. The coordinates of the grid points and the energy values for all of the maps would be in the one GRIDKONT output file.

  9. When Directive MOVE>1 some atoms of the Target would be allowed to move under the influence of the Probe. However each molecule would be influenced differently by each different Probe, and would assume a different conformation. When MOVE>1 it might therefore be difficult or impossible to interpret results from several Targets, each studied with several Probes, and the values MOVE>1 ARE NOT ALLOWED when you are studying a set of compounds with a FILE.LIST.

  10. NPLA: Note that the list of directives in 'grid.in' can include fractional values of NPLA when you are studying several Targets as a Set. For instance NPLA 0.5 would give grid points 2 Angstrom apart, or NPLA 0.6667 would give a spacing of 1.5 Angstrom.

  11. LIST: Note that Directive LIST should normally take the value 1 when you are studying a Set of Targets. However, it may also be assigned the value -1 in order to reduce the size of the output. If you use LIST=2 instead of LIST=1 the output will be in computer-readable binary. Similarly, it will be in binary if you use -2 instead of -1. See above under the description of Directive LIST

  12. POSI: Note that directive POSI can be used repeatedly in order to define a list of explicitly selected grid points for study. The maximum acceptable number of POSI positions is 100000. You may want to write a jiffy program which will generate the coordinates of suitable grid points as a set of POSI positions for your particular research problem. For instance, you could have the grid points distributed at spherically or cylindrically defined positions, or on the surface of a molecule, instead of an orthogonal grid.

  13. DWAT: The molecules in a Set of Targets are often studied in order to predict their interactions with an enzyme or other receptor site. Such a site may be relatively well shielded from the aqueous water environment, and the presence of a ligand molecule bound in the site may further tend to exclude water. In these circumstances it may be appropriate to reduce the value of DWAT, which determines the effective dielectric constant of the surroundings. In the present demonstration run, DWAT was set to 20.0 as shown in Figure 36-1.

  14. LENG and NUMB: We recommend that you set these directives to small values (or zero) if you want to have a reasonably small GRIDLONT lineprinter output file.

  15. KWIK: Note that directive KWIK may not take the values 1 or 2 when the input to GRID comes from an input file list like FILE.LIST.

  16. KWIK: If you set KWIK=3 for a Set of Compounds, then the size of the grid will be automatically adjusted. It will be increased in order to take account of the fact that some of the compounds and some of the Probes may be charged, and may therefore have a far-reaching electrostatic field. In the present example phenolate is charged, and the size of the grid will be increased accordingly.

  17. KWIK: If you set KWIK=4 then the output to GRIDKONT will be like that with KWIK=3 but with one of the following additional effects:

    • If directive VALU has its default value of 0.0 then the output to GRIDKONT will be shortened by the elimination of 'wasted' grid points. By a 'wasted' grid point is meant one that is distant from all the Target molecules, in a region of space where the energy is mostly electrostatic, and where the energy is small and only varies gradually from point to point. You will obtain a shorter output to GRIDKONT which may therefore be easier to study, but of course you will no longer get results for all the grid points of a complete rectangular grid.

    • Alternatively you may set directive VALU to a non-zero number. In this case the GRIDKONT file will be of the normal size that it would have been with KWIK 3, but all the 'wasted' grid points will be given the non-zero value of VALU. For instance, if VALU = -99.99 then every 'wasted' point will be -99.99 You can then write a jiffy program in order to compute explicit numerical values which should replace each missing value. The explicit values would be computed by your jiffy programme, so that subsequent statistical calculations will not be biased by the loss of the wasted points.

  18. KWIK: When KWIK is 3 or 4 then the size of the grid depends on the charges of the Targets; on the charges of the Probes; and on the dielectric environment. Two different situations can then occur:

    • If all the Targets were studied one after the other, using all the Probes one after the other in a single grid run, then programme Grid would make sure that all the grids were identical (Same size; same place; same spacing). This is the recommended method.

    • On the other hand if the run were broken down so that for example:

      • The Targets were studied as three separate subsets in three different Grid runs, or

      • The Targets were studied all together using the first Probe in the first Grid run, and then studied again using the second Probe in a second run and so on, then the Grid sizes in each run would most probably be different. For example, the Grid would be larger for a highly charged Probe and smaller if the Probe were neutral. This is NOT recommended.

  19. VALU: Note that rounding errors may occur under some circumstances, if you define the "missing value" VALU as a very small number like 0.00001. In particular, a "rounded missing value" of 0.0 will be printed to the GRIDKONT output file, if that file is written in eye-readable ASCII. That "rounded missing value" of 0.0 may or may not be what you wanted. We therefore recommend -99.99 as the most suitable "missing value", since this will not be rounded down and is not a physically possible energy value in Kcal/mole. On the other hand the GRIDKONT file will be written in binary, if you have set directive LIST to 2 or -2. In this case a missing value such as 0.00001 can be written to the binary file without significant rounding errors. However we still recommend VALU = -99.99 because this is not physically possible for a Grid Probe, whereas 0.00001 could be a genuine result from programme Grid.

  20. When studying a Set of compounds one after the other, it is essential to have them all in the same part of the coordinate space. The grid may be very big if some of the compounds are in different places, since it will surround all the compounds in the whole Set.

    From the version 22 of the Programmes new procedures to speed up the calculation have been implemented, and new values of KWIK have been introduced: when setting KWIK to 5, 6, or 7 each Target is individually considered for a fast zeroing of distant grid points (KWIK=5).In addition, also grid points inside the target are omitted from the calculation (KWIK=6). Lastly, the eight corners of the cube are held out from the calculation, because the grid cage is modified to an ellipsoid (KWIK=7).

Warning: directories in file.list

Care is needed if you are working with a FILE.LIST in one directory, but the Files of the FILE.LIST are in another directory. In that case we suggest that you specify the directory name of each file IN THE FILE.LIST something like this:

  /heythere/phenol      1.5
  /heythere/phenolate  -1.5
  /heythere/pyridine    0.5
:               MOLECULAR DISCOVERY LIMITED
:               ***************************
:
:             Command File for Programme GRID
:
:
:  Assign Channel Numbers and Output File names:
:  ---------------------------------------------
:
 LONT      6
 LONT    gridlont.dat
 KONT     20
 KONT    gridkont.dat
:
:  Assign Channel Numbers and Input File names:
:  --------------------------------------------
:
 INPT     10
 INPT    file.list
:
:  Provide Control Parameters:
:  ---------------------------
:
 CLER      5.000
 DEEP      5.000
 DPRO      4.000
 DWAT     20.000
 EACH      5.000
 EMAX      5.000
 FARH      5.000
 FARR      8.000
 KWIK      4
 LEAU      0
 LENG     10
 LEVL      1
 LIST      1
 MOVE      0
 NETA      0
 NPLA      0.200
 NUMB      1
 VALU      0.000
 C3
 OH2
 N3+
 O::
 IEND
 First trial with file.list
       0    1

Figure 36-1. Demonstration unix-type command file grid.in for studying a set of compounds.

36.4.1. Format of ascii output files from a set

With the various arrangements described above, several different layouts of the GRIDKONT output file are possible. We have therefore arranged programme Grid so that GRIDKONT can be printed in ASCII when you are studying a Set of compounds. It will be printed in ASCII if:

  • Several Targets are being studied as a Set, and if

  • Directive LIST equals 1 or -1

The ASCII output can be transmitted across a network from one computer to another, and its layout can be examined in detail by the User. He or she is then recommended to process the GRIDKONT output for statistical analysis as described below (see under CoMFA or GOLPE or SIMCA). Alternatively one can write a jiffy program in order to rearrange and/or select the output as required for the current research project. The jiffy program could also replace our recommended "missing value" of -99.99 with another VALU. When you are satisfied that everything is working correctly with the ASCII output, we suggest that you change to binary output instead. The binary output for a Set of Compounds is exactly equivalent to the ASCII format, and binary files may take up less storage space on disc. Binary output is obtained by setting Directive LIST equal to 2 or -2.

Note on directive LIST: Directive LIST must not have the value zero when several Target molecules are being studied as a SET one after the other. LIST should have the value 1 in order to obtain the above ASCII layout of the GRIDKONT file. If LIST = -1 then the list of grid points and coordinates will not be repeated in the GRIDKONT ASCII file for each subsequent Probe after the first. A shorter output will therefore be obtained. If LIST is 2 or -2 the output will be in computer-readable binary. See below, and see above under the instructions for Directive LIST.

36.4.1.1. Binary output files from a set

The GRIDKONT file is normally written in ASCII when many Targets are being studied as a Set, so that its layout can be inspected by the User. However, setting Directive LIST = 2 (instead of 1) will give a similar layout to LIST = 1 but in computer readable binary. The binary output file:

  1. may be shorter than ASCII;

  2. may be written faster;

  3. may require less disc space;

  4. can have very small "missing values";

  5. can be read more quickly by the computer which wrote it than an ASCII file.

  6. However binary cannot be read by eye, and

  7. the binary file may not be compatible with other computers or networks.

Setting Directive LIST = -2 will give a binary output equivalent to the ASCII output from LIST = -1.

Figure 36-2. Flow chart for dealing with a set of compounds

Note: The flow-chart shows how the single big GRIDKONT output file containing all the results from all the Targets and all the Probes, can be split up by Programme GCNT into a set of .CNT files suitable as input for CoMFA. Several other Programmes (eg: GCHEM or GINS) are provided to do a similar job, dividing the big GRIDKONT file into a set of files suitable for Chem-X or INSIGHT, etc. Another Programme (GKONT) divides the Big file into a set of regular GRIDKONT files, each containing one regular Grid map for one Probe on one Target.

Latest versions

Login

Username

Password

Register | Lost password?