| GRID manual | ||
|---|---|---|
| <<< Previous | Next >>> | |
Chapter 37. Output from GRID for CoMFA
CoMFA is a statistical package introduced by TRIPOS Associates Inc which can be used in order to analyse GRID maps. Full information about their products may be obtained directly from the supplier:
TRIPOS Associates Inc. 1699 S. Hanley Rd. St. Louis, Missouri 63144, USA.
Telephone: (314) 647-1099
Web Site: http://www.tripos.com
Users must have been supplied with Programme GRID by Molecular Discovery Limited, and must have valid authorities to use GRID and SYBYL and CoMFA. Unauthorized users are liable to prosecution.
CoMFA can be used to analyse the output from GRID when a Set of Targets has been studied with one or more Probes. In order to do this, it is necessary to:
postprocess the GRIDKONT output
prepare a "Molecular Database" for CoMFA. This database should contain precisely the same set of compounds as you want to study with GRID.
run a macro called G2C.
The following instructions show how to do this. In this description of the procedure it is assumed that all the files are in the same Working directory of a Silicon Graphics computer, and that version 6.1 of SYBYL/CoMFA is installed on the same machine. The following Directive values were used for this example GRID run:
DWAT 20.0 KWIK 4 LIST 1 NPLA 0.2 VALU 0.0
and the Probes were: C3 OH2 N3+ and O::
All other Directives had their default values as shown in Figure 36-1 above. FILE.LIST and the three Targets: phenol.pdb phenolate.pdb and pyridine.pdb were used exactly as supplied on the Molecular Discovery material.
PLEASE NOTE that this description is only intended to demonstrate how the procedure works. Dummy data is used. The manuals distributed by Tripos Associates Inc. describe the theory, and explain how to set up a real research problem.
PERMISSIONS
EVERY USER OF THE SYBYL/CoMFA SOFTWARE MUST HAVE OBTAINED DIRECTLY FROM THE SUPPLIER A CURRENT VALID AUTHORITY TO USE IT. Your supplier should also be consulted for the most up-to-date information.
37.1. Studying the GRIDKONT file
Run the demonstration job using the command file in Figure 36-1 and the procedure of Figure 36-2. We suggest that you then study the GRIDKONT output file. It is written in ASCII because the value of Directive LIST was 1 in the GRID command file. It is quite small, because Directive NPLA was 0.2 so that the grid points are 5 Angstrom apart! Of course this distant spacing would be quite inappropriate for a real research problem.
The file starts with a list of the coordinate positions of the grid points. Study this list carefully. Some expected grid points may be omitted, because the value of Directive KWIK was 4. These would be the so-called "wasted grid points" at places where the computed Grid energies do not differentiate between the compounds. Your Grid will not be complete if your output has "wasted grid points".
The list of coordinates is followed by a one-line Header and a list of energy values for the interaction of the first Probe with the first compound. The number of energies will be equal to the actual number of grid points. Further lists will follow one after the other as described above.
37.2. Post-processing the GRIDKONT file
When several Targets have been studied as a Set using several Probes as described above, the GRIDKONT output file must usually be post-processed. Several Programmes are provided for this job, and they all work like Programme GCNT which is described in detail below. The individual Programmes are:
PROGRAMME GCHEM. When several Targets have been studied together as a Set, using a FILE.LIST. this Programme converts the output from GRID into a Set of .CHE files which can be used as input to the Chem-X graphics software of Oxford Molecular.
Note: You must set Directive LEVL>0 if you are doing a Grid run, and want to postprocess the output through GCHEM.
PROGRAMME GCNT. When several Targets have been studied together as a Set, using a FILE.LIST, this Programme converts the output from GRID into a Set of .CNT files which can be used as input to the CoMFA package of TRIPOS Associates Inc.
PROGRAMME GINS. When several Targets have been studied together as a Set, using a FILE.LIST, this Programme converts the output from GRID into a Set of .GRD files which can be used as input to the INSIGHT graphics software of MSI.
Note: You must set Directive LEVL>0 if you are doing a Grid run, and want to postprocess the output through GINS.
PROGRAMME GKONT. When several Targets have been studied together as a Set, using a FILE.LIST, this Programme converts the output from GRID into a Set of standard GRIDKONT files each having one Grid Map for one Probe on one Target molecule.
Note: NOTE You must set Directive LEVL>0 if you are doing a Grid run, and want to postprocess the output through GKONT.
PROGRAMME GSIM. When several Targets have been studied together as a Set, using a FILE.LIST, this Programme converts the output from GRID into a format which can be used as input to the SIMCA package of UMETRI.
37.3. Post-processing GRIDKONT for CoMFA
When several Targets have been studied as a Set using several Probes as described above, the GRIDKONT output file must be postprocessed before it can be used as input for CoMFA. This is done with a Programme called GCNT which which runs directly from the keyboard. The executable is called gcnt, and this should be typed on a line by itself in response to the Unix Prompt %:
% gcnt |
One is asked to:
Type name of your GRIDKONT input file: |
and one answers with the name of the GRIDKONT output file generated by GRID for a Set of Targets with several Probes as described above. The next question is:
Is this an ASCII file which you can read by eye? |
and the answer is YES unless the binary output option was used (as described above) in which case the answer must be NO.
A set of standard SYBYL/CoMFA files is then automatically produced in the current working directory by Programme GCNT. There is one file for each Target/Probe combination, and these will be the input files for CoMFA. The name of each file is a combination of the Target name and the Probe number followed by the standard SYBYL/CoMFA extension .cnt Thus if the Targets phenol and phenolate and pyridine had each been processed by GRID with the four Probes C3 OH2 N3+ and O:: the files for CoMFA would be called:
phenol1.cnt phenol2.cnt phenol3.cnt phenol4.cnt phenolate1.cnt phenolate2.cnt phenolate3.cnt phenolate4.cnt pyridine1.cnt pyridine2.cnt pyridine3.cnt pyridine4.cnt |
These files are ready for input to CoMFA using the macro G2C.
NOTE ON FILE NAMES
You would prepare a similar batch of files if you used one of the other Programmes (GCHEM, GINS or GKONT) in order to post-process the big GRIDKONT file which was generated by using a FILE.LIST. However the file extension names would be different, in order to reflect the different format of the file:
GCHEM gives files for Chem-X with the extension .CHE
GINS gives files for INSIGHT with the extension .GRD
GKONT gives regular GRIDKONT files ending with .KONT
The output from GSIM is described in detail below.
37.4. Preparing a molecular database for CoMFA
CoMFA works on compounds which are defined in a special SYBYL/CoMFA file called a "Molecular Database". If you do not already have a Molecular Database for the compounds phenol, phenolate and pyridine in the present example, it is now necessary to prepare one. This can be done with the following commands at the SYBYL prompt:
Sybyl > brook in M1 phenol.pdb no Sybyl > brook in M2 phenolate.pdb no Sybyl > brook in M3 pyridine.pdb no |
These commands place the three molecules (which are defined in Brookhaven PDB Format) into three separate "Molecular Areas" known as M1 M2 and M3 in SYBYL. The instruction "no" at the end of each line tells SYBYL not to move the positions of the molecules, which were initially chosen so that the three Targets would be correctly aligned with each other.
Sybyl will type three or four lines of output (which may normally be ignored) after each command, and you then type:
Sybyl > modify molecule name M1 phenol Sybyl > modify molecule name M2 phenolate Sybyl > modify molecule name M3 pyridine |
These commands assign the names phenol, phenolate and pyridine to the appropriate Molecular Areas in SYBYL/CoMFA, and one then enters "Database Mode" in SYBYL with the following dialogue:
Sybyl > mode database Database Command > create Database Name > mols A directory "mols.mdb" is created and opened in UPDATE mode Database Command > add M1 phenol is added to Database Database Command > add M2 phenolate is added to Database Database Command > add M3 pyridine is added to Database Database Command > | Sybyl > |
This dialogue first creates a new Molecular Database Directory called MOLS.MDB, and then adds our three compounds to it. (Note that .MDB is the default extension in SYBYL/CoMFA for a database directory). After this dialogue you should have a new directory called MOLS.MDB in your working directory. Note that the "end loop" character | causes SYBYL to leave "Database Mode" and return to the standard SYBYL prompt as shown above.
MOLECULAR DATABASES
It is not necessary to create a Molecular Database if you already have one containing the compounds that you want to study. For your first trial run, however, we recommend that you have exactly the same compounds in your Molecular Database for CoMFA, as you have in your FILE.LIST for GRID.
RUNNING THE MACRO G2C
G2C is a Macro written in SYBYL Programming Language (SPL). The name G2C means: GRID to CoMFA, and the macro is most easily run if it is in the same directory as your working files.
For your first trial run we suggest that you use G2C immediately after preparing the Molecular Database as described above. Work through this whole GRID/CoMFA demonstration at one sitting.
Call up the SYBYL prompt and enter the following dialogue like this:
Sybyl > uims load g2c Sybyl > g2c |
Sybyl may give some messages and warnings at this point, but you can usually ignore them. It will then give the next prompts one by one, possibly interspaced by more messages. Note that some of the replies must be typed at the Sybyl prompt, and that others must be typed in an appropriate window:
Molecule database > mols Region name > region1 Number of probes > 4 Table name > first Table title > "First with CoMFA" Molecule area > M1 Number of columns for bioactivity, etc > 1 |
NOTE ON NAMES OF FILES
In the above dialogue the macro G2C and all your working files would be in your current working directory (This is not essential, but is convenient for your first few trial runs with a Set of Targets for CoMFA). Then "mols" defines MOLS.MDB which was created earlier as the Molecular Database with the Set of compounds. "region1" and "first" are names which you can choose, and they define a region called REGION1.RGN and a table called FIRST.TBL You can also choose the title of the table "First with CoMFA", but it must be in quotes as shown if it contains any blank characters. The "Molecular Area" must be defined by the letter M followed by a number. This is assigned automatically in a window by Version 6.1 of Sybyl. However, the M number has no relationship to the area M1 which you used previously in order to assign the name of phenol (see above).
NOTE ON THE COLUMNS FOR BIOACTIVITY
The "Columns for bioactivity, etc" each contain one type of information. For example you might have wanted two such Columns like this:
one for the activity of each compound in vitro; and one for the acute toxicity.
However, in the present example you have only one such Column with the three values: 1.5 -1.5 0.5 (representing approximate partition coefficients for each compound as shown in FILE.LIST). We have called this single column "logP", and have entered the explicit values 1.5 -1.5 and 0.5 into that one column.
At this point you may have to wait, if you have a lot of compounds and Probes. SYBYL may print more output, and then continue the dialogue:
Name of column 1 > logP |
Once again you may have to wait. Then continue:
logP data for phenol > 1.5 logP data for phenolate > -1.5 logP data for pyridine > 0.5 |
SYBYL will start working as soon as you have completed the above dialogue, and you may have to wait some time until it finishes. Quite a lot of Sybyl > prompts and messages may be printed to the screen, interspaced with pauses for the Sybyl computations. Finally you should get a series of messages like this:
Field files should be checked for molecule phenol . . . Field files should be checked for molecule pyridine |
These last messages are printed because SYBYL normally expects a pair of its own probes, and you have given it results from GRID Probes which it was not expecting. However these last messages also show that the preliminary computations by SYBYL have almost finished. When SYBYL has completed its calculations you should be left with the QSAR Command Prompt on the screen like this:
QSAR Command < TABLE > |
and SYBYL is now waiting for you to start your PLS computations.
Please note that:
Sybyl may print many messages and prompts during its preparations for CoMFA, and all of these may normally be ignored. However, if a prompt appears and nothing else happens for some time, then it is possible that something has gone wrong.
The run will have completed correctly when the QSAR Command > prompt shows as described above, soon after the "Field files should be checked ..." messages, If everything worked correctly you should keep your new .TBL and .RGN and .EFS files. Do not delete them at this stage.
If something does go wrong while macro G2C is running, then some partly prepared files may have been left in your working directory. These may be .TBL or .RGN or .EFS type files, and they should be deleted before you try to rerun G2C. (However, make sure that you do not delete your Molecular Database which will be a .MDB directory file!).
It appears that SYBYL sometimes knows about the partly prepared files, even after you have deleted them. This can cause problems, if you are trying to follow the above instructions and use G2C for the SECOND time. In this case we suggest that you exit completely from SYBYL, and THEN check to be sure that any residual .TBL or .RGN or .EFS files have really been deleted from your directory. Then re-enter SYBYL and try to run G2C again.
In order to leave SYBYL when the QSAR Command > prompt is showing, you should type the "End Loop" character | which will restore the SYBYL > prompt. If you then type exit your regular System prompt should be restored.
The macro G2C is provided in order to help new Users to get started with Grid and CoMFA. Experienced Users will probably want to edit this macro in order to make it more compatible with their particular methods of working.
PLS ANALYSIS UNDER COMFA
After successfully running G2C you can do a Partial Least Squares analysis under CoMFA using the results from GRID. Assuming that you have stayed in SYBYL, the above dialogue should be continued like this:
QSAR Command > analysis QSAR Analysis > do Analysis mode > interactive Row expression > * Column expression > * Model option > PLS Select dependent columns > logP |
You will have been offered the correct defaults for many of the above questions, and can just hit the RETURN key. The dependent column contains the logP data values 1.5 -1.5 and 0.5 which you entered previously. It is the Y column of the table FIRST.TBL which you created above. The other four columns represent the interaction energies of your four Probes C3 OH2 N3+ and O:: with the Targets. These are the X columns.
Each Probe generated a GRID box of 3 x 3 x 3 grid points when we ran this trial demonstration. (The actual number of points, and the energy values at each point, and the number of "wasted points" if any, depend upon the particular releases of Grid and SYBYL which are being used). In our case we could have had upto 27 grid points, but directive KWIK was equal to 4 and we actually had 7 "wasted" points which were omitted from our GRID output. However in principle the four Probes could have given us 4 x 27 = 108 values for each compound. These are the X-values that may be used in order to make predictions about Y which, in this example, measures the partition coefficient of the compounds.
At this stage of our computation CoMFA decided that 69 of the 108 X-columns might be dropped from the PLS computation, because they did not contain enough useful information. The remaining 39 values for each compound were retained by CoMFA, which next allows you to "Tailor" the PLS calculation. For this example, however, most of the CoMFA default values may be used (See your SYBYL/CoMFA manual) except for the Tailor options COMPONENTS and SCALING. The dialogue therefore continues:
Tailor Option > COMPONENTS Number of components to use > 2 Tailor Option > SCALING_METHOD Pre-analysis scaling > NONE Tailor Option > | |
No scaling was used in this example, because all the GRID energies were measured in the same units (Kcal/mole). However, the choice of scaling method usually depends upon the User's working hypothesis.
The PLS computation is then carried out automatically by CoMFA, leading to predicted values of logP. In this trivial example we are predicting three Y values (logP values) with a three-term equation (two components and an intercept). The demonstration computation is therefore fast and the fit is quite unrealistic, giving perfect statistics with standard Error 0.000 and R-squared = 1.000. However, the present example is only intended to demonstrate the processes required for using GRID with CoMFA, and this is not the place to do a full research calculation or to describe PLS theory or to explain the role of cross-validation.
At this stage the dialogue continues:
Name for Analysis > GRID_Results |
and the SYBYL prompt then returns. One can now see the detailed results of the PLS computation by typing:
Sybyl > QSAR analysis list terminal all |
However, as mentioned above, the results are fallaciously accurate in this example. Sybyl may invite you to save an updated copy of your table FIRST.TBL, but we suggest that you throw it away!
NOTE
The actual numbers (eg 108) in the above description may not agree with your results, because they may depend upon the Versions of the programs which you are using, and the way in which defaults have been set up on your system.
37.5. Column nomenclature
The word "column" is used with two different meanings in this description of the overall procedure. Each of these meanings will now be described in detail. However, you may prefer to skip these detailed column descriptions at a first reading, and come back to them later if necessary.
37.5.1. Column nomenclature in CoMFA tables
You have created a CoMFA table: FIRST.TBL which contains four columns corresponding to the four Probes, and another column containing the logP value of each compound. It has three rows corresponding to the three compounds. On the first row there is only one number in the logP column; i.e. 1.5 which is roughly the logP value for phenol. The fifth column has -1.5 on the second row for phenolate, and 0.5 on the third row for pyridine.
However column 1 in the phenol row of FIRST.TBL contains all the 108 GRID energies which were generated by the C3 Probe on phenol. Similarly column 2 of FIRST.TBL contains in the phenol row the 108 GRID energies which were generated by the OH2 Probe on phenol. FIRST.TBL is therefore laid out in columns like this:
| COLUMN-1 | COLUMN-2 | COLUMN-3 | COLUMN-4 | COLUMN-5 | |
| Probe type: | C3 | OH2 | N3+ | O:: | and logP |
| Phenol: | 108-values | 108-values | 108-values | 108-values | 1-value |
| Phenolate: | 108-values | 108-values | 108-values | 108-values | 1-value |
| Pyridine: | 108-values | 108-values | 108-values | 108-values | 1-value |
Hence, a "column" in a SYBYL/CoMFA "table" may contain either one or many values on each row, and this is the first sense in which the word "column" is used in the present overall description.
NOTE: The actual numbers (eg 108) in this description may not agree with your results, because they may depend upon the Versions of the programs which you are using, and the way in which defaults have been set up on your system.
37.5.2. Column nomenclature for PLS
The word column is used with a different meaning in describing a PLS analysis. Each of the 108 values in one column of the above table for CoMFA (FIRST.TBL) corresponds to a single grid point, and the GRID energies at each of those points can be used in the attempt to interpret the logP values of the three Target compounds. Each point corresponds to one "column" of X-values in the PLS computation, and the same data may therefore be described like this for PLS:
| COLUMN-1 | COLUMN-2 | COLUMN-3 | ... | COLUMN-432 | COLUMN-433 | |
| Probe Type: | C3 | C3 | C3 | ... | O:: | and logP |
| grid point: | 1 | 2 | 3 | ... | 108 | |
| X or Y | X | X | X | ... | X | Y |
| Phenol: | 1-value | 1-value | 1-value | ... | 1-value | 1-value |
| Phenolate: | 1-value | 1-value | 1-value | ... | 1-value | 1-value |
| Pyridine: | 1-value | 1-value | 1-value | ... | 1-value | 1-value |
Columns 1 to 108 now correspond to the first Probe (C3) at every grid point (Points 1-108). Column 109 corresponds to the same grid point as Column 1, but it relates to the second Probe (OH2) at that grid point. Column 216 represents the second (OH2) Probe at the last grid point (point 108 again). Column 217 corresponds to the first grid point but with the energy computed using an N3+ Probe (the third Probe). Column 324 has the N3+ energies at point 108, and column 325 corresponds to the first grid point again, but with the (fourth) O:: Probe. Column 432 has the O:: Probe at GRID point 108. All these columns contain X-values for the PLS analysis, while the final column 433 contains the single Y-value in this example.
Here is another way of representing this column layout for PLS:
| COLUMN No: GRID POINTS PROBE NUMBER PROBE TYPE BIOL. ACTIV X or Y | 1 G1 P1 C3 X | .. .. .. .. .. | 108 G108 P1 C3 X | 109 G1 P2 OH2 X | .. .. .. .. .. | 216 G108 P2 OH2 X | 217 G1 P3 N3+ X | .. .. .. .. .. | 324 G108 P3 N3+ X | 325 G1 P4 O:: X | .. .. .. .. .. | 432 G108 P4 O:: X | 433 logP Y |
| COMPOUNDS Comp1 Comp2 Comp3 | .. .. .. .. | In this example there are 108 points on the GRID. Then with three compounds and four Probes there are 3 Objects and 432 X-Variables. | |||||||||||
This layout is normally used for SIMCA and GOLPE.
37.6. GRID/CoMFA summary
The CoMFA method is fully described in the SYBYL/CoMFA literature distributed by TRIPOS Associates Inc to authorised Users of SYBYL/CoMFA. In the present example you have prepared the CoMFA input for three compounds (phenol, phenolate and pyridine). These compounds have been entered into a new Molecular Database called MOLS.MDB. Each compound has been studied using four GRID Probes (C3 OH2 N3+ and O::) which provided the so-called X-values for a CoMFA analysis. One Y-value (called "logP") has also been supplied for each compound.
The single original ASCII output file (GRIDKONT.DAT) from GRID has been rewritten by Programme GCNT as 12 individual .CNT files for CoMFA, and 12 associated .EFS files have also been created (as you can see from studying your directory contents). A Region and Table have been defined (Regions and Tables are described in the CoMFA literature). You finally carried out a PLS analysis under CoMFA, and were given the option to update your table after the PLS run was completed.
We suggest that you now exit from SYBYL and delete all the new .EFS and .TBL and .RGN files that you have generated. (Do not delete the Molecular Database Directory MOLS.MDB). Then, if you want to, you can run through the same example again.
PERMISSIONS
EVERY USER OF THE SYBYL/CoMFA SOFTWARE MUST HAVE OBTAINED DIRECTLY FROM THE SUPPLIER A CURRENT VALID AUTHORITY TO USE IT. Your supplier should also be consulted for the most up-to-date information.
37.7. Atom names for display on Sybyl
You may want to study the structure of a molecule while working with CoMFA, but Sybyl cannot always recognise HETATM names in Brookhaven PDB format. You may then have problems when you try to display the structure, because some or all of the atoms may appear as isolated crosses without bonds to their neighbours. If this happens to you, we suggest you:
Either try typing the following instruction (all on one line) at the Sybyl prompt:
CRYSIN M1 CONNECT M1(*) M1(*) NO_SYMMETRY_SEARCH BOND_LENGTH_TABLE
Or hit "Build/Edit" in the main Sybyl menu; hit "Add" in the pull-down menu; and hit "Quick Bonds" in the sub-menu. You get the "MOLECULE AREA" menu and hit "OK", which displays the "Atom Expression" menu. Hit "All" and "OK" and the bonds should then be displayed.
| <<< Previous | Home | Next >>> |
| Studying a set of targets with Grid | Up | Output from GRID for GOLPE or SIMCA |