Chapter 10. Protein Data Bank input

The PDB input file contains coordinate records for the atoms and hetero-atoms of the Target. This file must be in Protein Data Bank (Brookhaven) format in which each line is treated as one record. Different record types are permitted, and the type of each record is determined by the first six characters on the line. Files PDB.pdb and other files are provided as sample PDB files in the tutorials directory. Part of a PDB file is shown in Figure 1.

Note on the Protein Data Bank

The most up-to-date information about Brookhaven Protein Data Bank Format may be obtained free from:

The Protein Data Bank, Chemistry Department
Brookhaven National Laboratory, Upton
NY 11973
http://www.rcsb.org/pdb

Other sources of information are not always reliable

Programme GREAT (see Diagram 3) may be used to convert other formats into standard PDB format if need be.

Programme GRIN makes particular use of five standard record types in PDB files:

'HEADER'

which is a brief description of the PDB file. This line may contain upto 60 characters including spaces and the first six characters which must be 'HEADER'.

'ATOM '

which defines the properties of one atom.

'TER '

which shows that the carboxy terminal of a protein chain has been reached.

'HETATM'

which defines the properties of one hetero atom.

'END '

which marks the end of the PDB file.

In addition to the above five standard record types, Programme GRIN also responds to 'NAMIDE' and 'NTER ' which are described below. Any other line in a standard PDB file is copied directly to the lineprinter file GRINLOUT for information. However, it has no influence on the GRINKOUT file which is being prepared as input for the next Programme GRID.

10.1. HEADER records

HEADER records are copied from the PDB input file to lineprinter GRINLOUT for information. There should only be one HEADER in the PDB file, and it should be at the top of the file from where it is copied to the top of the GRINKOUT file. If more than one HEADER is present, the last one is copied to GRINKOUT.

The HEADER may contain upto 60 characters, including spaces and the first six characters which must be 'HEADER'. If a more detailed description of the GRID run is required, comments may be included in the command file (grid.in).

It is suggested that the text of the Header should not begin immedietely after the word HEADER. It may always start after at least 16 empty character positions thus:

HEADER                There are 16 blanks first

This convention is proposed because the first 16 character positions are reserved for a Password in some old copies of Programme GRID. .

10.2. ATOM records

ATOM records for Programme GRIN should comply with the following input format where I is an integer number which identifies this particular record in Programme GRIN:

 100 READ (INKO,120) ATOM, NPDB(I),ATM(I),ALT(I),ACID(I),
    +ISUB(I),NRES(I),INSERT(I),X(I),Y(I),Z(I),OCCUP(I),BVAL(I)
 120 FORMAT (6A1,I5,1X,A4,A1,A3,1X,A1,I4,A1,3X,3F8.3,2F6.2,4X)

This is a standard Protein Data Bank format statement. In fact, however, the actual input format for GRIN is slightly different:

 FORMAT (6A1,I5,A5,A1,A4,A1,I4,A1,3X,3F8.3,2F6.2,4X)

but the difference should not be exploited, and A5 should not normally be the input format for ATM. It should be 1X,A4 as shown in Format statement 120 above.

The names of the variables in ATOM records have the following meanings:

ATOM

is a string of six CHARACTER*1 variables which identifies the input line as an atomic coordinate record. It has the value: 'ATOM '

NPDB(I)

is an integer number which identifies this particular atom in the input file PDB. These numbers in PDB should increase from the N-terminal to the C-terminal of a protein sequence, according to certain rules. However the rules are not always followed, and NPDB is the number which actually appears in the PDB input file no matter how it originally got there. Thus NPDB is convenient for relating the final results of a GRIN/GRID computation to the original input data. A dummy value (-999) is inserted into the GRINKOUT file, if blank spaces fill the NPDB field of file PDB.

ATM(I)

is a CHARACTER*4 variable which defines the atom name. The first two characters are its chemical symbol right justified. The third character is an alphabetical remoteness indicator, and the fourth is a branch designator. Thus ' OE2' is an oxygen atom at remoteness five (E stands for Epsilon which is the fifth letter of the Greek alphabet) down the second branch of an amino-acid side chain; it is of course a carboxy oxygen on a glutamic acid residue.

The name of the atom generally follows IUPAC-IUB rules. However remoteness is indicated by the letters A, B, G, D, E, Z and H instead of the first seven letters of the Greek alphabet, so that ' CA ' represents an alpha carbon atom and 'CA ' represents calcium.

The letter A is also used for atoms where there might be ambiguity in the structure. Thus ' AE2' would be used if the atom might be an oxygen (OE2) or a nitrogen (NE2) at remoteness five of a glutamic acid or a glutamine side chain. The above are all standard Protein Data Bank conventions.

It is sometimes necessary to change the names of atoms in the PDB file, in order to comply with the Protein Data Bank Conventions. There is an editor in Programme GREAT which may be used in order to facilitate this when need be.

WARNING

IT IS ESSENTIAL TO FOLLOW THE PROTEIN DATA BANK CONVENTION BY WHICH THE FIRST TWO CHARACTERS OF AN ATOM NAME ARE ITS CHEMICAL SYMBOL. Moreover, this symbol must be right justified in the first two columns of the correct four-character input field to comply with the convention, as described above.

WARNING

Variable ATM is actually read by Programme GRIN as a CHARACTER*5 variable, with the extra character coming BEFORE the standard CHARACTER*4 atom name. However, the User should not normally exploit this extra character position. In the rest of this User Manual it will be assumed that the standard CHARACTER*4 conventions are used for PDB input files, as described above.

The extra (fifth) character position just before variable ATM, is reserved so that Programme GRID can print the character 'H' as a prefix to the atom name. This 'H' signifies that GRIN or GRID is printing the record for a hydrogen atom which is bonded to the named heavy atom. It may be noted that, in practice, Programme GRIN does not often print the 'H' prefix in the extra column in front of the standard four-character field, because most hydrogens are actually bonded to heavy atoms which have one-character atom symbols. Thus GRID would print a hydrogen bonded to carbon as: ' HC ' and would only exploit the initial field when, for example, a hydrogen was bound to an atom with a two-character symbol. For example, a hydrogen bound to chlorine would be: 'HCL '

ALT(I)

is a CHARACTER*1 variable which can be used to indicate alternative locations for the atom. It normally has the value ' ' and Programme GRIN will send a warning to GRINLOUT if there are alternative locations for an atom in the PDB input. When this occurs the User must decide on unequivocal coordinates for the ATOM or HETATM in question, and rerun GRIN after editing file PDB to remove the superfluous atom record.

In the Molecular Discovery Programmes the exclamation mark '!' is a reserved character as a value for variable ALT. See under Exclamation Mark.

ACID(I)

is a CHARACTER*3 variable which identifies the name of the molecule to which the ATOM or HETATM belongs. Acording to standard Protein Data Bank conventions this is often an amino-acid which is why this variable is called 'ACID'. Typical names are ALA or ARG or ASP for the amino-acids alanine or arginine or aspartic acid. In particular it will have the value 'HYD' for a hydrogen atom whose coordinates have been computed or checked by Programme GRIN. Other three-letter names must be used for Unknown Molecules and for isolated HETATMS.

WARNING

The use of the standard conventions is most strongly recommended. However, as mentioned above, ACID is actually read by Programme GRIN as a CHARACTER*4 variable. The extra Character comes AFTER the standard CHARACTER*3 acid name, and it is therefore possible to use four characters as the name of a molecule in GRIN. The next Programme GRID can deal with these 4 characters if need be. Throughout the rest of this User Manual, however, it will be assumed that the standard CHARACTER*3 conventions have been followed.

ISUB(I)

is a CHARACTER*1 variable normally used to identify the subunit to which the atom belongs. It usually has the value ' ' or 'A' for a single molecule or a single protein chain.

NRES(I)

(integer) normally identifies the position of an amino-acid residue in the overall protein sequence. If hetero molecules are present (e.g. ligands, cofactors, etc) they may be given any convenient NRES value which has not been used already. Negative values may be used.

This value is not carried forward by Programme GRIN to the GRINKOUT file which will be the input to GRID. The variable NRES is used by GRID for another purpose.

INSERT(I)

is a CHARACTER*1 variable which may be used to indicate if extra residues have been inserted into a macromolecular sequence, thus disturbing the regular order of residue numbers; e.g: ...34,35,36,36A,36B,37,38,39...

X(I), Y(I) and Z(I)

(reals) are the orthogonal coordinates of the atom in Angstrom.

OCCUP(I)

(real) measures the occupancy of the ATOM or HETATM as stated in the PDB input file. It is sometimes assigned the arbitrary value of 1.00 by the crystallographer. The OCCUP(I) value in the PDB file is not carried over by Programme GRIN into the GRINKOUT output file.

BVAL(I)

(real) measures the temperature factor of the ATOM or HETATM as stated in the PDB input file. It is sometimes assigned an arbitrary value by the crystallographer. The BVAL(I) value in the PDB file is not carried over by Programme GRIN into the GRINKOUT output file.

WARNING: FIRST AND LAST RECORDS

The first record in your file must not be a hydrogen record. Similarly, the last record in the file must not be a hydrogen.

10.3. ATOM records for hydrogens

The original conventions of the Brookhaven Protein Data Bank did not provide for the hydrogen atoms of biological macromolecules, because X-ray studies at that time could not observe those hydrogens. However modern X-ray and NMR methods and molecular dynamics computations, often yield the hydrogen coordinates of proteins. Programme Grid will therefore use this information when it is provided, as described below.

Extra ATOM records must be included in the PDB file for a macro-molecule, in order to define the positions of the hydrogens. It must be emphasised that these are ATOM records as defined by the conventions of the Brookhaven Data Bank. However, there is no particular Brookhaven convention for hydrogen ATOM names, and the Molecular Discovery Programmes will thereford regard any ATOM whose name includes the character 'H' as a potential hydrogen. The Programmes will then test the position of the ATOM in the macro-molecule, in order to establish if it really is a hydrogen atom.

hydrogen ATOM records: Biological macromolecules are normally represented by ATOM records, and this procedure applies explicitly to ATOM records for Hydrogens. If the Target is composed of HETATMS, then the hydrogens would be dealt with as HETATM records. See below under the heading HETATM RECORDS. The distinction between ATOMS and HETATMS is important (see under Atom Name Conventions).

By way of an example one may consider the first ATOM record for an arginine residue in a Brookhaven PDB file for a protein. This record would normally be:

 ATOM     68  N   ARG     5     -11.579   9.499   1.574  1.00  0.00

and it would be followed by ten more ATOM lines for the CA C O CB CG CD NE CZ NH1 and NH2 heavy atoms of arginine. However an additional thirteen ATOM records would be required in the PDB file if the hydrogen coordinates were provided, starting with:

 ATOM     69  HN  ARG     5     -11.634   9.851   2.485  1.00  0.00

for the hydrogen which is bonded to the protein backbone amide nitrogen, and finishing with:

 ATOM     89 HH22 ARG     5     -17.684  12.304   4.077  1.00  0.00

Users should note the following points:

  • The hydrogen ATOMS of each residue must be placed in the same part of the PDB file as the corresponding heavy ATOMS. They should come after the first heavy ATOM of their own residue (which is normally its protein backbone nitrogen) and before the last heavy ATOM of the same residue. In particular, a hydrogen should never be the first or the last record in the file as a whole. This is important.

  • Within these limits, the sequence of hydrogen ATOMS does not matter.

  • The ATOM name of each hydrogen must include the character 'H'. Note that hydrogen names are subject to the normal limitation in Brookhaven PDB files, that only two characters may come after the symbol for the element. Thus a name like HND2A would be unacceptable, because there are three characters after the N.

  • Programme GRIN will still detect any extra ATOM records which have been incorrectly added to the PDB file by mistake, even if the ATOM name of the incorrect record contains the character 'H'.

  • The Programmes will correctly recognise NH1 and NH2, for example, as the regular Brookhaven names of nitrogen ATOMS in arginine, and will not confuse these ATOM names for hydrogens just because they have names which include the character 'H'.

  • Every ATOM name in a residue must be unique. Because one of the side-chain nitrogen ATOMS of arginine is called NH2 according to the conventions of the Protein Data Bank, one must not call the pair of hydrogens which are bonded to this ATOM by the names NH1 and NH2, because these would be duplicate names.

  • PDB files may contain explicit coordinates for a potentially tautomeric hydrogen, such as a serine hydroxyl hydrogen for example. In this case Grid will assume that the hydrogen is mobile, and that it can rotate around its C-C-O-H torsion angle. (The 'Type' of the oxygen (OG) ATOM must be changed to Type 4 in the GRINKOUT file if the hydrogen position is to be fixed in its reported position).

  • Problems may arise if the PDB file contains some of the hydrogens in a protein as ATOM records, but other hydrogens are not represented. We have tried to cover all such possibilities by corrective procedures or Warning Messages. However, there are so many ways in which hydrogens can be selected for inclusion, that we cannot be certain if all such combinations have been dealt with.

  • Please note that the hydrogens of arginine, for example, are still ATOMS in the arginine residues. They must therefore have the same amino-acid name (ARG) and residue number and chain identifier as the heavy atoms of the arginine. Some Users have wrongly used the amino-acid name HYD (instead of ARG) for the hydrogens, or have omitted the residue number, and this could lead to errors. The chain identifier, of course, is only required if the heavy ATOMS of the whole macromolecule have an identifier; it is normally omitted when there is only one subunit in the Target.

10.4. TER records

'TER ' records may just consist of the three capital letters TER at the start of a line. This record shows that the carboxy terminus of a protein chain has been reached. The amino-acid preceding the TER record should therefore contain an extra oxygen (OXT) atom which completes the carboxy terminal group. Provision is made for this OXT atom in datafile GRUB, but it is only called by Programme GRIN if the amino-acid in the PDB file is followed by a TER record.

There should not be a TER record at the start of the PDB file, before the first amino-acid. However, a second protein sequence may follow after a TER record, and the N-terminal nitrogen of this second protein will automatically be treated as a terminal amino group; i.e. it will be a cationic nitrogen, and therefore different from the other backbone amido nitrogens of the protein. Any number of proteins may follow one after the other in this way, but they should not have individual HEADER records.

TER records should not be used at the end of other types of macro-molecular sequence such as DNA. They should not be used after amide -CONH2 termini. They are restricted to the unprotected carboxy terminus of proteins alone.

TER records should not be used after hetero-atoms. Hetero-atoms must appear together at the end of the PDB file, after all the protein ATOMS have been input. If the protein sequence ends in a carboxy group it should be followed by the usual TER record which may then be followed by any HETATMS.

10.5. NAMIDE records

'NAMIDE' records are not part of the standard Protein Data Bank format. However it may sometimes be necessary to force a backbone nitrogen into the amido form. This would be the case if some amino-acid residues were missing at the start of a sequence, because the first amino-acid is normally treated as an N- terminal unless an NAMIDE record is prefixed (see below).

The two consecutive records:

 TER
 NAMIDE

may be used if one protein sequence in the PDB file finishes correctly at the carboxy terminal, but the next sequence has missing residues at its N-terminal end. Please note that there should be no HEADER record at this point.

NAMIDE records should not be used with other types of macromolecular sequence. They are restricted to proteins and peptides alone.

10.5.1. Broken or incomplete protein sequences

The coordinates of a hydrogen atom bonded to a protein backbone nitrogen are determined by the positions of the nitrogen itself, and its bonded C-alpha and carbonyl-carbon neighbours. One of these neighbours may be absent if the sequence does not start at the true N-terminus, or if there is a break in mid-sequence, or if some amino-acids are missing. The hydrogen position will then be undetermined, and a warning message N220 will be sent to the lineprinter file GRINLOUT if this happens.

10.5.2. Misordered protein sequences

Programme GRIN will correct the sequence of ATOM records if they are in the wrong sequence within an amino-acid residue in the PDB input file. This corrective action is transparent to the User.

NAMIDE and NTER and END records can be used at the appropriate places, if the amino acids of a protein are not listed in the correct molecular sequence in the PDB input file. However, we do not recommended this procedure for dealing with a misordered protein sequence, because there are many different ways in which the sequence may be wrong, and it is easy to introduce more errors when one is trying to put things right. It is better practice to edit the PDB file, and restore the correct logical sequence with the amino-acid records running consecutively from N-terminus to C-terminus.

10.5.3. Cyclic peptides

With a cyclic peptide, there may be no N-terminal nitrogen atom because the C-terminal carboxy group is bonded back to the start of the sequence. In this case an NAMIDE record should precede the first amino-acid of the cyclic peptide, in the listing in the PDB input file.

Cyclic peptides do not have a carboxy terminus, and so their sequence should not (strictly speaking) finish with a TER record. An END record may be used if the end of the sequence is also the end of the file. An information message will be written to the lineprinter file GRINLOUT, if a TER record is used, but the correct GRINKOUT file will still be prepared.

10.6. NTER records

'NTER ' records are not part of the standard PDB format, but may be used in a PDB file in order to force the next following amino-acid into the N-terminal amino form. Thus an NTER record has the opposite effect to NAMIDE (see above). Neither of these should be used in conjunction with hetero-atoms. At the start of the first protein in the file, the first amino-acid is always treated as an N-terminal unless prefixed by NAMIDE.

NTER records should not be used at the end of other types of macro-molecular sequence. They are restricted to the amino terminus of proteins alone.

10.7. HETATM records

'HETATM' records can be used to define the properties of any molecule. Whilst 'ATOM ' records are normally restricted to amino-acids, peptides, proteins, nucleic acids, other biological macromolecules such as sugars and a few other special cases, HETATM records can be used more generally. Drug molecules, water molecules, solvent molecules, ions, inhibitors, cofactors or other ligands may all be defined by HETATM records.

HETATMS and ATOMS can be mixed in the same file, so that water and other ligands can be associated with a chosen macromolecule as a single Target structure. In this case, however, the HETATM RECORDS MUST ALL COME AFTER THE ATOM RECORDS in the PDB file.

HETATMS in a PDB file have essentially the same format as ATOM records, but the value of the first variable is 'HETATM' instead of 'ATOM '. The decision whether to describe a molecule as a collection of ATOMS or HETATMS will be considered in more detail below. ATOMS and HETATMS should never be used together IN THE SAME MOLECULE.

When you run Programme GRIN it will add Energy Variables to the HETATMS in your PDB file, and these Variables will come from a special 'HET' section at the end of Datafile GRUB. This section has Energy Variables for water molecules and other hetero-atom types. The User may add more 'HET' variables to datafile GRUB in order to deal with his or her own interests and applications. This is described in detail below.

It is often necessary to change the names of the HETATMS in the original PDB file, since there is no general agreement on the names to be used for atoms in Hetero-Molecules. There is an editor in Programme GREAT, which may be used in order to do this. It is Item 4 in the first menu of GREAT.

It is not essential to input any ATOMS at all, and the input for Programme GRIN may begin with HETATMS. In this case there should not be a TER record before the HETATMS, but an END record at the end of the file is required.

Warning

All ATOM records in the PDB input file must be listed before any HETATM records. The PDB file must be edited in order to ensure this sequence of records, before Programme GRIN is used. Programmes GREAT will attend to this editing automatically.

10.8. Water records

Water molecules associated with the Target in the PDB file may be treated in several different ways. This choice depends on the interests of the User. There is a Table summarising the different treatments at the end of this Section.

You should study the concept of Hydrogen Bonding Type before reading this Section, and should be acquainted with Format Statement 120 above. Note in particular the different and distinctive meanings of the terms ATOM and ATM and ACID:

  • ATOM means a whole line in a PDB file; an ATOM record as opposed to a HETATM record.

  • ATM is the name of the particular atom in the record; eg ' N ' for nitrogen.

  • ACID is the name of the Amino-Acid residue or molecule containing the ATM; eg 'ALA ' for alanine

10.8.1. Water as a recognised molecule

The names H2O HOH OH2 and WAT are often used by crystallographers to represent a water molecule. Any of these may be appear as a Molecule Name for water in an ATOM record in the PDB file. The water will be treated as a Recognised Molecule, just as Glycine or Cytidine are treated.

In this case H2O HOH OH2 or WAT would be the name of the Recognised Water Molecule, and would be the ACID variable in Format Statement 120 above (See Index under "Format 120"), just as GLY is the name for glycine as a Recognised Molecule. The name of the ATM variable for a Recognised Water Molecule in the Format Statement must always be ' O '. The water will be assigned as Hydrogen Bonding Type 95.

10.8.2. Water as a HETATM

Note on the molecule name for water: H2O, HOH, OH2 and WAT are all acceptable molecule names for water when ATOM records are being used. However, the molecule name for water as a HETATM should always be 'HOH '. If you use any other molecule name for water as a HETATM, it will normally be changed by Programme GRIN into 'HOH '. Message N180 may also be issued. Sometimes Programme GRIN cannot make all the necessary alterations, and you may then have problems when you run the next Programme GRID.

Please make sure that you understand the difference between the atom name for water, and its molecule name. Then re-read the above Note, and make certain that you always use 'HOH ' as the molecule name for water, whenever the water is represented as a HETATM.

When water (or any other atom) is represented as a HETATM, it will not be treated as a part of the Target for Programme Grid unless either:

  • It is explicitly called in Programme GRID by directive NETA (See Index under NETA), or

  • The Target consists entirely of HETATM records, and no ATOM records are present, or

  • Several Targets are being studied together in a single GRID run as a Set (see under Set of Targets for full details).

Three different ways of dealing with water as a HETATM will now be considered in detail:

10.8.2.1. Water as an extended HETATM (OH2)

The atom name ' OH2' is often used to represent water as an extended HETATM in the Target. This is the regular name for water as a HETATM in the Target, and is the normally recommended name. The record must start with the six characters HETATM, and like all other HETATM records it must come after all the ATOM records in the PDB file. The HETATM water will then be Hydrogen Bonding Type 95.

Note that ' OH2' is the name of the Extended Oxygen Atom in a HETATM record for a water molecule. ' OH2' is the ATM variable, but a molecule name is also required. The molecule name should always be 'HOH ' for a HETATM water, as noted above. This is the ACID variable of the HETATM record in Format statement number 120 above (See Index under "Format 120").

10.8.2.2. Water as a HETATM (OHH) with hydrogens

The atom name ' OHH' may alternatively be used to represent the oxygen of water as a HETATM. This is the recommended name for the oxygen of water, when the explicit positions of the hydrogen atoms bound to the oxygen of the water are known. The positions of these hydrogens in the Target are fixed. It is essential to include the one oxygen HETATM and its two bonded hydrogens as three consecutive HETATM records in the PDB input file. The oxygen record must come first, immediately before the hydrogens.

In this case the water is represented as a three atom molecule, and not as an Extended Atom. The hydrogen bonding Type of the oxygen HETATM must be 96 as shown at the end of datafile GRUB. The hydrogen positions are fixed, and so this water molecule of the Target cannot rotate. Note carefully that its molecule name must be HOH (as usual for a HETATM water).

Like all other HETATM records, this ' OHH' water record must come after all the ATOM records in the PDB file. ' OHH' is the atom name (ATM in Format 120) for the HETATM oxygen, and ' H ' is the atom name for the HETATM hydrogens. All three HETATMS must have 'HOH ' as their molecule name.

10.8.2.3. Water as an extended HETATM (O2)

The name ' O2 ' is a third alternative for water as a HETATM. This name is recommended when the water is to be treated as part of the Target, and when account must be taken of the hydrogen bonds already made between the water ' O2 ' and the other Target ATOMS or HETATMS. As previously explained, the name ' O2 ' is the atom name (ATM) and the molecule name of the HETATM water must be 'HOH ' as usual.

When the name ' O2 ' is used, the Hydrogen Bonding Type of the water is adjusted by Programme GRIN, to take account of any hydrogen bonds that the water is already making. The Type is 95 by default in Datafile GRUB, but if the water ' O2 ' was located so that it made two hydrogen bonds to two carbonyl oxygens of the Target, then the ' O2 ' water would be reassigned as Type 28. (This Type would be assigned because the water could donate no more hydrogen bonds, but could still accept one or two. See Index under Type 28). The hydrogen bonding Type of the carbonyl oxygens of the Target would also be adjusted, since they would be making a hydrogen bond to the water.

These adjustments are also made by Programme GRIN, if one water ' O2 ' is making a hydrogen bond to another ' O2 ' water. To take an extreme case, a water ' O2 ' might have no free hydrogen bonding capacity if it was already surrounded by several similar waters ' O2 ' or other hydrogen bonding groups.

Water ' O2 ' is particularly appropriate if a layer of organised water molecules is being studied near the surface of a Target molecule. Superficial polar groups on the surface of the Target may determine their positions, but those polar interactions do not define the organisation of water near non-polar Target atoms. This is apparently determined by the sideways hydrogen bonding network from one water molecule to its neighbouring waters near the Target surface, and this network is best represented by using water ' O2 '.

10.8.3. Water summary

The following possibilities exist for the waters of the Target:

ATOM or HETATMAtom Name (ATM)Molecule Name(ACID)Description
ATOM' O ''H2O 'Extended water mol Type 95.
ATOM' O ''HOH 'Extended water mol Type 95.
ATOM' O ''OH2 'Extended water mol Type 95.
ATOM' O ''WAT 'Extended water mol Type 95.
HETATM' OH2''HOH 'Extended water mol Type 95.
HETATM' OHH''HOH 'Extended water mol Type 96. One oxygen and two hydrogen records are required
HETATM' O2 ''HOH 'Extended water molecule. Type depends on Hydrogen bonding neighbours.

This table shows that water may be represented in the PDB file as a Recognised Molecule called either H2O HOH OH2 or WAT. It must then appear as an ATOM record, and the name of the oxygen atom itself must be O . The hydrogens of a Recognised Water Molecule are not explicitly represented. These Recognised waters, like all ATOM records, must precede the HETATMS in the PDB file.

Alternatively water may be represented by a HETATM. The molecule name of this HETATM must be HOH, and the atom (ATM) name may be OH2 OHH or O2 . The difference between these three names may be shown by a simple example. First consider the trimethylamine cation as a Target, interacting with a water Probe (The first Probe). The most favourable interaction will occur when the water accepts a hydrogen bond thus:

 H2O....H-N(CH3)3

Now start again considering this whole water:trimethylamine complex:

 H2O....H-N(CH3)3

as the Target. Then using a (second) water Probe:

  • The water of the Target complex (the first water) will be able to rotate freely if it is represented as an extended HETATM OH2. It will be able to make a hydrogen bond in any direction, and the NH group of the trimethylamine will also be able to donate to the (second) water Probe. In practice, of course, the NH group will be obstructed by the (first) water of the Target, and that water of the Target will be obstructed on one side by the neighbouring trimethylamine. These mutual obstructions within the Target will constrain the hydrogen bonding possibilities, and the overall Target system may be represented as two independent molecules like this:

     H2O    H-N(CH3)3

  • The (first) water of the Target will be unable to rotate at all if it is represented as a HETATM OHH with two bound hydrogens. The positions of the hydrogens (as defined by their own HETATM records) will determine the hydrogen bond donor directions of the water, and will also determine the orientations of the water's lone pairs. The NH group of the trimethylamine will still be able to donate, but will be obstructed as described above. The Target system may be represented as:

             H                                    
              \                                   
               O....H-N(CH3)3                     
              /                                   
             H                                    
    with the hydrogens in fixed positions.

  • The (first) water of the Target will have limited rotation about the O-N axis if it is represented as a HETATM O2 . In this case Programme GRIN will alter the Type of hydrogen bonding by both the oxygen of the water and the nitrogen of the amine. It will take account of the fact that these have a pre-existing interaction with each other, and treat the Target as if it was something like this:

             H                                  
              \                                 
              :O---H-N(CH3)3                    
              /                                 
             H                                  
    in which ---H- represents a dummy bond. Thus:

    1. the nitrogen will behave as if it has four bonded neighbours, and therefore has no capacity to make further hydrogen bonds, and

    2. the water will behave as if it was like a hybrid amine/hydroxyl group bonded to the nitrogen. In this situation the water will still be able to donate two hydrogen bonds and accept one. However, the directions in which these hydrogen bonds can be made will be constrained by the pre-existing dummy bond. Torsional rotation about this dummy bond is allowed, but one lone pair of the water must always point towards the nitrogen of the amine.

10.9. END records

'END ' records may just consist of the word END on a line by itself to show that the PDB file has all been input. There should only be one END record, and it should be at the end of the file!

10.10. Inputting a set to Programme GRIN

GRIN normally processes one Target at a time, and this Target may consist of one molecule or a mixture of molecules in PDB format; e.g.: a protein, a bound ligand, some waters and several counter-ions could all be parts of one Target. Alternatively, however, GRIN can be used to process several different Targets as a Set, one after the other in a single GRIN run.

Further information about processing a Set is given above under the headings Set of Targets, and PREPARING A SET OF TARGETS WITH GREAT. More information is given below under INPUTTING A FILE LIST and STUDYING A SET OF TARGETS.

Latest versions

Login

Username

Password

Register | Lost password?