| GRID manual | ||
|---|---|---|
| <<< Previous | Next >>> | |
Chapter 12. Nomenclature of ATOMs and HETATMs
12.1. ATOMs in recognised molecules
Datafile GRUB is supplied with Energy Variables for the extended atoms in the 20 common amino-acids whose recognised 3-letter abbreviations are:
ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR and VAL.
Energy variables are also supplied for:
| ASX | which is used if a particular residue might be ASN or ASP. |
| GLX | which is used if a particular residue might be GLN or GLU. |
| HI0 | which is used for deprotonated histidine at high pH |
| HIP | which is used for doubly protonated histidine at low pH |
| *HID | Histidine with proton on ND1 |
| *HIE | Histidine with proton on NE2 |
and for the less-common, unnatural or synthetic acids and amino-acids:
| ABU | Amino butyric acid |
| ADS | Aminodeoxystatine |
| AIB | Amino iso-butyric acid |
| ASZ | Aspartic Acid (Neutral -COOH form. OD2 protonated) |
| ASZ1 | Aspartic Acid (Neutral -COOH form. OD1 protonated) |
| AZT | Azetidine BAL Beta Alanine |
| CHA | Cyclohexyl alanine |
| CYD | Cystine with deprotonated sulphur eg: metal complex |
| CYO | Cystine oxydised to sulphite |
| *CYX | Oxidised Cystine |
| FUM | Fumaric acid (also for Maleic) |
| GLA | Gamma-carboxy glutamic acid |
| GLZ | Glutamic Acid (Neutral -COOH form. OE2 protonated) |
| GLZ1 | Glutamic Acid (Neutral -COOH form. OE1 protonated) |
| *HCX | Oxidised homocystine |
| HCY | Homocystine |
| HPH | Homophenyl alanine |
| HSE | Homoserine |
| HYP | Hydroxyproline |
| LOV | Leucine hydroxy valine |
| LRV | Leucine reduced valine (Neutral form) |
| LYSC | Lysine (Carbamylated anionic side chain NH.COO) |
| LYZ | Lysine with unionised side chain (NH2:) |
| LRVH | Leucine reduced valine (Cationic form) |
| NAL | Naphthyl alanine |
| NLE | Norleucine |
| NVA | Norvaline |
| ORN | Ornithine |
| ORZ | Ornithine with unionised side chain (NH2:) |
| PCA | Pyroglutamic acid |
| PEN | Penicillamine |
| PHG | Phenyl glycine |
| PIP | Pipecolic acid |
| PLP | Pyridoxal phosphate bonded to lysine |
| PSE | Phosphoserine |
| PTH | Phosphothreonine |
| PTY | Phosphotyrosine |
| STA | Statine |
| SUC | Succinic acid |
* Residue supplied for compatibility with other force fields. Residues CYS, HCY and HIS are normally recommended.
and for adenosine, cytidine, guanosine, thymine and uridine:
A, C, G, T and U
and the sugars fructose-6-phosphate, fucose, mannose, ribose and N-acetylglucoseamine:
F6P, FUC, MAN, NAG and RIB
and for the nicotinamide-adenine and flavine-adenine cofactors:
NAD-NADH, NAP-NAPH and FAD-FADH2
and for the ceramides:
CER, CE1, CUR and CU1
with normal or alpha-hydroxy fatty acids and saturated or unsaturated sphingosine
and for haem:
HEM
and water:
H2O or HOH or OH2 or TIP or TIP1 or TIP3 or WAT
and for some end-terminal groups:
ACE Acetyl
AMI Amide
BOC Butoxycarbonyl
CBZ Benzoxycarbonyl
TCB Trichloro-butoxycarbonyl
and for counter-ions which can move in response to the Probe (see under METAL CATIONS OF THE TARGET):
CHL Chloride anion
POT Potassium cation
SOD Sodium cation
These are the Recognised Molecules in the current version of datafile GRUB. When a Target structure is input from file PDB, the value of variable ACID will often be one of the above abbreviations for a Recognised Molecule. The names of the Target atoms in each Recognised Molecule should follow the Protein Data Bank conventions which are described above. Appropriate relationships will then be established between files PDB and GRUB, so that the correct energy variables are assigned to each ATOM.
In some cases Programme GRIN will automatically make appropriate adjustments to the Energy Variables tabulated in GRUB. For example if it has to deal with deoxythymine in DNA, it will start by using the Energy Variables for the ATOMS of Thymine itself, and will then make appropriate adjustments to allow for the missing ribose hydroxyl group. It will also decide if a sugar moiety is combined in a polysaccharide, and adjust the variables for the ether-type oxygen atoms appropriately. Atom Types 0, 2, 4, 28, 34, 64, 84 and 98 may be adjusted in this way. Further information about the adjustments is provided in the individual descriptions of each atom Type below, and in the notes appended to datafile GRUB itself.
12.2. ATOMs in unknown molecules
Provision is also made for Unknown Molecules which may be present in PDB files. Unknown Molecules are defined by ATOM records in the PDB file; not by HETATM records. The value of variable ACID will be some other 3-letter abbreviation which does not correspond to one of the "Recognised Molecules" nor with 'HYD' nor 'HET'. Programme GRIN will scan datafile GRUB for this unknown name, but will not find it among the Recognised Molecules.
In this case Programme GRIN will ignore the HET section at the end of datafile GRUB. Instead it will scan the "Recognised Molecules" in datafile GRUB for a second time, searching for an ATOM name which exactly matches the ATOM name in the "Unknown Molecule". If it finds a perfectly matching ATOM name in datafile GRUB, those GRUB variables will be "selected" for that ATOM in the "Unknown Molecule". The selected variables may or may not be acceptable, because this is only a default fall-back procedure which should never be used intentionally.
If the atom name does not occur in GRUB, then Programme GRIN will "guess" Energy Variables for the atom. If Programme GRIN "selects" or "guesses" Energy Variables, a message will normally be sent to the lineprinter file GRINLOUT.
"Unknown Molecules" need not necessarily be amino-acids, although care must be taken not to use TER, NAMIDE or NTER records with inappropriate molecules. Ligands, substrates, drug molecules, cofactors etc may all occur as "Unknown Molecules", but the selected Energy Variables may not be at all appropriate unless there is a suitable ATOM in one of the "Recognised Molecules" of datafile GRUB.
In summary, each PDB record in an Unknown Molecule begins with 'ATOM ' and has an "unknown" ACID name. It obtains Energy Variables from recognised ATOMS with the same ATOM name in datafile GRUB. It differs from hetero-atoms which get their Energy Variables from the HET list at the end of datafile GRUB.
12.3. Hetero atoms in hetero molecules
Finally provision is made for hetero-atoms in the PDB file. Each line containing a hetero-atom begins with 'HETATM'. Isolated ions; water molecules; ligands; substrates; inhibitors and other compounds can all be defined as collections of one or more HETATMS. The three-letter name of such a compound should not correspond to a "Recognised Molecule" name, and should not be 'HYD' nor 'HET'.
Every hetero-atom must be assigned to a hetero-molecule in the PDB file, and this hetero-molecule must have a unique name. This unique name may consist of two components:
Variable ACID which is the actual name as defined above.
Variable ISUB which qualifies ACID by defining its Sub-unit.
It should not be difficult for the User to use these two components in order to give each Hetero-Molecule a unique name. In practice it is often sufficient to use ACID alone, and users may want to use all four characters of the ACID field although only three are actually specified by Brookhaven PDB conventions (See above ATOM RECORDS). In some cases (eg water molecules) the hetero-atom symbol (eg OH2) may be an extended atom which represents a complete hetero-molecule.
We recommend that you do not normally use the residue number NRES in order to distinguish between different hetero-molecules, because the NRES field is used by this Version of the Programmes in order to define the conformational flexibility of Target atoms (see PARAMETERS FOR THE CONFORMATIONALLY FLEXIBLE MODEL). However, in certain cases where there are very many hetero-molecules (eg: water molecules) it may be necessary to use NRES to differentiate between them.
Datafile GRUB ends with a list of HET atoms, and Programme GRIN will scan this list for appropriate HETATM names in order to assign Energy Variables. The name of the HETATM in the Target Hetero-Molecule, and the name in the HET list of datafile GRUB must exactly agree, if the correct Energy Variables are to be selected from the datafile.
More HETATMS may be added by the User to the HET list at the end of GRUB. Furthermore, the names of these HETATMS will never be confused by Programme GRIN, with any ATOM names in Recognised Molecules in the first part of GRUB.
12.4. HETATM names
When a HETATM name is added to datafile GRUB by the User, the following points should be noted:
Four consecutive characters are available for each HETATM name.
The first two characters are reserved for the chemical symbol correctly justified according to the conventions of the Protein Data Bank so that ' CA ' represents an alpha carbon atom and 'CA ' represents calcium. (See above).
When a number DIRECTLY follows the chemical symbol, it defines the number of hydrogen atoms which are bonded to the main atom. Thus ' C3 ' is the carbon of a methyl group, and ' N2= ' is a nitrogen directly bonded to two hydrogens.
A HETATM name like ' N2= ' refers to the heavy atom alone. It is one nitrogen atom and is not an extended atom entry. The correct number of hydrogen records must also be present in the final GRINKOUT file. These hydrogens may be included as separate records with the other HETATMS in the PDB input file. If they are not present in PDB file, then appropriate hydrogen positions will be computed with standard geometry by Programme GRIN, and will be added to the GRINKOUT file.
When a plus or minus sign DIRECTLY follows the chemical symbol, it indicates the charge of a chemical group. Thus ' C+1' is the carbon in the centre of a guanidinium cation and ' C-1' is the carbon of a carboxy group.
When the final character is a number after a plus or minus sign, it is a charge multiplier. Thus ' S-2' is the sulphur in an anion such as sulphate with two negative charges, and 'FE+3' is a ferric iron ion.
The equals sign is used to indicate that a bond to the atom has significant double bond character. Thus ' C2=' is a carbon joined to two hydrogens, which also makes one double bond. It might be, for example, the CH2 of a vinyl group.
The hash symbol #, which may print as another symbol on some terminals, is used to indicate a triple bond.
The colon is used to indicate a lone pair. Thus ' O::' is a carboxy oxygen bearing two lone pairs.
Sometimes it has not been possible to follow these conventions exactly, and still obey the conventions of PDB format. Explicit PDB-compatible names have then been used. For example an ether oxygen is ' OC2'. All HETATM names MUST strictly follow the PDB convention for the chemical symbol, which is always right-justified. For example, the symbols for ether oxygen and for a calcium ion line up like this: ' OC2' The chemical symbols are O and CA and they 'CA+2' have the qualifiers C2 and +2 respectively If the HETATM in the PDB file has a HETATM name which is absent from the list of HET atoms in datafile GRUB, then Programme GRIN will "assume" HETATM variables as best it can and a message will be sent to GRINLOUT. The assumed values may or may not be acceptable, because this is only a default fall-back procedure which should never be used intentionally.
12.5. ATOM/HETATM differences
The obvious distinction between an atom in an Unknown Molecule and a hetero atom is the name at the start of each PDB record ('ATOM ' or 'HETATM'). However Unknown Molecules are always processed by the next Programme GRID in exactly the same way as Recognised Molecules. The default assumption in GRID is that all ATOM records are part of the Target, whether they come from "Recognised" or "Unknown Molecules".
The normal default assumption for HETATMS in Programme GRID, is that they are NOT part of the Target. This default prevents the many HETATM water molecules (called OH2, O2 or OHH) whose positions may be defined in a protein structure, from being treated as part of the Target. However, it is easy to change this default procedure, and consider HETATMS as part of the Target structure if such a treatment would be appropriate (see directive NETA for Programme GRID). This option might well be taken for a water molecule which was particularly strongly bound to the Target, or for any other strongly bound ligand such as a cofactor.
A particular case arises if the Target consists entirely of HETATMS; i.e, no ATOMS are present. If this happens the default assumption is that the HETATMS are the Target.
Another special case is the processing of several Targets together one after the other as a Set (see under Set of Targets).
Note that the names OH2 OHH and O2 all mean an oxygen bonded to two hydrogens in the GRID nomenclature; i.e. they all mean water. However, the detailed treatment of each kind of water is different (see WATER RECORDS).
12.6. ATOM and HETATM hydrogens
Programme GRIN always calculates the coordinates of hydrogen-bonding hydrogen atoms which are bonded to the ATOMS of "Recognised" and "Unknown Molecules". However, directive IHVA is needed if the bulk and charge of these hydrogen-bonding hydrogens are to be explicitly considered.
Programme GRIN will accept the coordinates of all the hydrogen ATOMS of a macromolecule, if they are given as ATOM records in your PDB file (See ATOM RECORDS FOR HYDROGEN). It works on a residue-by-residue basis, so that the hydrogens for one residue should be listed in the file after the first heavy atom of the residue, and before the last heavy atom of the same residue. GRIN will check off the hydrogens one by one, and may get confused if some of the expected hydrogens are given and some are omitted from the file. For example, if you use the residue name ASZ to mean an aspartic acid residue with a protonated carboxy group, then GRIN will expect to find a hydrogen record corresponding to the proton on OD2, as specified in datafile GRUB.
Programme GRIN calculates the coordinates of all hydrogen atoms in Hetero-Molecules, and always considers their bulk and charge.
Note that the HET list in GRUB must always come at the end of the datafile. i.e. the datafile must not be edited so that the HET list comes elsewhere.
Note on HETATM charges: the HETATM "charges" in the HET section at the end of datafile GRUB are used by the algorithm which computes the distribution of charge in a hetero-molecule. They are NOT charges themselves, and should never be used directly as charge values. They are only input values for the algorithm.
| <<< Previous | Home | Next >>> |
| Datafile GRUB | Up | Energy variables |