| VolSurf manual | ||
|---|---|---|
| <<< Previous | Next >>> | |
Chapter 8. Volsurf Library models
The VolSurf library models are helpful to predict the pharmacokinetics of new chemical entities in humans. Although companies may build their own models, the User who wants to fastly define the pharmacokinetic profile of a compound or series of compounds would use the models described below. All the models contains 94 VolSurf descriptors, since 3 probes (OH2, DRY and O) were used for calculation. The grid-spacing was always set to 0.5 Angstrom. Lastly, for each model two versions are available, in order to predict 3D structures (kout files) prepared with both charge computational methods available in VolSurf. Therefore we recommend to pay attention to the set charge preference (in the preferences->miscellaneous menu) before running the prediction.
References for Library models can be found in:
Crivori, P.; Cruciani, G.; Carrupt, P.-A.; Testa, B.; J. Med. Chem.; 2000; 43(11); 2204-2216.
G.Cruciani, M.Meniconi et al., Drug Bioavailability, WILEY-VCH, van de Waterbeemd, H. (ed), vol.18, p.406 (2003)
Oprea, T. I.; Zamora, I.; Ungell, A.-L.; J. Comb. Chem.; 2002; 4(4); 258-266.
Lombardo, F.; Obach, R. S.; Shalaeva, M. Y.; Gao, F.; J. Med. Chem.; 2002; 45(13); 2867-2876.
Pearlstein, R.; Vaz, R.; Rampe, D.; J. Med. Chem.; 2003; 46(11); 2017-2022.
Crivori, P.; Zamora, I.; Speed, B.; Orrenius, C.; Poggesi, I.; J. Comput.-Aided Mol. Des.; 2004; in press.
8.1. Blood-Brain Barrier permeation model (BBB)

N. of compounds: 313
Probes used: OH2, DRY, O
N. of variables: 94
statistical tool: PLS
N. of components: 2
Response type (Y): BBB
Response values: 1 for BB+
-1 for BB-
0 for BB±
Discrimination for external compounds:
if BBpred < -0.3 then BB-
if BBpred > 0.3 then BB+
if -0.3 > BBpred < 0.3 then BB± |
To be effective as therapeutic agents, centrally acting drugs must cross the blood-brain barrier (BBB). Entry into the brain is a complex phenomenon which depends on multiple factors. The basic assumption of this model is passive permeation.
The BBB model is a qualitative model containing 313 related, but chemically diverse, compounds extracted from literature and in house data which are either brain-penetrating (BB+ score 1), have a moderate permeation (BB± score 0) or have a little if any ability to cross the blood-brain barrier (BB- score -1). PLS discriminant analysis was used to build the statistical model and two significant latent variables emerged from the PLS model with cross validation.
The 2D PLS score model offers a good discrimination between the BB+ and BB- compounds since it assigned a correct BBB profile to move than 90% of the compounds. When spectrum color is active, red points refer to brain-penetrating compounds, blue points are non-penetrating compounds and white points represent moderate permeation. The yellow line in the 2D-score plot represents the best PLS discrimination between BB+ and BB- compound. The line was drawn at BB level equal zero and divides the plot into two subspaces, populated mainly by BB- compounds (in the left) and by BB+ compounds (in the right). The space from red line to blue line is the interval where BB prediction can be border line and doubtful. These lines represent the SDEP error of the discriminant PLS, and show a sort of confidence interval on the discrimination model. Accordingly, the PLS score space is divided in:
(left) a region in which BB ranges from negative values till -0.3; this is the region in which compounds show no ability to cross the BB barrier.
(central) a small region from -0.3 to +0.3 (in between red and blue lines) where compounds show moderate permeability.
(right) a region in which BB ranges from +0.3 value till positive values; this is the region in which compounds show ability to cross the BB barrier.
The model can be used to project external compounds in the chemical space represented by the model in order to rank the BB behaviour of external compounds.
8.2. Termodinamic solubility model (SOLY)

N. of compounds: 1028 Probes used: OH2, DRY, O N. of variables: 94 statistical tool: PLS N. of components: 2 Response type (Y): Y = log[Sol(Mol/liter)] Response values: -8 < log[S] < +2 |
Acqueus solubility has long been recognized as a key molecular property in pharmaceutical science. Drug distribution, delivery and transport depend on solubility. Many groups have discussed the correlation between solubility and molecular properties. However, the majority of methods suffer of overfitting data.
The SOLY model is a quantitative model for termodyinamic solubility containing 1028 diverse chemical structures. The structures are extracted from checked literature and the dataset was also completed using in-house produced solubility data. The solubility values are the log[Soly] where Soly is expressed in Moli/litre at 25°C. Three components PLS model was used to correlate chemical structures and solubility values. The PLS plot shows the correlation obtained. From the objects pattern it can be seen that, while a nice differentiation between poorer / low / medium / high /and very high soluble compounds is possible, more quantitative predictions will be be difficult to achieve.
The average error in the external prediction is about ±0.7 log unit. While this range is not suitable to predict solubility values of external compounds, it is still sufficient to rank compounds in different categories and to use it for the filtering of compounds in virtual databases. Overall, it seemed unlikely that this model could be improved upon, and any attempt made to do so resulted in dangerous overfitting. Many factors can play a role in solubility, and most of these are virtually imposssible to control.
8.3. Caco2 permeation model (CACO2)

N. of compounds: 751
Probes used: OH2, DRY, O
N. of variables: 94
statistical tool: PLS
N. of components: 2
Response type (Y): Caco2 permeability
Response values:
-1 for Papp. < 4*10-6 cm/s
0 for 4*10-6 < Papp. < 8*10-6 cm/s
+1 for Papp. > 8*10-6 cm/s
Discrimination for external compounds:
if BBpred < -0.4 then not permeable
if BBpred > 0.4 then permeable |
The use of Caco2 cell monolayers has gained in popularity as in vivo human absorption surrogate. However, Caco2 cell permability measurements exhibit certain limitations due to the mechanisms involved. Both passive and active pathways exists. Unstirred water can sensibly modify the penetration coefficient. Intervariability between laboratories are common problems.
For all these problems quantitative comparison and model is almost impossible. In order to avoid a lack of consistent data, the Caco2 permeability values are transformed according the following scheme:
Papp. < 4*10-6 cm/s ---> score -1
Papp. > 8*10-6 cm/s ---> score +1
However, different assuption were made in special cases, when the experimental protocols were different or no internal standard compounds were used. A basic assumption of the model is the passive permeation.
The CACO2 model is a qualitative model containing 751 related, but chemically diverse, compounds collected from literature or experimentally measured in laboratories connected with our group. Data are either penetrating (score 1), or have a little if any ability to penetrate the epithelial cells (score -1). PLS discriminant analysis was used to build the statistical model and two significant latent variables emerged from the PLS model with cross validation.
The 2D PLS score model offers a discrimination between the permeable and less permeable compounds. When spectrum color is active, red points refer to high permeability and blue points to low permeability. There is a region in the central part of the plot with read and blue compounds. In this region the permeability prediction can be less reliable.
The model can be used to project external compounds in the chemical space represented by the model in order to rank the caco2 behaviour of external compounds.
8.4. Biopharmaceutical classification model (SolyPerm)

N. of compounds: 1833
Probes used: OH2, DRY, O
N. of variables: 94
statistical tool: PLS2
N. of components: 2
Response values:
Soly:
-1 for -8 < Log[S] < -5
0 for -5 < Log[S] < -3
+1 for -3 < Log[S] < +2
Perm:
-1 for Papp. < 4*10-6 cm/s
0 for 4*10-6 < Papp. < 8*10-6 cm/s
+1 for Papp. > 8*10-6 cm/s
Discrimination for external compounds:
Good candidates lie in the 1st quadrant (++ values)
Bad candidates lie in the 3rd quadrant (-- values)
Problematic candidates lie in the 2nd and 4th quadrants (+- and -+ values) |
According to the FDA, a biopharmaceutical drug classification for bioavailability studies of solid drug products recognizes that, in particular, drug dissolution and gastrointestinal permeability are the fundamental parameters controlling rate and extent of drug absorption. A drug with high solubility and high membrane permeability is considered practically exempt from bioavailability problems. A drug exhibiting low solubility and high permeability requires careful formulation work in order to improve its dissolution rate. A drug with high solubility and poor permeability is more difficult to formulate because absorption requires enhanced membrane permeability. Finally, a drug with poor solubility and bioavailability is a problematic candidate for administration. This classification is an important tool since it allows the selection of the best candidates among related compounds. In addition, this classification helps the galenical developments of dosage forms to improve dissolution rate, permeability and stability, and to avoid fist-pass effects. In fact, new polymeric materials and novel drug-delivery systems allow to optimize administration in terms of route, rate of delivery, membrane transport and stability in hostile environments.
In VolSurf the termodynamic solubility and caco2 permeation models have been condensed altogether with the aim to facilitate the identification of soluble and permeable compounds. Whenever both experimental values were not available, the missing data have been predicted by using the other VolSurf model, already presented in this section. Complessively, the SolyPerm model is composed by 1833 molecules. PLS2 discriminant analysis was used to build the statistical model and two significant latent variables emerged from the PLS2 model with cross validation.
Since two Y values are simultaneously used (solubility and permeability), the User must take care of which Y is currently used for colouring the 2D-plot. In fact, red points refer to high solubility as well as blue points refer to low solubility when Y1 is used. Otherwise, in case of Y2 red points refer to high permeability and blue points to low permeability. The Y values are those reported singularly in the thermodynamic solubility model and in the Caco2 permeation model.
8.5. Protein Binding model (Protein_Binding)

N. of compounds: 408
Probes used: OH2, DRY, O
N. of variables: 94
statistical tool: PLS
N. of components: 2
Response type (Y): % protein binding
Response values:
0% < PB < 99% |
"In silico" quantitative models to predict binding affinity to Human Serum Albumin (HSA) are very useful in pharmaceutical industries to provide pharmacokinetic properties in an early phase of drug discovery. Being HSA the principal biological carrier of many drugs, it falicitates their conveyance to the target tissues through the circulatory sistem. The determination of the protein binding depends upon the analysis used (dialysis, ultra-centrifugation, ultra-filtration, NMR, UV, HPLC and other chromatographic methods), the instruments used (type of membrane of dialysis, type of spectrometer, type of chromatographic equipment) and the experimental conditions chosen in different laboratories (type of albumin, its concentration, temperature and time of analysis). The variation of these parameters dramatically affects the final results and the experimental errors. Such huge variability of experimental conditions produces noise and makes the interpretation of the data more complicated.
The Protein_Binding model is a qualitative model containing 408 related, but chemically diverse, compounds partially collected from literature or experimentally measured in laboratories connected with our group. Data report mainly albumin protein binding values between 10% and 100% obtained with spectroscopic techniques. The average experimental error reported was 8%. therefore, the model is not able to discriminate protein binding values ranging from 95% to 100%. Compounds were modeled all in their neutral form. PLS discriminant analysis was used to build the statistical model and two significant latent variables emerged from the PLS model with cross validation.
The 2D PLS score model offers a discrimination between the compounds with high protein binding values (between 90% and 100%) and low protein binding values (from 10% to 50%). When spectrum color is active, red points refer to high protein binding and blue points to low protein binding.
The model can be used to project external compounds in the chemical space represented by the model in order to rank the protein binding profile of external compounds.
8.6. Volume of Distribution model (Volume_Distribution)

N. of compounds: 118
Probes used: OH2, DRY, O
N. of variables: 94
statistical tool: PLS
N. of components: 2
Response type (Y): Volume of distribution
Response values:
-2.1 < VD < 0.47 |
The volume of distribution (VD) for a drug is the volume that accounts for the total dose administration based on the observed plasma concentration. the plasma volume of the average adult is approximately 3 litres. Therefore, apparent volume of distribution larger than the plasma compartment (so greater than 3 litres) indicate that the drug is also present in tissue or fluid outside the plasma compartment. Volume of distribution represents a complex combination of multiple chemical and biochemical phenomena. It is a measure of the relative partitioning of drug between plasma and the tissues. Although the volume of distribution cannot be used to determinate the actual site of distribution in the body of a drug, it is of extreme importance in estimating the looding dose necessary to rapidly achieve a desired plasma concentration.
The Volume_Distribution is a quantitative model containing 118 related, but chemically diverse, compounds collected from literature. PLS discriminant analysis was used to build the statistical model and two significant latent variables emerged from the PLS model with cross validation. The 2D PLS score model offers a discrimination between the compounds with high values of VD and low values of VD. The VD data (Litre/Kg) were converted in -Log[VD] values. When spectrum color is active, red points refer to low VD values (which means low drugs distribution into tissues) and blue points to high VD values (which means high drugs distribution into tissues).
To our knowledge is the first time that VD is modeled using only "in silico" descriptors without any experimental information from pharmacokinetic data (LogD, pka ...).
In the plot the compounds are coloured by a spectrum defined range from -1.5 to 0.4 in log scale and the three lines produce four intervals with the following VD ranges in log scale (or in Litre/Kg scale):
VD < -0.8 (or VD > 6.3 L/Kg)
-0.8 < VD < -0.5 (or 3.2 < VD < 6.3 L/Kg)
-0.5 < VD < -0.2 (or 1.6 < VD < 3.2 L/Kg)
VD > -0.2 (or VD < 1.6 L/Kg)
8.7. hERG model (HERG)

N. of compounds: 167
Probes used: OH2, DRY, O
N. of variables: 94
statistical tool: PLS
N. of components: 2
Response type (Y): hERG inhibition class
Discrimination for external compounds:
hERG < -0.5 ---> hERG blockers area
-0.5 < hERG < 0 ---> hERG blockers / NON-blockers area
0 < hERG < 0.5 ---> hERG NON-blockers / blockers area
hERG > 0.5 ---> hERG NON-blockers area |
QT prolongation is an important biomarker for the development of cardiac arrhythmias. As a consequence, a number of drugs associated with QT prolongation have been removed from the market over the past decade. All cases of drug-induced QT prolungation are associated with a particular ion channel known as hERG. Thus, the hERG inhibitory effect represents today an important safety consideration in drug discovery. Although hERG inhibition seems to be dependent on 3D-pharmacophoric features disposition of a drug, VolSurf descriptors are widely used to simulate hERG inhibition, with amazingly good results. Therefore literature data, merged with company data, were selected for a total of 167 compounds.
Compounds with IC50 values for inhibition of hERG K+ channel expressed in mammalian cells (HEK, CHO, COS, neuroblastoma cells) were selected. Because of some diversity among experimental procedures, IC50 values were converted into discontinuous data: +1 for hERG NON-blockers and -1 for hERG blockers. The former are 80 compounds collected from literature (IC50 > 10) or from company data (IC50 > 30). The latter are 87 compounds collected from literature (IC50 < 1) or from company data (IC50 < 0.2). Therefore, the HERG model is a qualitative model. Molecules were built in their N+ charged form, whenever it was possible.
PLS discriminant analysis was used to build the statistical model and two significant latent variables emerged from the PLS model with cross validation. When spectrum color is active, red points refer to hERG NON-blockers whereas blue points refer to hERG blockers. The lines divide the plot in four areas with qualitative hERG range:
hERG < -0.5 ---> hERG blockers area
-0.5 < hERG < 0 ---> hERG blockers / NON-blockers area
0 < hERG < 0.5 ---> hERG NON-blockers / blockers area
hERG > 0.5 ---> hERG NON-blockers area
8.8. Water / DMSO solubility model (solDMSO)

Compounds intended for biological screening are increasingly being distributed as solution in dimethylsulphoxide (DMSO). In general (with a number of exceptions) compounds are more soluble in DMSO than in pure water. It is very difficult to predict thermodynamic solubility from DMSO solubility and viceversa, because when the compounds are dissolved in DMSO solution there is no compound crystal lattice to disrupt as part of the acqueus solubilization process. Solubility in a mixture of water (98%) DMSO (2%) solvents was carried out in our laboratory.
The solubility for 150 drug like compounds, all in neutral form, were measured, and a chemometric model was developed.
In order to reduce the amount of compounds to test, the solubility values were converted in a discrete scale, with the arbitrary ranges reported below:
0 for compounds with solubility < 10
1 for compounds with 10 < solubility < 100
2 for compounds with 100 < solubility < 180
3 for compounds with solubility > 180
The solDMSO model is a qualitative model. PLS discriminant analysis was used to build the statistical model and two significant latent variables emerged from the PLS model with cross validation. When spectrum color is active, red points refer to high solubility values and blue points to low solubility values.
The following plot represents 150 chemicals projected into the solubility model. The projected compounds are coloured by solubility in DMSO/water. It is possible to note that compounds in the left bottom corner show low thermodynamic solubility but medium-high DMSO solubility. Conversly, compounds in the left-upper corner show low thermodynamic and low DMSO solubility. Summing up, problematic compounds seems to be those located in the left-upper part of the plot, where solubility in pure water and/or in DMSO-water mixture appear to be quite poor.

8.9. CYP3A4 Metabolic Stability model (MetabolicStability)

N. of compounds: 1507
Probes used: OH2, DRY, O
N. of variables: 94
statistical tool: PLS
N. of components: 2
Response type: CYP3A4 metabolic stability (MS)
Response values: 1 for MS > 40%
-1 for MS < 40% |
Metabolic stability in human CYP3A4 cDNA-expressed microsomal preparation offers a suitable approach to predict the metabolic stability of external compounds. The dataset consisted of about 1600 compounds from Pharmacia Corporation. Each compound was incubated at a fixed concentration for 60 min with a fixed concentration of protein at 37°C. The reaction was stopped by adding acetonitrile to the solution and, after centrifugation to remove the protein, the supernatant was analyzed using LC/MS and MS. Compounds with a final concentration ≥40% of the corresponding control sample were defined as stable, whereas compounds with final concentrations below 40% of the corresponding control were defined as unstable.
The solubility of stable compounds was used as a primary filter. All the compounds with a solubility lower than 10 μM per liter were removed from the analysis. Insoluble compounds always result metabolically stable.
Two significant principal components were extracted. The score plot of the two principal components shows the compounds color-coded according to their metabolic stability (red points represent stable compounds, blue points indicate unstable compounds). It should be noted that the majority of unstable compounds are clustered in the left region of the principal component space. A more detailed inspection of the score plot indicates that some compounds are misclassified. However, further evaluation carried out on some of these compounds revealed experimental problems. Thus, it appears that this model can be used to evaluate the false-positive (or false-negative) experiments. Morever, it can also be used to evaluate the metabolic stability from the 3D structure of drug candidate prior to experimental measurements.
| <<< Previous | Home | Next >>> |
| Statistical Tools | Up | Command line options |