Chapter 12. Menu Pretreatment (Alt-R)

This menu contains commands that modify the variables generated by VolSurf.

If some pretreatment is applied to the data and the User quits the program, the next time that the data is loaded using File->Open data file the same scaling and objects/variables exclusion will be applied.

12.1. Pretreatment->Scale...

VolSurf can apply the following scaling to the data:

RawNo transformation is performed
Autoscale

Every variable is mean centered and scaled to give it unit variance.

Note: This is the default scaling applied by VolSurf.

After the scaling operation is applied, the data is used with this scaling in all instances (in plots, modeling, export etc..) until another scaling is applied or the original data is reloaded with the command Pretreatment->Reload original. Scalings are not accumulative, the original data is transparently reloaded before each scaling operation.

The command presents a dialog like this:

Scaling:

click on the radio-button to select the different scalings.

Once the scaling method is selected press the OK button to apply it to the data or Cancel to exit without to apply any scaling.

12.2. Pretreatment->F. Factorial selection...

The dialog window is divided into three parts by horizontal lines:

12.2.1. Model Validation Parameters

this part is identical to that shown in Modeling->Validate PLS model... with the only difference that in the Validation mode the Specific Groups option is not present. Please refer to the section which describe these commands for details. The lower parts are new and will be described here.

12.2.2. FFD Selection Parameters

FFD method parameters

this box includes check boxes that define some parameters of the variable selection methodology.

Use grouping of variables

this check box is always inactive in VolSurf.

Retain uncertain variables

when this option is active the variables with an uncertain effect on the predictive ability of the model will not be removed from the data file. See Background section for further explanation about the concept of 'uncertain variables'. In our experience and in the context of 3D-QSAR, it is advisable to retain the uncertain variables in the models.

Fold-over design

select this option to force the factorial design to "fold-over". This means that all the variable combinations (PLS models) will be repeated, inverting the pattern of signs in the combination matrix. As a result the effect of the variables (or groups of variables) on the model predictive ability is evaluated in a much safer way, because the design contains less confoundings. However the procedure will take twice the time it takes in a standard procedure. The effect of fold-over on the quality of the variable selection is further discussed in the Background section.

Combinations/variables ratio

this scale controls the number of rows of the combination matrix. It is possible to calculate the number of PLS models to be tested as the smaller power of 2 higher than the number of active and dummy variables multiplied by the value shown in this scale. So, if 500 variables are present (active variables plus dummies), a value of 2 in this scale will result on testing 1024 PLS models (210) and a value of 3 will result on testing 2048 (211) PLS models.

Increase the value of this scale produce better estimations of the effects of the variables on the model predictive ability, but will also slow down the computation. The default value is 2.0.

% of dummies

some of the columns of the combination matrix are labelled as "dummy variables" in order to evaluate the noise level in the model. The radio buttons controls the percentage of dummies to include in the combination matrix. The effect of these variables on the size of the combination matrix is described above. Our suggestion is to add a 20% of the number of active X-variables: this is a good choice for most of the cases.

12.2.3. Execution Parameters

CPU priority (->lower)

please move the scale to the right to execute the calculation with a lower CPU priority (more "nice", using UNIX terminology). A lower CPU priority might be preferable when the computer is doing many others jobs in background.

Execution

  • select Background to start the variable selection procedure as an independent background job. This is the best choice for time consuming works, because the User can log out of the computer without stopping the job. The progress of the procedure can be inspected in the log file namefile.vol.FFDlog

  • select Window to start the variable selection procedure as a interactive job in a independent window (xterm). This is better for short jobs, because the progress of the selection can be followed in the window. However, if the User closes this window or logs out the job will be stopped.

When all the settings are correct press the OK button and the FFD selection will be started. Press the Cancel button to abort or the Defaults button to fill all the settings with the default values. A few seconds after the selection starts, the program gives an estimation of the time required to complete the calculations. If the process is running in background the status of the process can be inspected in the file namefile.vol.FFDlog; if the process is running in a independent window, the information will be displayed in that window. Moreover, a file named FFD.csh will be created, containing a shell script useful for running the F. Factorial selection on a different computer.

Once the calculation is finished the results must be applied to the data using the Pretreatment->Exclude vars.->FFD selections... command.

12.3. Pretreatment->Exclude vars.

Often, the models obtained with VolSurf benefit with the removal of variables not correlated with the activity. In other situations it is even convenient to remove VolSurf variables obtained with a certain probe. These submenus offer the possibility to exclude temporarily one or many variables from the analysis.

In any case, the original data (original objects and original variables) can be restores using the command Pretreatment->Reload original, without recomputing the data.

12.3.1. Pretreatment->Exclude vars.->Blocks...

This command allows the user to remove from the X variables one or many blocks of variables, each one representing the variables obtained with a certain probe:

Blocks to exclude: click on one or more buttons to mark the blocks to remove.

Once the blocks to remove are selected press OK to actually remove them from the X variables or Cancel to exit without modifying the data.

12.3.2. Pretreatment->Exclude vars.->Individual var...

This command allows the User to exclude one or many specific variables from the X matrix. The command shows a dialog like this, with a list of all the variables.

Click on top of any line to change the status from active to excluded or viceversa.

The Inverse button can be use to invert simultaneously the status of all the variables. When the list shows the desired active variables press the OK button to apply the changes or Cancel to exit without modifying the X data.

12.3.3. Pretreatment->Exclude vars.->FFD selections...

This command should be used after a FFD variables selection analysis to actually remove from the X matrix the variables that decrease the predictive quality of the PLS models.

F. Factorial selection list

this list contains all the previous variable selection procedures performed on this data file. Each procedure is identified by a sequential number, the initial number of variables, the final number of variables, the number of components and the hour and data when the procedure was finished. Click on any item to select it. The selected items will be included in the Selection input field.

Selection:

the selection from the above list is shown in this input field.

When the OK button is pressed, the variable selection procedure chosen will be read and all the variables not selected in this procedure will be excluded from the X matrix. Press Cancel for no selections and exit.

12.4. Pretreatment->Exclude objects

12.4.1. Pretreatment->Exclude objects->Manually...

This command can be used to remove from the analysis one or more objects (molecules). This command can be used to remove outliers, split the dataset in a training and a prediction set, etc...

Click on top of any line to change the status from active to excluded or viceversa.

The Inverse button can be used to invert simultaneously the status of all the objects. When the list shows the desired active objects press the OK button to apply the changes or Cancel to exit without modifying the X data.

Important: The removal of the objects can alter the scale of the variables. For example, when Autoscaling is applied, the removal of the objects can alter the weights applied to the variables. Therefore, after each object removal, the data is automatically re-scaled using the same scaling method.

12.4.2. Pretreatment->Exclude objects->From a plot...

This command can be used to remove from the analysis one or more objects (molecules). It can be used to remove cluster of molecules, to select cluster of molecules, to split the dataset into training and test set, etc.

When the following dialog is shown, select the plot to use for the graphic selection of objects to exclude an then press OK to accept or Cancel to close the dialog and exit:

Model:

the User can select the foollwing options: PCA, PLS scores , PLS plot, PLS Recalc vs Experimntal or PLS Predicted vs Experimntal:

  • if PCA or PLS scores is selected the following dialog appears:

    press OK to accept to work on the PC1-PC2 score plot. Alternatively select the appropriate PC dimensions in the two fields and press OK.

  • if PLS plot, PLS Recalc vs Experimntal or PLS Predicted vs Experimntal is selected the following dialog appears:

    press OK to accept the dimensionality or select the appropriate PC dimensions in the field and press OK.

after the chosen plot appears, for select the subregion of objects to exclude, with the mouse on the 2D plot, click the central mouse button until a magenta cross symbol will appear. Then move the mouse in another position and click the button again. A line will be drawn in the plot. Repeat the procedure until a polygon as been drawn surrounding the region to consider. Be sure to close the polygon (the line colour will change to red).

click on Pretreatment->Exclude objects->Manually... and the list of objects window will appear with the line status of the graphically selected compounds as "excluded". Press OK if your graphic selection is correct.

Only from this moment the compounds will be excluded.

The Inverse button can be use to invert simultaneously the status of all the objects. When the list shows the desired active objects press the OK button to apply the changes or Cancel to exit without modifying the X data.

Important: The removal of the objects can alter the scale of the variables. For example, when Autoscaling is applied, the removal of the objects can alter the weights applied to the variables. Therefore, after each object removal, the data is automatically re-scaled using the same scaling method.

12.5. Pretreatment->Reload original

This option will reload the original variables and objects, reverting the effect of the commands which exclude variables and objects. The number of variables considered in the dataset after the reloading will be shown in the main window and in the status line for reference.

Latest versions

Login

Username

Password

Register | Lost password?