Pretreatment

 

Pretreatment Menu

[Scale...][F.Factorial Selection...][Exclude vars.][Exclude objects...][Reload original...]

 

This menu contains commands that modify the variables generated by ALMOND.

 

If some pretreatment is applied to the data and the User quits the program, the next time that the data is loaded using File>>>Open data file the same scaling and objects/variables exclusion will be applied.

 


 

Pretreatment>>>Scale...

 

ALMOND 2.0 can apply the following scaling to the data:

Raw No transformation is performed
Remove baseline Often, some interactions are found only for some compounds in the series. The associated variables take a certain value for these compounds and zero for all the others, and therefore have a large variance that can be detrimental for the modeling. The Baseline scaling removes from the X matrix all the variables that take zero value for one or more compounds.
Normalize block-wise Values within each block is normalized between 0.5 and 2, using the maximum value of the product of interaction found into this block.

 

After the scaling operation is applied, the data is used with this scaling in all instances (in plots, modeling, export etc..) until another scaling is applied or the original data is reloaded with the command Pretreatment>>>Reload original. Scalings are not accumulative, the original data is transparently reloaded before each scaling operation.

 

The command presents a dialog like this:

Scale Dialog

 

Scaling

Click on the radio-button to select the different scalings.

 

 

Once the scaling method is selected press the OK button to apply it to the data or Cancel to exit without to apply any scaling

 


 

Pretreatment>>>F. Factorial selection...

 

FFD Variable Selection

 

The dialog window is divided into three parts by horizontal lines. The upper part is identical to that shown in Modeling>>>Validate PLS model with the only difference that in the Validation mode the Specific Groups option is not present. Please refer to the section which describe these commands for details. The lower parts are new and will be described here.

FFD Selection Parameters

 

FFD method parameters. This box includes check boxes that define some parameters of the variable selection methodology.

 

Use grouping of variables. This check box is always insensitive in ALMOND.

 

Retain uncertain variables. When this option is ON the variables with an uncertain effect on the predictive ability of the model will not be removed from the data file. See Background section for further explanation about the concept of 'uncertain variables'. In our experience and in the context of 3D-QSAR, it is advisable to retain the uncertain variables in the models.

 

Fold-over design. Select this option to force the factorial design to "fold-over". This means that all the variable combinations (PLS models) will be repeated, inverting the pattern of signs in the combination matrix. As a result the effect of the variables (or groups of variables) on the model predictive ability is evaluated in a much safer way, because the design contains less confoundings. However the procedure will take twice the time it takes in a standard procedure. The effect of fold-over on the quality of the variable selection is further discussed in the Background section.

 

Combinations/variables ratio This scale controls the number of rows of the combination matrix. It is possible to calculate the number of PLS models to be tested as the smaller power of 2 higher than the number of active and dummy variables multiplied by the value shown in this scale. So, if 500 variables are present (active variables plus dummies), a value of 2 in this scale will result on testing 1024 PLS models (210) and a value of 3 will result on testing 2048 (211) PLS models.

Increase the value of this scale produce better estimations of the effects of the variables on the model predictive ability, but will also slow down the computation. The default value is 2.0.

 

Dummies Some of the columns of the combination matrix are labelled as "dummy variables" in order to evaluate the noise level in the model. The radio buttons controls the percentage of dummies to include in the combination matrix. The effect of these variables on the size of the combination matrix is described above. Our suggestion is to add a 20% of the number of active X-variables: this is a good choice for most of the cases.

 

Execution Parameters

 

CPU priority Please move the scale to the right to execute the calculation with a lower CPU priority (more "nice", using UNIX terminology). A lower CPU priority might be preferable when the computer is doing many others jobs in background.

 

Execution

The options are:

 

When all the settings are correct press the OK button and the FFD selection will be started. Press the Cancel button to abort or the Defaults button to fill all the settings with the default values. A few seconds after the selection starts, the program gives an estimation of the time required to complete the calculations. If the process is running in background the status of the process can be inspected in the file namefile.alm.FFDlog; if the process is running in a independent window, the information will be displayed in that window. Moreover, a file named FFD.csh will be created, containing a shell script useful for running the F. Factorial selection on a different computer.

 

Once the calculation is finished the results must be applied to the data using the Pretreatment>>>Exclude vars.>>> Exclude FFD... command.

 


 

Pretreatment>>>Exclude vars.

 

Exclude vars Menu

 

Often, the models obtained with ALMOND benefit with the removal of variables not correlated with the activity. In other situations it is even convenient to remove whole correlograms. These submenus offer the possibility to exclude temporarily one or many variables from the analysis.

 

In any case, the original data (original objects and original variables) can be restores using the command Pretreatment>>>Restore original, without recomputing the data.


 

Pretreatment>>>Exclude vars.>>>Exclude blocks...

 

This command allows the user to remove from the X variables one or many blocks of variables, each one representing an auto or cross-correlograms.

Exlude Blocks Dialog

 

Blocks to exclude:

Click on one or more buttons to mark the blocks to remove.

 

Once the blocks to remove are selected press OK to actually remove them from the X variables or Cancel to exit without modifying the data.

 


 

Pretreatment>>>Exclude vars.>>>Exclude individual var....

 

This command allows the User to exclude one or many specific variables from the X matrix. The command shows a dialog like this, with a list of all the variables.

 

Exclude Indiv Vars Dialog

 

Click on top of any line to change the status from active to excluded or viceversa.

 

The Inverse button can be use to invert simultaneously the status of all the variables. When the list shows the desired active variables press the OK button to apply the changes or Cancel to exit without modifying the X data.


 

Pretreatment>>>Exclude vars.>>>Exclude FFD...

 

This command should be used after a FFD variables selection analysis to actually remove from the X matrix the variables that decrease the predictive quality of the PLS models.

 

Exclude FFD Dialog

 

F. Factorial selection list

This list contains all the previous variable selection procedures performed on this data file. Each procedure is identified by a sequential number, the initial number of variables, the final number of variables, the number of components and the hour and data when the procedure was finished. Click on any item to select it. The selected items will be included in the Selection input field.

 

Selection:

The selection from the above list is shown in the Selection input field.

 

When the OK button is pressed, the variable selection procedure chosen will be read and all the variables not selected in this procedure will be excluded from the X matrix


 

Pretreatment>>>Exclude objects...

 

This command can be launched with the pull-down menu or with the prediction palette. It can be used to remove from the analysis one or more objects (molecules). This command can be used to remove outliers, split the dataset in a training and a prediction set, etc...

 

Exclude Objects Dialog

 

Click on top of any line to change the status from active to excluded or viceversa.

 

Invert

This button can be use to change the status for all the objects from active to excluded or viceversa.

 

plot 2D

Offers the possibility to select an interactive 2D plot for selecting the objects to exclude First, the User must chose the kind of 2D plot to use. Options are.

{short description of image}

PCA-scores

Plots objects (molecules) in a two-dimensional space, using the PCA-scores vectors obtained from the PCA model.Click here for more information.

PLS-score

Plots objects (molecules) in a two-dimensional space, using the PLS-scores vectors obtained from the PLS model.Click here for more information.

Scatter

Plots the Recalculated Y-values (vertical axis) against the Experimental Y-values (horizontal axis). Click here for more information.

In any of these plots, the User can select a subregion of allowed compounds. In order to select the compounds, with the mouse on the 2D plot, click the central mouse button until a magenta cross symbol will appear. Then move the mouse in another position and click the button again. A line will be drawn in the plot. Repeat the procedure until a polygon has been drawn. Be sure to close the polygon (the line color will change to red). When finished, press the button capture clipboard on the dialog and ALMOND will automatically select the compounds situated within the polygon, updating the list of active compounds accordingly.

If the Exclude object command have been launched by the prediction palette only the objects of the external set (red objects) can be selected, other objects (in white) are ignored.

 

When the list shows the desired active objects press the OK button to apply the changes or Cancel to exit without modifying the X data.

IMPORTANT: The removal of the objects can alter the scale of the variables. For example, when Remove Baseline scaling is applied, the removal of the objects which take value of zero for some variables would make active these variables. Therefore, after each object removal, the data is automatically re-scaled using the same scaling method.


 

Pretreatment>>>Reload original

 

This option will reload the original variables and objects, reverting the effect of the commands which exclude variables and objects. The number of variables considered in the dataset after the reloading will be shown in the main window and in the status line for reference.

Since ALMOND 3.0, it might be possible that some reloaded objects have only missing activity values because they were excluded during activity importing. If it is the case, the name of these objects is printed on the main text window and two options are offered to the user : Clean all the activity value and import new ones using the Import activity dialog, or exclude these objects once again.

 

Latest versions

Login

Username

Password

Register | Lost password?