Chapter 6. Kibitzer: Extend pKa accuracy

Kibitzer is an automatic and expert tool to expand the MoKa internal database with a corporate database of pKa values. A fully automated training procedure makes it easy to create customized pKa models by using your experimental pKa values. You can also easily import an SD file containing experimental pKa values into the Kibitzer tool. Check the pKa assignments and, if necessary, correct them by using the embedded molecular viewer. Create customized models and verify the enhancements using MoKa. To save your Kibitzer project, export it in an SD file.

6.1. The Interface and the Workflow

Here is the Kibitzer Interface and the three steps used to build a custom pKa prediction model:

  • Import SD file

  • Check assigned pKa

  • Build models

6.2. The Menus

Here are the menus you will see displayed on the Kibitzer menu bar (left to right), and the commands you will find in each.

File menu

  • Open - Opens a .kib file to import into Kibitzer

  • Save - Saves your current work

  • Save as... - Saves your current work in the file specified

  • Import SD data file - Imports an SD file

  • Export SD - Exports the current project to an SD file including appropriate fields to store user defined settings

  • Exit - Quits Kibitzer

Edit menu

  • Select All - Selects all the molecules listed

  • Unselect All - Unselects all the molecules listed

Tools menu

  • Compute - Opens dialog box to start the model building process

  • Load Custom Model - Selects the Custom Model to use to assign experimental pKas. Please select this option before importing the SD file

  • Restore Default Model - Selects the MoKa standard Internal Model to assign experimental pKas. Please select this option before importing the SD file

Note: It is possible to load a custom model at program startup by using the --load-model option, like this: kibitzer --load-model=modelname

View menu

  • Optimize 3D view - Rotates the structure in the Molecule view window for better visualization

  • Toggle log window - Opens the log window

  • Annotate depiction - Annotates the 2D depiction with atom numbers (in parentheses) and experimental pKa

Help menu

  • Manual - Opens the manual

  • About - Displays information about the Kibitzer version

6.3. Working with Kibitzer

This chapter provides basic information to get you started working with Kibitzer. Before creating customized pKa prediction models, you need to consider how they will be used. Are you going to add your whole database of pKa values? Are you going to add only a part of your data set and keep the remainder for testing? Are you going to add only one molecular structure? Knowing your requirements will help you choose a suitable data set.

Kibitzer improves the accuracy of MoKa pKa calculations by expanding the chemical space of the internal database. If you want to test the capabilities of the software, bear in mind that Kibitzer works by automatically adding missing parameters to describe a molecular structure. When the structure that you are adding is already known, Kibitzer will account for your pKa and adjust the existing parameters accordingly.

By checking the QP value, you can easily see whether your structure contains structural features not parametrized in the internal database. The higher the absolute value of QP is, the further is your structure is from the chemical space covered by the model currently in use. To obtain the best results and avoid overfitting, include at least 2-3 structures of the same series when the QP for a pKa is significantly high ( > 1.0).

To select a data set suited to your requirements, you need to consider the following:

  • Kibitzer cannot significantly improve the accuracy of the predictions if QP = 0. However, you will be able to see a shift towards your experimental data.

  • If you generate a model for structures that have nonzero QP values, Kibitzer will build a model that has a very good fit with your experimental data.

  • If you add only one pKa, you will be able to see approximately the same pKa shift for structures of the same series.

  • If the training library and the test library are unrelated you will not be able to see the effects of the training.

6.3.1. Step 1: Import SD file

Import an SD file containing experimental pKa values and Kibitzer will automatically assign such pKas to their corresponding ionizable sites.

In the SD file each pKa should be reported in a field containing the pattern <pka> (case insensitive). For example:

> <PKA1>
3.490

> <PKA2>
5.320
It is also possible to import multiple pKa values in a single field <pka> as follows:
> <PKA1>
3.490
5.320
Kibitzer reads only the first number at the beginning of the line, therefore lines not beginning with numeric characters are disregarded. Likewise any other information following the first number is ignored, i.e:
> <PKA1>
<3.0
5.320  cosolvent
only pKa 5.32 is accepted. Please note that any information on pKa atom assignment is not necessary.

6.3.2. Step 2: Check assigned pKas

After importing your SD file, the Name of every molecule is listed on the left panel along with the attributes Accuracy, Class and Info. You can sort molecules by clicking on the corresponding attribute. The Accuracy attribute is also displayed by colored arrows:

You should be particularly careful with molecules labeled by a red arrow. Kibitzer assigns experimental pKa values according to MoKa calculations. A significant difference (> SD + 1.5 pKa units) between the calculated and the experimental pKas of a molecule is highlighted by a red arrow to the left of the molecule's name. If the assignments have a good degree of accuracy, the corresponding molecule is associated with a yellow (better than 1.5 pKa units) or green (within standard deviation) arrow. Molecules that have no experimental pKa value or no ionizable site do have any arrow to the left of their names.

The label class classifies molecules according to the relative number of predicted and experimental pKa values reported in the SD file.

To ease the assignment, predicted pKas in the extreme range are filtered out. For example, predicted pKas above 12 of weak acids or predicted pKas below 2 of very weak bases are removed because such pKas cannot be measured in normal conditions.

If for a molecule the number of experimental pKa values exceeds the number of ionizable sites, the experimental pKa values regarded as the least reliable are removed.

When Kibitzer finds more ionizable sites than experimental pKas, all the non-assigned sites are labeled as "N/A" (not available). This does not represent a problem for the computation and these centers are simply not considered.

The attribute Class keeps track of such operations:

  • CLASS A: n. pred. pKa = n. exp. pKa (before filtering); n. pred. pKa = n. exp. pKa (after filtering)

  • CLASS B: n. pred. pKa > n. exp. pKa (before filtering); n. pred. pKa = n. exp. pKa (after filtering)

  • CLASS C: n. pred. pKa < n. exp. pKa (before filtering); n. pred. pKa < n. exp. pKa(after filtering)

  • CLASS D: n. pred. pKa < n. exp. pKa (before filtering); n. pred. pKa = n. exp. pKa(after filtering)

Warning: prevent noise in your custom model. You might find that some of the assigned experimental pKa values are accompanied by a warning, which indicates that not only the experimental pKa assigned is very different from the predicted one, but that the system is also well parametrized for the structure that you are submitting. Consequently, this pKa value might not be beneficial to the training.

Figure 6-1. Check warnings in the assignment

Typically, the warning stems from one of the following:

  • The assignment is wrong; you can try to correct it manually

  • MoKa is not parametrized to predict that particular pKa correctly

  • The experimental pKa reported conflicts with the structure given

This problem can usually be solved by manually correcting the assignment. If this procedure does not work but you are confident that the experimental pKa is correct, disregard the warning. Otherwise, deselect the corresponding pKa value before building the custom models.

Add weight to your experimental pKa values. Ionizable centers that have QP = 0 are very well parametrized and so adding your experimental pKa values may produce little benefit to your customized model. If you wish to add more weight to your experimental pKa values, you can add replicas of the same structures. While it is not possible to predict the number of replicas to add, a minimum of three structures is necessary to obtain a significant effect.

6.3.3. Step 3: Build Models

If you are happy with the current assignments, you can build a new model, which will be stored in a .mkd file. You can also run a full validation by checking full validation in the dialog box Build model.

The validation process is important to improve the model's predictive ability, and it is increasingly important to run it for large training data (over 3000 pKa values). However, you can safely skip this step if your training library is small.

MoKa can import the .mkd file (Edit->Load custom model) and calculate pKas using a custom model, which is based on the MoKa internal database biased by the imported custom pKa database.

6.4. Tips and Troubleshooting

Here are a few tips for using Kibitzer.

  • Start building a temporary custom model that includes one or two compounds per series. Then load this temporary custom model to ease the assignment of your whole pKa database.

  • Remember that pKa value warnings might add noise to your models.

Some useful how to's.

  • merge multiple .kib files: with your currently opened .kib file, click Save as... and select the .kib file with whom you need to do the merging. You will be prompted to either overwrite or append. Click append for merging.

  • check the benefits of training: Export your Kibitzer project file in an SD file that will include all the assignments set. After this, load the SD file into Versus and select pKa assignments from Kibitzer SD. Now you can check the results with internal and custom models.

  • check the effects of training on the pKa prediction models: Tools > Select Internal Model. Kibitzer now loads the custom model selected and indicates the differences with internal models in terms of standard deviation. Differences of more than 0.05 may indicate inconsistencies in your custom model

Upgrading from older versions. Please note that models saved in .mkd files are NOT compatible with different versions of the software. You can save your current work for future use in a .kib file or in a .SD file, which can be both safely transferred from one version to another of Kibitzer.

6.5. Capabilities and Limitations

Kibitzer allows you to expand the chemical space covered by MoKa, and the custom model results in more accurate pKa predictions, but only for molecules within the new chemical space explored. To test the capabilities of Kibitzer you need to benchmark the predictivity of a custom model against a set of molecules of the same series of those added.

Latest versions

Login

Username

Password

Register | Lost password?