Using Uncertainty in Predictions

This tutorial explains how to use uncertainty (model confidence) visualization and thresholding in Nkululeko. Uncertainty helps you understand when your model is confident about its predictions and when it’s unsure.

Overview

Since Nkululeko version 0.94, aleatoric uncertainty (model confidence) is explicitly visualized. After running an experiment, you’ll find an uncertainty distribution plot in the images folder showing how uncertainty correlates with prediction accuracy.

What is Uncertainty?

In classification, uncertainty measures how confident the model is about its prediction. It’s typically computed using entropy of the predicted probability distribution:

Low uncertainty: Model is confident (probabilities concentrated on one class)
High uncertainty: Model is unsure (probabilities spread across multiple classes)

Automatic Uncertainty Visualization

When you run any classification experiment, Nkululeko automatically generates an uncertainty distribution plot:

results/<exp_name>/images/uncertainty_distribution.png

This plot shows:

Distribution of uncertainty values for correct predictions (usually lower uncertainty)
Distribution of uncertainty values for incorrect predictions (usually higher uncertainty)

A well-calibrated model shows clear separation between these distributions.

Using Uncertainty Threshold

You can use uncertainty to filter out low-confidence predictions. This is useful when:

It’s better to give no prediction than a wrong one
You’re working with critical applications (medical, safety)
You want to identify samples that need human review

Configuration

Add the uncertainty_threshold option in the [PLOT] section:

[PLOT]
uncertainty_threshold = 0.4

This will:

Generate the standard confusion matrix
Generate an additional confusion matrix excluding samples above the threshold
Show how accuracy improves when uncertain samples are filtered

Example Configuration: `exp_emodb_uncertainty.ini`

[EXP]
root = ./examples/results/
name = exp_emodb_uncertainty
runs = 1
epochs = 1
save = True

[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb
emodb.split_strategy = speaker_split
labels = ['anger', 'happiness', 'neutral', 'sadness']
target = emotion

[FEATS]
type = ['os']
scale = standard

[MODEL]
type = xgb

[PLOT]
# Uncertainty threshold: refuse to predict samples above this entropy value
# Lower values = stricter filtering (more samples rejected)
# Higher values = more permissive (fewer samples rejected)
uncertainty_threshold = 0.4

Run the Experiment

python -m nkululeko.nkululeko --config examples/exp_emodb_uncertainty.ini

Choosing the Right Threshold

Threshold	Effect
`0.2`	Very strict - only very confident predictions
`0.4`	Moderate - balanced filtering
`0.6`	Permissive - most predictions included
`0.8`	Very permissive - almost all predictions

The optimal threshold depends on your use case:

High-stakes applications: Use lower threshold (e.g., 0.2-0.3)
General analysis: Use moderate threshold (e.g., 0.4-0.5)
Maximum coverage: Use higher threshold or no filtering

Output Files

After running with uncertainty_threshold, you’ll find:

results/exp_emodb_uncertainty/
├── images/
│   ├── confusion_matrix.png          # Standard confusion matrix
│   ├── confusion_matrix_filtered.png # Filtered by uncertainty
│   └── uncertainty_distribution.png  # Uncertainty histogram
└── results/
    └── results.csv                   # Includes uncertainty metrics

Interpreting Results

Uncertainty Distribution Plot

A good model shows:

Correct predictions clustered at low uncertainty (left side)
Incorrect predictions spread toward high uncertainty (right side)
Clear separation between the two distributions

Filtered Confusion Matrix

Compare the filtered and unfiltered confusion matrices:

Improved accuracy: Filtering removes uncertain (often wrong) predictions
Reduced samples: Fewer samples in the filtered matrix
Trade-off: Better accuracy vs. fewer predictions

Use Cases

Quality control: Identify samples the model struggles with
Active learning: Select uncertain samples for human labeling
Cascaded systems: Route uncertain samples to more powerful models
Safety-critical applications: Refuse to predict when unsure
Model debugging: Understand where the model lacks confidence

Practical Example

Consider a medical diagnosis system:

Without threshold: 85% accuracy on all samples
With uncertainty_threshold = 0.3: 95% accuracy on 70% of samples

In this case, 30% of samples are flagged for human review, but the automatic predictions are much more reliable.

Examples of INI files in the Nkululeko examples directory:

exp_emodb_textclassifier.ini (threshold: 0.4)
exp_emodb_stress.ini (threshold: 0.3)
exp_emodb_audmodel_xgb.ini (threshold: 0.5)

Uncertainty works well with:

Multiple runs (runs = 5): Get uncertainty estimates across runs
Ensemble models: Combine predictions from multiple models
Feature importance: Identify which features cause uncertainty

Tips

Start with moderate threshold (0.4) and adjust based on results
Monitor coverage: Check what percentage of samples pass the threshold
Analyze uncertain samples: They often reveal data quality issues
Use with test data: Evaluate on held-out data for realistic estimates

Reference

Blog: Nkululeko - Using Uncertainty