nkululeko.nkululeko
The experiment module (nkululeko.nkululeko) orchestrates the end-to-end lifecycle of an experiment: reading an INI configuration, preparing data splits, extracting features, training models, evaluating, and producing plots / reports. This is the
the main user interface to run experiments with Nkululeko.
Responsibilities
Parse configuration (
[EXP],[DATA],[FEATS],[MODEL],[EXPL],[PLOT]).Manage run directories and result caching.
Trigger feature extraction pipelines (opensmile, praat, wav2vec2, etc.).
Initialize and train selected model type (svm, xgb, mlp, cnn, tree, knn, regressor variants).
Compute metrics (accuracy, UAR, regression scores) and generate confusion matrices.
Coordinate optional explainability steps (feature importance, distributions, regplot, PCA/t-SNE/UMAP scatter).
Invocation
Usage:
python -m nkululeko.nkululeko --config config_file.ini
Example:
python -m nkululeko.nkululeko --config examples/exp_emodb_os_svm.ini
Key Concepts
Concept |
Description |
|---|---|
Runs |
Repeats with different seeds for robustness. |
Store Format |
Choice of cached feature file format (csv, feather, pickle). |
Scaling |
Feature normalization (standard, minmax, none). |
Augmentation |
Optional audio transforms before extraction. |
Common INI Snippet
[EXP]
name = results/exp_demo
runs = 1
[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb.csv
target = emotion
[FEATS]
type = ['praat']
scale = standard
[MODEL]
type = xgb
[EXPL]
conf_mat = True
feature_distributions = top
regplot = [['duration','meanF0Hz']]
Outputs
results/<exp>/images/– plots (confusion matrix, distributions, regplots).results/<exp>/results/– metrics summaries.Feature cache under experiment root.
Internals
Important classes/functions (high-level):
Experiment class – central coordinator.
Hooks for plotting via
plotsmodule.Label encoding/decoding abstraction to support consistent plotting.
Tips
Start with a single feature set and model to validate pipeline.
Enable caching to save time on subsequent runs.
Use balanced splits (
speaker_split) for speaker leakage prevention.Limit
max_featswhen exploring importance to keep plots readable.
Testing a New Database with an Existing Model
When DATA.tests is set in the config and a saved experiment .pkl
already exists, nkululeko.nkululeko skips training automatically and
evaluates the stored best model on the new test database instead. This
produces a confusion matrix, a per-class text report, and a predictions CSV
with all original test columns plus a predicted column.
See test_new_database.md for a step-by-step guide.