nkululeko.nkululeko

The experiment module (nkululeko.nkululeko) orchestrates the end-to-end lifecycle of an experiment: reading an INI configuration, preparing data splits, extracting features, training models, evaluating, and producing plots / reports. This is the the main user interface to run experiments with Nkululeko.

Responsibilities

Parse configuration ([EXP], [DATA], [FEATS], [MODEL], [EXPL], [PLOT]).
Manage run directories and result caching.
Trigger feature extraction pipelines (opensmile, praat, wav2vec2, etc.).
Initialize and train selected model type (svm, xgb, mlp, cnn, tree, knn, regressor variants).
Compute metrics (accuracy, UAR, regression scores) and generate confusion matrices.
Coordinate optional explainability steps (feature importance, distributions, regplot, PCA/t-SNE/UMAP scatter).

Invocation

Usage:

python -m nkululeko.nkululeko --config config_file.ini

Example:

python -m nkululeko.nkululeko --config examples/exp_emodb_os_svm.ini

Key Concepts

Concept	Description
Runs	Repeats with different seeds for robustness.
Store Format	Choice of cached feature file format (csv, feather, pickle).
Scaling	Feature normalization (standard, minmax, none).
Augmentation	Optional audio transforms before extraction.

Common INI Snippet

[EXP]
name = results/exp_demo
runs = 1

[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb.csv
target = emotion

[FEATS]
type = ['praat']
scale = standard

[MODEL]
type = xgb

[EXPL]
conf_mat = True
feature_distributions = top
regplot = [['duration','meanF0Hz']]

Outputs

results/<exp>/images/ – plots (confusion matrix, distributions, regplots).
results/<exp>/results/ – metrics summaries.
Feature cache under experiment root.

Internals

Important classes/functions (high-level):

Experiment class – central coordinator.
Hooks for plotting via plots module.
Label encoding/decoding abstraction to support consistent plotting.

Tips

Start with a single feature set and model to validate pipeline.
Enable caching to save time on subsequent runs.
Use balanced splits (speaker_split) for speaker leakage prevention.
Limit max_feats when exploring importance to keep plots readable.

Testing a New Database with an Existing Model

When DATA.tests is set in the config and a saved experiment .pkl already exists, nkululeko.nkululeko skips training automatically and evaluates the stored best model on the new test database instead. This produces a confusion matrix, a per-class text report, and a predictions CSV with all original test columns plus a predicted column.

See test_new_database.md for a step-by-step guide.