Using Split Train and Test Data

This tutorial shows how to configure separate train and test sets using the split_strategy option, and how to confirm results using the test module.

Reference: How to use train, dev and test splits with Nkululeko

Overview

In machine learning, the typical workflow is:

Train your model on a training set
Tune hyperparameters on a development (validation) set
Evaluate on a held-out test set

Nkululeko can directly handle train/test splits in a single experiment using the split_strategy option.

Using Split Strategy

The simplest way to define train and test sets is using separate databases with split_strategy:

[EXP]
root = ./examples/results/
name = exp_polish_splits
save = True

[DATA]
databases = ['train', 'dev', 'test']
train = ./data/polish/polish_train.csv
train.type = csv
train.absolute_path = False
train.split_strategy = train
dev = ./data/polish/polish_dev.csv
dev.type = csv
dev.absolute_path = False
dev.split_strategy = train
test = ./data/polish/polish_test.csv
test.type = csv
test.absolute_path = False
test.split_strategy = test
target = emotion

[FEATS]
type = ['os']
scale = standard

[MODEL]
type = xgb
save = True

Key Configuration

Option	Description
`<db>.split_strategy = train`	Use this database for training
`<db>.split_strategy = test`	Use this database for testing only
`<db>.split_strategy = dev`	Use this database for development/validation

Running the Experiment

With split strategies defined, run a single command:

python -m nkululeko.nkululeko --config myconf.ini

This trains on the train/dev data and evaluates on the test set in one go.

Confirming Results with the Predict Module

After running your experiment, you can use the unified predict module to re-evaluate your saved model on a labeled test set. This is useful to:

Confirm the results from your experiment
Evaluate the model on additional test sets
Generate detailed test reports

Using the predict module in model mode

First, ensure your model is saved by setting save = True in both [EXP] and [MODEL] sections. Then run:

python -m nkululeko.predict \
    --config myconf.ini \
    --type model \
    --list ./data/polish/polish_test.csv \
    --outfile polish_test_predict.csv

The predict module will:

Load the saved best model from the experiment
Run it on every audio file listed in the CSV
Write predictions next to the original columns into --outfile

Defining Test Databases

You can also specify test databases using the tests option:

[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb
emodb.split_strategy = speaker_split
target = emotion
labels = ['anger', 'happiness', 'neutral', 'sadness']
; Define additional test databases
tests = ['crema-d']
crema-d = ./data/crema-d/crema-d
crema-d.split_strategy = test

Automatic fast path: when DATA.tests is set and a saved experiment already exists on disk, nkululeko.nkululeko skips training on subsequent runs and evaluates the stored best model on the test databases directly. The same config file therefore works for both the initial training run and all later test evaluations without any changes. See test_new_database.md for the full workflow.

Cross-Database Evaluation

A common use case is training on one database and testing on another:

[DATA]
databases = ['emodb', 'crema-d']
emodb = ./data/emodb/emodb
emodb.split_strategy = train
target = emotion
labels = ['anger', 'happiness']
; Test on a different database
crema-d = ./data/crema-d/crema-d
crema-d.split_strategy = test

This evaluates how well your model generalizes to unseen data from a different source.

Example Files

exp_emodb_os_xgb_test.ini: Cross-database evaluation example
exp_polish_flags.ini: Flags module for systematic comparison

Tips

Save your model: Set save = True in both [EXP] and [MODEL] sections to use the test module
Use split_strategy consistently: Set train, dev, or test for each database
Matching labels: Ensure test database has the same labels as training data
Final evaluation: Only evaluate on test set once, after all hyperparameter tuning is complete

Comparing Multiple Configurations with Flags Module

To systematically compare different models, features, and preprocessing options, use the flags module:

[EXP]
root = ./examples/results/
name = exp_polish_flags

[DATA]
databases = ['train', 'dev', 'test']
train = ./data/polish/polish_train.csv
train.type = csv
train.split_strategy = train
dev = ./data/polish/polish_dev.csv
dev.type = csv
dev.split_strategy = train
test = ./data/polish/polish_test.csv
test.type = csv
test.split_strategy = test
target = emotion

[FEATS]
; Leave empty - will be set by FLAGS

[MODEL]
; Leave empty - will be set by FLAGS

[FLAGS]
models = ['xgb', 'svm']  
features = ['praat', 'os']   
balancing = ['none', 'ros', 'smote']  
scale = ['none', 'standard', 'robust', 'minmax']  

Running the Flags Module

python -m nkululeko.flags --config exp_polish_flags.ini

This will automatically run all combinations:

2 models × 2 feature sets × 3 balancing methods × 4 scalers = 48 experiments

Flags Options

Flag	Values	Description
`models`	`['svm', 'xgb', 'mlp', 'knn', ...]`	Model types to compare
`features`	`['os', 'praat', 'wav2vec2', ...]`	Feature extractors
`balancing`	`['none', 'ros', 'smote', ...]`	Class balancing methods
`scale`	`['none', 'standard', 'robust', 'minmax']`	Feature scaling