Train/Dev/Test Splits

This tutorial explains how to use three-way data splits (train, dev, test) in Nkululeko for proper model evaluation and to avoid overfitting.

Reference: Nkululeko: How to use train/dev/test splits

Why Three Splits?

Supervised machine learning works as follows:

  1. Training phase: A learning algorithm adapts to a training dataset, producing a trained model

  2. Inference phase: The model makes predictions on a test set

The Overfitting Problem

Complex models may memorize training data rather than learning generalizable patterns. This means:

  • ✅ Great performance on training data

  • ❌ Poor performance on new data

This phenomenon is called overfitting.

The Solution: Development Set

To prevent overfitting:

  • Hyperparameters are optimized using a held-out evaluation set (not used during training)

  • Training stops when performance on the evaluation set declines (early stopping)

  • The best-performing model on evaluation data is selected

However, this introduces a new problem: the model may now be overfitted to the evaluation data!

The Final Solution: Test Set

A third dataset is needed for final testing—one that has not been used at any stage of model development.

The three splits are:

  • Train: Used for model training

  • Dev (Development): Used for hyperparameter tuning and early stopping

  • Test: Used only for final evaluation

Enabling Three-Way Splits

Enable train/dev/test splitting with a single option:

[EXP]
traindevtest = True

Example: EmoDB with MLP

Here’s a complete example using the emoDB dataset (which has no predefined splits):

[EXP]
root = ./examples/results/
name = exp_emodb_traindevtest
traindevtest = True
epochs = 100

[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb
emodb.split_strategy = speaker_split
labels = ['anger', 'happiness', 'neutral', 'sadness']
target = emotion

[FEATS]
type = ['os']
scale = standard

[MODEL]
type = mlp
layers = {'l1': 100, 'l2': 16}
patience = 10

[PLOT]
best_model = True
epoch_progression = True

Key Options

  • traindevtest = True: Enables three-way splitting

  • emodb.split_strategy = speaker_split: Splits by speaker to avoid data leakage

  • patience = 10: Early stopping patience (stops if no improvement for 10 epochs)

  • epoch_progression = True: Plots training progress over epochs

Split Strategies

When using traindevtest = True, you can use different split strategies:

Automatic Speaker Split

emodb.split_strategy = speaker_split

Automatically divides speakers into train/dev/test sets.

Manual Speaker Assignment

emodb.split_strategy = speakers_stated
emodb.train = [3, 9, 10, 11, 13, 16]
emodb.dev = [14, 8]
; Test gets remaining speakers

Pre-defined Splits

For datasets with existing splits (like MELD):

[DATA]
databases = ['train', 'dev', 'test']
train = ./data/meld/meld_train.csv
train.split_strategy = train
dev = ./data/meld/meld_dev.csv
dev.split_strategy = train
test = ./data/meld/meld_test.csv
test.split_strategy = test

Output and Evaluation

With traindevtest = True, Nkululeko produces three evaluations:

  1. Best model on dev set: Model selected by early stopping

  2. Best model on test set: Same model, evaluated on held-out test data

  3. Last model on dev set: Final epoch model performance

Interpreting Results

The test set performance is typically lower than dev set performance because:

  • The model was optimized for the dev set

  • The test set represents truly unseen data

  • This is the most realistic estimate of real-world performance

Example Files

Running the Experiment

python -m nkululeko.nkululeko --config examples/exp_emodb_traindevtest.ini

Tips

  1. Always use speaker splits: Avoid having the same speaker in train and test sets

  2. Set patience appropriately: Too low may stop training too early; too high wastes computation

  3. Report test set results: Only the test set gives unbiased performance estimates

  4. Use with neural networks: Train/dev/test splits are most important for models that can overfit (MLP, CNN, Transformers)