Train/Dev/Test Splits
This tutorial explains how to use three-way data splits (train, dev, test) in Nkululeko for proper model evaluation and to avoid overfitting.
Reference: Nkululeko: How to use train/dev/test splits
Why Three Splits?
Supervised machine learning works as follows:
Training phase: A learning algorithm adapts to a training dataset, producing a trained model
Inference phase: The model makes predictions on a test set
The Overfitting Problem
Complex models may memorize training data rather than learning generalizable patterns. This means:
✅ Great performance on training data
❌ Poor performance on new data
This phenomenon is called overfitting.
The Solution: Development Set
To prevent overfitting:
Hyperparameters are optimized using a held-out evaluation set (not used during training)
Training stops when performance on the evaluation set declines (early stopping)
The best-performing model on evaluation data is selected
However, this introduces a new problem: the model may now be overfitted to the evaluation data!
The Final Solution: Test Set
A third dataset is needed for final testing—one that has not been used at any stage of model development.
The three splits are:
Train: Used for model training
Dev (Development): Used for hyperparameter tuning and early stopping
Test: Used only for final evaluation
Enabling Three-Way Splits
Enable train/dev/test splitting with a single option:
[EXP]
traindevtest = True
Example: EmoDB with MLP
Here’s a complete example using the emoDB dataset (which has no predefined splits):
[EXP]
root = ./examples/results/
name = exp_emodb_traindevtest
traindevtest = True
epochs = 100
[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb
emodb.split_strategy = speaker_split
labels = ['anger', 'happiness', 'neutral', 'sadness']
target = emotion
[FEATS]
type = ['os']
scale = standard
[MODEL]
type = mlp
layers = {'l1': 100, 'l2': 16}
patience = 10
[PLOT]
best_model = True
epoch_progression = True
Key Options
traindevtest = True: Enables three-way splittingemodb.split_strategy = speaker_split: Splits by speaker to avoid data leakagepatience = 10: Early stopping patience (stops if no improvement for 10 epochs)epoch_progression = True: Plots training progress over epochs
Split Strategies
When using traindevtest = True, you can use different split strategies:
Automatic Speaker Split
emodb.split_strategy = speaker_split
Automatically divides speakers into train/dev/test sets.
Manual Speaker Assignment
emodb.split_strategy = speakers_stated
emodb.train = [3, 9, 10, 11, 13, 16]
emodb.dev = [14, 8]
; Test gets remaining speakers
Pre-defined Splits
For datasets with existing splits (like MELD):
[DATA]
databases = ['train', 'dev', 'test']
train = ./data/meld/meld_train.csv
train.split_strategy = train
dev = ./data/meld/meld_dev.csv
dev.split_strategy = train
test = ./data/meld/meld_test.csv
test.split_strategy = test
Output and Evaluation
With traindevtest = True, Nkululeko produces three evaluations:
Best model on dev set: Model selected by early stopping
Best model on test set: Same model, evaluated on held-out test data
Last model on dev set: Final epoch model performance
Interpreting Results
The test set performance is typically lower than dev set performance because:
The model was optimized for the dev set
The test set represents truly unseen data
This is the most realistic estimate of real-world performance
Example Files
exp_emodb_traindevtest.ini: Basic train/dev/test with XGBexp_emodb_traindevtest_split.ini: Manual speaker assignment with MLP
Running the Experiment
python -m nkululeko.nkululeko --config examples/exp_emodb_traindevtest.ini
Tips
Always use speaker splits: Avoid having the same speaker in train and test sets
Set patience appropriately: Too low may stop training too early; too high wastes computation
Report test set results: Only the test set gives unbiased performance estimates
Use with neural networks: Train/dev/test splits are most important for models that can overfit (MLP, CNN, Transformers)