# Train/Dev/Test Splits This tutorial explains how to use three-way data splits (train, dev, test) in Nkululeko for proper model evaluation and to avoid overfitting. **Reference**: [Nkululeko: How to use train/dev/test splits](http://blog.syntheticspeech.de/2025/03/31/nkululeko-how-to-use-train-dev-test-splits/) ## Why Three Splits? Supervised machine learning works as follows: 1. **Training phase**: A learning algorithm adapts to a training dataset, producing a trained model 2. **Inference phase**: The model makes predictions on a test set ### The Overfitting Problem Complex models may **memorize** training data rather than learning generalizable patterns. This means: - ✅ Great performance on training data - ❌ Poor performance on new data This phenomenon is called **overfitting**. ### The Solution: Development Set To prevent overfitting: - Hyperparameters are optimized using a **held-out evaluation set** (not used during training) - Training stops when performance on the evaluation set declines (**early stopping**) - The best-performing model on evaluation data is selected However, this introduces a new problem: the model may now be overfitted to the evaluation data! ### The Final Solution: Test Set A **third dataset** is needed for final testing—one that has not been used at any stage of model development. The three splits are: - **Train**: Used for model training - **Dev** (Development): Used for hyperparameter tuning and early stopping - **Test**: Used only for final evaluation ## Enabling Three-Way Splits Enable train/dev/test splitting with a single option: ```ini [EXP] traindevtest = True ``` ## Example: EmoDB with MLP Here's a complete example using the emoDB dataset (which has no predefined splits): ```ini [EXP] root = ./examples/results/ name = exp_emodb_traindevtest traindevtest = True epochs = 100 [DATA] databases = ['emodb'] emodb = ./data/emodb/emodb emodb.split_strategy = speaker_split labels = ['anger', 'happiness', 'neutral', 'sadness'] target = emotion [FEATS] type = ['os'] scale = standard [MODEL] type = mlp layers = {'l1': 100, 'l2': 16} patience = 10 [PLOT] best_model = True epoch_progression = True ``` ### Key Options - `traindevtest = True`: Enables three-way splitting - `emodb.split_strategy = speaker_split`: Splits by speaker to avoid data leakage - `patience = 10`: Early stopping patience (stops if no improvement for 10 epochs) - `epoch_progression = True`: Plots training progress over epochs ## Split Strategies When using `traindevtest = True`, you can use different split strategies: ### Automatic Speaker Split ```ini emodb.split_strategy = speaker_split ``` Automatically divides speakers into train/dev/test sets. ### Manual Speaker Assignment ```ini emodb.split_strategy = speakers_stated emodb.train = [3, 9, 10, 11, 13, 16] emodb.dev = [14, 8] ; Test gets remaining speakers ``` ### Pre-defined Splits For datasets with existing splits (like MELD): ```ini [DATA] databases = ['train', 'dev', 'test'] train = ./data/meld/meld_train.csv train.split_strategy = train dev = ./data/meld/meld_dev.csv dev.split_strategy = train test = ./data/meld/meld_test.csv test.split_strategy = test ``` ## Output and Evaluation With `traindevtest = True`, Nkululeko produces three evaluations: 1. **Best model on dev set**: Model selected by early stopping 2. **Best model on test set**: Same model, evaluated on held-out test data 3. **Last model on dev set**: Final epoch model performance ### Interpreting Results The test set performance is typically lower than dev set performance because: - The model was optimized for the dev set - The test set represents truly unseen data - This is the most realistic estimate of real-world performance ## Example Files - [`exp_emodb_traindevtest.ini`](https://github.com/felixbur/nkululeko/blob/main/examples/exp_emodb_traindevtest.ini): Basic train/dev/test with XGB - [`exp_emodb_traindevtest_split.ini`](https://github.com/felixbur/nkululeko/blob/main/examples/exp_emodb_traindevtest_split.ini): Manual speaker assignment with MLP ## Running the Experiment ```bash python -m nkululeko.nkululeko --config examples/exp_emodb_traindevtest.ini ``` ## Tips 1. **Always use speaker splits**: Avoid having the same speaker in train and test sets 2. **Set patience appropriately**: Too low may stop training too early; too high wastes computation 3. **Report test set results**: Only the test set gives unbiased performance estimates 4. **Use with neural networks**: Train/dev/test splits are most important for models that can overfit (MLP, CNN, Transformers) ## Related Tutorials - [Comparing Runs](compare_runs.md): Statistical comparison across experiments - [Hello World](hello_world_aud.md): Basic experiment setup