# Using Split Train and Test Data

This tutorial shows how to configure separate train and test sets using the `split_strategy` option, and how to confirm results using the test module.

**Reference**: [How to use train, dev and test splits with Nkululeko](http://blog.syntheticspeech.de/2024/04/26/how-to-use-train-dev-and-test-splits-with-nkululeko/)

## Overview

In machine learning, the typical workflow is:

1. **Train** your model on a training set
2. **Tune** hyperparameters on a development (validation) set  
3. **Evaluate** on a held-out test set

Nkululeko can directly handle train/test splits in a single experiment using the `split_strategy` option.

## Using Split Strategy

The simplest way to define train and test sets is using separate databases with `split_strategy`:

```ini
[EXP]
root = ./examples/results/
name = exp_polish_splits
save = True

[DATA]
databases = ['train', 'dev', 'test']
train = ./data/polish/polish_train.csv
train.type = csv
train.absolute_path = False
train.split_strategy = train
dev = ./data/polish/polish_dev.csv
dev.type = csv
dev.absolute_path = False
dev.split_strategy = train
test = ./data/polish/polish_test.csv
test.type = csv
test.absolute_path = False
test.split_strategy = test
target = emotion

[FEATS]
type = ['os']
scale = standard

[MODEL]
type = xgb
save = True
```

### Key Configuration

| Option | Description |
|--------|-------------|
| `<db>.split_strategy = train` | Use this database for training |
| `<db>.split_strategy = test` | Use this database for testing only |
| `<db>.split_strategy = dev` | Use this database for development/validation |

## Running the Experiment

With split strategies defined, run a single command:

```bash
python -m nkululeko.nkululeko --config myconf.ini
```

This trains on the train/dev data and evaluates on the test set in one go.

## Confirming Results with the Predict Module

After running your experiment, you can use the unified **predict module** to
re-evaluate your saved model on a labeled test set. This is useful to:

- Confirm the results from your experiment
- Evaluate the model on additional test sets
- Generate detailed test reports

### Using the predict module in model mode

First, ensure your model is saved by setting `save = True` in both `[EXP]` and `[MODEL]` sections. Then run:

```bash
python -m nkululeko.predict \
    --config myconf.ini \
    --type model \
    --list ./data/polish/polish_test.csv \
    --outfile polish_test_predict.csv
```

The predict module will:
1. Load the saved best model from the experiment
2. Run it on every audio file listed in the CSV
3. Write predictions next to the original columns into `--outfile`

### Defining Test Databases

You can also specify test databases using the `tests` option:

```ini
[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb
emodb.split_strategy = speaker_split
target = emotion
labels = ['anger', 'happiness', 'neutral', 'sadness']
; Define additional test databases
tests = ['crema-d']
crema-d = ./data/crema-d/crema-d
crema-d.split_strategy = test
```

> **Automatic fast path**: when `DATA.tests` is set and a saved experiment
> already exists on disk, `nkululeko.nkululeko` skips training on subsequent
> runs and evaluates the stored best model on the test databases directly.
> The same config file therefore works for both the initial training run and
> all later test evaluations without any changes.
> See [test_new_database.md](test_new_database.md) for the full workflow.

## Cross-Database Evaluation

A common use case is training on one database and testing on another:

```ini
[DATA]
databases = ['emodb', 'crema-d']
emodb = ./data/emodb/emodb
emodb.split_strategy = train
target = emotion
labels = ['anger', 'happiness']
; Test on a different database
crema-d = ./data/crema-d/crema-d
crema-d.split_strategy = test
```

This evaluates how well your model generalizes to unseen data from a different source.

## Example Files

- [`exp_emodb_os_xgb_test.ini`](https://github.com/felixbur/nkululeko/blob/main/examples/exp_emodb_os_xgb_test.ini): Cross-database evaluation example
- [`exp_polish_flags.ini`](https://github.com/felixbur/nkululeko/blob/main/examples/exp_polish_flags.ini): Flags module for systematic comparison

## Tips

1. **Save your model**: Set `save = True` in both `[EXP]` and `[MODEL]` sections to use the test module
2. **Use split_strategy consistently**: Set `train`, `dev`, or `test` for each database
3. **Matching labels**: Ensure test database has the same labels as training data
4. **Final evaluation**: Only evaluate on test set once, after all hyperparameter tuning is complete

## Comparing Multiple Configurations with Flags Module

To systematically compare different models, features, and preprocessing options, use the **flags module**:

```ini
[EXP]
root = ./examples/results/
name = exp_polish_flags

[DATA]
databases = ['train', 'dev', 'test']
train = ./data/polish/polish_train.csv
train.type = csv
train.split_strategy = train
dev = ./data/polish/polish_dev.csv
dev.type = csv
dev.split_strategy = train
test = ./data/polish/polish_test.csv
test.type = csv
test.split_strategy = test
target = emotion

[FEATS]
; Leave empty - will be set by FLAGS

[MODEL]
; Leave empty - will be set by FLAGS

[FLAGS]
models = ['xgb', 'svm']  
features = ['praat', 'os']   
balancing = ['none', 'ros', 'smote']  
scale = ['none', 'standard', 'robust', 'minmax']  
```

### Running the Flags Module

```bash
python -m nkululeko.flags --config exp_polish_flags.ini
```

This will automatically run all combinations:
- 2 models × 2 feature sets × 3 balancing methods × 4 scalers = 48 experiments

### Flags Options

| Flag | Values | Description |
|------|--------|-------------|
| `models` | `['svm', 'xgb', 'mlp', 'knn', ...]` | Model types to compare |
| `features` | `['os', 'praat', 'wav2vec2', ...]` | Feature extractors |
| `balancing` | `['none', 'ros', 'smote', ...]` | Class balancing methods |
| `scale` | `['none', 'standard', 'robust', 'minmax']` | Feature scaling |

## Related Tutorials

- [Train/Dev/Test Splits](traindevtest.md): Automatic three-way splitting with `traindevtest = True`
- [Comparing Runs](compare_runs.md): Statistical comparison of multiple experiments