# Bundle: Export and Inference Nkululeko provides a **bundle** module to export a trained experiment as a self-contained, portable directory. This bundle can then be shared with others and used for inference on new audio files without requiring the original training data or configuration. ## Overview A bundle directory contains everything needed for inference: - `manifest.json` — metadata and artifact paths - `inference.ini` — sanitized config (no training paths or private data) - `model.pkl` — the trained model - `scaler.pkl` — the fitted scaler (if scaling was used) - `label_encoder.pkl` — the label encoder (for classification) - `feature_schema.json` — feature column names and dimensions - `README.md` — human-readable description ## Tutorial: Polish Emotion Recognition Bundle This tutorial walks through training a model on the Polish emotion recognition dataset, exporting it as a bundle, and using the bundle for inference on new audio files. ### Prerequisites 1. Install nkululeko (see [Installation](installation.rst)). 2. Download and prepare the Polish dataset (see `data/polish/README.md`): ```bash cd data/polish unzip polish_speech_emotions.zip python3 process_database.py cd ../.. ``` ### Step 1: Train the Model First, train a model using the Polish dataset. Use the tutorial config file `tutorials/tut_bundle_polish.ini`: ```ini [EXP] root = ./ name = results/exp_polish_bundle runs = 1 epochs = 1 save = True [DATA] databases = ['train', 'dev', 'test'] train = ./data/polish/polish_train.csv train.type = csv train.absolute_path = False train.split_strategy = train dev = ./data/polish/polish_dev.csv dev.type = csv dev.absolute_path = False dev.split_strategy = train test = ./data/polish/polish_test.csv test.type = csv test.absolute_path = False test.split_strategy = test target = emotion labels = ['anger', 'neutral', 'fear'] [FEATS] type = ['os'] scale = standard [MODEL] type = xgb save = True ``` Key settings: - `[EXP] save = True` — saves the experiment state so it can be exported. - `[MODEL] save = True` — saves the trained model file. - `[FEATS] scale = standard` — applies standard scaling (bundled with the export). Run the training: ```bash python3 -m nkululeko.nkululeko --config tutorials/tut_bundle_polish.ini ``` ### Step 2: Export the Bundle After training, export the model as a bundle: ```bash python3 -m nkululeko.bundle --config tutorials/tut_bundle_polish.ini ``` By default, the bundle is saved to `//export/`. You can specify a custom output directory: ```bash python3 -m nkululeko.bundle --config tutorials/tut_bundle_polish.ini --output ./my_polish_bundle ``` The exported bundle directory will contain: ``` my_polish_bundle/ ├── manifest.json ├── inference.ini ├── model.pkl ├── scaler.pkl ├── label_encoder.pkl ├── feature_schema.json └── README.md ``` ### Step 3: Run Inference with the Bundle Use the `nkululeko.infer` module to predict emotions on new audio files using the exported bundle. #### Predict a single file ```bash python3 -m nkululeko.infer ./my_polish_bundle --file /path/to/audio.wav ``` #### Predict multiple files ```bash python3 -m nkululeko.infer ./my_polish_bundle --file file1.wav file2.wav file3.wav ``` #### Predict all audio files in a folder ```bash python3 -m nkululeko.infer ./my_polish_bundle --folder /path/to/audio_folder --outfile predictions.csv ``` #### Predict files from a CSV list ```bash python3 -m nkululeko.infer ./my_polish_bundle --list files.csv --outfile predictions.csv ``` ### Step 4: Interpret Results For classification tasks, the output includes: - Per-class probabilities (e.g., `anger`, `neutral`, `fear`) - The `predicted` column with the most likely label Example output for a single file: ``` /path/to/audio.wav anger: 0.85 /path/to/audio.wav neutral: 0.10 /path/to/audio.wav fear: 0.05 /path/to/audio.wav predicted: anger ``` When using `--outfile`, results are saved as a CSV file with an audformat segmented index. ## Programmatic API You can also use the bundle for inference in Python scripts: ```python from nkululeko.infer import infer_from_bundle results = infer_from_bundle( bundle_dir="./my_polish_bundle", files=["audio1.wav", "audio2.wav"], ) print(results) ``` ## Notes - The bundle is self-contained and does not require the original training data or configuration. - Feature extraction is performed automatically during inference using the same settings used during training. - The bundle can be shared across machines as long as the same Python and nkululeko versions are available (check `manifest.json` for version info). - For the Polish dataset, the labels are: `anger`, `neutral`, `fear`. - Set `[EXP] save = True` and `[MODEL] save = True` in your training config before exporting.