# nkululeko.predict `nkululeko.predict` is the unified prediction module of Nkululeko. It replaces the previous `nkululeko.demo`, `nkululeko.feature_demo` and `nkululeko.testing` modules and bundles all of their functionality behind a single command-line interface. You can use it to predict labels for: - one or more individual audio files (`--file`) - every audio file inside a folder (`--folder`) - the audio paths listed in a CSV (`--list`, original columns are preserved) - a live microphone recording (`--mic`) - the dataframe defined by an experiment config — pass `--config` without any of the input flags above and the module loads the databases declared in `[DATA]` (subset via `EXP.sample_selection`, default `all`) …using one of two prediction sources: - a **feature extractor** or **autopredict target** such as `age`, `gender`, `emotion`, `mos`, `snr` (`--type feats`, the default) - the **best model from a previously trained experiment** (`--type model`, requires `--config`) ## Command-line interface ```text python -m nkululeko.predict [--file AUDIO [AUDIO ...] | --folder FOLDER | --list CSV | --mic] [--model MODEL] [--type {feats,model}] [--config CONFIG.ini] [--outfile OUTFILE] [--language LANG] [--no_playback] ``` | Argument | Description | |---|---| | `--file AUDIO [AUDIO ...]` | One or more audio files. A single space-separated string also works (e.g. `--file "a.wav b.wav"`). Writes a per-file `_result.txt` next to each input and prints results to stdout. | | `--folder FOLDER` | Folder to scan recursively for audio (`wav`, `mp3`, `flac`, `ogg`, `m4a`, `au`, `aac`). Writes a single CSV to `--outfile`. | | `--list CSV` | CSV with audio paths. Existing columns and the audformat index are preserved; prediction columns are appended. Writes a single CSV to `--outfile`. | | `--mic` | Record `5` seconds from the microphone in a loop and print predictions to stdout. | | `--model MODEL` | Either an autopredict target name (`age`, `gender`, `emotion`, `mos`, `snr`, `pesq`, `sdr`, `stoi`, `arousal`, `valence`, `dominance`, `speaker`, `text`, `textclassification`, `translation`) **or** a feature-extractor name (`wav2vec2-...`, `opensmile`, `audmodel`, `emotion2vec-...`, `praat`, `clap`, `spkrec`, `trill`, `agender`, `whisper-...`, `ast`, `hubert-...`, `wavlm-...`, `squim`, `mos`, `snr`). When `--type model`, `--model` is ignored — the trained model from the experiment is used. | | `--type {feats,model}` | `feats` (default): use `--model` as autopredict target or feature extractor. `model`: load the best model from the experiment defined by `--config`. | | `--config CONFIG.ini` | Optional INI file. Required for `--type model`. With `--type feats` it may supply `FEATS.type` so that `--model` can be omitted. When passed alone (without `--file`/`--folder`/`--list`/`--mic`), the dataframe defined by the experiment's `[DATA]` section is used; `EXP.sample_selection` (default `all`) selects `train` / `test` / `all`. | | `--outfile OUTFILE` | Output CSV path for `--list` and `--folder`. Default: `./prediction_result.csv`. | | `--language LANG` | ISO 639-1 code (`en`, `de`, `pl`, …) for the `text` and `translation` autopredict targets. For `--model text` it sets the Whisper source language (overrides `EXP.language`). For `--model translation` it sets the Google Translate target language (overrides `PREDICT.target_language`). | | `--no_playback` | In `--mic` mode, suppress the playback of the recording before prediction. | The four input arguments (`--file`, `--folder`, `--list`, `--mic`) are mutually exclusive. ## Examples ### Predict emotion for a couple of audio files ```bash python -m nkululeko.predict --file test.mp3 test2.wav --model emotion ``` This writes `test_result.txt` and `test2_result.txt` next to each input and also prints the predictions to stdout. With `--model emotion`, the `nkululeko.autopredict.ap_emotion` predictor is used. ### Predict SNR for every audio file in a folder ```bash python -m nkululeko.predict --folder ./recordings --model snr --outfile snr.csv ``` The output CSV contains the audformat segmented index plus the new `snr_pred` column. ### Add prediction columns to an existing CSV, keeping all original columns ```bash python -m nkululeko.predict \ --list testdata.csv \ --model mos \ --outfile testdata_with_mos.csv ``` If `testdata.csv` is a valid audformat CSV (segmented or filewise index), the index is preserved. Otherwise the first column is interpreted as the audio path. Any further columns are passed through to the output. ### Use the best model of a trained experiment ```bash python -m nkululeko.predict \ --list testdata.csv \ --config config.ini \ --type model ``` This loads the experiment specified in `config.ini` (which must have been trained with `MODEL.save = True`) and runs its best model on each file in the list. For classification, the output contains one column per class label with the probability/score and a `predicted` column with the top-1 label. For regression, a single `predicted` column is written. ### Loop over microphone input using the FEATS section of a config ```bash python -m nkululeko.predict --mic --config config.ini ``` Press *Enter* to record `5` seconds, *q* + *Enter* to quit. ### Transcribe German audio with Whisper ```bash python -m nkululeko.predict --file lecture.mp3 --model text --language de ``` `--language de` overrides `EXP.language` for the Whisper source language. ### Predict over the dataframe defined by a config When you only pass `--config`, the module loads the databases declared in the config's `[DATA]` section and runs over the selection from `EXP.sample_selection` (default `all`): ```bash python -m nkululeko.predict \ --config experiments/emodb/exp.ini \ --model snr \ --outfile emodb_snr.csv ``` Set `EXP.sample_selection = train` (or `test`) in the INI to restrict the run to that subset. ### Translate transcriptions to French ```bash python -m nkululeko.predict \ --list transcribed.csv \ --model translation \ --language fr \ --outfile translated.csv ``` `--language fr` overrides `PREDICT.target_language` for Google Translate. ## Autopredict targets When `--model NAME` matches one of the autopredict targets below, the matching `nkululeko.autopredict.*` predictor is used. The added column name follows the `_pred` convention. | Target | Predictor module | Added column | |---|---|---| | `speaker` | `ap_sid.SIDPredictor` | `speaker_pred` | | `gender` | `ap_gender.GenderPredictor` (audEERING agender) | `gender_pred` | | `age` | `ap_age.AgePredictor` (audEERING agender) | `age_pred` | | `emotion` | `ap_emotion.EmotionPredictor` (emotion2vec) | `emotion_pred` | | `arousal` | `ap_arousal.ArousalPredictor` (audEERING dim) | `arousal_pred` | | `valence` | `ap_valence.ValencePredictor` (audEERING dim) | `valence_pred` | | `dominance` | `ap_dominance.DominancePredictor` (audEERING dim) | `dominance_pred` | | `mos` | `ap_mos.MOSPredictor` | `mos_pred` | | `pesq` | `ap_pesq.PESQPredictor` (SQUIM) | `pesq_pred` | | `sdr` | `ap_sdr.SDRPredictor` (SQUIM) | `sdr_pred` | | `stoi` | `ap_stoi.STOIPredictor` (SQUIM) | `stoi_pred` | | `snr` | `ap_snr.SNRPredictor` | `snr_pred` | | `text` | `ap_text.TextPredictor` (whisper transcription) | `text` | | `textclassification` | `ap_textclassifier.TextClassificationPredictor` | `classification_winner` + one column per candidate label | | `translation` | `ap_translate.TextTranslator` | column named after `PREDICT.target_language` (default: `en`) | ## Feature extractors If `--model` does **not** match an autopredict target, it is interpreted as a feature-extractor name. The output columns are `feat_0`, `feat_1`, …. Examples: ```bash python -m nkululeko.predict --file test.wav --model praat python -m nkululeko.predict --folder ./voices --model wav2vec2-large-robust-ft-swbd-300h --outfile feats.csv python -m nkululeko.predict --list audio.csv --model audmodel --outfile feats.csv --config has_audmodel_id.ini ``` Recognized prefixes / names: `wav2vec2*`, `hubert*`, `wavlm*`, `whisper*`, `ast*`, `emotion2vec*`, `opensmile`/`gemaps`/`compare`, `clap*`, `spkrec*` / `xvect*` / `ecapa*`, `trill*`, `praat*`, `audmodel*`, `agender*`, `squim*` / `pesq*` / `sdr*`, `mos*`, `snr*`. > **Note on overlapping names.** `mos` and `snr` are both autopredict targets > *and* feature extractors. They resolve to the autopredict path. If you need > the raw feature extractor for these, use the lower-level extractor classes > directly. ## Output formats | Mode | Where the result is written | |---|---| | `--file` | `_result.txt` per input file (one `key: value` per line), plus stdout. | | `--folder` | Single CSV at `--outfile` with the audformat segmented index of the discovered files and the prediction columns. | | `--list` | Single CSV at `--outfile` with the original columns of the input CSV plus the prediction columns. The audformat index is preserved when the input is a valid audformat CSV. | | `--mic` | stdout only. | ## See also - [demo.md](demo.md) — tutorial for using a previously trained model (`--type model`). - [emotion_prediction.md](emotion_prediction.md) — predicting emotions on unlabeled audio. - [predict_speaker.md](predict_speaker.md) — predicting speaker identity. - [text_processing.md](text_processing.md) — transcription, translation and text classification.