nkululeko.predict
nkululeko.predict is the unified prediction module of Nkululeko. It replaces
the previous nkululeko.demo, nkululeko.feature_demo and nkululeko.testing
modules and bundles all of their functionality behind a single command-line
interface.
You can use it to predict labels for:
one or more individual audio files (
--file)every audio file inside a folder (
--folder)the audio paths listed in a CSV (
--list, original columns are preserved)a live microphone recording (
--mic)the dataframe defined by an experiment config — pass
--configwithout any of the input flags above and the module loads the databases declared in[DATA](subset viaEXP.sample_selection, defaultall)
…using one of two prediction sources:
a feature extractor or autopredict target such as
age,gender,emotion,mos,snr(--type feats, the default)the best model from a previously trained experiment (
--type model, requires--config)
Command-line interface
python -m nkululeko.predict
[--file AUDIO [AUDIO ...] | --folder FOLDER | --list CSV | --mic]
[--model MODEL] [--type {feats,model}]
[--config CONFIG.ini] [--outfile OUTFILE]
[--language LANG] [--no_playback]
Argument |
Description |
|---|---|
|
One or more audio files. A single space-separated string also works (e.g. |
|
Folder to scan recursively for audio ( |
|
CSV with audio paths. Existing columns and the audformat index are preserved; prediction columns are appended. Writes a single CSV to |
|
Record |
|
Either an autopredict target name ( |
|
|
|
Optional INI file. Required for |
|
Output CSV path for |
|
ISO 639-1 code ( |
|
In |
The four input arguments (--file, --folder, --list, --mic) are mutually
exclusive.
Examples
Predict emotion for a couple of audio files
python -m nkululeko.predict --file test.mp3 test2.wav --model emotion
This writes test_result.txt and test2_result.txt next to each input and
also prints the predictions to stdout. With --model emotion, the
nkululeko.autopredict.ap_emotion predictor is used.
Predict SNR for every audio file in a folder
python -m nkululeko.predict --folder ./recordings --model snr --outfile snr.csv
The output CSV contains the audformat segmented index plus the new
snr_pred column.
Add prediction columns to an existing CSV, keeping all original columns
python -m nkululeko.predict \
--list testdata.csv \
--model mos \
--outfile testdata_with_mos.csv
If testdata.csv is a valid audformat CSV (segmented or filewise index), the
index is preserved. Otherwise the first column is interpreted as the audio
path. Any further columns are passed through to the output.
Use the best model of a trained experiment
python -m nkululeko.predict \
--list testdata.csv \
--config config.ini \
--type model
This loads the experiment specified in config.ini (which must have been
trained with MODEL.save = True) and runs its best model on each file in the
list. For classification, the output contains one column per class label with
the probability/score and a predicted column with the top-1 label. For
regression, a single predicted column is written.
Loop over microphone input using the FEATS section of a config
python -m nkululeko.predict --mic --config config.ini
Press Enter to record 5 seconds, q + Enter to quit.
Transcribe German audio with Whisper
python -m nkululeko.predict --file lecture.mp3 --model text --language de
--language de overrides EXP.language for the Whisper source language.
Predict over the dataframe defined by a config
When you only pass --config, the module loads the databases declared in the
config’s [DATA] section and runs over the selection from
EXP.sample_selection (default all):
python -m nkululeko.predict \
--config experiments/emodb/exp.ini \
--model snr \
--outfile emodb_snr.csv
Set EXP.sample_selection = train (or test) in the INI to restrict the
run to that subset.
Translate transcriptions to French
python -m nkululeko.predict \
--list transcribed.csv \
--model translation \
--language fr \
--outfile translated.csv
--language fr overrides PREDICT.target_language for Google Translate.
Autopredict targets
When --model NAME matches one of the autopredict targets below, the matching
nkululeko.autopredict.* predictor is used. The added column name follows the
<target>_pred convention.
Target |
Predictor module |
Added column |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
column named after |
Feature extractors
If --model does not match an autopredict target, it is interpreted as a
feature-extractor name. The output columns are feat_0, feat_1, …. Examples:
python -m nkululeko.predict --file test.wav --model praat
python -m nkululeko.predict --folder ./voices --model wav2vec2-large-robust-ft-swbd-300h --outfile feats.csv
python -m nkululeko.predict --list audio.csv --model audmodel --outfile feats.csv --config has_audmodel_id.ini
Recognized prefixes / names: wav2vec2*, hubert*, wavlm*, whisper*,
ast*, emotion2vec*, opensmile/gemaps/compare, clap*, spkrec* /
xvect* / ecapa*, trill*, praat*, audmodel*, agender*, squim* /
pesq* / sdr*, mos*, snr*.
Note on overlapping names.
mosandsnrare both autopredict targets and feature extractors. They resolve to the autopredict path. If you need the raw feature extractor for these, use the lower-level extractor classes directly.
Output formats
Mode |
Where the result is written |
|---|---|
|
|
|
Single CSV at |
|
Single CSV at |
|
stdout only. |
See also
demo.md — tutorial for using a previously trained model (
--type model).emotion_prediction.md — predicting emotions on unlabeled audio.
predict_speaker.md — predicting speaker identity.
text_processing.md — transcription, translation and text classification.