Emotion Prediction with Emotion2vec
This tutorial demonstrates how to use Nkululeko’s emotion prediction
capabilities with emotion2vec models.
Overview
The unified prediction module (nkululeko.predict) can
automatically predict emotions from audio. With --model emotion the
emotion autopredict target is used, which is currently backed by
emotion2vec. This is useful for:
Analyzing unlabeled audio data
Generating emotion annotations for new datasets
Comparing predicted vs. actual emotions
Building emotion-aware applications
Quick Start
1. Predict emotion for a few files
python -m nkululeko.predict --file sample1.wav sample2.wav --model emotion
This writes sample1_result.txt and sample2_result.txt next to each input
file (and prints the same predictions to stdout).
2. Predict for a whole folder
python -m nkululeko.predict \
--folder ./recordings \
--model emotion \
--outfile ./recordings_emotion.csv
3. Augment an existing CSV with an emotion_pred column
python -m nkululeko.predict \
--list ./your_dataset.csv \
--model emotion \
--outfile ./your_dataset_with_emotion.csv
All original columns of your_dataset.csv are preserved. A new emotion_pred
column is appended.
Output format
Example output CSV after --list ... --model emotion:
file,start,end,speaker,gender,emotion,emotion_pred
audio1.wav,0 days,,speaker1,male,anger,anger
audio2.wav,0 days,,speaker2,female,neutral,neutral
audio3.wav,0 days,,speaker1,male,fear,fear
The audformat segmented index (file, start, end) is preserved when the
input CSV is a valid audformat file. For a plain CSV, the first column is
interpreted as the audio path.
Combining with other autopredict targets
Each invocation of nkululeko.predict runs one prediction target. To enrich
your CSV with multiple targets (emotion + arousal + age + gender), run the
command multiple times, threading the output of one run into the input of the
next:
python -m nkululeko.predict --list data.csv --model emotion --outfile step1.csv
python -m nkululeko.predict --list step1.csv --model arousal --outfile step2.csv
python -m nkululeko.predict --list step2.csv --model age --outfile step3.csv
python -m nkululeko.predict --list step3.csv --model gender --outfile final.csv
final.csv will contain the original columns plus emotion_pred,
arousal_pred, age_pred and gender_pred.
Supported models
The emotion predictor uses an emotion2vec feature extractor, currently
iic/emotion2vec_plus_large.
Technical details
Pipeline
emotion2vec-largeextracts 768-dimensional embeddings for each segment.The pretrained emotion head produces the predicted emotion label.
The predicted labels are written as a new
emotion_predcolumn.
Performance
Processing time scales with audio duration and the number of files.
GPU acceleration is used automatically when available.
Large datasets may require running in batches by splitting the input CSV.
Limitations
The bundled
EmotionPredictorcurrently emits a placeholder label (neutral) until the emotion head is fully wired up.Prediction quality depends on how closely your data resembles the emotion2vec training distribution.
Troubleshooting
ModuleNotFoundError: Run from a directory where nkululeko is installed
or available on PYTHONPATH:
python -m nkululeko.predict --file your.wav --model emotion
Empty predictions: Make sure the audio files exist and are readable:
ls -la ./recordings/
Memory errors: Process smaller subsets of data or run on a machine with more RAM/VRAM.
Next steps
Try other autopredict targets:
age,gender,arousal,dominance,mos,snr,speaker. See predict.md.Use the predict module in
--type modelmode to apply a Nkululeko model you trained yourself: see demo.md.Combine predicted labels with traditional ML pipelines via
nkululeko.nkululeko.