# Text Processing: Transcribe, Translate, and Classify This tutorial demonstrates how to use Nkululeko's text processing pipeline to: 1. **Transcribe** audio to text using Whisper speech-to-text 2. **Translate** text between languages using Google Translate 3. **Classify** text topics using zero-shot classification This is useful when you want to analyze the linguistic content of speech databases, especially for cross-lingual analysis. ## Overview The pipeline consists of three steps, each invoking the unified [`nkululeko.predict`](predict.md) module with a different autopredict target: ``` Audio → [--model text] → Text Text → [--model translation] → English Text Text → [--model textclassification] → Topic Labels ``` The output CSV of one step is fed into the `--list` argument of the next. ## Prerequisites - Nkululeko >= 1.6.0 - A speech database (we use Berlin EmoDB as an example) - Required dependencies: `openai-whisper`, `googletrans` ## Step 1: Transcribe Audio to Text Use Whisper (via `transformers`) to transcribe audio to text. ### Configuration (`exp_emodb_predict_text.ini`) The config carries only the source-language setting; the input / output / model choice is on the command line. ```ini [EXP] root = ./examples/results name = exp_emodb_predict_text language = de ``` ### Run transcription ```bash python -m nkululeko.predict \ --list ./data/emodb/emodb_files.csv \ --model text \ --config examples/exp_emodb_predict_text.ini \ --outfile ./emodb_transcribed.csv ``` ### Output The output CSV preserves the original columns and adds a `text` column: ```csv file,start,end,emotion,text ./data/emodb/wav/03a01Fa.wav,0 days,,happiness,Der Lappen liegt auf dem Eisschrank. ./data/emodb/wav/03a01Nc.wav,0 days,,neutral,Der Lappen liegt auf dem Eisschrank. ``` ## Step 2: Translate Text to English Translate the German transcriptions to English using Google Translate. ### Configuration (`exp_emodb_translate.ini`) ```ini [EXP] root = ./examples/results name = exp_emodb_translate language = de target_language = en ``` ### Run translation The input is the CSV produced in step 1 (it already contains the `text` column expected by the translation predictor): ```bash python -m nkululeko.predict \ --list ./emodb_transcribed.csv \ --model translation \ --config examples/exp_emodb_translate.ini \ --language es \ --outfile ./emodb_translated.csv ``` > **Note**: `--language es` overrides both `EXP.language` and > `PREDICT.target_language` from the INI. For `--model translation` only the > target language matters, so the output column is named after `--language` > (`es` here). Drop `--language` to fall back to the INI's `target_language`. ### Output ```csv file,start,end,emotion,text,es ./data/emodb/wav/03a01Fa.wav,0 days,,happiness,Der Lappen liegt auf dem Eisschrank.,El trapo está sobre la nevera. ``` ## Step 3: Classify text topics Zero-shot classification with a multilingual XLM-RoBERTa model. ### Configuration (`exp_emodb_textclassifier.ini`) ```ini [EXP] root = ./examples/results name = emodb_textclassifier [FEATS] textclassifier.candidates = ["sadness", "anger", "neutral", "happiness", "fear", "disgust", "boredom"] ``` ### Run classification ```bash python -m nkululeko.predict \ --list ./emodb_translated.csv \ --model textclassification \ --config examples/exp_emodb_textclassifier.ini \ --outfile ./emodb_classified.csv ``` ### Zero-shot classification The text classifier uses [`joeddav/xlm-roberta-large-xnli`](https://huggingface.co/joeddav/xlm-roberta-large-xnli), a zero-shot model that can classify text into **any categories you define** without further training. Customize the candidates for your use case: ```ini # Sentiment analysis textclassifier.candidates = ["positive", "negative", "neutral"] # Topic classification textclassifier.candidates = ["sports", "politics", "technology", "entertainment"] # Intent detection textclassifier.candidates = ["question", "statement", "command", "greeting"] ``` ### Output ```csv file,classification_winner,sadness,anger,neutral,happiness,fear,disgust,boredom ./data/emodb/wav/03a01Fa.wav,neutral,0.116,0.141,0.359,0.059,0.089,0.121,0.114 ``` ## Complete pipeline Run all three steps in sequence, piping each output to the next input: ```bash python -m nkululeko.predict --list ./data/emodb/emodb_files.csv --model text --config examples/exp_emodb_predict_text.ini --outfile transcribed.csv python -m nkululeko.predict --list transcribed.csv --model translation --config examples/exp_emodb_translate.ini --outfile translated.csv python -m nkululeko.predict --list translated.csv --model textclassification --config examples/exp_emodb_textclassifier.ini --outfile classified.csv ``` ## Troubleshooting ### `KeyError: 'text'` The translation step needs a `text` column in the input CSV. Make sure step 1 finished successfully and that you pass its output to step 2 via `--list`. ### Slow transcription Whisper is slow on CPU. Use GPU if available: ```ini [MODEL] device = cuda ``` ### Google Translate rate limits For large datasets you may hit translation API rate limits. Consider: - Splitting the list into smaller chunks - Adding delays between requests - Switching to an alternative translation service ## Use cases 1. **Cross-lingual emotion analysis**: analyse emotional content in non-English speech. 2. **Content analysis**: extract topics and themes from speech recordings. 3. **Dataset enrichment**: add linguistic features to audio datasets. 4. **Multilingual research**: compare linguistic patterns across languages. ## Related tutorials - [Predict module](predict.md): full documentation of the unified prediction module and its autopredict targets. - [Hello World](hello_world_aud.md): getting started with Nkululeko. - [Explore module](explore.md): visualize and analyse your data. ## References - [Whisper](https://github.com/openai/whisper): OpenAI's speech recognition model - [XLM-RoBERTa-XNLI](https://huggingface.co/joeddav/xlm-roberta-large-xnli): zero-shot classification model - [Blog: How to add textual transcriptions](https://blog.syntheticspeech.de/2025/06/26/nkululeko-how-to-add-textual-transcriptions-to-your-data/) - [Blog: How to translate transcriptions](http://blog.syntheticspeech.de/2025/07/14/nkululelo-how-to-translate-your-textual-transcriptions/) - [Blog: How to predict topics](http://blog.syntheticspeech.de/2025/10/16/nkululeko-how-to-predict-topics-for-your-texts/)