# Linguistic Features with BERT This tutorial shows how to use BERT embeddings to model linguistic (semantic) content of speech, either alone or combined with acoustic features. **Reference**: [Nkululeko: How to explicitly model linguistics](http://blog.syntheticspeech.de/2025/07/22/nkululeko-how-to-explicitly-model-linguistics/) ## Overview Speech emotion recognition typically relies on acoustic features like pitch, energy, and spectral characteristics. However, **what** is being said (the linguistic content) can be just as important as **how** it is said. Nkululeko supports BERT (Bidirectional Encoder Representations from Transformers) embeddings to capture the semantic meaning of transcribed speech. This is particularly useful when: - You have transcripts available in your dataset - The spoken content is relevant to your classification task - You want to combine linguistic and acoustic information ## Requirements Your dataset must have a `text` column containing transcripts. If your column has a different name (e.g., "Utterance", "transcript"), use the `colnames` option to rename it. ## Basic BERT Features To use only BERT linguistic features: ```ini [EXP] root = ./ name = exp_meld_bert ; Set language for BERT model language = en [DATA] databases = ['train', 'test'] train = ./data/meld/meld_train.csv train.type = csv train.split_strategy = train test = ./data/meld/meld_test.csv test.type = csv test.split_strategy = test ; Rename column to 'text' if needed colnames = {'Utterance': 'text'} target = emotion labels = ['anger', 'joy', 'neutral', 'sadness'] [FEATS] type = ['bert'] scale = standard [MODEL] type = svm ``` ## Combining BERT with Acoustic Features To leverage both linguistic and acoustic information: ```ini [FEATS] ; Combine BERT with OpenSMILE acoustic features type = ['bert', 'os'] os.set = eGeMAPSv02 scale = standard ``` This creates a feature vector combining: - **BERT embeddings** (768 dimensions from bert-base-uncased) - **OpenSMILE features** (88 features from eGeMAPSv02) ## BERT Model Selection By default, Nkululeko uses `bert-base-uncased`. You can specify a different model: ```ini [FEATS] type = ['bert'] ; Use multilingual BERT bert.model = bert-base-multilingual-cased ``` Common BERT models: - `bert-base-uncased`: English, 110M parameters (default) - `bert-base-cased`: English, case-sensitive - `bert-base-multilingual-cased`: 104 languages - `bert-large-uncased`: English, 340M parameters ## Language Setting The `language` option in `[EXP]` helps select appropriate models: ```ini [EXP] ; For German text language = de ; For English text language = en ``` ## Using with Transcription If you don't have transcripts, you can first use Whisper to transcribe: ```ini [DATA] ; First experiment: transcribe [PREDICT] targets = ['text'] ``` Then use the generated `text` column for BERT features in a subsequent experiment. ## Example Files - [`exp_meld_bert.ini`](https://github.com/felixbur/nkululeko/blob/main/examples/exp_meld_bert.ini): BERT-only features - [`exp_meld_bert_os.ini`](https://github.com/felixbur/nkululeko/blob/main/examples/exp_meld_bert_os.ini): BERT + OpenSMILE combined ## Running the Experiment ```bash python -m nkululeko.nkululeko --config examples/exp_meld_bert.ini ``` ## Tips 1. **Memory**: BERT models require significant GPU memory. Use `device = cpu` if needed. 2. **Text quality**: BERT performance depends on transcript quality. 3. **Feature scaling**: Always use `scale = standard` when combining different feature types. 4. **Combining features**: Multi-modal (linguistic + acoustic) often outperforms single modality. ## Related Tutorials - [Text Processing Pipeline](text_processing.md): Transcribe and translate speech - [Feature Correlations](regplot.md): Explore feature importance