How to Align Databases

This tutorial explains how to combine and align multiple databases that have different label schemes for related tasks. This is useful when you want to leverage data from one domain (e.g., emotion) to improve prediction in another related domain (e.g., stress).

Overview

Sometimes you want to combine databases that are similar but don’t label exactly the same phenomena. For example:

You have limited stress data but many emotion databases
You want to use angry samples as stressed and happy/neutral as non-stressed

Nkululeko provides several configuration options to align databases:

Column renaming (colnames)
Label mapping (mapping)
Sample filtering (filter)
Target table selection (target_tables)

Configuration Options

Column Renaming: `colnames`

Rename columns to align with your target task:

emodb.colnames = {"emotion": "stress"}

This renames the emotion column to stress.

Label Mapping: `mapping`

Map original labels to new categories:

emodb.mapping = {"anger": "stress", "disgust": "stress", "neutral": "no stress", "sadness": "no stress"}

Sample Filtering: `filter`

Select only specific samples based on column values:

# Keep only anger, neutral, and happiness samples
emodb.filter = [["stress", ["anger", "neutral", "happiness"]]]

Target Tables: `target_tables`

Specify which tables contain the target labels:

emodb.target_tables = ["emotion"]

Example: Emotion to Stress Mapping

This example shows how to convert Berlin EmoDB emotion labels into binary stress labels.

Configuration: `exp_emodb_stress.ini`

[EXP]
root = ./examples/results/
name = emodb_stress
save_test = ./examples/results/emodb_stress/test.csv
epochs = 5

[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb
# Specify where target values come from
emodb.target_tables = ["emotion"]
# Rename emotion column to stress
emodb.colnames = {"emotion": "stress"}
# Keep only these emotion categories
emodb.filter = [["stress", ["anger", "neutral", "sadness", "disgust"]]]
# Map emotions to stress labels
emodb.mapping = {"anger": "stress", "disgust": "stress", "neutral": "no stress", "sadness": "no stress"}
emodb.split_strategy = speaker_split
# Define final labels
labels = ["stress", "no stress"]
target = stress

[FEATS]
type = ['os']

[MODEL]
type = mlp
layers = [64, 12]
drop = [.3, .4]

[PLOT]
uncertainty_threshold = 0.3

Run the Experiment

python -m nkululeko.nkululeko --config examples/exp_emodb_stress.ini

Advanced: Combining Multiple Databases

You can combine databases with different label schemes by aligning them to a common target.

Example: EmoDB + SUSAS for Stress Detection

[DATA]
databases = ['emodb', 'susas']

# EmoDB configuration
emodb = ./data/emodb/emodb
emodb.target_tables = ["emotion"]
emodb.colnames = {"emotion": "stress"}
emodb.filter = [["stress", ["anger", "neutral", "happiness"]]]
emodb.mapping = {"anger": "stress", "neutral": "no stress", "happiness": "no stress"}
# Use all emodb for training
emodb.split_strategy = train

# SUSAS configuration
susas = ./data/susas/
# Map ternary stress labels to binary
susas.mapping = {'0,1': 'no stress', '2': 'stress'}
susas.split_strategy = speaker_split

target = stress
labels = ["stress", "no stress"]

Key Points

EmoDB is used only for training (split_strategy = train)
SUSAS is split into train/test (split_strategy = speaker_split)
Both databases use the same target labels (stress, no stress)

Multi-Database Alignment with Root Files

For complex multi-database setups, use a separate configuration file for database roots:

Root Configuration: `data_roots.ini`

[DATA]
emodb = ./data/emodb/emodb
emodb.split_strategy = specified
emodb.test_tables = ['emotion.categories.test.gold_standard']
emodb.train_tables = ['emotion.categories.train.gold_standard']
emodb.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}

crema-d = ./data/crema-d/crema-d/1.3.0/
crema-d.split_strategy = specified
crema-d.colnames = {'sex':'gender'}
crema-d.target_tables = ['emotion.categories.desired.test','emotion.categories.desired.train']
crema-d.mapping = {'anger':'angry', 'happiness':'happy', 'sadness':'sad', 'neutral':'neutral'}

Main Configuration

[EXP]
root = ./examples/results/multidb
databases = ['emodb', 'crema-d']

[DATA]
root_folders = ./examples/data_roots.ini
target = emotion
labels = ['angry', 'happy', 'sad', 'neutral']

[FEATS]
type = ['os']
scale = standard

[MODEL]
type = mlp

Configuration Reference

Option	Description	Example
`colnames`	Rename columns	`{"emotion": "stress"}`
`mapping`	Map label values	`{"anger": "stress", "neutral": "no stress"}`
`filter`	Filter samples by column values	`[["column", ["val1", "val2"]]]`
`target_tables`	Tables containing target labels	`["emotion"]`
`split_strategy`	How to split data	`train`, `test`, `speaker_split`, `random`

Use Cases

Cross-domain transfer: Use emotion data for stress detection
Label harmonization: Combine databases with different label schemes
Data augmentation: Add out-of-domain data to training
Multi-corpus experiments: Train on multiple databases with aligned labels

Tips

In-domain data usually works better: Adding out-of-domain data doesn’t always help
Use a third database for evaluation: When combining databases, evaluate on held-out data
Check label distributions: Ensure balanced classes after mapping
Document your mappings: Keep track of how labels were aligned

Reference

Blog: How to align databases