Overview of options for the nkululeko framework
To be specified in a .ini file, config parser syntax
Kind of all (well, most) values have defaults
Contents
Sections
EXP
General experiment settings: paths, naming, run count, and output options.
root: experiment root folder
root = ./results/
type: the kind of experiment
type = classification
possible values:
classification: supervised learning experiment with a restricted set of categories (e.g., emotion categories).
regression: supervised learning experiment with continuous values (e.g., speaker age in years).
store: (relative to root) folder for caches
store = ./store/
name: a name for debugging output
name = emodb_exp
fig_dir: (relative to root) folder for plots
fig_dir = ./images/
res_dir: (relative to root) folder for result output
res_dir = ./results/
models_dir: (relative to root) folder to save models
models_dir = ./models/
runs: number of runs (e.g., to average over random initializations)
runs = 1
epochs: number of epochs for ANN training
epochs = 1
save: save the experiment as a pickle file to be restored again later (True or False)
save = False
save_test: save the test predictions as a new database in CSV format (default is False)
save_test = ./my_saved_test_predictions.csv
databases: name of databases to compare for the multidb module
databases = [‘emodb’, ‘timit’]
use_splits: can be used for multidb module to use the orginal split sets when train or test database. Else the whole database is used.
use_splits = True
traindevtest: set to true if you want to specify an extra dev set, that will be used for early stopping (patience) in neural net experiments.
traindevtest = False
sample_selection: select the samples to process (e.g. for augmentation, re-sampling, etc.): either train, test, or all
sample_selection = all
export_onnx = export the best trained model in ONNX format.
export_onnx = False
DATA
Database loading, label mapping, and train/test split configuration.
type: just a flag now to mark continuous data, so it can be binned to categorical data (using bins and labels)
type = continuous
databases: list of databases to be used in the experiment
databases = [‘emodb’, ‘timit’]
tests: Datasets to be used as test data for the stored best model. The databases listed here do not have to appear in the
databasesfield. Whennkululeko.nkululekois run with this option set and a saved experiment file already exists on disk, training is skipped entirely: the module loads the stored best model, evaluates it on the listed test databases, and writes a confusion matrix, a per-class text report, and a predictions CSV (with all original test columns plus apredictedcolumn) to the results directory. On the very first run (no saved file yet) the module trains normally and saves the experiment. See test_new_database.md for a step-by-step guide.tests = [‘emovo’]
tests = [‘ravdess’, ‘cremad’] ; multiple databases are concatenated
root_folders: specify an additional configuration specifically for all entries starting with a dataset name, acting as global defaults.
root_folders = data_roots.ini
db_name: path with audformatted repository for each database listed in ‘databases*. If this path is not absolute, it will be treated relative to the experiment folder.
emodb = /home/data/audformat/emodb/
db_name.type: type of storage, e.g., audformat database or ‘csv’ (needs header: file, speaker, task)
emodb.type = audformat
db_name.absolute_path: only for ‘csv’ databases: are the audio file paths relative or absolute? If not absolute, they will be treated relative to the database parent folder. NOT the experiment root folder.
my_data.absolute_path = True
db_name.audio_path: only for ‘csv’ databases: are the audio files in a special common folder?
my_data.audio_path = wav_files/
db_name.mapping: mapping python dictionary to map between categories for cross-database experiments (format: {‘target_emo’:’source_emo’})
emodb.mapping = {‘anger’:’angry’, ‘happiness’:’happy’, ‘sadness’:’sad’, ‘neutral’:’neutral’}
can also be used for general mapping
emodb.mapping = {‘gender’:{‘male’:0, ‘female’:1}, ‘emotion’:{‘anger’:’stress’, ‘neutral’:’no stress’}}
db_name.columns: names of the columns to load from the data (only for audformat databases)
my_data.columns = [“age”, “gender”, “speaker”, “diagnosis”]
db_name.label: name of the target variable for this database (if different from DATA.target)
my_data.label = “expression”
db_name.colnames: mapping to rename columns to standard names
my_data.colnames = {‘speaker’:’Participant ID’, ‘sex’:’gender’, ‘Age’: ‘age’}
db_name.split_strategy: How to identify sets for train/development data splits within one database
emodb.split_strategy = reuse
Possible values:
database: default (task.train, task.dev and task.test)
specified: specify the tables (an opportunity to assign multiple or no tables to train or dev set)
emodb.train_tables = [‘emotion.categories.train.gold_standard’]
emodb.dev_tables = [‘emotion.categories.dev.gold_standard’]
emodb.test_tables = [‘emotion.categories.test.gold_standard’]
speaker_split: split samples randomly but speaker disjunct, given a percentage of speakers for the test (and dev) set.
emodb.test_size = 50 (default:20)
emodb.dev_size = 20 # for train-dev-test experiments
list of test speakers: you can simply provide a list of test ids
emodb.split_strategy = [12, 14, 15, 16]
speakers_stated: explicitly state the speaker names for all splits (test and dev are required)
emodb.test = [14, 8]
emodb.dev = [12, 15]
emodb.train = [3, 9, 10, 11, 13, 16]
random: split samples randomly (but NOT speaker disjunct, e.g., no speaker info given or each sample a speaker), given a percentage of samples for the test set.
emodb.tests_size = 50 (default:20)
reuse: reuse the splits after a speaker_split run to save time with feature extraction.
train: use the entire database for training
test: use the entire database for evaluation / testing
dev: use the entire database for evaluation / development
balanced: stratify the data splits
balance = {‘emotion’:2, ‘age’:1, ‘gender’:1}
age_bins = 2
db_name.target_tables: tables that contain the target / speaker / sex labels
emodb.target_tables = [‘emotion’]
target_tables_append: set this to True if the multiple tables should be combined row-wise, else they are combined column-wise
target_tables_append = False
db_name.files_tables: tables that contain the audio file names
emodb.files_tables = [‘files’]
db_name.test_tables: tables that should be used for testing
emodb.test_tables = [‘emotion.categories.test.gold_standard’]
db_name.train_tables: tables that should be used for training
emodb.train_tables = [‘emotion.categories.train.gold_standard’]
db_name.as_test: use only the test split (for automatic experiments)
emodb.as_test = False
db_name.as_train: use only the train split (for automatic experiments)
emodb.as_train = False
db_name.limit_samples: maximum number of random N samples per table (for testing with very large data mainly)
emodb.limit_samples = 20
db_name.required: force a data set to have a specific feature (for example, filter all sets that have gender labeled in a database where this is not the case for all samples, e.g. MozillaCommonVoice)
emodb.required = gender
db_name.limit_samples_per_speaker: maximum number of samples per speaker (for leveling data where the same speakers have a large number of samples)
emodb.limit_samples_per_speaker = 20
db_name.min_duration_of_sample: limit the samples to a minimum length (in seconds)
emodb.min_duration_of_sample = 0.0
db_name.max_duration_of_sample: limit the samples to a maximum length (in seconds)
emodb.max_duration_of_sample = 0.0
db_name.rename_speakers: add the database name to the speaker names, e.g., because several databases use the same names
emodb.rename_speakers = False
db_name.filter: don’t use all the data but only selected values from columns: [col, val]*
emodb.filter = {‘gender’: [‘female’, ‘diverse’]}
db_name.scale: scale (standard normalize) the target variable (if numeric)
my_data.scale = True
db_name.reverse: reverse the target variable (if numeric). I.e. f(x) = abs(x-max)
db_name.reverse.max: max value to be used in the formula above. If omitted, the distribution will start with 0.
target: the task name, e.g. age or emotion
target = emotion
labels: for classification experiments: the names of the categories (is also used for regression when binning the values)
labels = [‘anger’, ‘boredom’, ‘disgust’, ‘fear’, ‘happiness’, ‘neutral’, ‘sadness’]
bins: array of integers to be used for binning continuous data
bins = [-100, 40, 50, 60, 70, 100]
no_reuse: don’t re-use any tables, but start fresh
no_reuse = False
min_dur_test: specify a minimum duration for test samples (in seconds)
min_dur_test = 3.5
target_divide_by: divide the target values by some factor, e.g., to make age smaller and encode years from .0 to 1
target_divide_by = 100
limit_samples: maximum number of random N samples per sample selection
limit_samples = 20
limit_samples_per_speaker: maximum number of samples per speaker per sample selection
limit_samples_per_speaker = 20
min_duration_of_sample: limit the samples to a minimum length (in seconds) per sample selection
min_duration_of_sample = 0.0
max_duration_of_sample: limit the samples to a maximum length (in seconds) per sample selection
max_duration_of_sample = 0.0
check_size: check the filesize of all samples in train and test splits in bytes
check_size = 1000
check_vad: check if the files contain speech, using silero VAD
check_vad = True
filter.sample_selection: restrict the filters to either [train, test, all]
filter.sample_selection=all
AUGMENT
Data augmentation options to artificially expand the training set.
augment: select the methods to augment: either traditional or random_splice
augment = [‘traditional’, ‘auglib’, ‘random_splice’]
choices are:
traditional: uses the audiomentations package
auglib: uses audEERING’s auglib package
random_splice: randomly re-orders short splices (obfuscates the words)
p_reverse: for random_splice: probability of some samples to be in reverse order (default: 0.3)
top_db: for random_splice: top db level for silence to be recognized (default: 12)
result: file name to store the augmented data (can then be added to training)
result = augmented.csv
augmentations: select the augmentation methods for the audiomentation module. Default provided.
augmentations = Compose([AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.05),Shift(p=0.5),BandPassFilter(min_center_freq=100.0, max_center_freq=6000),])
transformations: select the augmentation methods for the auglib package. Defaults to [“room”, “music”, “noise”, “babble”, “crop”, “cough”]
transformations = [‘music’, ‘room’, ‘cough’]
SEGMENT
Audio segmentation settings for splitting recordings into smaller chunks (e.g., by silence or fixed duration).
result: name of the segmented data table as a result. Additionally, a segment file with the gaps will be generated: segmented_silence.csv.
result = segmented.csv
method: select the model
method = silero
min_length: the minimum length of rest samples (in seconds)
min_length = 2
max_length: the maximum length of segments; longer ones are cut here. (in seconds)
max_length = 10 # if not set, original segmentation is used
output_audio: export actual audio files for each detected segment (default: False)
output_audio = True
audio_format: output audio format when output_audio is True (default: wav)
audio_format = wav # supported values: wav, flac, mp3
audio_dir: output directory for audio segments, relative to the experiment data directory (
{root}/{name}, default: segments)audio_dir = segments
sampling_rate: resample exported audio segments to this rate in Hz; omit to preserve the original sample rate
sampling_rate = 16000
include_silence_borders: for the result file that represent the gaps between speech: include the borders?
include_silence_borders = False
FEATS
Feature extraction settings. Multiple feature types can be combined by listing them together; they are concatenated column-wise.
type: a comma-separated list of types of features; they will be column-wise concatenated
type = [‘os’]
possible values:
import: already computed features
import_file = pathes to files with features in CSV format
import_file = [‘path1/file1.csv’, ‘path2/file1.csv2’]
import_files_append = set this to False if you want the files to be concatenated column-wise, else it’s done row-wise
import_files_append = True
-
mld.model = path to the mld sources folder
mld.df = MLD class to use for feature extraction (default:
Mld)accepted values:
Mld,MldSust,MldStructexample:
mld.df = MldSust
min_syls = minimum number of syllables
-
set = eGeMAPSv02 (features set)
level = functionals (or lld: feature level)
os.features: list of selected features (disregard others)
praat: Praat selected features thanks to David R. Feinberg scripts
praat.features: list of selected features (disregard others)
spectra: Melspecs for convolutional networks
fft_win_dur = 25 (msec analysis frame/window length)
fft_hop_dur = 10 (msec hop duration)
fft_nbands = 64 (number of frequency bands)
ast: audio spectrogram transformer features from MIT
wav2vec variants: wav2vec2 embeddings from facebook
“wav2vec2-large-robust-ft-swbd-300h”
wav2vec2.model = path to the wav2vec2 model folder
wav2vec2.layer = which last hidden layer to use
bert variants: Bert embeddings
bert.model = path to the bert model folder (without the google-bert/)
bert.layer = which last hidden layer to use
bert.text_column = which column to use for text analysis
bert.text_column = text
Hubert variants: facebook Hubert models
“hubert-base-ls960”, “hubert-large-ll60k”, “hubert-large-ls960-ft”, hubert-xlarge-ll60k, “hubert-xlarge-ls960-ft”
WavLM:
“wavlm-base”, “wavlm-base-plus”, “wavlm-large”
Whisper: whisper models
“whisper-base”, “whisper-large”, “whisper-medium”, “whisper-tiny”
audmodel: generic audmodel format model import
audmodel.id = audmodel id
audmodel.embeddings_name = hidden_states
audwav2vec2: audEERING emotion model embeddings, wav2vec2.0 model finetuned on MSPPodcast emotions, embeddings
aud.model = ./audmodel/ (path to the audEERING model folder)
auddim: audEERING emotion model dimensions, wav2vec2.0 model finetuned on MSPPodcast arousal, dominance, valence
agender: audEERING age and gender model embeddings, wav2vec2.0 model finetuned on several age databases, embeddings
agender.model = ./agender/ (path to the audEERING model folder)
agender_agender: audEERING age and gender model age and gender predictions, wav2vec2.0 model finetuned on several age and gendeer databases: age, female, male, child
clap: Laion’s Clap embedding
xbow: open crossbow features codebook computed from open smile features
xbow.model = path to xbow root folder (containing xbow.jar)
size = 500 (codebook size, rule of thumb: should grow with datasize)
assignments = 10 (number of words in the bag representation where the counter is increased for each input LLD, rule of thumb: should grow/shrink with codebook size)
snr: estimated SNR (signal-to-noise ratio)
mos: estimated MOS (mean opinion score)
pesq: estimated PESQ (Perceptual Evaluation of Speech Quality)
sdr: estimated SDR (Perceptual Evaluation of Speech Quality)
spkrec: speaker-id: speechbrain embeddings
stoi: estimated STOI (Perceptual Evaluation of Speech Quality)
squim: TorchAudio SQUIM (Speech Quality and Intelligibility Measures)
wav2vec2: Facebook’s wav2vec2 models
whisper: OpenAI’s Whisper ASR model
audmodel: audEERING’s models
balancing: balance the data with respect to class distribution
balancing = smote
possible values:
ros: Random Over Sampler
smote: SMOTE
adasyn: ADASYN
borderlinesmote: Borderline SMOTE
svmsmote: SVM SMOTE
smoteenn: SMOTE + Edited Nearest Neighbours
smotetomek: SMOTE + Tomek links
clustercentroids: Cluster Centroids
randomundersampler: Random Under Sampler
editednearestneighbours: Edited Nearest Neighbours
tomeklinks: Tomek Links
scale: scale (standard/normalize) the features
scale = standard
possible values:
standard: z-transformation (mean of 0 and std of 1) based on the training set
robust: robust scaler
speaker: like standard but based on individual speaker sets (also for the test)
bins: convert feature values into 0, .5 and 1 (for low, mid and high)
minmax: rescales the data set such that all feature values are in the range [0, 1]
maxabs: similar to MinMaxScaler except that the values are mapped across several ranges depending on whether negative OR positive values are present
normalizer: scales each sample (row) individually to have unit norm (e.g., L2 norm)
powertransformer: applies a power transformation to each feature to make the data more Gaussian-like in order to stabilize variance and minimize skewness
quantiletransformer: applies a non-linear transformation such that the probability density function of each feature will be mapped to a uniform or Gaussian distribution (range [0, 1])
set: name of opensmile feature set, e.g. eGeMAPSv02, ComParE_2016, GeMAPSv01a, eGeMAPSv01a
set = eGeMAPSv02
level: level of opensmile features
level = functional
possible values:
functional: aggregated over the whole utterance
lld: low-level descriptor: framewise
no_reuse: don’t re-use any feature files, but start fresh
no_reuse = False
features: disregard all other features and only use these the ones stated here.
features = [‘speechrate(nsyll / dur)’, ‘F0semitoneFrom27.5Hz_sma3nz_amean’]
needs_feature_extraction: force the features to be freshly extracted
needs_feature_extraction = False
print_feats: set this to False if you don’t want os and praat feature names to be printed out
print_feats = True
store_format: in which format to store the feature data frames [pkl | csv]
store_format = pkl
MODEL
Model and training specifications. In general, default values should work for classification tasks.
type: select the model
type = xgb
possible values:
xgb: XGBoost
xgr: XGBoost for regression
svr: Support vector machine for regression
knn: k nearest neighbors
knn_reg: k nearest neighbors for regression
tree: Decision tree
tree_reg: Decision tree for regression
nb: Naive Bayes
mlp: Multi-layer perceptron (neural network)
finetune: Fine-tuning for pre-trained models:
pretrained_model: HF for base model
push_to_hub: True
max_duration: 8 (in seconds, resit are disgarded)
balancing: smote (as in FEATS, only for finetune needs to be defined here)
class_weight: add class_weight to the linear classifier (XGB, SVM) fit methods for imbalanced data (True or False)
class_weight = False
logo: leave-one-speaker group out. Will disregard train/dev splits and split the speakers in logo groups and then do a LOGO evaluation. If you want LOSO (leave one speaker out), simply set the number to the number of speakers.
logo = 10
k_fold_cross: k-fold-cross validation. Will disregard train/dev splits and do a stratified cross validation (meaning that classes are balanced across folds). speaker id is ignored.
k_fold_cross = 10
learning_rate: learning rate for neural networks
learning_rate = 0.0001
optimizer: optimizer type for neural networks (case insensitive)
optimizer = adam
possible values:
adam: Adam optimizer (default)
adamw: AdamW optimizer with weight decay
sgd: SGD optimizer with momentum
related parameters:
weight_decay: weight decay for AdamW optimizer (default: 0.01)
weight_decay = 0.01
momentum: momentum for SGD optimizer (default: 0.9)
momentum = 0.9
scheduler: learning rate scheduler for neural networks (case insensitive)
scheduler = cosine
possible values:
cosine: cosine annealing with linear warmup (default); steps per batch
step: step decay — reduces LR by gamma every step_size epochs; steps per epoch
exponential: exponential decay — multiplies LR by gamma each epoch; steps per epoch
none / false: no scheduler
related parameters:
warmup_epochs: number of warmup epochs for cosine scheduler (default: 5)
warmup_epochs = 5
scheduler.step_size: epoch interval for step scheduler (default: 10)
scheduler.step_size = 10
scheduler.gamma: decay factor for step and exponential schedulers
step default: 0.5; exponential default: 0.95
scheduler.gamma = 0.5
drop: dropout rate for neural networks (0 to 1)
drop = 0.1
batch_size: batch size for neural networks
batch_size = 8
loss: loss function for neural networks
loss = cross
possible values:
bce: BinaryCrossEntropyLoss (for binary classification)
cross: CrossEntropyLoss
f1: F1 loss
focal: Focal loss (for imbalanced classification)
1-ccc: concordance correlation coefficient
mse: Mean squared error (for regression)
mae: Mean absolute error (for regression)
weighted_bce: Weighted BinaryCrossEntropyLoss (for imbalanced binary classification)
label_smoothing: label smoothing for cross-entropy loss. Accepts either a boolean or a float in [0.0, 1.0]. Helps prevent overconfidence and can improve generalization.
label_smoothing = 0.1
Set to
Trueto use the default value of 0.1Set to a float between 0.0 and 1.0 for a custom smoothing factor
Invalid or out-of-range values fall back to 0.0 with a warning
Default: 0.0 (no smoothing)
measure: A measure/metric to report progress with experiments. For classification, default is UAR. For regression, default is MSE.
measure = mse
possible values:
uar: Unweighted Average Recall (default for classification)
eer: Equal Error Rate (for binary classification, commonly used in biometric systems and deepfake detection)
mse: Mean Squared Error (default for regression)
mae: Mean Absolute Error (for regression)
ccc: Concordance Correlation Coefficient (for regression)
Note: When EER is specified, both EER and UAR will be reported
activation: The activation function for MLPs. One of [“relu”, “sigmoid”, “tanh”, “leaky_relu”]
activation = relu
layers: specify the layer architecture for MLP
layers = [64, 16]
C_val: regularization value for SVM
C_val = 1.0
gamma: gamma value for SVM (kernel coefficient)
gamma = scale
kernel: kernel type for SVM
kernel = rbf
possible values: linear, poly, rbf, sigmoid
K_val: number of neighbors for KNN
K_val = 5
weights: weight function for KNN
weights = uniform
possible values: uniform, distance
n_estimators: number of trees for tree-based models (XGBoost, Random Forest)
n_estimators = 100
max_depth: maximum depth of trees
max_depth = 6
subsample: subsample ratio for XGBoost
subsample = 1.0
colsample_bytree: subsample ratio of columns for XGBoost
colsample_bytree = 1.0
random_seed: random seed for reproducible results
random_seed = 42 # set this to False if #run > 1
device: device for neural network training
device = cpu
possible values: cpu, cuda
patience: early stopping patience for neural networks
patience = 5
save: set this to False if you don’t want models stored on disk
save = True
EXPL
Feature exploration and visualisation options, used by the explore module.
feature_distributions: plot distributions for features and analyze importance
feature_distributions = False
ignore_gender: ignore gender when plotting feature distribution
ignore_gender = False
model: Which model to use to estimate feature importance.
model = [‘log_reg’] # can be all models from the MODEL section, If they are combined, the mean result is used.
max_feats: Maximal number of important features
max_feats = 10
permutation: use feature permutation to determine the best features. Make sure to test the models before.
permutation = True
scatter: make a scatter plot of combined train and test data, colored by label.
scatter = [‘tsne’, ‘umap’, ‘pca’]
scatter.target: target for the scatter plot (defaults to target value).
scatter.target = [‘age’, ‘gender’, ‘likability’]
scatter.dim: dimension of reduction, can be 2 or 3.
scatter.dim = 2
plot_tree: Plot a decision tree for classification (Requires model = tree)
plot_tree = False
value_counts: plot distributions of target for the samples and speakers (in the image_dir)
value_counts = [[‘gender’], [‘age’], [‘age’, ‘duration’]]
column.bin_reals: If the column variable is real numbers (instead of categories), should it be binned? for any value in value_counts as well as the target variable
age.bin_reals = True
dist_type: type of plot for value counts, either histogram (hist) or density estimation (kde)
dist_type = kde
spotlight: open a web-browser window to inspect the data with the spotlight software. Needs package renumics-spotlight to be installed!
spotlight = False
shap: compute SHAP values, need to run the model first.
shap = False
print_stats: whether (possibly extensive) results from statistical tests should be printed out on the debug channel
print_stats = False
print_colvals: print the unique values for all columns in the data
print_colvals = False
plot_features: plot distributions for this features in any case, irrespective of their importance
plot_features = [“speechrate”, “mean_f0”]
regplot: do scatter plots for two features, and show categories. When two values are given, the target is used as category, else one could be stated.
regplot = [[“feat_a”, “feat_b”], [“feat_a”, “feat_b”, “emotion”], [“feat_a”, “feat_b”, “age”]]
PREDICT
Automatic soft-label prediction using pre-trained models (e.g., age, gender, arousal).
targets: Speaker/speech characteristics to be predicted by some models
targets = [‘text’, ‘translation’, ‘textclassification’, ‘speaker’, ‘gender’, ‘age’, ‘snr’, ‘arousal’, ‘valence’, ‘dominance’, ‘pesq’, ‘mos’]
textclassifier.candidates = [“sadness”, “anger”, “neutral”]: for target textclassification: the labels for the categories that should be predicted (using joeddav/xlm-roberta-large-xnli)
target_language: target language for the translation prediction
target_language = en
EXPORT
Options for exporting the dataset (audio files and annotations) to a new location or format.
target_root: New root directory for the database, will be created
target_root = ./exported_data/
orig_root: Path to folder that is parent to the original audio files
orig_root = ../data/emodb/wav
data_name: Name for the CSV file
data_name = exported_database
segments_as_files: Whether original files should be used, or segments split (resulting potentially in many new files).
segments_as_files = False
bundle_path: Output directory for the portable model bundle created by
python -m nkululeko.bundle. Defaults to<root>/<name>/export. Overridden by the--outputCLI flag.bundle_path = ./my_polish_bundle
CROSSDB
Cross-database experiment settings for evaluating generalisation across datasets.
train_extra: add a additional training partition to all experiments in the cross database series. This extra data should be described in a root_folders file
train_extra = [‘addtrain_db_1’, ‘addtrain_db_2’]
PLOT
Plot styling and output options for result figures.
name: special name as a prefix for all plots (stored in img_dir).
name = my_special_config_within_the_experiment
epochs: whether to make a plot each for every epoch result.
epochs = False
anim_progression: generate an animated GIF from the epoch plots
anim_progression = False
fps: frames per second for the animated GIF
fps = 1
epoch_progression: plot the progression of test, train and loss results over epochs
epoch_progression = False
best_model: search for the best performing model and plot conf matrix (needs MODEL.store to be turned on)
best_model = False
combine_per_speaker: print an extra confusion plot where the predictions per speaker are combined, with either the
modeor themeanfunctioncombine_per_speaker = mode
format: format for plots, either png or eps (for scalable graphics)
format = png
ccc: show concordance correlation coefficient in plot headings
ccc = False
fill_areas: should areas, e.g. in distribution plots, be filled?
fill_areas = False
uncertainty_threshold: plot a confusionmatrix with samples removed that are less uncertain
uncertainty_threshold = .6
runs_compare: generate plots to compare the run results: compare features, models or databases
runs_compare = features
titles: if titles should be added to the plots
titles = True
kind: kind of plot for EXPL.feature distributions: [violin, bar, box, swarm, strip]
kind = violin
RESAMPLE
Audio resampling settings for converting sample rates across a dataset.
replace: whether samples should be replaced right where they are, or copies done and a new dataframe given
replace = False
target: the name of the new dataframe, if replace==false
target = data_resampled.csv
REPORT
Controls how experiment results are collected, displayed, and persisted.
show: print the report at the end
show = False
fresh: start a new report
latex: generate a latex and PDF document: name of document
latex = my_latex_document
title: title for document
author: author for document
OPTIM
Hyperparameter optimisation settings for automated model tuning.
model: the model type to optimize (e.g., ‘mlp’, ‘svm’, ‘xgb’)
model = mlp
search_strategy: intelligent search strategy for faster optimization
search_strategy = random
possible values:
grid: exhaustive grid search (default, slowest but thorough)
random: random search with n_iter samples (faster, often as good as grid)
halving_random: successive halving random search (fastest, requires sklearn >= 0.24)
halving_grid: successive halving grid search (compromise between speed and thoroughness)
metric: evaluation metric for optimization
metric = uar
possible values:
uar: Unweighted Average Recall (balanced accuracy, good for imbalanced datasets)
accuracy: Standard accuracy (default)
f1: Macro-averaged F1-score (balance of precision and recall)
precision: Macro-averaged precision
recall: Macro-averaged recall
sensitivity: Sensitivity (same as recall)
specificity: Specificity (true negative rate)
n_iter: number of parameter combinations to try for random search
n_iter = 50
cv_folds: number of cross-validation folds for hyperparameter evaluation
cv_folds = 3
Parameter specifications: Define search spaces for hyperparameters using tuples for ranges and lists for discrete choices
nlayers: number of hidden layers for neural networks
nlayers = (1, 3) # search from 1 to 3 layers
nnodes: number of nodes per layer for neural networks
nnodes = (16, 256) # search powers of 2 from 16 to 256
lr: learning rate for neural networks
lr = [0.0001, 0.001, 0.01, 0.1] # discrete log-scale choices (recommended)
lr = (0.0001, 0.01) # or range with automatic log-scale sampling
bs: batch size for neural networks
bs = (2, 256) # search powers of 2 from 2 to 256
loss: loss function for neural networks
loss = [“cross”, “f1”] # discrete choices
do: dropout rate for neural networks
do = (0.1, 0.5, 0.1) # search from 0.1 to 0.5 with step 0.1
Traditional ML parameters: For SVM, XGB, etc., use parameter names from sklearn
C = [0.1, 1.0, 10.0] # SVM regularization parameter
n_estimators = [50, 100, 200] # XGB number of estimators
max_depth = [3, 6, 9] # XGB maximum depth
Parameter specification formats:
(min, max): Range with automatic step selection based on parameter type
For learning rates: uses logarithmic sampling (5-8 values)
For dropout: uses linear sampling (5 values)
For integers: uses linear sampling
(min, max, step): Range with explicit step size
[val1, val2, …]: Discrete list of values to try (recommended for most cases)
value: Single value (equivalent to [value])
Recommended parameter ranges:
Learning rate:
[0.0001, 0.001, 0.01, 0.1](log-scale discrete values)Dropout:
[0.1, 0.3, 0.5, 0.7](common dropout rates)SVM C:
[0.1, 1.0, 10.0, 100.0](regularization parameter)XGB n_estimators:
[50, 100, 200](number of trees)XGB max_depth:
[3, 6, 9, 12](tree depth)
Usage: Run with python3 -m nkululeko.optim --config exp.ini
FLAGS
Running different values at once. All listed parameters are combined via Cartesian product — one experiment is run per combination. Example:
models = [‘xgb’, ‘svm’]
features = [‘praat’, ‘os’]
balancing = [‘none’, ‘ros’, ‘smote’]
scale = [‘none’, ‘standard’, ‘robust’, ‘minmax’]
name_target = list of (EXP.name, DATA.target) pairs iterated as a unit
Each pair sets
EXP.nameandDATA.targettogether for one experiment slot, then that slot is combined via product with any other FLAGS parameters.Label DataFrames are reloaded per pair; audio features are extracted once and reused across all pairs.
example:
name_target = [("grade", "grade"), ("roughness", "roughness"), ("strain", "strain")] models = ['xgb', 'mlp']
→ runs 3 × 2 = 6 experiments
The FLAGS mechanism can also drive the
exploremodule (feature analysis / visualisation) instead of model training. Pass--mod exploreon the command line:python -m nkululeko.flags --config exp.ini --mod explore
No result score is reported; output plots are stored per experiment under
{EXP.root}/{EXP.name}/images/.