Overview of options for the nkululeko framework

  • To be specified in a .ini file, config parser syntax

  • Kind of all (well, most) values have defaults

Contents

Sections

EXP

General experiment settings: paths, naming, run count, and output options.

  • root: experiment root folder

    • root = ./results/

  • type: the kind of experiment

    • type = classification

    • possible values:

      • classification: supervised learning experiment with a restricted set of categories (e.g., emotion categories).

      • regression: supervised learning experiment with continuous values (e.g., speaker age in years).

  • store: (relative to root) folder for caches

    • store = ./store/

  • name: a name for debugging output

    • name = emodb_exp

  • fig_dir: (relative to root) folder for plots

    • fig_dir = ./images/

  • res_dir: (relative to root) folder for result output

    • res_dir = ./results/

  • models_dir: (relative to root) folder to save models

    • models_dir = ./models/

  • runs: number of runs (e.g., to average over random initializations)

    • runs = 1

  • epochs: number of epochs for ANN training

    • epochs = 1

  • save: save the experiment as a pickle file to be restored again later (True or False)

    • save = False

  • save_test: save the test predictions as a new database in CSV format (default is False)

    • save_test = ./my_saved_test_predictions.csv

  • databases: name of databases to compare for the multidb module

    • databases = [‘emodb’, ‘timit’]

  • use_splits: can be used for multidb module to use the orginal split sets when train or test database. Else the whole database is used.

    • use_splits = True

  • traindevtest: set to true if you want to specify an extra dev set, that will be used for early stopping (patience) in neural net experiments.

    • traindevtest = False

  • sample_selection: select the samples to process (e.g. for augmentation, re-sampling, etc.): either train, test, or all

    • sample_selection = all

  • export_onnx = export the best trained model in ONNX format.

    • export_onnx = False

DATA

Database loading, label mapping, and train/test split configuration.

  • type: just a flag now to mark continuous data, so it can be binned to categorical data (using bins and labels)

    • type = continuous

  • databases: list of databases to be used in the experiment

    • databases = [‘emodb’, ‘timit’]

  • tests: Datasets to be used as test data for the stored best model. The databases listed here do not have to appear in the databases field. When nkululeko.nkululeko is run with this option set and a saved experiment file already exists on disk, training is skipped entirely: the module loads the stored best model, evaluates it on the listed test databases, and writes a confusion matrix, a per-class text report, and a predictions CSV (with all original test columns plus a predicted column) to the results directory. On the very first run (no saved file yet) the module trains normally and saves the experiment. See test_new_database.md for a step-by-step guide.

    • tests = [‘emovo’]

    • tests = [‘ravdess’, ‘cremad’] ; multiple databases are concatenated

  • root_folders: specify an additional configuration specifically for all entries starting with a dataset name, acting as global defaults.

    • root_folders = data_roots.ini

  • db_name: path with audformatted repository for each database listed in ‘databases*. If this path is not absolute, it will be treated relative to the experiment folder.

    • emodb = /home/data/audformat/emodb/

  • db_name.type: type of storage, e.g., audformat database or ‘csv’ (needs header: file, speaker, task)

    • emodb.type = audformat

  • db_name.absolute_path: only for ‘csv’ databases: are the audio file paths relative or absolute? If not absolute, they will be treated relative to the database parent folder. NOT the experiment root folder.

    • my_data.absolute_path = True

  • db_name.audio_path: only for ‘csv’ databases: are the audio files in a special common folder?

    • my_data.audio_path = wav_files/

  • db_name.mapping: mapping python dictionary to map between categories for cross-database experiments (format: {‘target_emo’:’source_emo’})

    • emodb.mapping = {‘anger’:’angry’, ‘happiness’:’happy’, ‘sadness’:’sad’, ‘neutral’:’neutral’}

    • can also be used for general mapping

    • emodb.mapping = {‘gender’:{‘male’:0, ‘female’:1}, ‘emotion’:{‘anger’:’stress’, ‘neutral’:’no stress’}}

  • db_name.columns: names of the columns to load from the data (only for audformat databases)

    • my_data.columns = [“age”, “gender”, “speaker”, “diagnosis”]

  • db_name.label: name of the target variable for this database (if different from DATA.target)

    • my_data.label = “expression”

  • db_name.colnames: mapping to rename columns to standard names

    • my_data.colnames = {‘speaker’:’Participant ID’, ‘sex’:’gender’, ‘Age’: ‘age’}

  • db_name.split_strategy: How to identify sets for train/development data splits within one database

    • emodb.split_strategy = reuse

    • Possible values:

      • database: default (task.train, task.dev and task.test)

      • specified: specify the tables (an opportunity to assign multiple or no tables to train or dev set)

        • emodb.train_tables = [‘emotion.categories.train.gold_standard’]

        • emodb.dev_tables = [‘emotion.categories.dev.gold_standard’]

        • emodb.test_tables = [‘emotion.categories.test.gold_standard’]

      • speaker_split: split samples randomly but speaker disjunct, given a percentage of speakers for the test (and dev) set.

        • emodb.test_size = 50 (default:20)

        • emodb.dev_size = 20 # for train-dev-test experiments

      • list of test speakers: you can simply provide a list of test ids

        • emodb.split_strategy = [12, 14, 15, 16]

      • speakers_stated: explicitly state the speaker names for all splits (test and dev are required)

        • emodb.test = [14, 8]

        • emodb.dev = [12, 15]

        • emodb.train = [3, 9, 10, 11, 13, 16]

      • random: split samples randomly (but NOT speaker disjunct, e.g., no speaker info given or each sample a speaker), given a percentage of samples for the test set.

        • emodb.tests_size = 50 (default:20)

      • reuse: reuse the splits after a speaker_split run to save time with feature extraction.

      • train: use the entire database for training

      • test: use the entire database for evaluation / testing

      • dev: use the entire database for evaluation / development

      • balanced: stratify the data splits

        • balance = {‘emotion’:2, ‘age’:1, ‘gender’:1}

        • age_bins = 2

  • db_name.target_tables: tables that contain the target / speaker / sex labels

    • emodb.target_tables = [‘emotion’]

  • target_tables_append: set this to True if the multiple tables should be combined row-wise, else they are combined column-wise

    • target_tables_append = False

  • db_name.files_tables: tables that contain the audio file names

    • emodb.files_tables = [‘files’]

  • db_name.test_tables: tables that should be used for testing

    • emodb.test_tables = [‘emotion.categories.test.gold_standard’]

  • db_name.train_tables: tables that should be used for training

    • emodb.train_tables = [‘emotion.categories.train.gold_standard’]

  • db_name.as_test: use only the test split (for automatic experiments)

    • emodb.as_test = False

  • db_name.as_train: use only the train split (for automatic experiments)

    • emodb.as_train = False

  • db_name.limit_samples: maximum number of random N samples per table (for testing with very large data mainly)

    • emodb.limit_samples = 20

  • db_name.required: force a data set to have a specific feature (for example, filter all sets that have gender labeled in a database where this is not the case for all samples, e.g. MozillaCommonVoice)

    • emodb.required = gender

  • db_name.limit_samples_per_speaker: maximum number of samples per speaker (for leveling data where the same speakers have a large number of samples)

    • emodb.limit_samples_per_speaker = 20

  • db_name.min_duration_of_sample: limit the samples to a minimum length (in seconds)

    • emodb.min_duration_of_sample = 0.0

  • db_name.max_duration_of_sample: limit the samples to a maximum length (in seconds)

    • emodb.max_duration_of_sample = 0.0

  • db_name.rename_speakers: add the database name to the speaker names, e.g., because several databases use the same names

    • emodb.rename_speakers = False

  • db_name.filter: don’t use all the data but only selected values from columns: [col, val]*

    • emodb.filter = {‘gender’: [‘female’, ‘diverse’]}

  • db_name.scale: scale (standard normalize) the target variable (if numeric)

    • my_data.scale = True

  • db_name.reverse: reverse the target variable (if numeric). I.e. f(x) = abs(x-max)

  • db_name.reverse.max: max value to be used in the formula above. If omitted, the distribution will start with 0.

  • target: the task name, e.g. age or emotion

    • target = emotion

  • labels: for classification experiments: the names of the categories (is also used for regression when binning the values)

    • labels = [‘anger’, ‘boredom’, ‘disgust’, ‘fear’, ‘happiness’, ‘neutral’, ‘sadness’]

  • bins: array of integers to be used for binning continuous data

    • bins = [-100, 40, 50, 60, 70, 100]

  • no_reuse: don’t re-use any tables, but start fresh

    • no_reuse = False

  • min_dur_test: specify a minimum duration for test samples (in seconds)

    • min_dur_test = 3.5

  • target_divide_by: divide the target values by some factor, e.g., to make age smaller and encode years from .0 to 1

    • target_divide_by = 100

  • limit_samples: maximum number of random N samples per sample selection

    • limit_samples = 20

  • limit_samples_per_speaker: maximum number of samples per speaker per sample selection

    • limit_samples_per_speaker = 20

  • min_duration_of_sample: limit the samples to a minimum length (in seconds) per sample selection

    • min_duration_of_sample = 0.0

  • max_duration_of_sample: limit the samples to a maximum length (in seconds) per sample selection

    • max_duration_of_sample = 0.0

  • check_size: check the filesize of all samples in train and test splits in bytes

    • check_size = 1000

  • check_vad: check if the files contain speech, using silero VAD

    • check_vad = True

  • filter.sample_selection: restrict the filters to either [train, test, all]

    • filter.sample_selection=all

AUGMENT

Data augmentation options to artificially expand the training set.

  • augment: select the methods to augment: either traditional or random_splice

  • p_reverse: for random_splice: probability of some samples to be in reverse order (default: 0.3)

  • top_db: for random_splice: top db level for silence to be recognized (default: 12)

  • result: file name to store the augmented data (can then be added to training)

    • result = augmented.csv

  • augmentations: select the augmentation methods for the audiomentation module. Default provided.

    • augmentations = Compose([AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.05),Shift(p=0.5),BandPassFilter(min_center_freq=100.0, max_center_freq=6000),])

  • transformations: select the augmentation methods for the auglib package. Defaults to [“room”, “music”, “noise”, “babble”, “crop”, “cough”]

    • transformations = [‘music’, ‘room’, ‘cough’]

SEGMENT

Audio segmentation settings for splitting recordings into smaller chunks (e.g., by silence or fixed duration).

  • result: name of the segmented data table as a result. Additionally, a segment file with the gaps will be generated: segmented_silence.csv.

    • result = segmented.csv

  • method: select the model

  • min_length: the minimum length of rest samples (in seconds)

    • min_length = 2

  • max_length: the maximum length of segments; longer ones are cut here. (in seconds)

    • max_length = 10 # if not set, original segmentation is used

  • output_audio: export actual audio files for each detected segment (default: False)

    • output_audio = True

  • audio_format: output audio format when output_audio is True (default: wav)

    • audio_format = wav # supported values: wav, flac, mp3

  • audio_dir: output directory for audio segments, relative to the experiment data directory ({root}/{name}, default: segments)

    • audio_dir = segments

  • sampling_rate: resample exported audio segments to this rate in Hz; omit to preserve the original sample rate

    • sampling_rate = 16000

  • include_silence_borders: for the result file that represent the gaps between speech: include the borders?

    • include_silence_borders = False

FEATS

Feature extraction settings. Multiple feature types can be combined by listing them together; they are concatenated column-wise.

  • type: a comma-separated list of types of features; they will be column-wise concatenated

    • type = [‘os’]

    • possible values:

      • import: already computed features

        • import_file = pathes to files with features in CSV format

          • import_file = [‘path1/file1.csv’, ‘path2/file1.csv2’]

        • import_files_append = set this to False if you want the files to be concatenated column-wise, else it’s done row-wise

          • import_files_append = True

      • mld: mid-level-descriptors

        • mld.model = path to the mld sources folder

        • mld.df = MLD class to use for feature extraction (default: Mld)

          • accepted values: Mld, MldSust, MldStruct

          • example: mld.df = MldSust

        • min_syls = minimum number of syllables

      • os: open smile features

        • set = eGeMAPSv02 (features set)

        • level = functionals (or lld: feature level)

        • os.features: list of selected features (disregard others)

      • praat: Praat selected features thanks to David R. Feinberg scripts

        • praat.features: list of selected features (disregard others)

      • spectra: Melspecs for convolutional networks

        • fft_win_dur = 25 (msec analysis frame/window length)

        • fft_hop_dur = 10 (msec hop duration)

        • fft_nbands = 64 (number of frequency bands)

      • ast: audio spectrogram transformer features from MIT

  • balancing: balance the data with respect to class distribution

    • balancing = smote

    • possible values:

      • ros: Random Over Sampler

      • smote: SMOTE

      • adasyn: ADASYN

      • borderlinesmote: Borderline SMOTE

      • svmsmote: SVM SMOTE

      • smoteenn: SMOTE + Edited Nearest Neighbours

      • smotetomek: SMOTE + Tomek links

      • clustercentroids: Cluster Centroids

      • randomundersampler: Random Under Sampler

      • editednearestneighbours: Edited Nearest Neighbours

      • tomeklinks: Tomek Links

  • scale: scale (standard/normalize) the features

    • scale = standard

    • possible values:

      • standard: z-transformation (mean of 0 and std of 1) based on the training set

      • robust: robust scaler

      • speaker: like standard but based on individual speaker sets (also for the test)

      • bins: convert feature values into 0, .5 and 1 (for low, mid and high)

      • minmax: rescales the data set such that all feature values are in the range [0, 1]

      • maxabs: similar to MinMaxScaler except that the values are mapped across several ranges depending on whether negative OR positive values are present

      • normalizer: scales each sample (row) individually to have unit norm (e.g., L2 norm)

      • powertransformer: applies a power transformation to each feature to make the data more Gaussian-like in order to stabilize variance and minimize skewness

      • quantiletransformer: applies a non-linear transformation such that the probability density function of each feature will be mapped to a uniform or Gaussian distribution (range [0, 1])

  • set: name of opensmile feature set, e.g. eGeMAPSv02, ComParE_2016, GeMAPSv01a, eGeMAPSv01a

    • set = eGeMAPSv02

  • level: level of opensmile features

    • level = functional

    • possible values:

      • functional: aggregated over the whole utterance

      • lld: low-level descriptor: framewise

  • no_reuse: don’t re-use any feature files, but start fresh

    • no_reuse = False

  • features: disregard all other features and only use these the ones stated here.

    • features = [‘speechrate(nsyll / dur)’, ‘F0semitoneFrom27.5Hz_sma3nz_amean’]

  • needs_feature_extraction: force the features to be freshly extracted

    • needs_feature_extraction = False

  • print_feats: set this to False if you don’t want os and praat feature names to be printed out

    • print_feats = True

  • store_format: in which format to store the feature data frames [pkl | csv]

    • store_format = pkl

MODEL

Model and training specifications. In general, default values should work for classification tasks.

  • type: select the model

    • type = xgb

    • possible values:

      • xgb: XGBoost

      • xgr: XGBoost for regression

      • svm: Support vector machine

      • svr: Support vector machine for regression

      • knn: k nearest neighbors

      • knn_reg: k nearest neighbors for regression

      • tree: Decision tree

      • tree_reg: Decision tree for regression

      • nb: Naive Bayes

      • mlp: Multi-layer perceptron (neural network)

      • cnn: Convolutional neural network

      • finetune: Fine-tuning for pre-trained models:

        • pretrained_model: HF for base model

        • push_to_hub: True

        • max_duration: 8 (in seconds, resit are disgarded)

        • balancing: smote (as in FEATS, only for finetune needs to be defined here)

  • class_weight: add class_weight to the linear classifier (XGB, SVM) fit methods for imbalanced data (True or False)

    • class_weight = False

  • logo: leave-one-speaker group out. Will disregard train/dev splits and split the speakers in logo groups and then do a LOGO evaluation. If you want LOSO (leave one speaker out), simply set the number to the number of speakers.

    • logo = 10

  • k_fold_cross: k-fold-cross validation. Will disregard train/dev splits and do a stratified cross validation (meaning that classes are balanced across folds). speaker id is ignored.

    • k_fold_cross = 10

  • learning_rate: learning rate for neural networks

    • learning_rate = 0.0001

  • optimizer: optimizer type for neural networks (case insensitive)

    • optimizer = adam

    • possible values:

      • adam: Adam optimizer (default)

      • adamw: AdamW optimizer with weight decay

      • sgd: SGD optimizer with momentum

    • related parameters:

      • weight_decay: weight decay for AdamW optimizer (default: 0.01)

        • weight_decay = 0.01

      • momentum: momentum for SGD optimizer (default: 0.9)

        • momentum = 0.9

  • scheduler: learning rate scheduler for neural networks (case insensitive)

    • scheduler = cosine

    • possible values:

      • cosine: cosine annealing with linear warmup (default); steps per batch

      • step: step decay — reduces LR by gamma every step_size epochs; steps per epoch

      • exponential: exponential decay — multiplies LR by gamma each epoch; steps per epoch

      • none / false: no scheduler

    • related parameters:

      • warmup_epochs: number of warmup epochs for cosine scheduler (default: 5)

        • warmup_epochs = 5

      • scheduler.step_size: epoch interval for step scheduler (default: 10)

        • scheduler.step_size = 10

      • scheduler.gamma: decay factor for step and exponential schedulers

        • step default: 0.5; exponential default: 0.95

        • scheduler.gamma = 0.5

  • drop: dropout rate for neural networks (0 to 1)

    • drop = 0.1

  • batch_size: batch size for neural networks

    • batch_size = 8

  • loss: loss function for neural networks

    • loss = cross

    • possible values:

      • bce: BinaryCrossEntropyLoss (for binary classification)

      • cross: CrossEntropyLoss

      • f1: F1 loss

      • focal: Focal loss (for imbalanced classification)

      • 1-ccc: concordance correlation coefficient

      • mse: Mean squared error (for regression)

      • mae: Mean absolute error (for regression)

      • weighted_bce: Weighted BinaryCrossEntropyLoss (for imbalanced binary classification)

  • label_smoothing: label smoothing for cross-entropy loss. Accepts either a boolean or a float in [0.0, 1.0]. Helps prevent overconfidence and can improve generalization.

    • label_smoothing = 0.1

    • Set to True to use the default value of 0.1

    • Set to a float between 0.0 and 1.0 for a custom smoothing factor

    • Invalid or out-of-range values fall back to 0.0 with a warning

    • Default: 0.0 (no smoothing)

  • measure: A measure/metric to report progress with experiments. For classification, default is UAR. For regression, default is MSE.

    • measure = mse

    • possible values:

      • uar: Unweighted Average Recall (default for classification)

      • eer: Equal Error Rate (for binary classification, commonly used in biometric systems and deepfake detection)

      • mse: Mean Squared Error (default for regression)

      • mae: Mean Absolute Error (for regression)

      • ccc: Concordance Correlation Coefficient (for regression)

    • Note: When EER is specified, both EER and UAR will be reported

  • activation: The activation function for MLPs. One of [“relu”, “sigmoid”, “tanh”, “leaky_relu”]

    • activation = relu

  • layers: specify the layer architecture for MLP

    • layers = [64, 16]

  • C_val: regularization value for SVM

    • C_val = 1.0

  • gamma: gamma value for SVM (kernel coefficient)

    • gamma = scale

  • kernel: kernel type for SVM

    • kernel = rbf

    • possible values: linear, poly, rbf, sigmoid

  • K_val: number of neighbors for KNN

    • K_val = 5

  • weights: weight function for KNN

    • weights = uniform

    • possible values: uniform, distance

  • n_estimators: number of trees for tree-based models (XGBoost, Random Forest)

    • n_estimators = 100

  • max_depth: maximum depth of trees

    • max_depth = 6

  • subsample: subsample ratio for XGBoost

    • subsample = 1.0

  • colsample_bytree: subsample ratio of columns for XGBoost

    • colsample_bytree = 1.0

  • random_seed: random seed for reproducible results

    • random_seed = 42 # set this to False if #run > 1

  • device: device for neural network training

    • device = cpu

    • possible values: cpu, cuda

  • patience: early stopping patience for neural networks

    • patience = 5

  • save: set this to False if you don’t want models stored on disk

    • save = True

EXPL

Feature exploration and visualisation options, used by the explore module.

  • feature_distributions: plot distributions for features and analyze importance

    • feature_distributions = False

  • ignore_gender: ignore gender when plotting feature distribution

    • ignore_gender = False

  • model: Which model to use to estimate feature importance.

    • model = [‘log_reg’] # can be all models from the MODEL section, If they are combined, the mean result is used.

  • max_feats: Maximal number of important features

    • max_feats = 10

  • permutation: use feature permutation to determine the best features. Make sure to test the models before.

    • permutation = True

  • scatter: make a scatter plot of combined train and test data, colored by label.

    • scatter = [‘tsne’, ‘umap’, ‘pca’]

  • scatter.target: target for the scatter plot (defaults to target value).

    • scatter.target = [‘age’, ‘gender’, ‘likability’]

  • scatter.dim: dimension of reduction, can be 2 or 3.

    • scatter.dim = 2

  • plot_tree: Plot a decision tree for classification (Requires model = tree)

    • plot_tree = False

  • value_counts: plot distributions of target for the samples and speakers (in the image_dir)

    • value_counts = [[‘gender’], [‘age’], [‘age’, ‘duration’]]

  • column.bin_reals: If the column variable is real numbers (instead of categories), should it be binned? for any value in value_counts as well as the target variable

    • age.bin_reals = True

  • dist_type: type of plot for value counts, either histogram (hist) or density estimation (kde)

    • dist_type = kde

  • spotlight: open a web-browser window to inspect the data with the spotlight software. Needs package renumics-spotlight to be installed!

    • spotlight = False

  • shap: compute SHAP values, need to run the model first.

    • shap = False

  • print_stats: whether (possibly extensive) results from statistical tests should be printed out on the debug channel

    • print_stats = False

  • print_colvals: print the unique values for all columns in the data

    • print_colvals = False

  • plot_features: plot distributions for this features in any case, irrespective of their importance

    • plot_features = [“speechrate”, “mean_f0”]

  • regplot: do scatter plots for two features, and show categories. When two values are given, the target is used as category, else one could be stated.

    • regplot = [[“feat_a”, “feat_b”], [“feat_a”, “feat_b”, “emotion”], [“feat_a”, “feat_b”, “age”]]

PREDICT

Automatic soft-label prediction using pre-trained models (e.g., age, gender, arousal).

  • targets: Speaker/speech characteristics to be predicted by some models

    • targets = [‘text’, ‘translation’, ‘textclassification’, ‘speaker’, ‘gender’, ‘age’, ‘snr’, ‘arousal’, ‘valence’, ‘dominance’, ‘pesq’, ‘mos’]

    • textclassifier.candidates = [“sadness”, “anger”, “neutral”]: for target textclassification: the labels for the categories that should be predicted (using joeddav/xlm-roberta-large-xnli)

  • target_language: target language for the translation prediction

    • target_language = en

EXPORT

Options for exporting the dataset (audio files and annotations) to a new location or format.

  • target_root: New root directory for the database, will be created

    • target_root = ./exported_data/

  • orig_root: Path to folder that is parent to the original audio files

    • orig_root = ../data/emodb/wav

  • data_name: Name for the CSV file

    • data_name = exported_database

  • segments_as_files: Whether original files should be used, or segments split (resulting potentially in many new files).

    • segments_as_files = False

  • bundle_path: Output directory for the portable model bundle created by python -m nkululeko.bundle. Defaults to <root>/<name>/export. Overridden by the --output CLI flag.

    • bundle_path = ./my_polish_bundle

CROSSDB

Cross-database experiment settings for evaluating generalisation across datasets.

PLOT

Plot styling and output options for result figures.

  • name: special name as a prefix for all plots (stored in img_dir).

    • name = my_special_config_within_the_experiment

  • epochs: whether to make a plot each for every epoch result.

    • epochs = False

  • anim_progression: generate an animated GIF from the epoch plots

    • anim_progression = False

  • fps: frames per second for the animated GIF

    • fps = 1

  • epoch_progression: plot the progression of test, train and loss results over epochs

    • epoch_progression = False

  • best_model: search for the best performing model and plot conf matrix (needs MODEL.store to be turned on)

    • best_model = False

  • combine_per_speaker: print an extra confusion plot where the predictions per speaker are combined, with either the mode or the mean function

    • combine_per_speaker = mode

  • format: format for plots, either png or eps (for scalable graphics)

    • format = png

  • ccc: show concordance correlation coefficient in plot headings

    • ccc = False

  • fill_areas: should areas, e.g. in distribution plots, be filled?

    • fill_areas = False

  • uncertainty_threshold: plot a confusionmatrix with samples removed that are less uncertain

    • uncertainty_threshold = .6

  • runs_compare: generate plots to compare the run results: compare features, models or databases

    • runs_compare = features

  • titles: if titles should be added to the plots

    • titles = True

  • kind: kind of plot for EXPL.feature distributions: [violin, bar, box, swarm, strip]

    • kind = violin

RESAMPLE

Audio resampling settings for converting sample rates across a dataset.

  • replace: whether samples should be replaced right where they are, or copies done and a new dataframe given

    • replace = False

  • target: the name of the new dataframe, if replace==false

    • target = data_resampled.csv

REPORT

Controls how experiment results are collected, displayed, and persisted.

  • show: print the report at the end

    • show = False

  • fresh: start a new report

  • latex: generate a latex and PDF document: name of document

    • latex = my_latex_document

  • title: title for document

  • author: author for document

OPTIM

Hyperparameter optimisation settings for automated model tuning.

  • model: the model type to optimize (e.g., ‘mlp’, ‘svm’, ‘xgb’)

    • model = mlp

  • search_strategy: intelligent search strategy for faster optimization

    • search_strategy = random

    • possible values:

      • grid: exhaustive grid search (default, slowest but thorough)

      • random: random search with n_iter samples (faster, often as good as grid)

      • halving_random: successive halving random search (fastest, requires sklearn >= 0.24)

      • halving_grid: successive halving grid search (compromise between speed and thoroughness)

  • metric: evaluation metric for optimization

    • metric = uar

    • possible values:

      • uar: Unweighted Average Recall (balanced accuracy, good for imbalanced datasets)

      • accuracy: Standard accuracy (default)

      • f1: Macro-averaged F1-score (balance of precision and recall)

      • precision: Macro-averaged precision

      • recall: Macro-averaged recall

      • sensitivity: Sensitivity (same as recall)

      • specificity: Specificity (true negative rate)

  • n_iter: number of parameter combinations to try for random search

    • n_iter = 50

  • cv_folds: number of cross-validation folds for hyperparameter evaluation

    • cv_folds = 3

  • Parameter specifications: Define search spaces for hyperparameters using tuples for ranges and lists for discrete choices

    • nlayers: number of hidden layers for neural networks

      • nlayers = (1, 3) # search from 1 to 3 layers

    • nnodes: number of nodes per layer for neural networks

      • nnodes = (16, 256) # search powers of 2 from 16 to 256

    • lr: learning rate for neural networks

      • lr = [0.0001, 0.001, 0.01, 0.1] # discrete log-scale choices (recommended)

      • lr = (0.0001, 0.01) # or range with automatic log-scale sampling

    • bs: batch size for neural networks

      • bs = (2, 256) # search powers of 2 from 2 to 256

    • loss: loss function for neural networks

      • loss = [“cross”, “f1”] # discrete choices

    • do: dropout rate for neural networks

      • do = (0.1, 0.5, 0.1) # search from 0.1 to 0.5 with step 0.1

    • Traditional ML parameters: For SVM, XGB, etc., use parameter names from sklearn

      • C = [0.1, 1.0, 10.0] # SVM regularization parameter

      • n_estimators = [50, 100, 200] # XGB number of estimators

      • max_depth = [3, 6, 9] # XGB maximum depth

Parameter specification formats:

  • (min, max): Range with automatic step selection based on parameter type

    • For learning rates: uses logarithmic sampling (5-8 values)

    • For dropout: uses linear sampling (5 values)

    • For integers: uses linear sampling

  • (min, max, step): Range with explicit step size

  • [val1, val2, …]: Discrete list of values to try (recommended for most cases)

  • value: Single value (equivalent to [value])

Recommended parameter ranges:

  • Learning rate: [0.0001, 0.001, 0.01, 0.1] (log-scale discrete values)

  • Dropout: [0.1, 0.3, 0.5, 0.7] (common dropout rates)

  • SVM C: [0.1, 1.0, 10.0, 100.0] (regularization parameter)

  • XGB n_estimators: [50, 100, 200] (number of trees)

  • XGB max_depth: [3, 6, 9, 12] (tree depth)

Usage: Run with python3 -m nkululeko.optim --config exp.ini

FLAGS

Running different values at once. All listed parameters are combined via Cartesian product — one experiment is run per combination. Example:

  • models = [‘xgb’, ‘svm’]

  • features = [‘praat’, ‘os’]

  • balancing = [‘none’, ‘ros’, ‘smote’]

  • scale = [‘none’, ‘standard’, ‘robust’, ‘minmax’]

  • name_target = list of (EXP.name, DATA.target) pairs iterated as a unit

    • Each pair sets EXP.name and DATA.target together for one experiment slot, then that slot is combined via product with any other FLAGS parameters.

    • Label DataFrames are reloaded per pair; audio features are extracted once and reused across all pairs.

    • example:

      name_target = [("grade", "grade"), ("roughness", "roughness"), ("strain", "strain")]
      models = ['xgb', 'mlp']
      

      → runs 3 × 2 = 6 experiments

    The FLAGS mechanism can also drive the explore module (feature analysis / visualisation) instead of model training. Pass --mod explore on the command line:

    python -m nkululeko.flags --config exp.ini --mod explore
    

    No result score is reported; output plots are stored per experiment under {EXP.root}/{EXP.name}/images/.