Nkululeko Overview

Nkululeko is a Python-based framework designed for speaker characteristics detection from audio data. Its primary purpose is to enable researchers and engineers to train, evaluate, and deploy machine learning models for detecting characteristics such as emotion, age, gender, or speech disorders from audio samples.

Purpose and Capabilities

The framework provides a high-level interface that allows users with limited programming experience to configure and run experiments through INI configuration files rather than writing extensive code. It enables users to:

  1. Load and preprocess audio data from various sources

  2. Extract a wide range of acoustic features using multiple techniques

  3. Train different machine learning models for classification or regression tasks

  4. Evaluate model performance using appropriate metrics

  5. Generate reports and visualizations — confusion matrices, per-class text reports, and prediction CSVs

  6. Test a trained model on a new database automatically: when DATA.tests is set in the configuration and a saved experiment already exists, nkululeko.nkululeko skips training and evaluates the best stored model on the new database directly (see test_new_database.md)

  7. Predict labels for arbitrary audio using the unified nkululeko.predict module — supports single files, folders, CSV lists, and live microphone input, using either built-in autopredict targets (age, gender, emotion, SNR, …) or the best model from a trained experiment (see predict.md)

  8. Investigate and mitigate potential biases in training data

Nkululeko serves speech processing researchers, machine learning practitioners, and developers working on audio-based applications, allowing them to focus on experimentation rather than implementation details.

Core Systems

The project is organized around a modular architecture with several core systems:

  1. Experiment Framework: The central system that orchestrates the entire experiment lifecycle.

  2. Data Processing: Handles loading, filtering, and splitting datasets.

  3. Feature Extraction: Extracts acoustic features from audio using various methods.

  4. Model Training: Trains and evaluates machine learning models.

  5. Reporting: Generates visualization and performance reports.

  6. Command-Line Interface: Provides multiple entry points for different functionalities.

Target Audience

Nkululeko is designed for:

  • Speech processing researchers

  • Machine learning practitioners

  • Audio application developers

  • Students learning about speech processing

  • Anyone interested in speaker characteristics detection without extensive programming knowledge

By providing a high-level interface through configuration files, Nkululeko makes sophisticated audio analysis accessible to users with varying levels of technical expertise.