Nkululeko Overview
Nkululeko is a Python-based framework designed for speaker characteristics detection from audio data. Its primary purpose is to enable researchers and engineers to train, evaluate, and deploy machine learning models for detecting characteristics such as emotion, age, gender, or speech disorders from audio samples.
Purpose and Capabilities
The framework provides a high-level interface that allows users with limited programming experience to configure and run experiments through INI configuration files rather than writing extensive code. It enables users to:
Load and preprocess audio data from various sources
Extract a wide range of acoustic features using multiple techniques
Train different machine learning models for classification or regression tasks
Evaluate model performance using appropriate metrics
Generate reports and visualizations — confusion matrices, per-class text reports, and prediction CSVs
Test a trained model on a new database automatically: when
DATA.testsis set in the configuration and a saved experiment already exists,nkululeko.nkululekoskips training and evaluates the best stored model on the new database directly (see test_new_database.md)Predict labels for arbitrary audio using the unified
nkululeko.predictmodule — supports single files, folders, CSV lists, and live microphone input, using either built-in autopredict targets (age, gender, emotion, SNR, …) or the best model from a trained experiment (see predict.md)Investigate and mitigate potential biases in training data
Nkululeko serves speech processing researchers, machine learning practitioners, and developers working on audio-based applications, allowing them to focus on experimentation rather than implementation details.
Core Systems
The project is organized around a modular architecture with several core systems:
Experiment Framework: The central system that orchestrates the entire experiment lifecycle.
Data Processing: Handles loading, filtering, and splitting datasets.
Feature Extraction: Extracts acoustic features from audio using various methods.
Model Training: Trains and evaluates machine learning models.
Reporting: Generates visualization and performance reports.
Command-Line Interface: Provides multiple entry points for different functionalities.
Target Audience
Nkululeko is designed for:
Speech processing researchers
Machine learning practitioners
Audio application developers
Students learning about speech processing
Anyone interested in speaker characteristics detection without extensive programming knowledge
By providing a high-level interface through configuration files, Nkululeko makes sophisticated audio analysis accessible to users with varying levels of technical expertise.