Architecture and Components

This page describes the architecture and main components of Nkululeko, providing an overview of how the different parts of the system work together.

Main Files and Directories

nkululeko/: Root directory containing the main package code
- nkululeko/nkululeko.py: Main entry point for running experiments
- nkululeko/experiment.py: Core Experiment class implementation
- nkululeko/feature_extractor.py: Handles feature extraction orchestration
- nkululeko/runmanager.py: Manages experiment runs and epochs
- nkululeko/data/: Dataset handling implementations
- nkululeko/feat_extract/: Feature extraction implementations
- nkululeko/models/: Machine learning model implementations
- nkululeko/reporting/: Reporting and visualization tools
- nkululeko/utils/: Utility functions and helpers
- nkululeko/explore.py, nkululeko/predict.py, etc.: Command-line module entry points
tests/: Contains configuration files for testing
data/: Storage for datasets or symbolic links to datasets
.github/workflows/: CI/CD configuration files

Main Classes and Components

Class Diagram

For a quick visual overview of how the core pieces interact, see the class diagram. It mirrors the components described below and can help when navigating the codebase.

Experiment

The central class that manages the entire experiment lifecycle:

# nkululeko/experiment.py
class Experiment:
    def __init__(self, config_obj):
        # Initialize experiment from configuration
    def load_datasets(self):
        # Load datasets specified in configuration
    def fill_train_and_tests(self):
        # Split data into training and testing sets
    def extract_feats(self):
        # Extract features from audio files
    def run(self):
        # Execute the experiment runs

FeatureExtractor

Orchestrates feature extraction from audio:

# nkululeko/feature_extractor.py
class FeatureExtractor:
    def __init__(self, data_df, feats_types, data_name, feats_designation):
        # Initialize feature extractor
    def extract(self):
        # Extract features from audio files
    def extract_sample(self, signal, sr):
        # Extract features from a single audio sample

Runmanager

Manages multiple runs of an experiment:

# nkululeko/runmanager.py
class Runmanager:
    def __init__(self, df_train, df_test, feats_train, feats_test, dev_x=None, dev_y=None):
        # Initialize run manager
    def do_runs(self):
        # Execute multiple experiment runs

Model

Base class for machine learning models:

# nkululeko/models/model.py
class Model:
    def __init__(self, df_train, df_test, feats_train, feats_test):
        # Initialize model
    def train(self):
        # Train the model
    def predict(self):
        # Generate predictions

Reporter

Generates reports and visualizations:

# nkululeko/reporting/reporter.py
class Reporter:
    def __init__(self, truths, preds, run, epoch, probas=None):
        # Initialize reporter
    def plot_confmatrix(self, plot_name, epoch=None):
        # Plot confusion matrix
    def print_results(self, epoch=None, file_name=None):
        # Print evaluation results

Command-Line Modules

Each module provides specific functionality:

nkululeko.nkululeko: Main experiment runner
nkululeko.explore: Data and feature exploration
nkululeko.predict: Unified prediction (single files, folder, CSV list, microphone; feature extractors, autopredict targets, or a trained model)
nkululeko.augment: Data augmentation
nkululeko.ensemble: Model ensemble creation
nkululeko.multidb: Cross-database experimentation
nkululeko.segment: Audio segmentation

Data Flow

The user creates an INI configuration file specifying the experiment parameters
The Experiment class loads the configuration and initializes the experiment
Datasets are loaded and split into training and testing sets
Features are extracted from the audio files using the specified feature extractors
The Runmanager executes multiple runs of the experiment
Models are trained and evaluated
Results are reported and visualized

This architecture allows for a high degree of flexibility and extensibility, enabling users to experiment with different combinations of datasets, features, and models without having to write extensive code.