Architecture and Components
This page describes the architecture and main components of Nkululeko, providing an overview of how the different parts of the system work together.
Main Files and Directories
nkululeko/: Root directory containing the main package codenkululeko/nkululeko.py: Main entry point for running experimentsnkululeko/experiment.py: Core Experiment class implementationnkululeko/feature_extractor.py: Handles feature extraction orchestrationnkululeko/runmanager.py: Manages experiment runs and epochsnkululeko/data/: Dataset handling implementationsnkululeko/feat_extract/: Feature extraction implementationsnkululeko/models/: Machine learning model implementationsnkululeko/reporting/: Reporting and visualization toolsnkululeko/utils/: Utility functions and helpersnkululeko/explore.py,nkululeko/predict.py, etc.: Command-line module entry points
tests/: Contains configuration files for testingdata/: Storage for datasets or symbolic links to datasets.github/workflows/: CI/CD configuration files
Main Classes and Components
Class Diagram
For a quick visual overview of how the core pieces interact, see the class diagram. It mirrors the components described below and can help when navigating the codebase.
Experiment
The central class that manages the entire experiment lifecycle:
# nkululeko/experiment.py
class Experiment:
def __init__(self, config_obj):
# Initialize experiment from configuration
def load_datasets(self):
# Load datasets specified in configuration
def fill_train_and_tests(self):
# Split data into training and testing sets
def extract_feats(self):
# Extract features from audio files
def run(self):
# Execute the experiment runs
FeatureExtractor
Orchestrates feature extraction from audio:
# nkululeko/feature_extractor.py
class FeatureExtractor:
def __init__(self, data_df, feats_types, data_name, feats_designation):
# Initialize feature extractor
def extract(self):
# Extract features from audio files
def extract_sample(self, signal, sr):
# Extract features from a single audio sample
Runmanager
Manages multiple runs of an experiment:
# nkululeko/runmanager.py
class Runmanager:
def __init__(self, df_train, df_test, feats_train, feats_test, dev_x=None, dev_y=None):
# Initialize run manager
def do_runs(self):
# Execute multiple experiment runs
Model
Base class for machine learning models:
# nkululeko/models/model.py
class Model:
def __init__(self, df_train, df_test, feats_train, feats_test):
# Initialize model
def train(self):
# Train the model
def predict(self):
# Generate predictions
Reporter
Generates reports and visualizations:
# nkululeko/reporting/reporter.py
class Reporter:
def __init__(self, truths, preds, run, epoch, probas=None):
# Initialize reporter
def plot_confmatrix(self, plot_name, epoch=None):
# Plot confusion matrix
def print_results(self, epoch=None, file_name=None):
# Print evaluation results
Command-Line Modules
Each module provides specific functionality:
nkululeko.nkululeko: Main experiment runnernkululeko.explore: Data and feature explorationnkululeko.predict: Unified prediction (single files, folder, CSV list, microphone; feature extractors, autopredict targets, or a trained model)nkululeko.augment: Data augmentationnkululeko.ensemble: Model ensemble creationnkululeko.multidb: Cross-database experimentationnkululeko.segment: Audio segmentation
Data Flow
The user creates an INI configuration file specifying the experiment parameters
The
Experimentclass loads the configuration and initializes the experimentDatasets are loaded and split into training and testing sets
Features are extracted from the audio files using the specified feature extractors
The
Runmanagerexecutes multiple runs of the experimentModels are trained and evaluated
Results are reported and visualized
This architecture allows for a high degree of flexibility and extensibility, enabling users to experiment with different combinations of datasets, features, and models without having to write extensive code.