nkululeko.explore
The nkululeko.explore module focuses on exploratory data analysis and feature examination without full model training (unless requested). Use it to inspect distributions, correlations, and dimensionality reduction plots early.
Features
Feature importance (tree / xgb model quick fit).
Feature distributions per category (with statistical tests like Mann-Whitney, t-test, Levene).
Scatter plots (PCA, t-SNE, UMAP) for feature space structure.
Regression plots (
regplot) between feature pairs (categorical or continuous targets).Bias / correlation plots between automatically predicted properties and target labels.
Invocation
python -m nkululeko.explore --config examples/exp_emodb_explore_features.ini
Minimal INI Example
[EXP]
name = results/exp_explore
[DATA]
databases = ['emodb']
emodb = ./data/emodb/emodb.csv
target = emotion
[FEATS]
type = ['praat']
scale = standard
[EXPL]
feature_distributions = all
scatter = ['pca','tsne']
regplot = [['duration','meanF0Hz']]
print_stats = True
Outputs
Plots saved to results/<exp>/images/ (feat_importance, feat_dist, regplot_*, tsne.png, pca.png, etc.). Statistical summaries in log output.
Tips
Use
max_featsto limit heavy plots on large feature sets.Enable
print_statsfor regression plot statistical output.Combine scatter methods (
pca,tsne,umap) for complementary structure views.