streamline.postanalysis.dataset_compare module

class streamline.postanalysis.dataset_compare.CompareJob(output_path=None, experiment_name=None, experiment_path=None, algorithms=None, exclude=('XCS', 'eLCS'), class_label='Class', instance_label=None, sig_cutoff=0.05, show_plots=False)[source]

Bases: Job

This ‘Job’ script is called by DataCompareMain.py which runs non-parametric statistical analysis comparing ML algorithm performance between all target datasets included in the original Phase 1 data folder, for each evaluation metric. Also compares the best overall model for each target dataset, for each evaluation metric. This runs once for the entire pipeline analysis.

best_kruscall_wallis()[source]: For best performing algorithm on a given metric and dataset, apply non-parametric Kruskal Wallis one-way ANOVA on ranks. Determines if there is a statistically significant difference in performance between original target datasets across CV runs on best algorithm for given metric.

best_mann_whitney_u(global_data)[source]: For best performing algorithm on a given metric and dataset, apply non-parametric Mann Whitney U-test (pairwise comparisons). Mann Whitney tests dataset pairs (for each metric) to determine if there is a statistically significant difference in performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.

best_wilcoxon_rank(global_data)[source]: For best performing algorithm on a given metric and dataset, apply non-parametric Mann Whitney U-test (pairwise comparisons). Mann Whitney tests dataset pairs (for each metric) to determine if there is a statistically significant difference in performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.

data_compare_bp()[source]: Generate a boxplot comparing average algorithm performance (for a given target metric) across all target datasets to be compared.

data_compare_bp_all()[source]: Generate a boxplot comparing algorithm performance (CV average of each target metric) across all target datasets to be compared.

inter_set_best_fn(fn, global_data)[source]

inter_set_fn(fn, algorithm)[source]

kruscall_wallis()[source]: For each algorithm apply non-parametric Kruskal Wallis one-way ANOVA on ranks. Determines if there is a statistically significant difference in performance between original target datasets across CV runs. Completed for each standard metric separately.

mann_whitney_u()[source]: For each algorithm, apply non-parametric Mann Whitney U-test (pairwise comparisons). Mann Whitney tests dataset pairs (for each metric) to determine if there is a statistically significant difference in performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.

run()[source]

save_runtime()[source]: Save phase runtime

temp_summary(set1, set2, x, y, metric, fn)[source]

wilcoxon_rank()[source]: For each algorithm, apply non-parametric Wilcoxon Rank Sum (pairwise comparisons). This tests individual algorithm pairs of original target datasets (for each metric) to determine if there is a statistically significant difference in performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.