streamline.postanalysis.dataset_compare module
- class streamline.postanalysis.dataset_compare.CompareJob(output_path=None, experiment_name=None, experiment_path=None, algorithms=None, exclude=('XCS', 'eLCS'), class_label='Class', instance_label=None, sig_cutoff=0.05, show_plots=False)[source]
Bases:
Job
This ‘Job’ script is called by DataCompareMain.py which runs non-parametric statistical analysis comparing ML algorithm performance between all target datasets included in the original Phase 1 data folder, for each evaluation metric. Also compares the best overall model for each target dataset, for each evaluation metric. This runs once for the entire pipeline analysis.
- best_kruscall_wallis()[source]
For best performing algorithm on a given metric and dataset, apply non-parametric Kruskal Wallis one-way ANOVA on ranks. Determines if there is a statistically significant difference in performance between original target datasets across CV runs on best algorithm for given metric.
- best_mann_whitney_u(global_data)[source]
For best performing algorithm on a given metric and dataset, apply non-parametric Mann Whitney U-test (pairwise comparisons). Mann Whitney tests dataset pairs (for each metric) to determine if there is a statistically significant difference in performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.
- best_wilcoxon_rank(global_data)[source]
For best performing algorithm on a given metric and dataset, apply non-parametric Mann Whitney U-test (pairwise comparisons). Mann Whitney tests dataset pairs (for each metric) to determine if there is a statistically significant difference in performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.
- data_compare_bp()[source]
Generate a boxplot comparing average algorithm performance (for a given target metric) across all target datasets to be compared.
- data_compare_bp_all()[source]
Generate a boxplot comparing algorithm performance (CV average of each target metric) across all target datasets to be compared.
- kruscall_wallis()[source]
For each algorithm apply non-parametric Kruskal Wallis one-way ANOVA on ranks. Determines if there is a statistically significant difference in performance between original target datasets across CV runs. Completed for each standard metric separately.
- mann_whitney_u()[source]
For each algorithm, apply non-parametric Mann Whitney U-test (pairwise comparisons). Mann Whitney tests dataset pairs (for each metric) to determine if there is a statistically significant difference in performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.
- wilcoxon_rank()[source]
For each algorithm, apply non-parametric Wilcoxon Rank Sum (pairwise comparisons). This tests individual algorithm pairs of original target datasets (for each metric) to determine if there is a statistically significant difference in performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.