streamline.postanalysis.statistics module

class streamline.postanalysis.statistics.StatsJob(full_path, algorithms, class_label, instance_label, scoring_metric='balanced_accuracy', cv_partitions=5, top_features=40, sig_cutoff=0.05, metric_weight='balanced_accuracy', scale_data=True, exclude_plots=None, show_plots=False)[source]

Bases: Job

This ‘Job’ script creates summaries of ML classification evaluation statistics (means and standard deviations), ROC and PRC plots (comparing CV performance in the same ML algorithm and comparing average performance between ML algorithms), model feature importance averages over CV runs, boxplot comparing ML algorithms for each metric, Kruskal Wallis and Mann Whitney statistical comparisons between ML algorithms, model feature importance boxplot for each algorithm, and composite feature importance plots summarizing model feature importance across all ML algorithms. It is run for a single dataset from the original target dataset folder (data_path) in Phase 1 (i.e. stats summary completed for all cv datasets).

Parameters:

full_path
algorithms
class_label
instance_label
scoring_metric
cv_partitions
top_features
sig_cutoff
metric_weight
scale_data
show_plots

composite_fi_plot(fi_list, all_feature_list_to_viz, fig_name, y_label_text, metric_ranking, metric_weighting)[source]: Generate composite feature importance plot given list of feature names and associated feature importance scores for each algorithm. This is run for different transformations of the normalized feature importance scores.

do_fi_boxplots(fi_df_list, fi_med_list, metric_ranking)[source]: Generate individual feature importance boxplot for each algorithm

do_fi_histogram(fi_med_list, metric_ranking)[source]: Generate histogram showing distribution of median feature importance scores for each algorithm.

do_model_prc(algorithm, precs, praucs, mean_recall, alg_result_table, rep_data=None, replicate=False)[source]

do_model_roc(algorithm, tprs, aucs, mean_fpr, alg_result_table)[source]

do_plot_prc(result_table, rep_data=None, replicate=False)[source]: Generate PRC plot comparing average ML algorithm performance (over all CV training/testing sets)

do_plot_roc(result_table)[source]: Generate ROC plot comparing average ML algorithm performance (over all CV training/testing sets)

fi_stats(metric_dict)[source]

static frac_fi(top_fi_med_norm_list)[source]: Transforms feature scores so that they sum to 1 over all features for a given algorithm. This way the normalized and fracionated composit bar plot offers equal total bar area for every algorithm. The intuition here is that if an algorithm gives the same FI scores for all top features it won’t be overly represented in the resulting plot (i.e. all features can have the same maximum feature importance which might lead to the impression that an algorithm is working better than it is.) Instead, that maximum ‘bar-real-estate’ has to be divided by the total number of features. Notably, this transformation has the potential to alter total algorithm FI bar height ranking of features.

get_fi_to_viz_sorted(features_to_viz, all_feature_list, fi_med_norm_list)[source]: Takes a list of top features names for visualisation, gets their indexes. In every composite FI plot features are ordered the same way they are selected for visualisation (i.e. normalized and performance weighted). Because of this feature bars are only perfectly ordered in descending order for the normalized + performance weighted composite plot.

kruskal_wallis(metrics, metric_dict)[source]: Apply non-parametric Kruskal Wallis one-way ANOVA on ranks. Determines if there is a statistically significant difference in algorithm performance across CV runs. Completed for each standard metric separately.

mann_whitney_u(metrics, metric_dict, kruskal_summary)[source]: Apply non-parametric Mann Whitney U-test (pairwise comparisons). If a significant Kruskal Wallis algorithm difference was found for a given metric, Mann Whitney tests individual algorithm pairs to determine if there is a statistically significant difference in algorithm performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.

metric_boxplots(metrics, metric_dict)[source]: Export boxplots comparing algorithm performance for each standard metric

parse_runtime()[source]: Loads runtime summaries from entire pipeline and parses them into a single summary file.

prep_fi(metric_dict, metric_ranking, metric_weighting)[source]: Organizes and prepares model feature importance data for boxplot and composite feature importance figure generation.

preparation()[source]: Creates directory for all results files, decodes included ML modeling algorithms that were run

primary_stats(master_list=None, rep_data=None)[source]: Combine classification metrics and model feature importance scores as well as ROC and PRC plot data across all CV datasets. Generate ROC and PRC plots comparing separate CV models for each individual modeling algorithm.

run()[source]

save_fi(fi_all, algorithm, global_feature_list)[source]: Creates directory to store model feature importance results and, for each algorithm, exports a file of feature importance scores from each CV.

save_metric_stats(metrics, metric_dict)[source]: Exports csv file with mean, median and std dev metric values (over all CVs) for each ML modeling algorithm

save_runtime()[source]: Save phase runtime

select_for_composite_viz(non_zero_union_features, non_zero_union_indexes, ave_metric_list, fi_ave_norm_list)[source]: Identify list of top features over all algorithms to visualize (note that best features to visualize are chosen using algorithm performance weighting and normalization: frac plays no useful role here only for viz). All features included if there are fewer than ‘top_model_features’. Top features are determined by the sum of performance (i.e. balanced accuracy) weighted feature importance over all algorithms.

static weight_fi(med_metric_list, top_fi_med_norm_list)[source]: Weights the feature importance scores by algorithm performance (intuitive because when interpreting feature importances we want to place more weight on better performing algorithms)

static weight_frac_fi(frac_lists, weights)[source]: Weight normalized and fractionated feature importances.

wilcoxon_rank(metrics, metric_dict, kruskal_summary)[source]: Apply non-parametric Wilcoxon signed-rank test (pairwise comparisons). If a significant Kruskal Wallis algorithm difference was found for a given metric, Wilcoxon tests individual algorithm pairs to determine if there is a statistically significant difference in algorithm performance across CV runs. Test statistic will be zero if all scores from one set are larger than the other.