streamline.runners.replicate_runner module

class streamline.runners.replicate_runner.ReplicationRunner(rep_data_path, dataset_for_rep, output_path, experiment_name, class_label=None, instance_label=None, match_label=None, algorithms=None, load_algo=True, exclude=('XCS', 'eLCS'), exclude_plots=None, run_cluster=False, queue='defq', reserved_memory=4, show_plots=False)[source]

Bases: object

Phase 9 of STREAMLINE (Optional)- This ‘Main’ script manages Phase 9 run parameters, and submits job to run locally (to run serially) or on cluster (parallelized).

Parameters:
  • rep_data_path – path to directory containing replication or hold-out testing datasets (must have at least all features with same labels as in original training dataset)

  • dataset_for_rep – path to target original training dataset

  • output_path – path to output directory

  • experiment_name – name of experiment (no spaces)

  • match_label – applies if original training data included column with matched instance ids, default=None

  • exclude_plots – analysis to exclude from outputs, possible options given below. export_feature_correlations, run and export feature correlation analysis (yields correlation heatmap), default=True

  • plot_roc

  • averages (Plot PRC curves individually for each algorithm including all CV results and) –

  • default=True

  • plot_prc

  • averages

  • default=True

  • plot_metric_boxplots

  • metric (Plot box plot summaries comparing algorithms for each) –

  • default=True

get_algorithms()[source]
get_cluster_params(dataset_filename)[source]
run(run_parallel=False)[source]
save_metadata()[source]
submit_lsf_cluster_job(dataset_filename)[source]
submit_slurm_cluster_job(dataset_filename)[source]