streamline.runners.feature_runner module

class streamline.runners.feature_runner.FeatureImportanceRunner(output_path, experiment_name, class_label='Class', instance_label=None, instance_subset=None, algorithms=('MI', 'MS'), use_turf=True, turf_pct=True, random_state=None, n_jobs=None, run_cluster=False, queue='defq', reserved_memory=4)[source]

Bases: object

Runner Class for running feature importance jobs for cross-validation splits.

Parameters:
  • output_path

  • experiment_name

  • class_label

  • instance_label

  • instance_subset

  • algorithms

  • use_turf

  • turf_pct

  • random_state

  • n_jobs

Returns: None

get_cluster_params(cv_train_path, experiment_path, algorithm)[source]
run(run_parallel=False)[source]
save_metadata()[source]
submit_lsf_cluster_job(cv_train_path, experiment_path, algorithm)[source]
submit_slurm_cluster_job(cv_train_path, experiment_path, algorithm)[source]
class streamline.runners.feature_runner.FeatureSelectionRunner(output_path, experiment_name, algorithms, class_label='Class', instance_label=None, max_features_to_keep=2000, filter_poor_features=True, top_features=40, export_scores=True, overwrite_cv=True, random_state=None, n_jobs=None, run_cluster=False, queue='defq', reserved_memory=4, show_plots=False)[source]

Bases: object

Runner Class for running feature selection jobs for cross-validation splits.

Parameters:
  • output_path – path other the output folder

  • experiment_name – name for the current experiment

  • algorithms – feature selection algorithms from last phase

  • max_features_to_keep – max features to keep (only applies if filter_poor_features is True), default=2000

  • filter_poor_features – filter out the worst performing features prior to modeling,default=’True’

  • top_features – number of top features to illustrate in figures, default=40)

  • export_scores – export figure summarizing average fi scores over cv partitions, default=’True’

  • overwrite_cv – overwrites working cv datasets with new feature subset datasets,default=”True”

  • random_state – random seed for reproducibility

  • n_jobs – n_jobs param for multiprocessing

Returns: None

get_cluster_params(full_path, n_datasets)[source]
run(run_parallel=False)[source]
save_metadata()[source]
submit_lsf_cluster_job(full_path, n_datasets)[source]
submit_slurm_cluster_job(full_path, n_datasets)[source]