streamline.runners.model_runner module
- class streamline.runners.model_runner.ModelExperimentRunner(output_path, experiment_name, algorithms=None, exclude=('XCS', 'eLCS'), class_label='Class', instance_label=None, scoring_metric='balanced_accuracy', metric_direction='maximize', training_subsample=0, use_uniform_fi=True, n_trials=200, timeout=900, save_plots=False, do_lcs_sweep=False, lcs_nu=1, lcs_n=2000, lcs_iterations=200000, lcs_timeout=1200, resubmit=False, random_state=None, n_jobs=None, run_cluster=False, queue='defq', reserved_memory=4)[source]
Bases:
object
Runner Class for running all the model jobs for cross-validation splits.
- Parameters:
output_path – path to output directory
experiment_name – name of experiment (no spaces)
algorithms – list of str of ML models to run
scoring_metric – primary scikit-learn specified scoring metric used for hyperparameter optimization and permutation-based model feature importance evaluation, default=’balanced_accuracy’
metric_direction – direction to optimize the scoring metric in optuna, either ‘maximize’ or ‘minimize’, default=’maximize’
training_subsample – for long running algos (XGB,SVM,ANN,KNN), option to subsample training set (0 for no subsample, default=0)
use_uniform_fi – overrides use of any available feature importance estimate methods from models, instead using permutation_importance uniformly, default=True
n_trials – number of bayesian hyperparameter optimization trials using optuna (specify an integer or None) default=200
timeout – seconds until hyperparameter sweep stops running new trials (Note: it may run longer to finish last trial started) If set to None, STREAMLINE is completely replicable, but will take longer to run default=900 i.e. 900 sec = 15 minutes default save_plots: export optuna-generated hyperparameter sweep plots, default False
do_lcs_sweep – do LCS hyper-param tuning or use below params, default=False
lcs_nu – fixed LCS nu param (recommended range 1-10), set to larger value for data with less or no noise, default=1
lcs_iterations – fixed LCS number of learning iterations param, default=200000
lcs_n – fixed LCS rule population maximum size param, default=2000
lcs_timeout – seconds until hyperparameter sweep stops for LCS algorithms, default=1200