streamline.runners.feature_runner module
- class streamline.runners.feature_runner.FeatureImportanceRunner(output_path, experiment_name, class_label='Class', instance_label=None, instance_subset=None, algorithms=('MI', 'MS'), use_turf=True, turf_pct=True, random_state=None, n_jobs=None, run_cluster=False, queue='defq', reserved_memory=4)[source]
Bases:
object
Runner Class for running feature importance jobs for cross-validation splits.
- Parameters:
output_path –
experiment_name –
class_label –
instance_label –
instance_subset –
algorithms –
use_turf –
turf_pct –
random_state –
n_jobs –
Returns: None
- class streamline.runners.feature_runner.FeatureSelectionRunner(output_path, experiment_name, algorithms, class_label='Class', instance_label=None, max_features_to_keep=2000, filter_poor_features=True, top_features=40, export_scores=True, overwrite_cv=True, random_state=None, n_jobs=None, run_cluster=False, queue='defq', reserved_memory=4, show_plots=False)[source]
Bases:
object
Runner Class for running feature selection jobs for cross-validation splits.
- Parameters:
output_path – path other the output folder
experiment_name – name for the current experiment
algorithms – feature selection algorithms from last phase
max_features_to_keep – max features to keep (only applies if filter_poor_features is True), default=2000
filter_poor_features – filter out the worst performing features prior to modeling,default=’True’
top_features – number of top features to illustrate in figures, default=40)
export_scores – export figure summarizing average fi scores over cv partitions, default=’True’
overwrite_cv – overwrites working cv datasets with new feature subset datasets,default=”True”
random_state – random seed for reproducibility
n_jobs – n_jobs param for multiprocessing
Returns: None