streamline.featurefns.importance module
- class streamline.featurefns.importance.FeatureImportance(cv_train_path, experiment_path, class_label, instance_label=None, instance_subset=2000, algorithm='MS', use_turf=True, turf_pct=True, random_state=None, n_jobs=None)[source]
Bases:
Job
Initializer for Feature Importance Job
- Parameters:
cv_train_path – path for the cross-validation dataset created
experiment_path –
class_label –
instance_label –
instance_subset –
algorithm –
use_turf –
turf_pct –
random_state –
n_jobs –
- pickle_scores(output_name, scores, score_dict, score_sorted_features)[source]
Pickle the scores, score dictionary and features sorted by score to be used primarily in phase 4 (feature selection) of pipeline
- prepare_data()[source]
Loads target cv training dataset, separates class from features and removes instance labels.
- run()[source]
Run all elements of the feature importance evaluation: applies either mutual information and multisurf and saves a sorted dictionary of features with associated scores
- run_multi_surf()[source]
Run multiSURF (a Relief-based feature importance algorithm able to detect both univariate and interaction effects) and return scores as well as file path/name information
- run_mutual_information()[source]
Run mutual information on target training dataset and return scores as well as file path/name information.
- sort_save_fi_scores(scores, ordered_feature_names, alg_name)[source]
Creates a feature score dictionary and a dictionary sorted by decreasing feature importance scores.
- Parameters:
scores –
ordered_feature_names –
alg_name –
Returns: score_dict, score_sorted_features - dictionary of scores and score sorted name of features