streamline.featurefns.importance module

class streamline.featurefns.importance.FeatureImportance(cv_train_path, experiment_path, class_label, instance_label=None, instance_subset=2000, algorithm='MS', use_turf=True, turf_pct=True, random_state=None, n_jobs=None)[source]

Bases: Job

Initializer for Feature Importance Job

Parameters:

cv_train_path – path for the cross-validation dataset created
experiment_path
class_label
instance_label
instance_subset
algorithm
use_turf
turf_pct
random_state
n_jobs

pickle_scores(output_name, scores, score_dict, score_sorted_features)[source]: Pickle the scores, score dictionary and features sorted by score to be used primarily in phase 4 (feature selection) of pipeline

prepare_data()[source]: Loads target cv training dataset, separates class from features and removes instance labels.

run()[source]: Run all elements of the feature importance evaluation: applies either mutual information and multisurf and saves a sorted dictionary of features with associated scores

run_multi_surf()[source]: Run multiSURF (a Relief-based feature importance algorithm able to detect both univariate and interaction effects) and return scores as well as file path/name information

run_mutual_information()[source]: Run mutual information on target training dataset and return scores as well as file path/name information.

save_runtime(output_name)[source]: Save phase runtime :param output_name: name of the output tag

sort_save_fi_scores(scores, ordered_feature_names, alg_name)[source]

Creates a feature score dictionary and a dictionary sorted by decreasing feature importance scores.

Parameters:

scores
ordered_feature_names
alg_name

Returns: score_dict, score_sorted_features - dictionary of scores and score sorted name of features