streamline.featurefns.importance module

class streamline.featurefns.importance.FeatureImportance(cv_train_path, experiment_path, class_label, instance_label=None, instance_subset=2000, algorithm='MS', use_turf=True, turf_pct=True, random_state=None, n_jobs=None)[source]

Bases: Job

Initializer for Feature Importance Job

Parameters:
  • cv_train_path – path for the cross-validation dataset created

  • experiment_path

  • class_label

  • instance_label

  • instance_subset

  • algorithm

  • use_turf

  • turf_pct

  • random_state

  • n_jobs

pickle_scores(output_name, scores, score_dict, score_sorted_features)[source]

Pickle the scores, score dictionary and features sorted by score to be used primarily in phase 4 (feature selection) of pipeline

prepare_data()[source]

Loads target cv training dataset, separates class from features and removes instance labels.

run()[source]

Run all elements of the feature importance evaluation: applies either mutual information and multisurf and saves a sorted dictionary of features with associated scores

run_multi_surf()[source]

Run multiSURF (a Relief-based feature importance algorithm able to detect both univariate and interaction effects) and return scores as well as file path/name information

run_mutual_information()[source]

Run mutual information on target training dataset and return scores as well as file path/name information.

save_runtime(output_name)[source]

Save phase runtime :param output_name: name of the output tag

sort_save_fi_scores(scores, ordered_feature_names, alg_name)[source]

Creates a feature score dictionary and a dictionary sorted by decreasing feature importance scores.

Parameters:
  • scores

  • ordered_feature_names

  • alg_name

Returns: score_dict, score_sorted_features - dictionary of scores and score sorted name of features