Code Documentation
skrare.rare module
- class skrare.rare.RARE(label_name='Class', duration_name='grf_yrs', given_starting_point=False, amino_acid_start_point=None, amino_acid_bins_start_point=None, iterations=1000, rare_variant_maf_cutoff=1, set_number_of_bins=1, min_features_per_group=1, max_number_of_groups_with_feature=1, informative_cutoff=0.2, crossover_probability=0.5, mutation_probability=0.05, elitism_parameter=0.2, scoring_method='Relief', score_based_on_sample=True, score_with_common_variables=False, instance_sample_size=500, random_seed=None, bin_size_variability_constraint=None, max_features_per_bin=None, multiprocessing=False)[source]
- Bases: - BaseEstimator,- TransformerMixin- A Scikit-Learn compatible framework for the RARE Algorithm. - Parameters:
- given_starting_point – whether or not expert knowledge is being inputted (True or False) 
- amino_acid_start_point – if RARE is starting with expert knowledge, input the list of features here; otherwise None 
- amino_acid_bins_start_point – if RARE is starting with expert knowledge, input the list of bins of features here; otherwise None 
- iterations – the number of evolutionary cycles RARE will run 
- label_name – label for the class/endpoint column in the dataset (e.g., ‘Class’) 
- rare_variant_maf_cutoff – the minor allele frequency cutoff separating common features from rare variant features 
- set_number_of_bins – the population size of candidate bins 
- min_features_per_group – the minimum number of features in a bin 
- max_number_of_groups_with_feature – the maximum number of bins containing a feature 
- scoring_method – ‘Univariate’, ‘Relief’, or ‘Relief only on bin and common features’ 
- score_based_on_sample – if Relief scoring is used, whether or not bin evaluation is done based on a sample of instances rather than the whole dataset 
- score_with_common_variables – if Relief scoring is used, whether or not common features should be used as context for evaluating rare variant bins 
- instance_sample_size – if bin evaluation is done based on a sample of instances, input the sample size here 
- crossover_probability – the probability of each feature in an offspring bin to crossover to the paired offspring bin (recommendation: 0.5 to 0.8) 
- mutation_probability – the probability of each feature in a bin to be deleted (a proportionate probability is automatically applied on each feature outside the bin to be added (recommendation: 0.05 to 0.5 depending on situation and number of iterations run) 
- elitism_parameter – the proportion of elite bins in the current generation to be preserved for the next evolutionary cycle (recommendation: 0.2 to 0.8 depending on conservativeness of approach and number of iterations run) 
- random_seed – the seed value needed to generate a random number 
- bin_size_variability_constraint – sets the max bin size of children to be n times the size of their sibling (recommendation: 2, with larger or smaller values the population would trend heavily towards small or large bins without exploring the search space) 
- max_features_per_bin – sets a max value for the number of features per bin 
- multiprocessing – flag for using multiprocessing implementation of RARE 
 
 - fit(original_feature_matrix, y=None)[source]
- Scikit-learn compatible fit function for supervised training of FIBERS - Parameters:
- original_feature_matrix – array-like {n_samples, n_features} Training instances. ALL INSTANCE ATTRIBUTES MUST BE NUMERIC or NAN 
- y – array-like {n_samples} Training labels. ALL INSTANCE PHENOTYPES MUST BE NUMERIC NOT NAN OR OTHER TYPE 
 
 - :return self 
 - transform(original_feature_matrix, y=None)[source]
- Scikit-learn compatible transform function for supervised training of FIBERS - Parameters:
- X – original feature matrix. pd.DataFrame 
- y – array-like {n_samples} Training labels. ALL INSTANCE PHENOTYPES MUST BE NUMERIC NOT NAN OR OTHER TYPE 
 
 - :return self, bin_feature_matrix, common_features_and_bins_matrix, amino_acid_bins, amino_acid_bin_scores, rare_feature_maf_dict, common_feature_maf_dict, rare_feature_df, common_feature_df, maf_0_features