streamline.dataprep.kfold_partitioning module
- class streamline.dataprep.kfold_partitioning.KFoldPartitioner(dataset, partition_method, experiment_path, n_splits=10, random_state=None)[source]
Bases:
Job
Base class for KFold CrossValidation Operations on dataset, Initialization for KFoldPartitioner base class
- Parameters:
dataset – a streamline.utils.dataset.Dataset object or a path to dataset text file
partition_method – KFold CV method used for partitioning, must be one of [“Random”, “Stratified”, “Group”]
experiment_path – path to experiment the logging directory folder
n_splits – number of splits in k-fold cross validation
random_state – random seed parameter for data reproducibility
- cv_partitioner(return_dfs=True, save_dfs=True, partition_method=None)[source]
Takes data frame (data), number of cv partitions, partition method (R, S, or M), class label, and the column name used for matched CV. Returns list of training and testing dataframe partitions.
- Parameters:
return_dfs – flag to return splits as list of dataframe, returns empty list if set to False
save_dfs – save dataframes in experiment path folder
partition_method – override default partition method
Returns: train_df, test_df both list of dataframes of train and test splits