streamline.dataprep.kfold_partitioning module

class streamline.dataprep.kfold_partitioning.KFoldPartitioner(dataset, partition_method, experiment_path, n_splits=10, random_state=None)[source]

Bases: Job

Base class for KFold CrossValidation Operations on dataset, Initialization for KFoldPartitioner base class

Parameters:
  • dataset – a streamline.utils.dataset.Dataset object or a path to dataset text file

  • partition_method – KFold CV method used for partitioning, must be one of [“Random”, “Stratified”, “Group”]

  • experiment_path – path to experiment the logging directory folder

  • n_splits – number of splits in k-fold cross validation

  • random_state – random seed parameter for data reproducibility

cv_partitioner(return_dfs=True, save_dfs=True, partition_method=None)[source]

Takes data frame (data), number of cv partitions, partition method (R, S, or M), class label, and the column name used for matched CV. Returns list of training and testing dataframe partitions.

Parameters:
  • return_dfs – flag to return splits as list of dataframe, returns empty list if set to False

  • save_dfs – save dataframes in experiment path folder

  • partition_method – override default partition method

Returns: train_df, test_df both list of dataframes of train and test splits

run()[source]
save_datasets(experiment_path=None, train_dfs=None, test_dfs=None)[source]

Saves individual training and testing CV datasets as .csv files