Folds¶
This submodule provides functions to split, save and load folds.
- biom3d.utils.fold.get_folds_df(df: pandas.DataFrame, verbose: bool = True) list[list[str]][source]¶
Extract folds from a DataFrame into a list of lists.
- Parameters:
df (pandas.DataFrame) – DataFrame with a ‘fold’ column indicating fold assignment.
verbose (bool, default=True) – If True, prints the number and size of the folds.
- Returns:
List of folds, each being a list of filenames (or sample IDs).
- Return type:
list of list
- biom3d.utils.fold.get_folds_train_test_df(df: pandas.DataFrame, verbose: bool = True, merge_test: bool = True) tuple[list[list[str]], list[list[str]] | list[str]][source]¶
Extract fold groups from both train and test sets.
- Parameters:
df (pandas.DataFrame) – DataFrame with ‘hold_out’ and ‘fold’ columns.
verbose (bool, default=True) – If True, prints debug info.
merge_test (bool, default=True) – If True, test folds are merged into one list.
- Returns:
train_folds (list of list) – List of training folds, each being a list of filenames.
test_folds (list or list of list) – Test set either as a merged list or as a list of folds.
- biom3d.utils.fold.get_splits_train_val_test(df: pandas.DataFrame) tuple[list[list[str]], list[str], list[str]][source]¶
Create dataset splits of different sizes, along with validation and test sets.
Assumes columns: - ‘split’: indicates split index (e.g., 0=50%, 1=25%, etc.) - ‘fold’: used to separate training and validation - ‘hold_out’: 0=train/val, 1=test - ‘filename’: sample identifier
The splits contains [100%,50%,25%,10%,5%,2%,the rest] of the dataset
- Returns:
train_splits (list of list) – List of training splits (first is the full training set, followed by reduced ones).
valset (list) – List of filenames used for validation.
testset (list) – List of filenames used for testing.
- biom3d.utils.fold.get_splits_train_val_test_overlapping(df: pandas.DataFrame) tuple[list[list[str]], list[str], list[str]][source]¶
Create overlapping training splits plus validation and test sets.
Each smaller training subset is fully included in all larger ones. Used for dataset scaling experiments (e.g., 100%, 50%, 25%, etc.).
- Parameters:
df (pandas.DataFrame) – DataFrame with ‘split’, ‘fold’, ‘hold_out’, and ‘filename’ columns.
- Returns:
train_splits (list of list) – List of overlapping training subsets.
valset (list) – List of filenames used for validation.
testset (list) – List of filenames used for testing.
Notes
Only works if the splits follow descending powers of two.
- biom3d.utils.fold.get_train_test_df(df: pandas.DataFrame, verbose: bool = True) tuple[ndarray, ndarray][source]¶
Extract train and test sets from a DataFrame based on the ‘hold_out’ column.
- Parameters:
df (pandas.DataFrame) – The dataset containing a ‘hold_out’ column with 0 (train) and 1 (test) labels.
verbose (bool, default=True) – If True, enables debug printing (currently unused).
- Returns:
train_set (numpy.ndarray) – Array of training filenames (or sample IDs).
test_set (numpy.ndarray) – Array of test filenames (or sample IDs).