Encoding¶
Note
For the moment, this submodule is only used in preprocessing.
This module implement several version of one hot encoding.
- biom3d.utils.encoding.one_hot(values: ndarray, num_classes: int | None = None) ndarray[source]¶
Convert an integer array to one-hot encoding using NumPy.
- Parameters:
values (numpy.ndarray) – Integer array of labels to encode.
num_classes (int, optional) – Total number of classes. If None, inferred as max(values)+1.
- Returns:
One-hot encoded array of shape (num_classes, *values.shape), dtype int64.
- Return type:
numpy.ndarray
Notes
If max value is 255, values are normalized to {0,1}.
Unique values are re-indexed to consecutive integers before encoding.
- biom3d.utils.encoding.one_hot_fast(values: ndarray, num_classes: int | None = None, mapping_mode: Literal['strict', 'remap', 'pad'] = 'strict')¶
Transform an integer array into a one-hot encoded array with robust mapping control.
This function is accelerated with Numba and designed to be a safe, standalone utility.
- Parameters:
values (numpy.ndarray) – The integer label array to be encoded.
num_classes (int, optional) – The total number of classes. If None, this is inferred from the unique values in the array, and mapping_mode is forced to ‘remap’.
mapping_mode ('strict','remap' or 'pad', default='strict') –
Controls how input values are mapped to class channels:
’strict’ (Default): Safest mode. Requires all values to be within the range [0, num_classes-1]. Raises a ValueError if any value is outside this range.
’remap’: For arbitrarily numbered labels. Remaps the N unique values in the input array to [0, 1, …, N-1]. Requires that the number of unique values equals num_classes.
’pad’: For correctly-numbered labels where some classes may be missing. Creates channels for all classes in range(num_classes) and populates the ones present in values. Raises a ValueError if any value is outside the [0, num_classes-1] range.
- Raises:
ValueError – If the input values are incompatible with the chosen mode or unknown mapping_mode.
- Returns:
The one-hot encoded array of shape (num_classes, *values.shape) and dtype np.uint8.
- Return type:
numpy.ndarray
- biom3d.utils.encoding.one_hot_fast_v1(values: ndarray, num_classes: int | None = None)¶
Numba-accelerated one-hot encoding with simple class heuristics.
- Parameters:
values (numpy.ndarray) – Integer array of labels to encode.
num_classes (int, optional) – Number of classes. If None, inferred from unique values.
- Returns:
One-hot encoded array of shape (num_classes, *values.shape), dtype uint8.
- Return type:
numpy.ndarray
Warning
If number of unique values < num_classes, missing classes are appended after max value.
If max value exceeds num_classes, behavior might be unexpected.
For binary classes, applies thresholding if input is not in {0,1}.