Encoding

Note

For the moment, this submodule is only used in preprocessing.

This module implement several version of one hot encoding.

biom3d.utils.encoding.one_hot(values: ndarray, num_classes: int | None = None) ndarray[source]

Convert an integer array to one-hot encoding using NumPy.

Parameters:
  • values (numpy.ndarray) – Integer array of labels to encode.

  • num_classes (int, optional) – Total number of classes. If None, inferred as max(values)+1.

Returns:

One-hot encoded array of shape (num_classes, *values.shape), dtype int64.

Return type:

numpy.ndarray

Notes

  • If max value is 255, values are normalized to {0,1}.

  • Unique values are re-indexed to consecutive integers before encoding.

biom3d.utils.encoding.one_hot_fast(values: ndarray, num_classes: int | None = None, mapping_mode: Literal['strict', 'remap', 'pad'] = 'strict')

Transform an integer array into a one-hot encoded array with robust mapping control.

This function is accelerated with Numba and designed to be a safe, standalone utility.

Parameters:
  • values (numpy.ndarray) – The integer label array to be encoded.

  • num_classes (int, optional) – The total number of classes. If None, this is inferred from the unique values in the array, and mapping_mode is forced to ‘remap’.

  • mapping_mode ('strict','remap' or 'pad', default='strict') –

    Controls how input values are mapped to class channels:

    • ’strict’ (Default): Safest mode. Requires all values to be within the range [0, num_classes-1]. Raises a ValueError if any value is outside this range.

    • ’remap’: For arbitrarily numbered labels. Remaps the N unique values in the input array to [0, 1, …, N-1]. Requires that the number of unique values equals num_classes.

    • ’pad’: For correctly-numbered labels where some classes may be missing. Creates channels for all classes in range(num_classes) and populates the ones present in values. Raises a ValueError if any value is outside the [0, num_classes-1] range.

Raises:

ValueError – If the input values are incompatible with the chosen mode or unknown mapping_mode.

Returns:

The one-hot encoded array of shape (num_classes, *values.shape) and dtype np.uint8.

Return type:

numpy.ndarray

biom3d.utils.encoding.one_hot_fast_v1(values: ndarray, num_classes: int | None = None)

Numba-accelerated one-hot encoding with simple class heuristics.

Parameters:
  • values (numpy.ndarray) – Integer array of labels to encode.

  • num_classes (int, optional) – Number of classes. If None, inferred from unique values.

Returns:

One-hot encoded array of shape (num_classes, *values.shape), dtype uint8.

Return type:

numpy.ndarray

Warning

  • If number of unique values < num_classes, missing classes are appended after max value.

  • If max value exceeds num_classes, behavior might be unexpected.

  • For binary classes, applies thresholding if input is not in {0,1}.