VGG3D Encoder

This encoder is currently used by UNet.

3D Resnet adapted from: https://github.com/akamaster/pytorch_resnet_cifar10.

class biom3d.models.encoder_vgg.EncoderBlock(*args: Any, **kwargs: Any)[source]

A 3D convolutional encoder block with optional InstanceNorm and LeakyReLU activation.

This block consists of two convolutional layers. The second normalization and activation are skipped if the block is marked as the last.

Variables:
  • conv1 (nn.Conv3d) – First 3D convolution layer.

  • bn1 (nn.InstanceNorm3d) – Instance normalization applied after the first convolution.

  • conv2 (nn.Conv3d) – Second 3D convolution layer.

  • bn2 – Instance normalization applied after the second convolution (if not last block).

  • is_last (bool) – Flag indicating if the block is the last in the sequence.

__init__(in_planes: int, planes: int, stride: int = 1, option: Literal['A', 'B'] = 'B', is_last: bool = False)[source]

3D convolutional encoder block with optional InstanceNorm and LeakyReLU activation.

This block consists of two convolutional layers. The second normalization and activation are skipped if the block is marked as the last.

Parameters:
  • in_planes (int) – Number of input channels.

  • planes (int) – Number of output channels.

  • stride (int or tuple, default=1) – Stride for the first convolution layer.

  • option (str, default='B') – Not used in this implementation, placeholder for possible variants.

  • is_last (bool, default=False) – Whether this block is the last one, which disables the second normalization and activation.

forward(x: torch.Tensor) torch.Tensor[source]

Forward pass through the EncoderBlock.

Parameters:

x (torch.Tensor) – Input tensor of shape (N, C, D, H, W).

Returns:

Output tensor after convolution, normalization, and activation.

Return type:

torch.Tensor

class biom3d.models.encoder_vgg.GlobalAvgPool3d(*args: Any, **kwargs: Any)[source]

Performs global average pooling over the last three dimensions.

This layer averages the input tensor over the depth, height, and width dimensions.

__init__()[source]

Perform global average pooling over the last three dimensions.

This layer averages the input tensor over the depth, height, and width dimensions.

forward(x: torch.Tensor) torch.Tensor[source]

Forward pass computing the global average pooling.

Parameters:

x (torch.Tensor) – Input tensor of shape (N, C, D, H, W).

Returns:

Output tensor of shape (N, C) after global average pooling.

Return type:

torch.Tensor

class biom3d.models.encoder_vgg.LambdaLayer(*args: Any, **kwargs: Any)[source]

Applies a lambda function as a layer.

Variables:

lambd (callable) – lambda function to be applied in forward

__init__(lambd: Callable)[source]

Apply a lambda function as a layer.

Parameters:

lambd (callable) – Lambda function to apply in forward pass.

forward(x: torch.Tensor) torch.Tensor[source]

Forward pass applying the lambda function.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

Output after applying the lambda function.

Return type:

torch.Tensor

class biom3d.models.encoder_vgg.SmallEncoderBlock(*args: Any, **kwargs: Any)[source]

Small 3D encoder block with one convolution and optional normalization and activation.

Variables:
  • conv1 (nn.Conv3d) – 3D convolution layer

  • bn1 (nn.InstanceNorm3d) – Instance normalization layer (only if is_last is False)

  • is_last (bool) – indicates if this is the last block (no norm or activation)

__init__(in_planes: int, planes: int, stride: int = 1, option: Literal['A', 'B'] = 'B', is_last: bool = False)[source]

Small 3D encoder block with one convolution and optional normalization and activation.

Parameters:
  • in_planes (int) – Number of input channels.

  • planes (int) – Number of output channels.

  • stride (int, default=1) – Stride of the convolution.

  • option (str, default='B') – Option parameter used to initialize block (not used).

  • is_last (bool, default=False) – If True, no normalization or activation is applied.

forward(x: torch.Tensor) torch.Tensor[source]

Forward pass through the block.

Applies convolution, followed by instance normalization and LeakyReLU if not the last block. Otherwise, applies only convolution.

Parameters:

x (torch.Tensor) – Input tensor.

Returns:

Output tensor after block processing.

Return type:

torch.Tensor

class biom3d.models.encoder_vgg.VGGEncoder(*args: Any, **kwargs: Any)[source]

VGG-style 3D encoder composed of multiple EncoderBlocks.

The architecture applies a sequence of blocks with progressively increasing number of channels, with configurable pooling and strides.

Variables:
  • in_planes (int) – Number of input channels to the current layer.

  • use_emb (bool) – Whether embedding is used.

  • use_head (bool) – Whether fully connected head is used.

  • layers (ModuleList) – ModuleList containing the sequence of encoder layers.

  • head (nn.Sequential) – Optional fully connected head for embedding (if use_head is True).

__init__(block: type[torch.nn.Module], num_pools: list[int], factor: int = 32, first_stride: list[int] = [1, 1, 1], flip_strides: bool = False, use_emb: bool = False, emb_dim: int = 320, use_head: bool = False, patch_size: Iterable[int] | None = None, in_planes: int = 1, roll_strides: bool = True)[source]

VGG-style 3D encoder composed of multiple EncoderBlocks.

The architecture applies a sequence of blocks with progressively increasing number of channels, with configurable pooling and strides.

Parameters:
  • block (nn.Module) – Encoder block class to use (e.g. EncoderBlock).

  • num_pools (list of int) – Number of pooling steps in each spatial dimension.

  • factor (int, default=32) – Base factor for channel scaling.

  • first_stride (list of int, default=[1,1,1]) – Stride for the first convolution layer.

  • flip_strides (bool, default=False) – Whether to flip the order of computed strides. Flipped strides creates larger feature maps.

  • use_emb (bool, default=False) – Whether to use an embedding layer on top of the last encoder output.

  • emb_dim (int, default=320) – Dimension of the embedding output.

  • use_head (bool, default=False) – Whether to use a fully connected head after flattening.

  • patch_size (iterable of int , optional) – Input patch size, needed if use_head is True.

  • in_planes (int, default=1) – Number of input channels.

  • roll_strides (bool, default=True) – Whether to roll strides when computing pooling (used for backward compatibility for models trained before commit f2ac9ee (August 2023)).

forward(x: torch.Tensor, use_encoder: bool = False) torch.Tensor[source]

Forward pass through the VGGEncoder.

Parameters:
  • x (torch.Tensor) – Input tensor of shape (N, C, D, H, W).

  • use_encoder (bool, default=False) – Whether to apply the embedding head to the last output.

Returns:

List of intermediate feature maps if use_emb is False. If use_emb is True, returns the embedding vector (after flattening and head if use_encoder=True).

Return type:

list of torch.Tensor or torch.Tensor