VGG3D Encoder¶
This encoder is currently used by UNet.
3D Resnet adapted from: https://github.com/akamaster/pytorch_resnet_cifar10.
- class biom3d.models.encoder_vgg.EncoderBlock(*args: Any, **kwargs: Any)[source]¶
A 3D convolutional encoder block with optional InstanceNorm and LeakyReLU activation.
This block consists of two convolutional layers. The second normalization and activation are skipped if the block is marked as the last.
- Variables:
conv1 (nn.Conv3d) – First 3D convolution layer.
bn1 (nn.InstanceNorm3d) – Instance normalization applied after the first convolution.
conv2 (nn.Conv3d) – Second 3D convolution layer.
bn2 – Instance normalization applied after the second convolution (if not last block).
is_last (bool) – Flag indicating if the block is the last in the sequence.
- __init__(in_planes: int, planes: int, stride: int = 1, option: Literal['A', 'B'] = 'B', is_last: bool = False)[source]¶
3D convolutional encoder block with optional InstanceNorm and LeakyReLU activation.
This block consists of two convolutional layers. The second normalization and activation are skipped if the block is marked as the last.
- Parameters:
in_planes (int) – Number of input channels.
planes (int) – Number of output channels.
stride (int or tuple, default=1) – Stride for the first convolution layer.
option (str, default='B') – Not used in this implementation, placeholder for possible variants.
is_last (bool, default=False) – Whether this block is the last one, which disables the second normalization and activation.
- class biom3d.models.encoder_vgg.GlobalAvgPool3d(*args: Any, **kwargs: Any)[source]¶
Performs global average pooling over the last three dimensions.
This layer averages the input tensor over the depth, height, and width dimensions.
- class biom3d.models.encoder_vgg.LambdaLayer(*args: Any, **kwargs: Any)[source]¶
Applies a lambda function as a layer.
- Variables:
lambd (callable) – lambda function to be applied in forward
- class biom3d.models.encoder_vgg.SmallEncoderBlock(*args: Any, **kwargs: Any)[source]¶
Small 3D encoder block with one convolution and optional normalization and activation.
- Variables:
conv1 (nn.Conv3d) – 3D convolution layer
bn1 (nn.InstanceNorm3d) – Instance normalization layer (only if is_last is False)
is_last (bool) – indicates if this is the last block (no norm or activation)
- __init__(in_planes: int, planes: int, stride: int = 1, option: Literal['A', 'B'] = 'B', is_last: bool = False)[source]¶
Small 3D encoder block with one convolution and optional normalization and activation.
- Parameters:
in_planes (int) – Number of input channels.
planes (int) – Number of output channels.
stride (int, default=1) – Stride of the convolution.
option (str, default='B') – Option parameter used to initialize block (not used).
is_last (bool, default=False) – If True, no normalization or activation is applied.
- forward(x: torch.Tensor) torch.Tensor[source]¶
Forward pass through the block.
Applies convolution, followed by instance normalization and LeakyReLU if not the last block. Otherwise, applies only convolution.
- Parameters:
x (torch.Tensor) – Input tensor.
- Returns:
Output tensor after block processing.
- Return type:
torch.Tensor
- class biom3d.models.encoder_vgg.VGGEncoder(*args: Any, **kwargs: Any)[source]¶
VGG-style 3D encoder composed of multiple EncoderBlocks.
The architecture applies a sequence of blocks with progressively increasing number of channels, with configurable pooling and strides.
- Variables:
in_planes (int) – Number of input channels to the current layer.
use_emb (bool) – Whether embedding is used.
use_head (bool) – Whether fully connected head is used.
layers (ModuleList) – ModuleList containing the sequence of encoder layers.
head (nn.Sequential) – Optional fully connected head for embedding (if use_head is True).
- __init__(block: type[torch.nn.Module], num_pools: list[int], factor: int = 32, first_stride: list[int] = [1, 1, 1], flip_strides: bool = False, use_emb: bool = False, emb_dim: int = 320, use_head: bool = False, patch_size: Iterable[int] | None = None, in_planes: int = 1, roll_strides: bool = True)[source]¶
VGG-style 3D encoder composed of multiple EncoderBlocks.
The architecture applies a sequence of blocks with progressively increasing number of channels, with configurable pooling and strides.
- Parameters:
block (nn.Module) – Encoder block class to use (e.g. EncoderBlock).
num_pools (list of int) – Number of pooling steps in each spatial dimension.
factor (int, default=32) – Base factor for channel scaling.
first_stride (list of int, default=[1,1,1]) – Stride for the first convolution layer.
flip_strides (bool, default=False) – Whether to flip the order of computed strides. Flipped strides creates larger feature maps.
use_emb (bool, default=False) – Whether to use an embedding layer on top of the last encoder output.
emb_dim (int, default=320) – Dimension of the embedding output.
use_head (bool, default=False) – Whether to use a fully connected head after flattening.
patch_size (iterable of int , optional) – Input patch size, needed if use_head is True.
in_planes (int, default=1) – Number of input channels.
roll_strides (bool, default=True) – Whether to roll strides when computing pooling (used for backward compatibility for models trained before commit f2ac9ee (August 2023)).
- forward(x: torch.Tensor, use_encoder: bool = False) torch.Tensor[source]¶
Forward pass through the VGGEncoder.
- Parameters:
x (torch.Tensor) – Input tensor of shape (N, C, D, H, W).
use_encoder (bool, default=False) – Whether to apply the embedding head to the last output.
- Returns:
List of intermediate feature maps if use_emb is False. If use_emb is True, returns the embedding vector (after flattening and head if use_encoder=True).
- Return type:
list of torch.Tensor or torch.Tensor