Network architectures#

Blocks#

ADN#

class monai.networks.blocks.ADN(ordering='NDA', in_channels=None, act='RELU', norm=None, norm_dim=None, dropout=None, dropout_dim=None)[source]#

Constructs a sequential module of optional activation (A), dropout (D), and normalization (N) layers with an arbitrary order:

-- (Norm) -- (Dropout) -- (Acti) --

Parameters:

ordering – a string representing the ordering of activation, dropout, and normalization. Defaults to “NDA”.
in_channels – C from an expected input of size (N, C, H[, W, D]).
act – activation type and arguments. Defaults to PReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
norm_dim – determine the spatial dimensions of the normalization layer. defaults to dropout_dim if unspecified.
dropout – dropout ratio. Defaults to no dropout.
dropout_dim –
determine the spatial dimensions of dropout. defaults to norm_dim if unspecified.
- When dropout_dim = 1, randomly zeroes some of the elements for each channel.
- When dropout_dim = 2, Randomly zeroes out entire channels (a channel is a 2D feature map).
- When dropout_dim = 3, Randomly zeroes out entire channels (a channel is a 3D feature map).

Examples:

# activation, group norm, dropout
>>> norm_params = ("GROUP", {"num_groups": 1, "affine": False})
>>> ADN(norm=norm_params, in_channels=1, dropout_dim=1, dropout=0.8, ordering="AND")
ADN(
    (A): ReLU()
    (N): GroupNorm(1, 1, eps=1e-05, affine=False)
    (D): Dropout(p=0.8, inplace=False)
)

# LeakyReLU, dropout
>>> act_params = ("leakyrelu", {"negative_slope": 0.1, "inplace": True})
>>> ADN(act=act_params, in_channels=1, dropout_dim=1, dropout=0.8, ordering="AD")
ADN(
    (A): LeakyReLU(negative_slope=0.1, inplace=True)
    (D): Dropout(p=0.8, inplace=False)
)

Convolution#

class monai.networks.blocks.Convolution(spatial_dims, in_channels, out_channels, strides=1, kernel_size=3, adn_ordering='NDA', act='PRELU', norm='INSTANCE', dropout=None, dropout_dim=1, dilation=1, groups=1, bias=True, conv_only=False, is_transposed=False, padding=None, output_padding=None)[source]#

Constructs a convolution with normalization, optional dropout, and optional activation layers:

-- (Conv|ConvTrans) -- (Norm -- Dropout -- Acti) --

if conv_only set to True:

-- (Conv|ConvTrans) --

For example:

from monai.networks.blocks import Convolution

conv = Convolution(
    spatial_dims=3,
    in_channels=1,
    out_channels=1,
    adn_ordering="ADN",
    act=("prelu", {"init": 0.2}),
    dropout=0.1,
    norm=("layer", {"normalized_shape": (10, 10, 10)}),
)
print(conv)

output:

Convolution(
  (conv): Conv3d(1, 1, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
  (adn): ADN(
    (A): PReLU(num_parameters=1)
    (D): Dropout(p=0.1, inplace=False)
    (N): LayerNorm((10, 10, 10), eps=1e-05, elementwise_affine=True)
  )
)

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
strides – convolution stride. Defaults to 1.
kernel_size – convolution kernel size. Defaults to 3.
adn_ordering – a string representing the ordering of activation, normalization, and dropout. Defaults to “NDA”.
act – activation type and arguments. Defaults to PReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
dropout – dropout ratio. Defaults to no dropout.
dropout_dim –
determine the spatial dimensions of dropout. Defaults to 1.
- When dropout_dim = 1, randomly zeroes some of the elements for each channel.
- When dropout_dim = 2, Randomly zeroes out entire channels (a channel is a 2D feature map).
- When dropout_dim = 3, Randomly zeroes out entire channels (a channel is a 3D feature map).
The value of dropout_dim should be no larger than the value of spatial_dims.
dilation – dilation rate. Defaults to 1.
groups – controls the connections between inputs and outputs. Defaults to 1.
bias – whether to have a bias term. Defaults to True.
conv_only – whether to use the convolutional layer only. Defaults to False.
is_transposed – if True uses ConvTrans instead of Conv. Defaults to False.
padding – controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension. Defaults to None.
output_padding – controls the additional size added to one side of the output shape. Defaults to None.

CRF#

class monai.networks.blocks.CRF(iterations=5, bilateral_weight=1.0, gaussian_weight=1.0, bilateral_spatial_sigma=5.0, bilateral_color_sigma=0.5, gaussian_spatial_sigma=5.0, update_factor=3.0, compatibility_matrix=None)[source]#

Conditional Random Field: Combines message passing with a class compatibility convolution into an iterative process designed to successively minimise the energy of the class labeling.

In this implementation, the message passing step is a weighted combination of a gaussian filter and a bilateral filter. The bilateral term is included to respect existing structure within the reference tensor.

See:: https://arxiv.org/abs/1502.03240

__init__(iterations=5, bilateral_weight=1.0, gaussian_weight=1.0, bilateral_spatial_sigma=5.0, bilateral_color_sigma=0.5, gaussian_spatial_sigma=5.0, update_factor=3.0, compatibility_matrix=None)[source]#

Parameters:

iterations – the number of iterations.
bilateral_weight – the weighting of the bilateral term in the message passing step.
gaussian_weight – the weighting of the gaussian term in the message passing step.
bilateral_spatial_sigma – standard deviation in spatial coordinates for the bilateral term.
bilateral_color_sigma – standard deviation in color space for the bilateral term.
gaussian_spatial_sigma – standard deviation in spatial coordinates for the gaussian term.
update_factor – determines the magnitude of each update.
compatibility_matrix – a matrix describing class compatibility, should be NxN where N is the number of classes.

forward(input_tensor, reference_tensor)[source]#

Parameters:

input_tensor (Tensor) – tensor containing initial class logits.
reference_tensor (Tensor) – the reference tensor used to guide the message passing.

Returns:

output tensor.

Return type:

output (torch.Tensor)

ResidualUnit#

class monai.networks.blocks.ResidualUnit(spatial_dims, in_channels, out_channels, strides=1, kernel_size=3, subunits=2, adn_ordering='NDA', act='PRELU', norm='INSTANCE', dropout=None, dropout_dim=1, dilation=1, bias=True, last_conv_only=False, padding=None)[source]#

Residual module with multiple convolutions and a residual connection.

For example:

from monai.networks.blocks import ResidualUnit

convs = ResidualUnit(
    spatial_dims=3,
    in_channels=1,
    out_channels=1,
    adn_ordering="AN",
    act=("prelu", {"init": 0.2}),
    norm=("layer", {"normalized_shape": (10, 10, 10)}),
)
print(convs)

output:

ResidualUnit(
  (conv): Sequential(
    (unit0): Convolution(
      (conv): Conv3d(1, 1, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
      (adn): ADN(
        (A): PReLU(num_parameters=1)
        (N): LayerNorm((10, 10, 10), eps=1e-05, elementwise_affine=True)
      )
    )
    (unit1): Convolution(
      (conv): Conv3d(1, 1, kernel_size=(3, 3, 3), stride=(1, 1, 1), padding=(1, 1, 1))
      (adn): ADN(
        (A): PReLU(num_parameters=1)
        (N): LayerNorm((10, 10, 10), eps=1e-05, elementwise_affine=True)
      )
    )
  )
  (residual): Identity()
)

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
strides – convolution stride. Defaults to 1.
kernel_size – convolution kernel size. Defaults to 3.
subunits – number of convolutions. Defaults to 2.
adn_ordering – a string representing the ordering of activation, normalization, and dropout. Defaults to “NDA”.
act – activation type and arguments. Defaults to PReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
dropout – dropout ratio. Defaults to no dropout.
dropout_dim –
determine the dimensions of dropout. Defaults to 1.
- When dropout_dim = 1, randomly zeroes some of the elements for each channel.
- When dropout_dim = 2, Randomly zero out entire channels (a channel is a 2D feature map).
- When dropout_dim = 3, Randomly zero out entire channels (a channel is a 3D feature map).
The value of dropout_dim should be no larger than the value of dimensions.
dilation – dilation rate. Defaults to 1.
bias – whether to have a bias term. Defaults to True.
last_conv_only – for the last subunit, whether to use the convolutional layer only. Defaults to False.
padding – controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension. Defaults to None.

See also

monai.networks.blocks.Convolution

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Return type:: Tensor

Swish#

class monai.networks.blocks.Swish(alpha=1.0)[source]#

Applies the element-wise function:

\[\text{Swish}(x) = x * \text{Sigmoid}(\alpha * x) ~~~~\text{for constant value}~ \alpha.\]

Citation: Searching for Activation Functions, Ramachandran et al., 2017, https://arxiv.org/abs/1710.05941.

Shape:

Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input

Examples:

>>> import torch
>>> from monai.networks.layers.factories import Act
>>> m = Act['swish']()
>>> input = torch.randn(2)
>>> output = m(input)

forward(input)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

MemoryEfficientSwish#

class monai.networks.blocks.MemoryEfficientSwish(inplace=False)[source]#

Applies the element-wise function:

\[\text{Swish}(x) = x * \text{Sigmoid}(\alpha * x) ~~~~\text{for constant value}~ \alpha=1.\]

Memory efficient implementation for training following recommendation from: lukemelas/EfficientNet-PyTorch#18

Results in ~ 30% memory saving during training as compared to Swish()

Citation: Searching for Activation Functions, Ramachandran et al., 2017, https://arxiv.org/abs/1710.05941.

From Pytorch 1.7.0+, the optimized version of Swish named SiLU is implemented, this class will utilize torch.nn.functional.silu to do the calculation if meets the version.

Shape:

Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input

Examples:

>>> import torch
>>> from monai.networks.layers.factories import Act
>>> m = Act['memswish']()
>>> input = torch.randn(2)
>>> output = m(input)

forward(input)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

FPN#

class monai.networks.blocks.ExtraFPNBlock(*args, **kwargs)[source]#

Base class for the extra block in the FPN.

Same code as pytorch/vision

forward(results, x, names)[source]#

Compute extended set of results of the FPN and their names.

Parameters:

results (list[Tensor]) – the result of the FPN
x (list[Tensor]) – the original feature maps
names (list[str]) – the names for each one of the original feature maps

Returns:

the extended set of results of the FPN
the extended set of names for the results

class monai.networks.blocks.FeaturePyramidNetwork(spatial_dims, in_channels_list, out_channels, extra_blocks=None)[source]#

Module that adds a FPN from on top of a set of feature maps. This is based on “Feature Pyramid Network for Object Detection”.

The feature maps are currently supposed to be in increasing depth order.

The input to the model is expected to be an OrderedDict[Tensor], containing the feature maps on top of which the FPN will be added.

Parameters:

spatial_dims – 2D or 3D images
in_channels_list – number of channels for each feature map that is passed to the module
out_channels – number of channels of the FPN representation
extra_blocks – if provided, extra operations will be performed. It is expected to take the fpn features, the original features and the names of the original features as input, and returns a new list of feature maps and their corresponding names

Examples:

>>> m = FeaturePyramidNetwork(2, [10, 20, 30], 5)
>>> # get some dummy data
>>> x = OrderedDict()
>>> x['feat0'] = torch.rand(1, 10, 64, 64)
>>> x['feat2'] = torch.rand(1, 20, 16, 16)
>>> x['feat3'] = torch.rand(1, 30, 8, 8)
>>> # compute the FPN on top of x
>>> output = m(x)
>>> print([(k, v.shape) for k, v in output.items()])
>>> # returns
>>>   [('feat0', torch.Size([1, 5, 64, 64])),
>>>    ('feat2', torch.Size([1, 5, 16, 16])),
>>>    ('feat3', torch.Size([1, 5, 8, 8]))]

forward(x)[source]#

Computes the FPN for a set of feature maps.

Parameters:: x (dict[str, Tensor]) – feature maps for each feature level.
Return type:: dict[str, Tensor]
Returns:: feature maps after FPN layers. They are ordered from highest resolution first.

get_result_from_inner_blocks(x, idx)[source]#

This is equivalent to self.inner_blocks[idx](x), but torchscript doesn’t support this yet

Return type:: Tensor

get_result_from_layer_blocks(x, idx)[source]#

This is equivalent to self.layer_blocks[idx](x), but torchscript doesn’t support this yet

Return type:: Tensor

class monai.networks.blocks.LastLevelMaxPool(spatial_dims)[source]#

Applies a max_pool2d or max_pool3d on top of the last feature map. Serves as an extra_blocks in FeaturePyramidNetwork .

forward(results, x, names)[source]#

Compute extended set of results of the FPN and their names.

Parameters:

results (list[Tensor]) – the result of the FPN
x (list[Tensor]) – the original feature maps
names (list[str]) – the names for each one of the original feature maps

Return type:

tuple[list[Tensor], list[str]]

Returns:

the extended set of results of the FPN
the extended set of names for the results

class monai.networks.blocks.LastLevelP6P7(spatial_dims, in_channels, out_channels)[source]#

This module is used in RetinaNet to generate extra layers, P6 and P7. Serves as an extra_blocks in FeaturePyramidNetwork .

forward(results, x, names)[source]#

Compute extended set of results of the FPN and their names.

Parameters:

results (list[Tensor]) – the result of the FPN
x (list[Tensor]) – the original feature maps
names (list[str]) – the names for each one of the original feature maps

Return type:

tuple[list[Tensor], list[str]]

Returns:

the extended set of results of the FPN
the extended set of names for the results

class monai.networks.blocks.BackboneWithFPN(backbone, return_layers, in_channels_list, out_channels, spatial_dims=None, extra_blocks=None)[source]#

Adds an FPN on top of a model. Internally, it uses torchvision.models._utils.IntermediateLayerGetter to extract a submodel that returns the feature maps specified in return_layers. The same limitations of IntermediateLayerGetter apply here.

Same code as pytorch/vision Except that this class uses spatial_dims

Parameters:

backbone – backbone network
return_layers – a dict containing the names of the modules for which the activations will be returned as the key of the dict, and the value of the dict is the name of the returned activation (which the user can specify).
in_channels_list – number of channels for each feature map that is returned, in the order they are present in the OrderedDict
out_channels – number of channels in the FPN.
spatial_dims – 2D or 3D images

forward(x)[source]#

Computes the resulted feature maps of the network.

Parameters:: x (Tensor) – input images
Return type:: dict[str, Tensor]
Returns:: feature maps after FPN layers. They are ordered from highest resolution first.

Mish#

class monai.networks.blocks.Mish(inplace=False)[source]#

Applies the element-wise function:

\[\text{Mish}(x) = x * tanh(\text{softplus}(x)).\]

Citation: Mish: A Self Regularized Non-Monotonic Activation Function, Diganta Misra, 2019, https://arxiv.org/abs/1908.08681.

From Pytorch 1.9.0+, the optimized version of Mish is implemented, this class will utilize torch.nn.functional.mish to do the calculation if meets the version.

Shape:

Input: \((N, *)\) where * means, any number of additional dimensions
Output: \((N, *)\), same shape as the input

Examples:

>>> import torch
>>> from monai.networks.layers.factories import Act
>>> m = Act['mish']()
>>> input = torch.randn(2)
>>> output = m(input)

forward(input)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

GEGLU#

class monai.networks.blocks.GEGLU(*args, **kwargs)[source]#

Applies the element-wise function:

\[\text{GEGLU}(x) = x_1 * \text{Sigmoid}(x_2)\]

where \(x_1\) and \(x_2\) are split from the input tensor along the last dimension.

Citation: GLU Variants Improve Transformer, Noam Shazeer, 2020, https://arxiv.org/abs/2002.05202.

Shape:

Input: \((N, *, 2 * D)\)
Output: \((N, *, D)\), where * means, any number of additional dimensions

forward(input)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

GCN Module#

class monai.networks.blocks.GCN(inplanes, planes, ks=7)[source]#

The Global Convolutional Network module using large 1D Kx1 and 1xK kernels to represent 2D kernels.

__init__(inplanes, planes, ks=7)[source]#

Parameters:

inplanes (int) – number of input channels.
planes (int) – number of output channels.
ks (int) – kernel size for one dimension. Defaults to 7.

forward(x)[source]#

Parameters:: x (Tensor) – in shape (batch, inplanes, spatial_1, spatial_2).
Return type:: Tensor

FCN Module#

class monai.networks.blocks.FCN(out_channels=1, upsample_mode='bilinear', pretrained=True, progress=True)[source]#

2D FCN network with 3 input channels. The small decoder is built with the GCN and Refine modules. The code is adapted from lsqshr’s official 2D code.

Parameters:

out_channels (int) – number of output channels. Defaults to 1.
upsample_mode (str) –
["transpose", "bilinear"] The mode of upsampling manipulations. Using the second mode cannot guarantee the model’s reproducibility. Defaults to bilinear.
- transpose, uses transposed convolution layers.
- bilinear, uses bilinear interpolation.
pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr.

forward(x)[source]#

Parameters:: x (Tensor) – in shape (batch, 3, spatial_1, spatial_2).

Multi-Channel FCN Module#

class monai.networks.blocks.MCFCN(in_channels=3, out_channels=1, upsample_mode='bilinear', pretrained=True, progress=True)[source]#

The multi-channel version of the 2D FCN module. Adds a projection layer to take arbitrary number of inputs.

Parameters:

in_channels (int) – number of input channels. Defaults to 3.
out_channels (int) – number of output channels. Defaults to 1.
upsample_mode (str) –
["transpose", "bilinear"] The mode of upsampling manipulations. Using the second mode cannot guarantee the model’s reproducibility. Defaults to bilinear.
- transpose, uses transposed convolution layers.
- bilinear, uses bilinear interpolate.
pretrained (bool) – If True, returns a model pre-trained on ImageNet
progress (bool) – If True, displays a progress bar of the download to stderr.

forward(x)[source]#

Parameters:: x (Tensor) – in shape (batch, in_channels, spatial_1, spatial_2).

Dynamic-Unet Block#

class monai.networks.blocks.UnetResBlock(spatial_dims, in_channels, out_channels, kernel_size, stride, norm_name, act_name=('leakyrelu', {'inplace': True, 'negative_slope': 0.01}), dropout=None)[source]#

A skip-connection based module that can be used for DynUNet, based on: Automated Design of Deep Learning Methods for Biomedical Image Segmentation. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation.

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
kernel_size – convolution kernel size.
stride – convolution stride.
norm_name – feature normalization type and arguments.
act_name – activation layer type and arguments.
dropout – dropout probability.

forward(inp)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

class monai.networks.blocks.UnetBasicBlock(spatial_dims, in_channels, out_channels, kernel_size, stride, norm_name, act_name=('leakyrelu', {'inplace': True, 'negative_slope': 0.01}), dropout=None)[source]#

A CNN module that can be used for DynUNet, based on: Automated Design of Deep Learning Methods for Biomedical Image Segmentation. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation.

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
kernel_size – convolution kernel size.
stride – convolution stride.
norm_name – feature normalization type and arguments.
act_name – activation layer type and arguments.
dropout – dropout probability.

forward(inp)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

class monai.networks.blocks.UnetUpBlock(spatial_dims, in_channels, out_channels, kernel_size, stride, upsample_kernel_size, norm_name, act_name=('leakyrelu', {'inplace': True, 'negative_slope': 0.01}), dropout=None, trans_bias=False)[source]#

An upsampling module that can be used for DynUNet, based on: Automated Design of Deep Learning Methods for Biomedical Image Segmentation. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation.

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
kernel_size – convolution kernel size.
stride – convolution stride.
upsample_kernel_size – convolution kernel size for transposed convolution layers.
norm_name – feature normalization type and arguments.
act_name – activation layer type and arguments.
dropout – dropout probability.
trans_bias – transposed convolution bias.

forward(inp, skip)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

class monai.networks.blocks.UnetOutBlock(spatial_dims, in_channels, out_channels, dropout=None)[source]#

forward(inp)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

DenseBlock#

class monai.networks.blocks.DenseBlock(layers)[source]#

A DenseBlock is a sequence of layers where each layer’s outputs are concatenated with their inputs. This has the effect of accumulating outputs from previous layers as inputs to later ones and as the final output of the block.

Parameters:: layers (Sequence[Module]) – sequence of nn.Module objects to define the individual layers of the dense block

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

SegResnet Block#

class monai.networks.blocks.ResBlock(spatial_dims, in_channels, norm, kernel_size=3, act=('RELU', {'inplace': True}))[source]#

ResBlock employs skip connection and two convolution blocks and is used in SegResNet based on 3D MRI brain tumor segmentation using autoencoder regularization.

__init__(spatial_dims, in_channels, norm, kernel_size=3, act=('RELU', {'inplace': True}))[source]#

Parameters:

spatial_dims – number of spatial dimensions, could be 1, 2 or 3.
in_channels – number of input channels.
norm – feature normalization type and arguments.
kernel_size – convolution kernel size, the value should be an odd number. Defaults to 3.
act – activation type and arguments. Defaults to RELU.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

SABlock Block#

class monai.networks.blocks.SABlock(hidden_size, num_heads, dropout_rate=0.0, qkv_bias=False, save_attn=False)[source]#

A self-attention block, based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”

__init__(hidden_size, num_heads, dropout_rate=0.0, qkv_bias=False, save_attn=False)[source]#

Parameters:

hidden_size (int) – dimension of hidden layer.
num_heads (int) – number of attention heads.
dropout_rate (float, optional) – fraction of the input units to drop. Defaults to 0.0.
qkv_bias (bool, optional) – bias term for the qkv linear layer. Defaults to False.
save_attn (bool, optional) – to make accessible the attention matrix. Defaults to False.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Squeeze-and-Excitation#

class monai.networks.blocks.ChannelSELayer(spatial_dims, in_channels, r=2, acti_type_1=('relu', {'inplace': True}), acti_type_2='sigmoid', add_residual=False)[source]#

Re-implementation of the Squeeze-and-Excitation block based on: “Hu et al., Squeeze-and-Excitation Networks, https://arxiv.org/abs/1709.01507”.

__init__(spatial_dims, in_channels, r=2, acti_type_1=('relu', {'inplace': True}), acti_type_2='sigmoid', add_residual=False)[source]#

Parameters:

spatial_dims – number of spatial dimensions, could be 1, 2, or 3.
in_channels – number of input channels.
r – the reduction ratio r in the paper. Defaults to 2.
acti_type_1 – activation type of the hidden squeeze layer. Defaults to ("relu", {"inplace": True}).
acti_type_2 – activation type of the output squeeze layer. Defaults to “sigmoid”.

Raises:

ValueError – When r is nonpositive or larger than in_channels.

See also

monai.networks.layers.Act

forward(x)[source]#

Parameters:: x (Tensor) – in shape (batch, in_channels, spatial_1[, spatial_2, …]).
Return type:: Tensor

Transformer Block#

class monai.networks.blocks.TransformerBlock(hidden_size, mlp_dim, num_heads, dropout_rate=0.0, qkv_bias=False, save_attn=False)[source]#

A transformer block, based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”

__init__(hidden_size, mlp_dim, num_heads, dropout_rate=0.0, qkv_bias=False, save_attn=False)[source]#

Parameters:

hidden_size (int) – dimension of hidden layer.
mlp_dim (int) – dimension of feedforward layer.
num_heads (int) – number of attention heads.
dropout_rate (float, optional) – fraction of the input units to drop. Defaults to 0.0.
qkv_bias (bool, optional) – apply bias term for the qkv linear layer. Defaults to False.
save_attn (bool, optional) – to make accessible the attention matrix. Defaults to False.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

UNETR Block#

class monai.networks.blocks.UnetrBasicBlock(spatial_dims, in_channels, out_channels, kernel_size, stride, norm_name, res_block=False)[source]#

A CNN module that can be used for UNETR, based on: “Hatamizadeh et al., UNETR: Transformers for 3D Medical Image Segmentation <https://arxiv.org/abs/2103.10504>”

__init__(spatial_dims, in_channels, out_channels, kernel_size, stride, norm_name, res_block=False)[source]#

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
kernel_size – convolution kernel size.
stride – convolution stride.
norm_name – feature normalization type and arguments.
res_block – bool argument to determine if residual block is used.

forward(inp)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

class monai.networks.blocks.UnetrUpBlock(spatial_dims, in_channels, out_channels, kernel_size, upsample_kernel_size, norm_name, res_block=False)[source]#

An upsampling module that can be used for UNETR: “Hatamizadeh et al., UNETR: Transformers for 3D Medical Image Segmentation <https://arxiv.org/abs/2103.10504>”

__init__(spatial_dims, in_channels, out_channels, kernel_size, upsample_kernel_size, norm_name, res_block=False)[source]#

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
kernel_size – convolution kernel size.
upsample_kernel_size – convolution kernel size for transposed convolution layers.
norm_name – feature normalization type and arguments.
res_block – bool argument to determine if residual block is used.

forward(inp, skip)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

class monai.networks.blocks.UnetrPrUpBlock(spatial_dims, in_channels, out_channels, num_layer, kernel_size, stride, upsample_kernel_size, norm_name, conv_block=False, res_block=False)[source]#

A projection upsampling module that can be used for UNETR: “Hatamizadeh et al., UNETR: Transformers for 3D Medical Image Segmentation <https://arxiv.org/abs/2103.10504>”

__init__(spatial_dims, in_channels, out_channels, num_layer, kernel_size, stride, upsample_kernel_size, norm_name, conv_block=False, res_block=False)[source]#

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
num_layer – number of upsampling blocks.
kernel_size – convolution kernel size.
stride – convolution stride.
upsample_kernel_size – convolution kernel size for transposed convolution layers.
norm_name – feature normalization type and arguments.
conv_block – bool argument to determine if convolutional block is used.
res_block – bool argument to determine if residual block is used.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Residual Squeeze-and-Excitation#

class monai.networks.blocks.ResidualSELayer(spatial_dims, in_channels, r=2, acti_type_1='leakyrelu', acti_type_2='relu')[source]#

A “squeeze-and-excitation”-like layer with a residual connection:

--+-- SE --o--
  |        |
  +--------+

__init__(spatial_dims, in_channels, r=2, acti_type_1='leakyrelu', acti_type_2='relu')[source]#

Parameters:

spatial_dims – number of spatial dimensions, could be 1, 2, or 3.
in_channels – number of input channels.
r – the reduction ratio r in the paper. Defaults to 2.
acti_type_1 – defaults to “leakyrelu”.
acti_type_2 – defaults to “relu”.

See also

monai.networks.blocks.ChannelSELayer

Squeeze-and-Excitation Block#

class monai.networks.blocks.SEBlock(spatial_dims, in_channels, n_chns_1, n_chns_2, n_chns_3, conv_param_1=None, conv_param_2=None, conv_param_3=None, project=None, r=2, acti_type_1=('relu', {'inplace': True}), acti_type_2='sigmoid', acti_type_final=('relu', {'inplace': True}))[source]#

Residual module enhanced with Squeeze-and-Excitation:

----+- conv1 --  conv2 -- conv3 -- SE -o---
    |                                  |
    +---(channel project if needed)----+

Re-implementation of the SE-Resnet block based on: “Hu et al., Squeeze-and-Excitation Networks, https://arxiv.org/abs/1709.01507”.

__init__(spatial_dims, in_channels, n_chns_1, n_chns_2, n_chns_3, conv_param_1=None, conv_param_2=None, conv_param_3=None, project=None, r=2, acti_type_1=('relu', {'inplace': True}), acti_type_2='sigmoid', acti_type_final=('relu', {'inplace': True}))[source]#

Parameters:

spatial_dims – number of spatial dimensions, could be 1, 2, or 3.
in_channels – number of input channels.
n_chns_1 – number of output channels in the 1st convolution.
n_chns_2 – number of output channels in the 2nd convolution.
n_chns_3 – number of output channels in the 3rd convolution.
conv_param_1 – additional parameters to the 1st convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}
conv_param_2 – additional parameters to the 2nd convolution. Defaults to {"kernel_size": 3, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}
conv_param_3 – additional parameters to the 3rd convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": None}
project – in the case of residual chns and output chns doesn’t match, a project (Conv) layer/block is used to adjust the number of chns. In SENET, it is consisted with a Conv layer as well as a Norm layer. Defaults to None (chns are matchable) or a Conv layer with kernel size 1.
r – the reduction ratio r in the paper. Defaults to 2.
acti_type_1 – activation type of the hidden squeeze layer. Defaults to “relu”.
acti_type_2 – activation type of the output squeeze layer. Defaults to “sigmoid”.
acti_type_final – activation type of the end of the block. Defaults to “relu”.

See also

monai.networks.blocks.ChannelSELayer

forward(x)[source]#

Parameters:: x (Tensor) – in shape (batch, in_channels, spatial_1[, spatial_2, …]).
Return type:: Tensor

Squeeze-and-Excitation Bottleneck#

class monai.networks.blocks.SEBottleneck(spatial_dims, inplanes, planes, groups, reduction, stride=1, downsample=None)[source]#: Bottleneck for SENet154.

Squeeze-and-Excitation Resnet Bottleneck#

class monai.networks.blocks.SEResNetBottleneck(spatial_dims, inplanes, planes, groups, reduction, stride=1, downsample=None)[source]#: ResNet bottleneck with a Squeeze-and-Excitation module. It follows Caffe implementation and uses strides=stride in conv1 and not in conv2 (the latter is used in the torchvision implementation of ResNet).

Squeeze-and-Excitation ResNeXt Bottleneck#

class monai.networks.blocks.SEResNeXtBottleneck(spatial_dims, inplanes, planes, groups, reduction, stride=1, downsample=None, base_width=4)[source]#: ResNeXt bottleneck type C with a Squeeze-and-Excitation module.

Simple ASPP#

class monai.networks.blocks.SimpleASPP(spatial_dims, in_channels, conv_out_channels, kernel_sizes=(1, 3, 3, 3), dilations=(1, 2, 4, 6), norm_type='BATCH', acti_type='LEAKYRELU', bias=False)[source]#

A simplified version of the atrous spatial pyramid pooling (ASPP) module.

Chen et al., Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. https://arxiv.org/abs/1802.02611

Wang et al., A Noise-robust Framework for Automatic Segmentation of COVID-19 Pneumonia Lesions from CT Images. https://ieeexplore.ieee.org/document/9109297

__init__(spatial_dims, in_channels, conv_out_channels, kernel_sizes=(1, 3, 3, 3), dilations=(1, 2, 4, 6), norm_type='BATCH', acti_type='LEAKYRELU', bias=False)[source]#

Parameters:

spatial_dims – number of spatial dimensions, could be 1, 2, or 3.
in_channels – number of input channels.
conv_out_channels – number of output channels of each atrous conv. The final number of output channels is conv_out_channels * len(kernel_sizes).
kernel_sizes – a sequence of four convolutional kernel sizes. Defaults to (1, 3, 3, 3) for four (dilated) convolutions.
dilations – a sequence of four convolutional dilation parameters. Defaults to (1, 2, 4, 6) for four (dilated) convolutions.
norm_type – final kernel-size-one convolution normalization type. Defaults to batch norm.
acti_type – final kernel-size-one convolution activation type. Defaults to leaky ReLU.
bias – whether to have a bias term in convolution blocks. Defaults to False. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.

Raises:

ValueError – When kernel_sizes length differs from dilations.

forward(x)[source]#

Parameters:: x (Tensor) – in shape (batch, channel, spatial_1[, spatial_2, …]).
Return type:: Tensor

MaxAvgPooling#

class monai.networks.blocks.MaxAvgPool(spatial_dims, kernel_size, stride=None, padding=0, ceil_mode=False)[source]#

Downsample with both maxpooling and avgpooling, double the channel size by concatenating the downsampled feature maps.

__init__(spatial_dims, kernel_size, stride=None, padding=0, ceil_mode=False)[source]#

Parameters:

spatial_dims – number of spatial dimensions of the input image.
kernel_size – the kernel size of both pooling operations.
stride – the stride of the window. Default value is kernel_size.
padding – implicit zero padding to be added to both pooling operations.
ceil_mode – when True, will use ceil instead of floor to compute the output shape.

forward(x)[source]#

Parameters:: x (Tensor) – Tensor in shape (batch, channel, spatial_1[, spatial_2, …]).
Return type:: Tensor
Returns:: Tensor in shape (batch, 2*channel, spatial_1[, spatial_2, …]).

Upsampling#

class monai.networks.blocks.UpSample(spatial_dims, in_channels=None, out_channels=None, scale_factor=2, kernel_size=None, size=None, mode=deconv, pre_conv='default', interp_mode=linear, align_corners=True, bias=True, apply_pad_pool=True)[source]#

Upsamples data by scale_factor. Supported modes are:

“deconv”: uses a transposed convolution.

“deconvgroup”: uses a transposed group convolution.

“nontrainable”: uses torch.nn.Upsample.

“pixelshuffle”: uses monai.networks.blocks.SubpixelUpsample.

This operation will cause non-deterministic when mode is UpsampleMode.NONTRAINABLE. Please check the link below for more details: https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms This module can optionally take a pre-convolution (often used to map the number of features from in_channels to out_channels).

__init__(spatial_dims, in_channels=None, out_channels=None, scale_factor=2, kernel_size=None, size=None, mode=deconv, pre_conv='default', interp_mode=linear, align_corners=True, bias=True, apply_pad_pool=True)[source]#

Parameters:

spatial_dims – number of spatial dimensions of the input image.
in_channels – number of channels of the input image.
out_channels – number of channels of the output image. Defaults to in_channels.
scale_factor – multiplier for spatial size. Has to match input size if it is a tuple. Defaults to 2.
kernel_size – kernel size used during transposed convolutions. Defaults to scale_factor.
size – spatial size of the output image. Only used when mode is UpsampleMode.NONTRAINABLE. In torch.nn.functional.interpolate, only one of size or scale_factor should be defined, thus if size is defined, scale_factor will not be used. Defaults to None.
mode – {"deconv", "deconvgroup", "nontrainable", "pixelshuffle"}. Defaults to "deconv".
pre_conv – a conv block applied before upsampling. Defaults to “default”. When conv_block is "default", one reserved conv layer will be utilized when Only used in the “nontrainable” or “pixelshuffle” mode.
interp_mode – {"nearest", "linear", "bilinear", "bicubic", "trilinear"} Only used in the “nontrainable” mode. If ends with "linear" will use spatial dims to determine the correct interpolation. This corresponds to linear, bilinear, trilinear for 1D, 2D, and 3D respectively. The interpolation mode. Defaults to "linear". See also: https://pytorch.org/docs/stable/generated/torch.nn.Upsample.html
align_corners – set the align_corners parameter of torch.nn.Upsample. Defaults to True. Only used in the “nontrainable” mode.
bias – whether to have a bias term in the default preconv and deconv layers. Defaults to True.
apply_pad_pool – if True the upsampled tensor is padded then average pooling is applied with a kernel the size of scale_factor with a stride of 1. See also: monai.networks.blocks.SubpixelUpsample. Only used in the “pixelshuffle” mode.

monai.networks.blocks.Upsample#: alias of UpSample

class monai.networks.blocks.SubpixelUpsample(spatial_dims, in_channels, out_channels=None, scale_factor=2, conv_block='default', apply_pad_pool=True, bias=True)[source]#

Upsample via using a subpixel CNN. This module supports 1D, 2D and 3D input images. The module is consisted with two parts. First of all, a convolutional layer is employed to increase the number of channels into: in_channels * (scale_factor ** dimensions). Secondly, a pixel shuffle manipulation is utilized to aggregates the feature maps from low resolution space and build the super resolution space. The first part of the module is not fixed, a sequential layers can be used to replace the default single layer.

See: Shi et al., 2016, “Real-Time Single Image and Video Super-Resolution Using a nEfficient Sub-Pixel Convolutional Neural Network.”

See: Aitken et al., 2017, “Checkerboard artifact free sub-pixel convolution”.

The idea comes from: https://arxiv.org/abs/1609.05158

The pixel shuffle mechanism refers to: https://pytorch.org/docs/stable/generated/torch.nn.PixelShuffle.html#torch.nn.PixelShuffle. and: pytorch/pytorch#6340.

__init__(spatial_dims, in_channels, out_channels=None, scale_factor=2, conv_block='default', apply_pad_pool=True, bias=True)[source]#

Parameters:

spatial_dims – number of spatial dimensions of the input image.
in_channels – number of channels of the input image.
out_channels – optional number of channels of the output image.
scale_factor – multiplier for spatial size. Defaults to 2.
conv_block –
a conv block to extract feature maps before upsampling. Defaults to None.
- When conv_block is "default", one reserved conv layer will be utilized.
- When conv_block is an nn.module, please ensure the output number of channels is divisible (scale_factor ** dimensions).
apply_pad_pool – if True the upsampled tensor is padded then average pooling is applied with a kernel the size of scale_factor with a stride of 1. This implements the nearest neighbour resize convolution component of subpixel convolutions described in Aitken et al.
bias – whether to have a bias term in the default conv_block. Defaults to True.

forward(x)[source]#

Parameters:: x (Tensor) – Tensor in shape (batch, channel, spatial_1[, spatial_2, …).
Return type:: Tensor

monai.networks.blocks.Subpixelupsample#: alias of SubpixelUpsample

monai.networks.blocks.SubpixelUpSample#: alias of SubpixelUpsample

Registration Residual Conv Block#

class monai.networks.blocks.RegistrationResidualConvBlock(spatial_dims, in_channels, out_channels, num_layers=2, kernel_size=3)[source]#

A block with skip links and layer - norm - activation. Only changes the number of channels, the spatial size is kept same.

__init__(spatial_dims, in_channels, out_channels, num_layers=2, kernel_size=3)[source]#

Parameters:

spatial_dims (int) – number of spatial dimensions
in_channels (int) – number of input channels
out_channels (int) – number of output channels
num_layers (int) – number of layers inside the block
kernel_size (int) – kernel_size

forward(x)[source]#

Parameters:: x (Tensor) – Tensor in shape (batch, in_channels, insize_1, insize_2, [insize_3])
Return type:: Tensor
Returns:: Tensor in shape (batch, out_channels, insize_1, insize_2, [insize_3]), with the same spatial size as x

Registration Down Sample Block#

class monai.networks.blocks.RegistrationDownSampleBlock(spatial_dims, channels, pooling)[source]#

A down-sample module used in RegUNet to half the spatial size. The number of channels is kept same.

Adapted from:: DeepReg (DeepRegNet/DeepReg)

__init__(spatial_dims, channels, pooling)[source]#

Parameters:

spatial_dims (int) – number of spatial dimensions.
channels (int) – channels
pooling (bool) – use MaxPool if True, strided conv if False

forward(x)[source]#

Halves the spatial dimensions and keeps the same channel. output in shape (batch, channels, insize_1 / 2, insize_2 / 2, [insize_3 / 2]),

Parameters:: x (Tensor) – Tensor in shape (batch, channels, insize_1, insize_2, [insize_3])
Raises:: ValueError – when input spatial dimensions are not even.
Return type:: Tensor

Registration Extraction Block#

class monai.networks.blocks.RegistrationExtractionBlock(spatial_dims, extract_levels, num_channels, out_channels, kernel_initializer='kaiming_uniform', activation=None, mode='nearest', align_corners=None)[source]#

The Extraction Block used in RegUNet. Extracts feature from each extract_levels and takes the average.

__init__(spatial_dims, extract_levels, num_channels, out_channels, kernel_initializer='kaiming_uniform', activation=None, mode='nearest', align_corners=None)[source]#

Parameters:

spatial_dims – number of spatial dimensions
extract_levels – spatial levels to extract feature from, 0 refers to the input scale
num_channels – number of channels at each scale level, List or Tuple of length equals to depth of the RegNet
out_channels – number of output channels
kernel_initializer – kernel initializer
activation – kernel activation function
mode – feature map interpolation mode, default to “nearest”.
align_corners – whether to align corners for feature map interpolation.

forward(x, image_size)[source]#

Parameters:

x (list[Tensor]) – Decoded feature at different spatial levels, sorted from deep to shallow
image_size (list[int]) – output image size

Return type:

Tensor

Returns:

Tensor of shape (batch, out_channels, size1, size2, size3), where (size1, size2, size3) = image_size

LocalNet DownSample Block#

class monai.networks.blocks.LocalNetDownSampleBlock(spatial_dims, in_channels, out_channels, kernel_size)[source]#

A down-sample module that can be used for LocalNet, based on: Weakly-supervised convolutional neural networks for multimodal image registration. Label-driven weakly-supervised learning for multimodal deformable image registration.

Adapted from:: DeepReg (DeepRegNet/DeepReg)

__init__(spatial_dims, in_channels, out_channels, kernel_size)[source]#

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
kernel_size – convolution kernel size.

Raises:

NotImplementedError – when kernel_size is even

forward(x)[source]#

Halves the spatial dimensions. A tuple of (x, mid) is returned:

x is the downsample result, in shape (batch, out_channels, insize_1 / 2, insize_2 / 2, [insize_3 / 2]),

mid is the mid-level feature, in shape (batch, out_channels, insize_1, insize_2, [insize_3])

Parameters:: x – Tensor in shape (batch, in_channels, insize_1, insize_2, [insize_3])
Raises:: ValueError – when input spatial dimensions are not even.
Return type:: tuple[Tensor, Tensor]

LocalNet UpSample Block#

class monai.networks.blocks.LocalNetUpSampleBlock(spatial_dims, in_channels, out_channels, mode='nearest', align_corners=None)[source]#

An up-sample module that can be used for LocalNet, based on: Weakly-supervised convolutional neural networks for multimodal image registration. Label-driven weakly-supervised learning for multimodal deformable image registration.

Adapted from:: DeepReg (DeepRegNet/DeepReg)

__init__(spatial_dims, in_channels, out_channels, mode='nearest', align_corners=None)[source]#

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
mode – interpolation mode of the additive upsampling, default to ‘nearest’.
align_corners – whether to align corners for the additive upsampling, default to None.

Raises:

ValueError – when in_channels != 2 * out_channels

forward(x, mid)[source]#

Halves the channel and doubles the spatial dimensions.

Parameters:

x – feature to be up-sampled, in shape (batch, in_channels, insize_1, insize_2, [insize_3])
mid – mid-level feature saved during down-sampling, in shape (batch, out_channels, midsize_1, midsize_2, [midsize_3])

Raises:

ValueError – when midsize != insize * 2

Return type:

Tensor

LocalNet Feature Extractor Block#

class monai.networks.blocks.LocalNetFeatureExtractorBlock(spatial_dims, in_channels, out_channels, act='RELU', initializer='kaiming_uniform')[source]#

A feature-extraction module that can be used for LocalNet, based on: Weakly-supervised convolutional neural networks for multimodal image registration. Label-driven weakly-supervised learning for multimodal deformable image registration.

Adapted from:: DeepReg (DeepRegNet/DeepReg)

__init__(spatial_dims, in_channels, out_channels, act='RELU', initializer='kaiming_uniform')[source]#: Args: spatial_dims: number of spatial dimensions. in_channels: number of input channels. out_channels: number of output channels. act: activation type and arguments. Defaults to ReLU. kernel_initializer: kernel initializer. Defaults to None.

forward(x)[source]#

Parameters:: x – Tensor in shape (batch, in_channels, insize_1, insize_2, [insize_3])
Return type:: Tensor

MLP Block#

class monai.networks.blocks.MLPBlock(hidden_size, mlp_dim, dropout_rate=0.0, act='GELU', dropout_mode='vit')[source]#

A multi-layer perceptron block, based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”

__init__(hidden_size, mlp_dim, dropout_rate=0.0, act='GELU', dropout_mode='vit')[source]#

Parameters:

hidden_size – dimension of hidden layer.
mlp_dim – dimension of feedforward layer. If 0, hidden_size will be used.
dropout_rate – fraction of the input units to drop.
act – activation type and arguments. Defaults to GELU. Also supports “GEGLU” and others.
dropout_mode – dropout mode, can be “vit” or “swin”. “vit” mode uses two dropout instances as implemented in google-research/vision_transformer “swin” corresponds to one instance as implemented in microsoft/Swin-Transformer

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Patch Embedding Block#

class monai.networks.blocks.PatchEmbeddingBlock(in_channels, img_size, patch_size, hidden_size, num_heads, pos_embed='conv', proj_type='conv', pos_embed_type='learnable', dropout_rate=0.0, spatial_dims=3)[source]#

A patch embedding block, based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”

Example:

>>> from monai.networks.blocks import PatchEmbeddingBlock
>>> PatchEmbeddingBlock(in_channels=4, img_size=32, patch_size=8, hidden_size=32, num_heads=4,
>>>                     proj_type="conv", pos_embed_type="sincos")

__init__(in_channels, img_size, patch_size, hidden_size, num_heads, pos_embed='conv', proj_type='conv', pos_embed_type='learnable', dropout_rate=0.0, spatial_dims=3)[source]#

Parameters:

in_channels – dimension of input channels.
img_size – dimension of input image.
patch_size – dimension of patch size.
hidden_size – dimension of hidden layer.
num_heads – number of attention heads.
proj_type – patch embedding layer type.
pos_embed_type – position embedding layer type.
dropout_rate – fraction of the input units to drop.
spatial_dims – number of spatial dimensions.

Deprecated since version 1.4: pos_embed is deprecated in favor of proj_type.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

FactorizedIncreaseBlock#

class monai.networks.blocks.FactorizedIncreaseBlock(in_channel, out_channel, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}))[source]#

Up-sampling the features by two using linear interpolation and convolutions.

__init__(in_channel, out_channel, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}))[source]#

Parameters:

in_channel – number of input channels
out_channel – number of output channels
spatial_dims – number of spatial dimensions
act_name – activation layer type and arguments.
norm_name – feature normalization type and arguments.

FactorizedReduceBlock#

class monai.networks.blocks.FactorizedReduceBlock(in_channel, out_channel, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}))[source]#

Down-sampling the feature by 2 using stride. The length along each spatial dimension must be a multiple of 2.

__init__(in_channel, out_channel, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}))[source]#

Parameters:

in_channel – number of input channels
out_channel – number of output channels.
spatial_dims – number of spatial dimensions.
act_name – activation layer type and arguments.
norm_name – feature normalization type and arguments.

forward(x)[source]#

The length along each spatial dimension must be a multiple of 2.

Return type:: Tensor

P3DActiConvNormBlock#

class monai.networks.blocks.P3DActiConvNormBlock(in_channel, out_channel, kernel_size, padding, mode=0, act_name='RELU', norm_name=('INSTANCE', {'affine': True}))[source]#

– (act) – (conv) – (norm) –

__init__(in_channel, out_channel, kernel_size, padding, mode=0, act_name='RELU', norm_name=('INSTANCE', {'affine': True}))[source]#

Parameters:

in_channel – number of input channels.
out_channel – number of output channels.
kernel_size – kernel size to be expanded to 3D.
padding – padding size to be expanded to 3D.
mode –
mode for the anisotropic kernels:
- 0: (k, k, 1), (1, 1, k),
- 1: (k, 1, k), (1, k, 1),
- 2: (1, k, k). (k, 1, 1).
act_name – activation layer type and arguments.
norm_name – feature normalization type and arguments.

ActiConvNormBlock#

class monai.networks.blocks.ActiConvNormBlock(in_channel, out_channel, kernel_size=3, padding=1, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}))[source]#

– (Acti) – (Conv) – (Norm) –

__init__(in_channel, out_channel, kernel_size=3, padding=1, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}))[source]#

Parameters:

in_channel – number of input channels.
out_channel – number of output channels.
kernel_size – kernel size of the convolution.
padding – padding size of the convolution.
spatial_dims – number of spatial dimensions.
act_name – activation layer type and arguments.
norm_name – feature normalization type and arguments.

Warp#

class monai.networks.blocks.Warp(mode='bilinear', padding_mode='border', jitter=False)[source]#

Warp an image with given dense displacement field (DDF).

__init__(mode='bilinear', padding_mode='border', jitter=False)[source]#

For pytorch native APIs, the possible values are:

mode: "nearest", "bilinear", "bicubic".

padding_mode: "zeros", "border", "reflection"

For MONAI C++/CUDA extensions, the possible values are:

mode: "nearest", "bilinear", "bicubic", 0, 1, …

padding_mode: "zeros", "border", "reflection", 0, 1, …

jitter: bool, default=False
Define reference grid on non-integer values Reference: B. Likar and F. Pernus. A heirarchical approach to elastic registration based on mutual information. Image and Vision Computing, 19:33-44, 2001.

forward(image, ddf)[source]#

Parameters:

image (Tensor) – Tensor in shape (batch, num_channels, H, W[, D])
ddf (Tensor) – Tensor in the same spatial size as image, in shape (batch, spatial_dims, H, W[, D])

Returns:

warped_image in the same shape as image (batch, num_channels, H, W[, D])

DVF2DDF#

class monai.networks.blocks.DVF2DDF(num_steps=7, mode='bilinear', padding_mode='zeros')[source]#

Layer calculates a dense displacement field (DDF) from a dense velocity field (DVF) with scaling and squaring.

Adapted from:: DeepReg (DeepRegNet/DeepReg)

forward(dvf)[source]#

Parameters:: dvf (Tensor) – dvf to be transformed, in shape (batch, spatial_dims, H, W[,D])
Return type:: Tensor
Returns:: a dense displacement field

VarNetBlock#

class monai.apps.reconstruction.networks.blocks.varnetblock.VarNetBlock(refinement_model, spatial_dims=2)[source]#

A variational block based on Sriram et. al., “End-to-end variational networks for accelerated MRI reconstruction”. It applies data consistency and refinement to the intermediate kspace and combines those results.

Modified and adopted from: facebookresearch/fastMRI

Parameters:

refinement_model (Module) – the model used for refinement (typically a U-Net but can be any deep learning model that performs well when the input and output are in image domain (e.g., a convolutional network).
spatial_dims (int) – is 2 for 2D data and is 3 for 3D data

forward(current_kspace, ref_kspace, mask, sens_maps)[source]#

Parameters:

current_kspace (Tensor) – Predicted kspace from the previous block. It’s a 2D kspace (B,C,H,W,2) with the last dimension being 2 (for real/imaginary parts) and C denoting the coil dimension. 3D data will have the shape (B,C,H,W,D,2).
ref_kspace (Tensor) – reference kspace for applying data consistency (is the under-sampled kspace in MRI reconstruction). Its shape is the same as current_kspace.
mask (Tensor) – the under-sampling mask with shape (1,1,1,W,1) for 2D data or (1,1,1,1,D,1) for 3D data.
sens_maps (Tensor) – coil sensitivity maps with the same shape as current_kspace

Return type:

Tensor

Returns:

Output of VarNetBlock with the same shape as current_kspace

soft_dc(x, ref_kspace, mask)[source]#

Applies data consistency to input x. Suppose x is an intermediate estimate of the kspace and ref_kspace is the reference under-sampled measurement. This function returns mask * (x - ref_kspace). View this as the residual between the original under-sampled kspace and the estimate given by the network.

Parameters:

x (Tensor) – 2D kspace (B,C,H,W,2) with the last dimension being 2 (for real/imaginary parts) and C denoting the coil dimension. 3D data will have the shape (B,C,H,W,D,2).
ref_kspace (Tensor) – original under-sampled kspace with the same shape as x.
mask (Tensor) – the under-sampling mask with shape (1,1,1,W,1) for 2D data or (1,1,1,1,D,1) for 3D data.

Return type:

Tensor

Returns:

Output of DC block with the same shape as x

N-Dim Fourier Transform#

monai.networks.blocks.fft_utils_t.fftn_centered_t(im, spatial_dims, is_complex=True)[source]#

Pytorch-based fft for spatial_dims-dim signals. “centered” means this function automatically takes care of the required ifft and fft shifts. This is equivalent to do ifft in numpy based on numpy.fft.fftn, numpy.fft.fftshift, and numpy.fft.ifftshift

Parameters:

im (Tensor) – image that can be 1) real-valued: the shape is (C,H,W) for 2D spatial inputs and (C,H,W,D) for 3D, or 2) complex-valued: the shape is (C,H,W,2) for 2D spatial data and (C,H,W,D,2) for 3D. C is the number of channels.
spatial_dims (int) – number of spatial dimensions (e.g., is 2 for an image, and is 3 for a volume)
is_complex (bool) – if True, then the last dimension of the input im is expected to be 2 (representing real and imaginary channels)

Return type:

Tensor

Returns:

“out” which is the output kspace (fourier of im)

Example

import torch
im = torch.ones(1,3,3,2) # the last dim belongs to real/imaginary parts
# output1 and output2 will be identical
output1 = torch.fft.fftn(torch.view_as_complex(torch.fft.ifftshift(im,dim=(-3,-2))), dim=(-2,-1), norm="ortho")
output1 = torch.fft.fftshift( torch.view_as_real(output1), dim=(-3,-2) )

output2 = fftn_centered(im, spatial_dims=2, is_complex=True)

monai.networks.blocks.fft_utils_t.ifftn_centered_t(ksp, spatial_dims, is_complex=True)[source]#

Pytorch-based ifft for spatial_dims-dim signals. “centered” means this function automatically takes care of the required ifft and fft shifts. This is equivalent to do fft in numpy based on numpy.fft.ifftn, numpy.fft.fftshift, and numpy.fft.ifftshift

Parameters:

ksp (Tensor) – k-space data that can be 1) real-valued: the shape is (C,H,W) for 2D spatial inputs and (C,H,W,D) for 3D, or 2) complex-valued: the shape is (C,H,W,2) for 2D spatial data and (C,H,W,D,2) for 3D. C is the number of channels.
spatial_dims (int) – number of spatial dimensions (e.g., is 2 for an image, and is 3 for a volume)
is_complex (bool) – if True, then the last dimension of the input ksp is expected to be 2 (representing real and imaginary channels)

Return type:

Tensor

Returns:

“out” which is the output image (inverse fourier of ksp)

Example

import torch
ksp = torch.ones(1,3,3,2) # the last dim belongs to real/imaginary parts
# output1 and output2 will be identical
output1 = torch.fft.ifftn(torch.view_as_complex(torch.fft.ifftshift(ksp,dim=(-3,-2))), dim=(-2,-1), norm="ortho")
output1 = torch.fft.fftshift( torch.view_as_real(output1), dim=(-3,-2) )

output2 = ifftn_centered(ksp, spatial_dims=2, is_complex=True)

monai.networks.blocks.fft_utils_t.roll(x, shift, shift_dims)[source]#

Similar to np.roll but applies to PyTorch Tensors

Parameters:

x (Tensor) – input data (k-space or image) that can be 1) real-valued: the shape is (C,H,W) for 2D spatial inputs and (C,H,W,D) for 3D, or 2) complex-valued: the shape is (C,H,W,2) for 2D spatial data and (C,H,W,D,2) for 3D. C is the number of channels.
shift (list[int]) – the amount of shift along each of shift_dims dimensions
shift_dims (list[int]) – dimensions over which the shift is applied

Return type:

Tensor

Returns:

shifted version of x

Note

This function is called when fftshift and ifftshift are not available in the running pytorch version

monai.networks.blocks.fft_utils_t.roll_1d(x, shift, shift_dim)[source]#

Similar to roll but for only one dim.

Parameters:

x (Tensor) – input data (k-space or image) that can be 1) real-valued: the shape is (C,H,W) for 2D spatial inputs and (C,H,W,D) for 3D, or 2) complex-valued: the shape is (C,H,W,2) for 2D spatial data and (C,H,W,D,2) for 3D. C is the number of channels.
shift (int) – the amount of shift along each of shift_dims dimension
shift_dim (int) – the dimension over which the shift is applied

Return type:

Tensor

Returns:

1d-shifted version of x

Note

This function is called when fftshift and ifftshift are not available in the running pytorch version

monai.networks.blocks.fft_utils_t.fftshift(x, shift_dims)[source]#

Similar to np.fft.fftshift but applies to PyTorch Tensors

Parameters:

x (Tensor) – input data (k-space or image) that can be 1) real-valued: the shape is (C,H,W) for 2D spatial inputs and (C,H,W,D) for 3D, or 2) complex-valued: the shape is (C,H,W,2) for 2D spatial data and (C,H,W,D,2) for 3D. C is the number of channels.
shift_dims (list[int]) – dimensions over which the shift is applied

Return type:

Tensor

Returns:

fft-shifted version of x

Note

This function is called when fftshift is not available in the running pytorch version

monai.networks.blocks.fft_utils_t.ifftshift(x, shift_dims)[source]#

Similar to np.fft.ifftshift but applies to PyTorch Tensors

Parameters:

x (Tensor) – input data (k-space or image) that can be 1) real-valued: the shape is (C,H,W) for 2D spatial inputs and (C,H,W,D) for 3D, or 2) complex-valued: the shape is (C,H,W,2) for 2D spatial data and (C,H,W,D,2) for 3D. C is the number of channels.
shift_dims (list[int]) – dimensions over which the shift is applied

Return type:

Tensor

Returns:

ifft-shifted version of x

Note

This function is called when ifftshift is not available in the running pytorch version

Layers#

Factories#

Defines factories for creating layers in generic, extensible, and dimensionally independent ways. A separate factory object is created for each type of layer, and factory functions keyed to names are added to these objects. Whenever a layer is requested the factory name and any necessary arguments are passed to the factory object. The return value is typically a type but can be any callable producing a layer object.

The factory objects contain functions keyed to names converted to upper case, these names can be referred to as members of the factory so that they can function as constant identifiers. eg. instance normalization is named Norm.INSTANCE.

For example, to get a transpose convolution layer the name is needed and then a dimension argument is provided which is passed to the factory function:

dimension = 3
name = Conv.CONVTRANS
conv = Conv[name, dimension]

This allows the dimension value to be set in the constructor, for example so that the dimensionality of a network is parameterizable. Not all factories require arguments after the name, the caller must be aware which are required.

Defining new factories involves creating the object then associating it with factory functions:

fact = LayerFactory()

@fact.factory_function('test')
def make_something(x, y):
    # do something with x and y to choose which layer type to return
    return SomeLayerType
...

# request object from factory TEST with 1 and 2 as values for x and y
layer = fact[fact.TEST, 1, 2]

Typically the caller of a factory would know what arguments to pass (ie. the dimensionality of the requested type) but can be parameterized with the factory name and the arguments to pass to the created type at instantiation time:

def use_factory(fact_args):
    fact_name, type_args = split_args
    layer_type = fact[fact_name, 1, 2]
    return layer_type(**type_args)
...

kw_args = {'arg0':0, 'arg1':True}
layer = use_factory( (fact.TEST, kwargs) )

class monai.networks.layers.LayerFactory(name, description)[source]#

Factory object for creating layers, this uses given factory functions to actually produce the types or constructing callables. These functions are referred to by name and can be added at any time.

add_factory_callable(name, func, desc=None)[source]#: Add the factory function to this object under the given name, with optional description.

add_factory_class(name, cls, desc=None)[source]#: Adds a factory function which returns the supplied class under the given name, with optional description.

factory_function(name)[source]#

Decorator for adding a factory function with the given name.

Return type:: Callable

get_constructor(factory_name, *args)[source]#

Get the constructor for the given factory name and arguments.

Raises:: TypeError – When factory_name is not a str.
Return type:: Any

split_args#

monai.networks.layers.split_args(args)[source]#

Split arguments in a way to be suitable for using with the factory types. If args is a string it’s interpreted as the type name.

Parameters:: args (str or a tuple of object name and kwarg dict) – input arguments to be parsed.
Raises:: TypeError – When args type is not in Union[str, Tuple[Union[str, Callable], dict]].

Examples:

>>> act_type, args = split_args("PRELU")
>>> monai.networks.layers.Act[act_type]
<class 'torch.nn.modules.activation.PReLU'>

>>> act_type, args = split_args(("PRELU", {"num_parameters": 1, "init": 0.25}))
>>> monai.networks.layers.Act[act_type](**args)
PReLU(num_parameters=1)

Dropout#

Layer Factory ‘Dropout layers’: Factory for creating dropout layers. Please see monai.networks.layers.split_args for additional args parsing.

The supported members are: dropout, alphadropout

Act#

Layer Factory ‘Activation layers’: Factory for creating activation layers. Please see monai.networks.layers.split_args for additional args parsing.

The supported members are: elu, relu, leakyrelu, prelu, relu6, selu, celu, gelu, sigmoid, tanh, softmax, logsoftmax, swish, memswish, mish, geglu

Norm#

Layer Factory ‘Normalization layers’: Factory for creating normalization layers. Please see monai.networks.layers.split_args for additional args parsing.

The supported members are: instance, batch, instance_nvfuser, group, layer, localresponse, syncbatch

Conv#

Layer Factory ‘Convolution layers’: Factory for creating convolution layers. Please see monai.networks.layers.split_args for additional args parsing.

The supported members are: conv, convtrans

Pad#

Layer Factory ‘Padding layers’: Factory for creating padding layers. Please see monai.networks.layers.split_args for additional args parsing.

The supported members are: replicationpad, constantpad

Pool#

Layer Factory ‘Pooling layers’: Factory for creating pooling layers. Please see monai.networks.layers.split_args for additional args parsing.

The supported members are: max, adaptivemax, avg, adaptiveavg

ChannelPad#

class monai.networks.layers.ChannelPad(spatial_dims, in_channels, out_channels, mode=pad)[source]#

Expand the input tensor’s channel dimension from length in_channels to out_channels, by padding or a projection.

__init__(spatial_dims, in_channels, out_channels, mode=pad)[source]#

Parameters:

spatial_dims – number of spatial dimensions of the input image.
in_channels – number of input channels.
out_channels – number of output channels.
mode –
{"pad", "project"} Specifies handling residual branch and conv branch channel mismatches. Defaults to "pad".
- "pad": with zero padding.
- "project": with a trainable conv with kernel size one.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

SkipConnection#

class monai.networks.layers.SkipConnection(submodule, dim=1, mode='cat')[source]#

Combine the forward pass input with the result from the given submodule:

--+--submodule--o--
  |_____________|

The available modes are "cat", "add", "mul".

__init__(submodule, dim=1, mode='cat')[source]#

Parameters:

submodule – the module defines the trainable branch.
dim – the dimension over which the tensors are concatenated. Used when mode is "cat".
mode – "cat", "add", "mul". defaults to "cat".

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

Flatten#

class monai.networks.layers.Flatten(*args, **kwargs)[source]#

Flattens the given input in the forward pass to be [B,-1] in shape.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

Reshape#

class monai.networks.layers.Reshape(*shape)[source]#

Reshapes input tensors to the given shape (minus batch dimension), retaining original batch size.

__init__(*shape)[source]#

Given a shape list/tuple shape of integers (s0, s1, … , sn), this layer will reshape input tensors of shape (batch, s0 * s1 * … * sn) to shape (batch, s0, s1, … , sn).

Parameters:: shape (int) – list/tuple of integer shape dimensions

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

separable_filtering#

monai.networks.layers.separable_filtering(x, kernels, mode='zeros')[source]#

Apply 1-D convolutions along each spatial dimension of x.

Parameters:

x (Tensor) – the input image. must have shape (batch, channels, H[, W, …]).
kernels (list[Tensor]) – kernel along each spatial dimension. could be a single kernel (duplicated for all spatial dimensions), or a list of spatial_dims number of kernels.
mode (string, optional) – padding mode passed to convolution class. 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'. See torch.nn.Conv1d() for more information.

Raises:

TypeError – When x is not a torch.Tensor.

Examples:

>>> import torch
>>> from monai.networks.layers import separable_filtering
>>> img = torch.randn(2, 4, 32, 32)  # batch_size 2, channels 4, 32x32 2D images
# applying a [-1, 0, 1] filter along each of the spatial dimensions.
# the output shape is the same as the input shape.
>>> out = separable_filtering(img, torch.tensor((-1., 0., 1.)))
# applying `[-1, 0, 1]`, `[1, 0, -1]` filters along two spatial dimensions respectively.
# the output shape is the same as the input shape.
>>> out = separable_filtering(img, [torch.tensor((-1., 0., 1.)), torch.tensor((1., 0., -1.))])

Return type:: Tensor

apply_filter#

monai.networks.layers.apply_filter(x, kernel, **kwargs)[source]#

Filtering x with kernel independently for each batch and channel respectively.

Parameters:

x (Tensor) – the input image, must have shape (batch, channels, H[, W, D]).
kernel (Tensor) – kernel must at least have the spatial shape (H_k[, W_k, D_k]). kernel shape must be broadcastable to the batch and channels dimensions of x.
kwargs – keyword arguments passed to conv*d() functions.

Return type:

Tensor

Returns:

The filtered x.

Examples:

>>> import torch
>>> from monai.networks.layers import apply_filter
>>> img = torch.rand(2, 5, 10, 10)  # batch_size 2, channels 5, 10x10 2D images
>>> out = apply_filter(img, torch.rand(3, 3))   # spatial kernel
>>> out = apply_filter(img, torch.rand(5, 3, 3))  # channel-wise kernels
>>> out = apply_filter(img, torch.rand(2, 5, 3, 3))  # batch-, channel-wise kernels

GaussianFilter#

class monai.networks.layers.GaussianFilter(spatial_dims, sigma, truncated=4.0, approx='erf', requires_grad=False)[source]#

__init__(spatial_dims, sigma, truncated=4.0, approx='erf', requires_grad=False)[source]#

Parameters:

spatial_dims – number of spatial dimensions of the input image. must have shape (Batch, channels, H[, W, …]).
sigma – std. could be a single value, or spatial_dims number of values.
truncated – spreads how many stds.
approx –
discrete Gaussian kernel type, available options are “erf”, “sampled”, and “scalespace”.
- erf approximation interpolates the error function;
- sampled uses a sampled Gaussian kernel;
- scalespace corresponds to https://en.wikipedia.org/wiki/Scale_space_implementation#The_discrete_Gaussian_kernel based on the modified Bessel functions.
requires_grad – whether to store the gradients for sigma. if True, sigma will be the initial value of the parameters of this module (for example parameters() iterator could be used to get the parameters); otherwise this module will fix the kernels using sigma as the std.

forward(x)[source]#

Parameters:: x (Tensor) – in shape [Batch, chns, H, W, D].
Return type:: Tensor

MedianFilter#

class monai.networks.layers.MedianFilter(radius, spatial_dims=3, device='cpu')[source]#

Apply median filter to an image.

Parameters:: radius – the blurring kernel radius (radius of 1 corresponds to 3x3x3 kernel when spatial_dims=3).
Returns:: filtered input tensor.

Example:

>>> from monai.networks.layers import MedianFilter
>>> import torch
>>> in_tensor = torch.rand(4, 5, 7, 6)
>>> blur = MedianFilter([1, 1, 1])  # 3x3x3 kernel
>>> output = blur(in_tensor)
>>> output.shape
torch.Size([4, 5, 7, 6])

forward(in_tensor, number_of_passes=1)[source]#

Parameters:

in_tensor (Tensor) – input tensor, median filtering will be applied to the last spatial_dims dimensions.
number_of_passes – median filtering will be repeated this many times

Return type:

Tensor

median_filter#

class monai.networks.layers.median_filter(in_tensor, kernel_size=(3, 3, 3), spatial_dims=3, kernel=None, **kwargs)[source]#

Apply median filter to an image.

Parameters:

in_tensor – input tensor; median filtering will be applied to the last spatial_dims dimensions.
kernel_size – the convolution kernel size.
spatial_dims – number of spatial dimensions to apply median filtering.
kernel – an optional customized kernel.
kwargs – additional parameters to the conv.

Returns:

the filtered input tensor, shape remains the same as in_tensor

Example:

>>> from monai.networks.layers import median_filter
>>> import torch
>>> x = torch.rand(4, 5, 7, 6)
>>> output = median_filter(x, (3, 3, 3))
>>> output.shape
torch.Size([4, 5, 7, 6])

BilateralFilter#

class monai.networks.layers.BilateralFilter(*args, **kwargs)[source]#

Blurs the input tensor spatially whilst preserving edges. Can run on 1D, 2D, or 3D, tensors (on top of Batch and Channel dimensions). Two implementations are provided, an exact solution and a much faster approximation which uses a permutohedral lattice.

See:: https://en.wikipedia.org/wiki/Bilateral_filter https://graphics.stanford.edu/papers/permutohedral/

Parameters:

input – input tensor.
spatial_sigma – the standard deviation of the spatial blur. Higher values can hurt performance when not using the approximate method (see fast approx).
color_sigma – the standard deviation of the color blur. Lower values preserve edges better whilst higher values tend to a simple gaussian spatial blur.
approx (fast) – This flag chooses between two implementations. The approximate method may produce artifacts in some scenarios whereas the exact solution may be intolerably slow for high spatial standard deviations.

Returns:

output tensor.

Return type:

output (torch.Tensor)

static backward(ctx, grad_output)[source]#: autograd backward

static forward(ctx, input, spatial_sigma=5, color_sigma=0.5, fast_approx=True)[source]#: autograd forward

TrainableBilateralFilter#

class monai.networks.layers.TrainableBilateralFilter(spatial_sigma, color_sigma)[source]#

Implementation of a trainable bilateral filter layer as proposed in the corresponding publication. All filter parameters can be trained data-driven. The spatial filter kernels x, y, and z determine image smoothing whereas the color parameter specifies the amount of edge preservation. Can run on 1D, 2D, or 3D tensors (on top of Batch and Channel dimensions).

See:: F. Wagner, et al., Ultralow-parameter denoising: Trainable bilateral filter layers in computed tomography, Medical Physics (2022), https://doi.org/10.1002/mp.15718

Parameters:

input – input tensor to be filtered.
spatial_sigma – tuple (sigma_x, sigma_y, sigma_z) initializing the trainable standard deviations of the spatial filter kernels. Tuple length must equal the number of spatial input dimensions.
color_sigma – trainable standard deviation of the intensity range kernel. This filter parameter determines the degree of edge preservation.

Returns:

filtered tensor.

Return type:

output (torch.Tensor)

forward(input_tensor)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

TrainableJointBilateralFilter#

class monai.networks.layers.TrainableJointBilateralFilter(spatial_sigma, color_sigma)[source]#

Implementation of a trainable joint bilateral filter layer as proposed in the corresponding publication. The guidance image is used as additional (edge) information during filtering. All filter parameters and the guidance image can be trained data-driven. The spatial filter kernels x, y, and z determine image smoothing whereas the color parameter specifies the amount of edge preservation. Can run on 1D, 2D, or 3D tensors (on top of Batch and Channel dimensions). Input tensor shape must match guidance tensor shape.

See:: F. Wagner, et al., Trainable joint bilateral filters for enhanced prediction stability in low-dose CT, Scientific Reports (2022), https://doi.org/10.1038/s41598-022-22530-4

Parameters:

input – input tensor to be filtered.
guide – guidance image tensor to be used during filtering.
spatial_sigma – tuple (sigma_x, sigma_y, sigma_z) initializing the trainable standard deviations of the spatial filter kernels. Tuple length must equal the number of spatial input dimensions.
color_sigma – trainable standard deviation of the intensity range kernel. This filter parameter determines the degree of edge preservation.

Returns:

filtered tensor.

Return type:

output (torch.Tensor)

forward(input_tensor, guidance_tensor)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

PHLFilter#

class monai.networks.layers.PHLFilter(*args, **kwargs)[source]#

Filters input based on arbitrary feature vectors. Uses a permutohedral lattice data structure to efficiently approximate n-dimensional gaussian filtering. Complexity is broadly independent of kernel size. Most applicable to higher filter dimensions and larger kernel sizes.

See:: https://graphics.stanford.edu/papers/permutohedral/

Parameters:

input – input tensor to be filtered.
features – feature tensor used to filter the input.
sigmas – the standard deviations of each feature in the filter.

Returns:

output tensor.

Return type:

output (torch.Tensor)

GaussianMixtureModel#

class monai.networks.layers.GaussianMixtureModel(channel_count, mixture_count, mixture_size, verbose_build=False)[source]#

Takes an initial labeling and uses a mixture of Gaussians to approximate each classes distribution in the feature space. Each unlabeled element is then assigned a probability of belonging to each class based on it’s fit to each classes approximated distribution.

See:: https://en.wikipedia.org/wiki/Mixture_model

SavitzkyGolayFilter#

class monai.networks.layers.SavitzkyGolayFilter(window_length, order, axis=2, mode='zeros')[source]#

Convolve a Tensor along a particular axis with a Savitzky-Golay kernel.

Parameters:

window_length (int) – Length of the filter window, must be a positive odd integer.
order (int) – Order of the polynomial to fit to each window, must be less than window_length.
axis (optional) – Axis along which to apply the filter kernel. Default 2 (first spatial dimension).
mode (string, optional) – padding mode passed to convolution class. 'zeros', 'reflect', 'replicate' or
Default ('circular'.) – 'zeros'. See torch.nn.Conv1d() for more information.

forward(x)[source]#

Parameters:: x (Tensor) – Tensor or array-like to filter. Must be real, in shape [Batch, chns, spatial1, spatial2, ...] and have a device type of 'cpu'.
Returns:: x filtered by Savitzky-Golay kernel with window length self.window_length using polynomials of order self.order, along axis specified in self.axis.
Return type:: torch.Tensor

HilbertTransform#

class monai.networks.layers.HilbertTransform(axis=2, n=None)[source]#

Determine the analytical signal of a Tensor along a particular axis.

Parameters:

axis – Axis along which to apply Hilbert transform. Default 2 (first spatial dimension).
n – Number of Fourier components (i.e. FFT size). Default: x.shape[axis].

forward(x)[source]#

Parameters:: x (Tensor) – Tensor or array-like to transform. Must be real and in shape [Batch, chns, spatial1, spatial2, ...].
Returns:: Analytical signal of x, transformed along axis specified in self.axis using FFT of size self.N. The absolute value of x_ht relates to the envelope of x along axis self.axis.
Return type:: torch.Tensor

Affine Transform#

class monai.networks.layers.AffineTransform(spatial_size=None, normalized=False, mode=bilinear, padding_mode=zeros, align_corners=True, reverse_indexing=True, zero_centered=None)[source]#

__init__(spatial_size=None, normalized=False, mode=bilinear, padding_mode=zeros, align_corners=True, reverse_indexing=True, zero_centered=None)[source]#

Apply affine transformations with a batch of affine matrices.

When normalized=False and reverse_indexing=True, it does the commonly used resampling in the ‘pull’ direction following the scipy.ndimage.affine_transform convention. In this case theta is equivalent to (ndim+1, ndim+1) input matrix of scipy.ndimage.affine_transform, operates on homogeneous coordinates. See also: https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.affine_transform.html

When normalized=True and reverse_indexing=False, it applies theta to the normalized coordinates (coords. in the range of [-1, 1]) directly. This is often used with align_corners=False to achieve resolution-agnostic resampling, thus useful as a part of trainable modules such as the spatial transformer networks. See also: https://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html

Parameters:

spatial_size – output spatial shape, the full output shape will be [N, C, *spatial_size] where N and C are inferred from the src input of self.forward.
normalized – indicating whether the provided affine matrix theta is defined for the normalized coordinates. If normalized=False, theta will be converted to operate on normalized coordinates as pytorch affine_grid works with the normalized coordinates.
mode – {"bilinear", "nearest"} Interpolation mode to calculate output values. Defaults to "bilinear". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html
padding_mode – {"zeros", "border", "reflection"} Padding mode for outside grid values. Defaults to "zeros". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html
align_corners – see also https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html.
reverse_indexing – whether to reverse the spatial indexing of image and coordinates. set to False if theta follows pytorch’s default “D, H, W” convention. set to True if theta follows scipy.ndimage default “i, j, k” convention.
zero_centered – whether the affine is applied to coordinates in a zero-centered value range. With zero_centered=True, for example, the center of rotation will be the spatial center of the input; with zero_centered=False, the center of rotation will be the origin of the input. This option is only available when normalized=False, where the default behaviour is False if unspecified. See also: monai.networks.utils.normalize_transform().

forward(src, theta, spatial_size=None)[source]#

theta must be an affine transformation matrix with shape 3x3 or Nx3x3 or Nx2x3 or 2x3 for spatial 2D transforms, 4x4 or Nx4x4 or Nx3x4 or 3x4 for spatial 3D transforms, where N is the batch size. theta will be converted into float Tensor for the computation.

Parameters:

src (array_like) – image in spatial 2D or 3D (N, C, spatial_dims), where N is the batch dim, C is the number of channels.
theta (array_like) – Nx3x3, Nx2x3, 3x3, 2x3 for spatial 2D inputs, Nx4x4, Nx3x4, 3x4, 4x4 for spatial 3D inputs. When the batch dimension is omitted, theta will be repeated N times, N is the batch dim of src.
spatial_size – output spatial shape, the full output shape will be [N, C, *spatial_size] where N and C are inferred from the src.

Raises:

TypeError – When theta is not a torch.Tensor.
ValueError – When theta is not one of [Nxdxd, dxd].
ValueError – When theta is not one of [Nx3x3, Nx4x4].
TypeError – When src is not a torch.Tensor.
ValueError – When src spatially is not one of [2D, 3D].
ValueError – When affine and image batch dimension differ.

grid_pull#

monai.networks.layers.grid_pull(input, grid, interpolation='linear', bound='zero', extrapolate=True)[source]#

Sample an image with respect to a deformation field.

interpolation can be an int, a string or an InterpolationType. Possible values are:

- 0 or 'nearest'    or InterpolationType.nearest
- 1 or 'linear'     or InterpolationType.linear
- 2 or 'quadratic'  or InterpolationType.quadratic
- 3 or 'cubic'      or InterpolationType.cubic
- 4 or 'fourth'     or InterpolationType.fourth
- 5 or 'fifth'      or InterpolationType.fifth
- 6 or 'sixth'      or InterpolationType.sixth
- 7 or 'seventh'    or InterpolationType.seventh

A list of values can be provided, in the order [W, H, D], to specify dimension-specific interpolation orders.

bound can be an int, a string or a BoundType. Possible values are:

- 0 or 'replicate' or 'nearest'      or BoundType.replicate or 'border'
- 1 or 'dct1'      or 'mirror'       or BoundType.dct1
- 2 or 'dct2'      or 'reflect'      or BoundType.dct2
- 3 or 'dst1'      or 'antimirror'   or BoundType.dst1
- 4 or 'dst2'      or 'antireflect'  or BoundType.dst2
- 5 or 'dft'       or 'wrap'         or BoundType.dft
- 7 or 'zero'      or 'zeros'        or BoundType.zero

A list of values can be provided, in the order [W, H, D], to specify dimension-specific boundary conditions. sliding is a specific condition than only applies to flow fields (with as many channels as dimensions). It cannot be dimension-specific. Note that:

dft corresponds to circular padding

dct2 corresponds to Neumann boundary conditions (symmetric)

dst2 corresponds to Dirichlet boundary conditions (antisymmetric)

See also

https://en.wikipedia.org/wiki/Discrete_cosine_transform
https://en.wikipedia.org/wiki/Discrete_sine_transform
help(monai._C.BoundType)
help(monai._C.InterpolationType)

Parameters:

input (Tensor) – Input image. (B, C, Wi, Hi, Di).
grid (Tensor) – Deformation field. (B, Wo, Ho, Do, 1|2|3).
interpolation (int or list[int] , optional) – Interpolation order. Defaults to ‘linear’.
bound (BoundType, or list[BoundType], optional) – Boundary conditions. Defaults to ‘zero’.
extrapolate (bool) – Extrapolate out-of-bound data. Defaults to True.

Returns:

Deformed image (B, C, Wo, Ho, Do).

Return type:

output (torch.Tensor)

grid_push#

monai.networks.layers.grid_push(input, grid, shape=None, interpolation='linear', bound='zero', extrapolate=True)[source]#

Splat an image with respect to a deformation field (pull adjoint).

interpolation can be an int, a string or an InterpolationType. Possible values are:

- 0 or 'nearest'    or InterpolationType.nearest
- 1 or 'linear'     or InterpolationType.linear
- 2 or 'quadratic'  or InterpolationType.quadratic
- 3 or 'cubic'      or InterpolationType.cubic
- 4 or 'fourth'     or InterpolationType.fourth
- 5 or 'fifth'      or InterpolationType.fifth
- 6 or 'sixth'      or InterpolationType.sixth
- 7 or 'seventh'    or InterpolationType.seventh

A list of values can be provided, in the order [W, H, D], to specify dimension-specific interpolation orders.

bound can be an int, a string or a BoundType. Possible values are:

- 0 or 'replicate' or 'nearest'      or BoundType.replicate
- 1 or 'dct1'      or 'mirror'       or BoundType.dct1
- 2 or 'dct2'      or 'reflect'      or BoundType.dct2
- 3 or 'dst1'      or 'antimirror'   or BoundType.dst1
- 4 or 'dst2'      or 'antireflect'  or BoundType.dst2
- 5 or 'dft'       or 'wrap'         or BoundType.dft
- 7 or 'zero'                        or BoundType.zero

dft corresponds to circular padding

dct2 corresponds to Neumann boundary conditions (symmetric)

dst2 corresponds to Dirichlet boundary conditions (antisymmetric)

See also

https://en.wikipedia.org/wiki/Discrete_cosine_transform
https://en.wikipedia.org/wiki/Discrete_sine_transform
help(monai._C.BoundType)
help(monai._C.InterpolationType)

Parameters:

input (Tensor) – Input image (B, C, Wi, Hi, Di).
grid (Tensor) – Deformation field (B, Wi, Hi, Di, 1|2|3).
shape – Shape of the source image.
interpolation (int or list[int] , optional) – Interpolation order. Defaults to ‘linear’.
bound (BoundType, or list[BoundType], optional) – Boundary conditions. Defaults to ‘zero’.
extrapolate (bool) – Extrapolate out-of-bound data. Defaults to True.

Returns:

Splatted image (B, C, Wo, Ho, Do).

Return type:

output (torch.Tensor)

grid_count#

monai.networks.layers.grid_count(grid, shape=None, interpolation='linear', bound='zero', extrapolate=True)[source]#

Splatting weights with respect to a deformation field (pull adjoint).

This function is equivalent to applying grid_push to an image of ones.

interpolation can be an int, a string or an InterpolationType. Possible values are:

- 0 or 'nearest'    or InterpolationType.nearest
- 1 or 'linear'     or InterpolationType.linear
- 2 or 'quadratic'  or InterpolationType.quadratic
- 3 or 'cubic'      or InterpolationType.cubic
- 4 or 'fourth'     or InterpolationType.fourth
- 5 or 'fifth'      or InterpolationType.fifth
- 6 or 'sixth'      or InterpolationType.sixth
- 7 or 'seventh'    or InterpolationType.seventh

A list of values can be provided, in the order [W, H, D], to specify dimension-specific interpolation orders.

bound can be an int, a string or a BoundType. Possible values are:

- 0 or 'replicate' or 'nearest'      or BoundType.replicate
- 1 or 'dct1'      or 'mirror'       or BoundType.dct1
- 2 or 'dct2'      or 'reflect'      or BoundType.dct2
- 3 or 'dst1'      or 'antimirror'   or BoundType.dst1
- 4 or 'dst2'      or 'antireflect'  or BoundType.dst2
- 5 or 'dft'       or 'wrap'         or BoundType.dft
- 7 or 'zero'                        or BoundType.zero

dft corresponds to circular padding

dct2 corresponds to Neumann boundary conditions (symmetric)

dst2 corresponds to Dirichlet boundary conditions (antisymmetric)

See also

https://en.wikipedia.org/wiki/Discrete_cosine_transform
https://en.wikipedia.org/wiki/Discrete_sine_transform
help(monai._C.BoundType)
help(monai._C.InterpolationType)

Parameters:

grid (Tensor) – Deformation field (B, Wi, Hi, Di, 2|3).
shape – shape of the source image.
interpolation (int or list[int] , optional) – Interpolation order. Defaults to ‘linear’.
bound (BoundType, or list[BoundType], optional) – Boundary conditions. Defaults to ‘zero’.
extrapolate (bool, optional) – Extrapolate out-of-bound data. Defaults to True.

Returns:

Splat weights (B, 1, Wo, Ho, Do).

Return type:

output (torch.Tensor)

grid_grad#

monai.networks.layers.grid_grad(input, grid, interpolation='linear', bound='zero', extrapolate=True)[source]#

Sample an image with respect to a deformation field.

interpolation can be an int, a string or an InterpolationType. Possible values are:

- 0 or 'nearest'    or InterpolationType.nearest
- 1 or 'linear'     or InterpolationType.linear
- 2 or 'quadratic'  or InterpolationType.quadratic
- 3 or 'cubic'      or InterpolationType.cubic
- 4 or 'fourth'     or InterpolationType.fourth
- 5 or 'fifth'      or InterpolationType.fifth
- 6 or 'sixth'      or InterpolationType.sixth
- 7 or 'seventh'    or InterpolationType.seventh

A list of values can be provided, in the order [W, H, D], to specify dimension-specific interpolation orders.

bound can be an int, a string or a BoundType. Possible values are:

- 0 or 'replicate' or 'nearest'      or BoundType.replicate
- 1 or 'dct1'      or 'mirror'       or BoundType.dct1
- 2 or 'dct2'      or 'reflect'      or BoundType.dct2
- 3 or 'dst1'      or 'antimirror'   or BoundType.dst1
- 4 or 'dst2'      or 'antireflect'  or BoundType.dst2
- 5 or 'dft'       or 'wrap'         or BoundType.dft
- 7 or 'zero'                        or BoundType.zero

dft corresponds to circular padding

dct2 corresponds to Neumann boundary conditions (symmetric)

dst2 corresponds to Dirichlet boundary conditions (antisymmetric)

See also

https://en.wikipedia.org/wiki/Discrete_cosine_transform
https://en.wikipedia.org/wiki/Discrete_sine_transform
help(monai._C.BoundType)
help(monai._C.InterpolationType)

Parameters:

input (Tensor) – Input image. (B, C, Wi, Hi, Di).
grid (Tensor) – Deformation field. (B, Wo, Ho, Do, 2|3).
interpolation (int or list[int] , optional) – Interpolation order. Defaults to ‘linear’.
bound (BoundType, or list[BoundType], optional) – Boundary conditions. Defaults to ‘zero’.
extrapolate (bool) – Extrapolate out-of-bound data. Defaults to True.

Returns:

Sampled gradients (B, C, Wo, Ho, Do, 1|2|3).

Return type:

output (torch.Tensor)

LLTM#

class monai.networks.layers.LLTM(input_features, state_size)[source]#

This recurrent unit is similar to an LSTM, but differs in that it lacks a forget gate and uses an Exponential Linear Unit (ELU) as its internal activation function. Because this unit never forgets, call it LLTM, or Long-Long-Term-Memory unit. It has both C++ and CUDA implementation, automatically switch according to the target device where put this module to.

Parameters:

input_features (int) – size of input feature data
state_size (int) – size of the state of recurrent unit

Referring to: https://pytorch.org/tutorials/advanced/cpp_extension.html

forward(input, state)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

ConjugateGradient#

class monai.networks.layers.ConjugateGradient(linear_op, num_iter)[source]#

Congugate Gradient (CG) solver for linear systems Ax = y.

For linear_op that is positive definite and self-adjoint, CG is guaranteed to converge CG is often used to solve linear systems of the form Ax = y, where A is too large to store explicitly, but can be computed via a linear operator.

As a result, here we won’t set A explicitly as a matrix, but rather as a linear operator. For example, A could be a FFT/IFFT operation

__init__(linear_op, num_iter)[source]#

Parameters:

linear_op (Callable) – Linear operator
num_iter (int) – Number of iterations to run CG

forward(x, y)[source]#

run conjugate gradient for num_iter iterations to solve Ax = y

Parameters:

x (Tensor) – tensor (real or complex); Initial guess for linear system Ax = y.
For (The size of x should be applicable to the linear operator.)
example
FFT (if the linear operator is)
the (then x is HCHW; if)
multiplication (linear operator is a matrix)
vector (then x is a)
y (Tensor) – tensor (real or complex); Measurement. Same size as x

Returns:

Solution to Ax = y

Return type:

update(x, p, r, rsold)[source]#

perform one iteration of the CG method. It takes the current solution x, the current search direction p, the current residual r, and the old residual norm rsold as inputs. Then it computes the new solution, search direction, residual, and residual norm, and returns them.

Return type:: tuple[Tensor, Tensor, Tensor, Tensor]

Utilities#

monai.networks.layers.convutils.calculate_out_shape(in_shape, kernel_size, stride, padding)[source]#: Calculate the output tensor shape when applying a convolution to a tensor of shape inShape with kernel size kernel_size, stride value stride, and input padding value padding. All arguments can be scalars or multiple values, return value is a scalar if all inputs are scalars.

monai.networks.layers.convutils.gaussian_1d(sigma, truncated=4.0, approx='erf', normalize=False)[source]#

one dimensional Gaussian kernel.

Parameters:

sigma (Tensor) – std of the kernel
truncated (float) – tail length
approx (str) –
discrete Gaussian kernel type, available options are “erf”, “sampled”, and “scalespace”.
- erf approximation interpolates the error function;
- sampled uses a sampled Gaussian kernel;
- scalespace corresponds to https://en.wikipedia.org/wiki/Scale_space_implementation#The_discrete_Gaussian_kernel based on the modified Bessel functions.
normalize (bool) – whether to normalize the kernel with kernel.sum().

Raises:

ValueError – When truncated is non-positive.

Return type:

Tensor

Returns:

1D torch tensor

monai.networks.layers.convutils.polyval(coef, x)[source]#

Evaluates the polynomial defined by coef at x.

For a 1D sequence of coef (length n), evaluate:

y = coef[n-1] + x * (coef[n-2] + ... + x * (coef[1] + x * coef[0]))

Parameters:

coef – a sequence of floats representing the coefficients of the polynomial
x – float or a sequence of floats representing the variable of the polynomial

Return type:

Tensor

Returns:

1D torch tensor

monai.networks.layers.convutils.same_padding(kernel_size, dilation=1)[source]#

Return the padding value needed to ensure a convolution using the given kernel size produces an output of the same shape as the input for a stride of 1, otherwise ensure a shape of the input divided by the stride rounded down.

Raises:: NotImplementedError – When np.any((kernel_size - 1) * dilation % 2 == 1).

monai.networks.layers.utils.get_act_layer(name)[source]#

Create an activation layer instance.

For example, to create activation layers:

from monai.networks.layers import get_act_layer

s_layer = get_act_layer(name="swish")
p_layer = get_act_layer(name=("prelu", {"num_parameters": 1, "init": 0.25}))

Parameters:: name – an activation type string or a tuple of type string and parameters.

monai.networks.layers.utils.get_dropout_layer(name, dropout_dim=1)[source]#

Create a dropout layer instance.

For example, to create dropout layers:

from monai.networks.layers import get_dropout_layer

d_layer = get_dropout_layer(name="dropout")
a_layer = get_dropout_layer(name=("alphadropout", {"p": 0.25}))

Parameters:

name – a dropout ratio or a tuple of dropout type and parameters.
dropout_dim – the spatial dimension of the dropout operation.

monai.networks.layers.utils.get_norm_layer(name, spatial_dims=1, channels=1)[source]#

Create a normalization layer instance.

For example, to create normalization layers:

from monai.networks.layers import get_norm_layer

g_layer = get_norm_layer(name=("group", {"num_groups": 1}))
n_layer = get_norm_layer(name="instance", spatial_dims=2)

Parameters:

name – a normalization type string or a tuple of type string and parameters.
spatial_dims – number of spatial dimensions of the input.
channels – number of features/channels when the normalization layer requires this parameter but it is not specified in the norm parameters.

monai.networks.layers.utils.get_pool_layer(name, spatial_dims=1)[source]#

Create a pooling layer instance.

For example, to create adaptiveavg layer:

from monai.networks.layers import get_pool_layer

pool_layer = get_pool_layer(("adaptiveavg", {"output_size": (1, 1, 1)}), spatial_dims=3)

Parameters:

name – a pooling type string or a tuple of type string and parameters.
spatial_dims – number of spatial dimensions of the input.

Nets#

AHNet#

class monai.networks.nets.AHNet(layers=(3, 4, 6, 3), spatial_dims=3, in_channels=1, out_channels=1, psp_block_num=4, upsample_mode='transpose', pretrained=False, progress=True)[source]#

AHNet based on Anisotropic Hybrid Network. Adapted from lsqshr’s official code. Except from the original network that supports 3D inputs, this implementation also supports 2D inputs. According to the tests for deconvolutions, using "transpose" rather than linear interpolations is faster. Therefore, this implementation sets "transpose" as the default upsampling method.

To meet the requirements of the structure, the input size for each spatial dimension (except the last one) should be: divisible by 2 ** (psp_block_num + 3) and no less than 32 in transpose mode, and should be divisible by 32 and no less than 2 ** (psp_block_num + 3) in other upsample modes. In addition, the input size for the last spatial dimension should be divisible by 32, and at least one spatial size should be no less than 64.

Parameters:

layers (tuple) – number of residual blocks for 4 layers of the network (layer1…layer4). Defaults to (3, 4, 6, 3).
spatial_dims (int) – spatial dimension of the input data. Defaults to 3.
in_channels (int) – number of input channels for the network. Default to 1.
out_channels (int) – number of output channels for the network. Defaults to 1.
psp_block_num (int) – the number of pyramid volumetric pooling modules used at the end of the network before the final output layer for extracting multiscale features. The number should be an integer that belongs to [0,4]. Defaults to 4.
upsample_mode (str) –
["transpose", "bilinear", "trilinear", nearest] The mode of upsampling manipulations. Using the last two modes cannot guarantee the model’s reproducibility. Defaults to transpose.
- "transpose", uses transposed convolution layers.
- "bilinear", uses bilinear interpolate.
- "trilinear", uses trilinear interpolate.
- "nearest", uses nearest interpolate.
pretrained (bool) – whether to load pretrained weights from ResNet50 to initialize convolution layers, default to False.
progress (bool) – If True, displays a progress bar of the download of pretrained weights to stderr.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

DenseNet#

class monai.networks.nets.DenseNet(spatial_dims, in_channels, out_channels, init_features=64, growth_rate=32, block_config=(6, 12, 24, 16), bn_size=4, act=('relu', {'inplace': True}), norm='batch', dropout_prob=0.0)[source]#

Densenet based on: Densely Connected Convolutional Networks. Adapted from PyTorch Hub 2D version: https://pytorch.org/vision/stable/models.html#id16. This network is non-deterministic When spatial_dims is 3 and CUDA is enabled. Please check the link below for more details: https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html#torch.use_deterministic_algorithms

Parameters:

spatial_dims – number of spatial dimensions of the input image.
in_channels – number of the input channel.
out_channels – number of the output classes.
init_features – number of filters in the first convolution layer.
growth_rate – how many filters to add each layer (k in paper).
block_config – how many layers in each pooling block.
bn_size – multiplicative factor for number of bottle neck layers. (i.e. bn_size * k features in the bottleneck layer)
act – activation type and arguments. Defaults to relu.
norm – feature normalization type and arguments. Defaults to batch norm.
dropout_prob – dropout rate after each dense layer.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

DenseNet121#

class monai.networks.nets.DenseNet121(spatial_dims, in_channels, out_channels, init_features=64, growth_rate=32, block_config=(6, 12, 24, 16), pretrained=False, progress=True, **kwargs)[source]#: DenseNet121 with optional pretrained support when spatial_dims is 2.

DenseNet169#

class monai.networks.nets.DenseNet169(spatial_dims, in_channels, out_channels, init_features=64, growth_rate=32, block_config=(6, 12, 32, 32), pretrained=False, progress=True, **kwargs)[source]#: DenseNet169 with optional pretrained support when spatial_dims is 2.

DenseNet201#

class monai.networks.nets.DenseNet201(spatial_dims, in_channels, out_channels, init_features=64, growth_rate=32, block_config=(6, 12, 48, 32), pretrained=False, progress=True, **kwargs)[source]#: DenseNet201 with optional pretrained support when spatial_dims is 2.

DenseNet264#

class monai.networks.nets.DenseNet264(spatial_dims, in_channels, out_channels, init_features=64, growth_rate=32, block_config=(6, 12, 64, 48), pretrained=False, progress=True, **kwargs)[source]#

EfficientNet#

class monai.networks.nets.EfficientNet(blocks_args_str, spatial_dims=2, in_channels=3, num_classes=1000, width_coefficient=1.0, depth_coefficient=1.0, dropout_rate=0.2, image_size=224, norm=('batch', {'eps': 0.001, 'momentum': 0.01}), drop_connect_rate=0.2, depth_divisor=8)[source]#

__init__(blocks_args_str, spatial_dims=2, in_channels=3, num_classes=1000, width_coefficient=1.0, depth_coefficient=1.0, dropout_rate=0.2, image_size=224, norm=('batch', {'eps': 0.001, 'momentum': 0.01}), drop_connect_rate=0.2, depth_divisor=8)[source]#

EfficientNet based on Rethinking Model Scaling for Convolutional Neural Networks. Adapted from EfficientNet-PyTorch.

Parameters:

blocks_args_str – block definitions.
spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
num_classes – number of output classes.
width_coefficient – width multiplier coefficient (w in paper).
depth_coefficient – depth multiplier coefficient (d in paper).
dropout_rate – dropout rate for dropout layers.
image_size – input image resolution.
norm – feature normalization type and arguments.
drop_connect_rate – dropconnect rate for drop connection (individual weights) layers.
depth_divisor – depth divisor for channel rounding.

forward(inputs)[source]#

Parameters:

inputs (Tensor) – input should have spatially N dimensions
`` (Batch, in_channels, dim_0[, dim_1, ..., dim_N])

Returns:

a torch Tensor of classification prediction in shape (Batch, num_classes).

set_swish(memory_efficient=True)[source]#

Sets swish function as memory efficient (for training) or standard (for JIT export).

Parameters:: memory_efficient (bool) – whether to use memory-efficient version of swish.
Return type:: None

BlockArgs#

class monai.networks.nets.BlockArgs(num_repeat: int, kernel_size: int, stride: int, expand_ratio: int, input_filters: int, output_filters: int, id_skip: bool, se_ratio: float | None = None)[source]#

BlockArgs object to assist in decoding string notation: of arguments for MBConvBlock definition.

expand_ratio: int#: Alias for field number 3

static from_string(block_string)[source]#

Get a BlockArgs object from a string notation of arguments.

Parameters:: block_string (str) – A string notation of arguments. Examples: “r1_k3_s11_e1_i32_o16_se0.25”.
Returns:: namedtuple defined at the top of this function.
Return type:: BlockArgs

id_skip: bool#: Alias for field number 6

input_filters: int#: Alias for field number 4

kernel_size: int#: Alias for field number 1

num_repeat: int#: Alias for field number 0

output_filters: int#: Alias for field number 5

se_ratio: float | None#: Alias for field number 7

stride: int#: Alias for field number 2

to_string()[source]#

Return a block string notation for current BlockArgs object

Returns:

A string notation of BlockArgs object arguments.: Example: “r1_k3_s11_e1_i32_o16_se0.25_noskip”.

EfficientNetBN#

class monai.networks.nets.EfficientNetBN(model_name, pretrained=True, progress=True, spatial_dims=2, in_channels=3, num_classes=1000, norm=('batch', {'eps': 0.001, 'momentum': 0.01}), adv_prop=False)[source]#

__init__(model_name, pretrained=True, progress=True, spatial_dims=2, in_channels=3, num_classes=1000, norm=('batch', {'eps': 0.001, 'momentum': 0.01}), adv_prop=False)[source]#

Generic wrapper around EfficientNet, used to initialize EfficientNet-B0 to EfficientNet-B7 models model_name is mandatory argument as there is no EfficientNetBN itself, it needs the N in [0, 1, 2, 3, 4, 5, 6, 7, 8] to be a model

Parameters:

model_name – name of model to initialize, can be from [efficientnet-b0, …, efficientnet-b8, efficientnet-l2].
pretrained – whether to initialize pretrained ImageNet weights, only available for spatial_dims=2 and batch norm is used.
progress – whether to show download progress for pretrained weights download.
spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
num_classes – number of output classes.
norm – feature normalization type and arguments.
adv_prop – whether to use weights trained with adversarial examples. This argument only works when pretrained is True.

Examples:

# for pretrained spatial 2D ImageNet
>>> image_size = get_efficientnet_image_size("efficientnet-b0")
>>> inputs = torch.rand(1, 3, image_size, image_size)
>>> model = EfficientNetBN("efficientnet-b0", pretrained=True)
>>> model.eval()
>>> outputs = model(inputs)

# create spatial 2D
>>> model = EfficientNetBN("efficientnet-b0", spatial_dims=2)

# create spatial 3D
>>> model = EfficientNetBN("efficientnet-b0", spatial_dims=3)

# create EfficientNetB7 for spatial 2D
>>> model = EfficientNetBN("efficientnet-b7", spatial_dims=2)

EfficientNetBNFeatures#

class monai.networks.nets.EfficientNetBNFeatures(model_name, pretrained=True, progress=True, spatial_dims=2, in_channels=3, num_classes=1000, norm=('batch', {'eps': 0.001, 'momentum': 0.01}), adv_prop=False)[source]#

__init__(model_name, pretrained=True, progress=True, spatial_dims=2, in_channels=3, num_classes=1000, norm=('batch', {'eps': 0.001, 'momentum': 0.01}), adv_prop=False)[source]#

Initialize EfficientNet-B0 to EfficientNet-B7 models as a backbone, the backbone can be used as an encoder for segmentation and objection models. Compared with the class EfficientNetBN, the only different place is the forward function.

This class refers to PyTorch image models.

forward(inputs)[source]#

Parameters:

inputs (Tensor) – input should have spatially N dimensions
`` (Batch, in_channels, dim_0[, dim_1, ..., dim_N])

Returns:

a list of torch Tensors.

SegResNet#

class monai.networks.nets.SegResNet(spatial_dims=3, init_filters=8, in_channels=1, out_channels=2, dropout_prob=None, act=('RELU', {'inplace': True}), norm=('GROUP', {'num_groups': 8}), norm_name='', num_groups=8, use_conv_final=True, blocks_down=(1, 2, 2, 4), blocks_up=(1, 1, 1), upsample_mode=nontrainable)[source]#

SegResNet based on 3D MRI brain tumor segmentation using autoencoder regularization. The module does not include the variational autoencoder (VAE). The model supports 2D or 3D inputs.

Parameters:

spatial_dims – spatial dimension of the input data. Defaults to 3.
init_filters – number of output channels for initial convolution layer. Defaults to 8.
in_channels – number of input channels for the network. Defaults to 1.
out_channels – number of output channels for the network. Defaults to 2.
dropout_prob – probability of an element to be zero-ed. Defaults to None.
act – activation type and arguments. Defaults to RELU.
norm – feature normalization type and arguments. Defaults to GROUP.
norm_name – deprecating option for feature normalization type.
num_groups – deprecating option for group norm. parameters.
use_conv_final – if add a final convolution block to output. Defaults to True.
blocks_down – number of down sample blocks in each layer. Defaults to [1,2,2,4].
blocks_up – number of up sample blocks in each layer. Defaults to [1,1,1].
upsample_mode –
["deconv", "nontrainable", "pixelshuffle"] The mode of upsampling manipulations. Using the nontrainable modes cannot guarantee the model’s reproducibility. Defaults to``nontrainable``.
- deconv, uses transposed convolution layers.
- nontrainable, uses non-trainable linear interpolation.
- pixelshuffle, uses monai.networks.blocks.SubpixelUpsample.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

SegResNetDS#

class monai.networks.nets.SegResNetDS(spatial_dims=3, init_filters=32, in_channels=1, out_channels=2, act='relu', norm='batch', blocks_down=(1, 2, 2, 4), blocks_up=None, dsdepth=1, preprocess=None, upsample_mode='deconv', resolution=None)[source]#

SegResNetDS based on 3D MRI brain tumor segmentation using autoencoder regularization. It is similar to https://docs.monai.io/en/stable/networks.html#segresnet, with several improvements including deep supervision and non-isotropic kernel support.

Parameters:

spatial_dims – spatial dimension of the input data. Defaults to 3.
init_filters – number of output channels for initial convolution layer. Defaults to 32.
in_channels – number of input channels for the network. Defaults to 1.
out_channels – number of output channels for the network. Defaults to 2.
act – activation type and arguments. Defaults to RELU.
norm – feature normalization type and arguments. Defaults to BATCH.
blocks_down – number of downsample blocks in each layer. Defaults to [1,2,2,4].
blocks_up – number of upsample blocks (optional).
dsdepth – number of levels for deep supervision. This will be the length of the list of outputs at each scale level. At dsdepth==1,only a single output is returned.
preprocess – optional callable function to apply before the model’s forward pass
resolution – optional input image resolution. When provided, the network will first use non-isotropic kernels to bring image spacing into an approximately isotropic space. Otherwise, by default, the kernel size and downsampling is always isotropic.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Union[None, Tensor, list[Tensor]]

is_valid_shape(x)[source]#: Calculate if the input shape is divisible by the minimum factors for the current network configuration

shape_factor()[source]#: Calculate the factors (divisors) that the input image shape must be divisible by

SegResNetVAE#

class monai.networks.nets.SegResNetVAE(input_image_size, vae_estimate_std=False, vae_default_std=0.3, vae_nz=256, spatial_dims=3, init_filters=8, in_channels=1, out_channels=2, dropout_prob=None, act=('RELU', {'inplace': True}), norm=('GROUP', {'num_groups': 8}), use_conv_final=True, blocks_down=(1, 2, 2, 4), blocks_up=(1, 1, 1), upsample_mode=nontrainable)[source]#

SegResNetVAE based on 3D MRI brain tumor segmentation using autoencoder regularization. The module contains the variational autoencoder (VAE). The model supports 2D or 3D inputs.

Parameters:

input_image_size – the size of images to input into the network. It is used to determine the in_features of the fc layer in VAE.
vae_estimate_std – whether to estimate the standard deviations in VAE. Defaults to False.
vae_default_std – if not to estimate the std, use the default value. Defaults to 0.3.
vae_nz – number of latent variables in VAE. Defaults to 256. Where, 128 to represent mean, and 128 to represent std.
spatial_dims – spatial dimension of the input data. Defaults to 3.
init_filters – number of output channels for initial convolution layer. Defaults to 8.
in_channels – number of input channels for the network. Defaults to 1.
out_channels – number of output channels for the network. Defaults to 2.
dropout_prob – probability of an element to be zero-ed. Defaults to None.
act – activation type and arguments. Defaults to RELU.
norm – feature normalization type and arguments. Defaults to GROUP.
use_conv_final – if add a final convolution block to output. Defaults to True.
blocks_down – number of down sample blocks in each layer. Defaults to [1,2,2,4].
blocks_up – number of up sample blocks in each layer. Defaults to [1,1,1].
upsample_mode –
["deconv", "nontrainable", "pixelshuffle"] The mode of upsampling manipulations. Using the nontrainable modes cannot guarantee the model’s reproducibility. Defaults to``nontrainable``.
- deconv, uses transposed convolution layers.
- nontrainable, uses non-trainable linear interpolation.
- pixelshuffle, uses monai.networks.blocks.SubpixelUpsample.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

ResNet#

class monai.networks.nets.ResNet(block, layers, block_inplanes, spatial_dims=3, n_input_channels=3, conv1_t_size=7, conv1_t_stride=1, no_max_pool=False, shortcut_type='B', widen_factor=1.0, num_classes=400, feed_forward=True, bias_downsample=True)[source]#

ResNet based on: Deep Residual Learning for Image Recognition and Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?. Adapted from kenshohara/3D-ResNets-PyTorch.

Parameters:

block – which ResNet block to use, either Basic or Bottleneck. ResNet block class or str. for Basic: ResNetBlock or ‘basic’ for Bottleneck: ResNetBottleneck or ‘bottleneck’
layers – how many layers to use.
block_inplanes – determine the size of planes at each step. Also tunable with widen_factor.
spatial_dims – number of spatial dimensions of the input image.
n_input_channels – number of input channels for first convolutional layer.
conv1_t_size – size of first convolution layer, determines kernel and padding.
conv1_t_stride – stride of first convolution layer.
no_max_pool – bool argument to determine if to use maxpool layer.
shortcut_type – which downsample block to use. Options are ‘A’, ‘B’, default to ‘B’. - ‘A’: using self._downsample_basic_block. - ‘B’: kernel_size 1 conv + norm.
widen_factor – widen output for each layer.
num_classes – number of output (classifications).
feed_forward – whether to add the FC layer for the output, default to True.
bias_downsample – whether to use bias term in the downsampling block when shortcut_type is ‘B’, default to True.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

ResNetFeatures#

class monai.networks.nets.ResNetFeatures(model_name, pretrained=True, spatial_dims=3, in_channels=1)[source]#

__init__(model_name, pretrained=True, spatial_dims=3, in_channels=1)[source]#

Initialize resnet18 to resnet200 models as a backbone, the backbone can be used as an encoder for segmentation and objection models.

Compared with the class ResNet, the only different place is the forward function.

Parameters:

model_name (str) – name of model to initialize, can be from [resnet10, …, resnet200].
pretrained (bool) – whether to initialize pretrained MedicalNet weights, only available for spatial_dims=3 and in_channels=1.
spatial_dims (int) – number of spatial dimensions of the input image.
in_channels (int) – number of input channels for first convolutional layer.

forward(inputs)[source]#

Parameters:

inputs (Tensor) – input should have spatially N dimensions
`` (Batch, in_channels, dim_0[, dim_1, ..., dim_N])

Returns:

a list of torch Tensors.

SENet#

class monai.networks.nets.SENet(spatial_dims, in_channels, block, layers, groups, reduction, dropout_prob=0.2, dropout_dim=1, inplanes=128, downsample_kernel_size=3, input_3x3=True, num_classes=1000)[source]#

SENet based on Squeeze-and-Excitation Networks. Adapted from Cadene Hub 2D version.

Parameters:

spatial_dims – spatial dimension of the input data.
in_channels – channel number of the input data.
block – SEBlock class or str. for SENet154: SEBottleneck or ‘se_bottleneck’ for SE-ResNet models: SEResNetBottleneck or ‘se_resnet_bottleneck’ for SE-ResNeXt models: SEResNeXtBottleneck or ‘se_resnetxt_bottleneck’
layers – number of residual blocks for 4 layers of the network (layer1…layer4).
groups – number of groups for the 3x3 convolution in each bottleneck block. for SENet154: 64 for SE-ResNet models: 1 for SE-ResNeXt models: 32
reduction – reduction ratio for Squeeze-and-Excitation modules. for all models: 16
dropout_prob – drop probability for the Dropout layer. if None the Dropout layer is not used. for SENet154: 0.2 for SE-ResNet models: None for SE-ResNeXt models: None
dropout_dim – determine the dimensions of dropout. Defaults to 1. When dropout_dim = 1, randomly zeroes some of the elements for each channel. When dropout_dim = 2, Randomly zeroes out entire channels (a channel is a 2D feature map). When dropout_dim = 3, Randomly zeroes out entire channels (a channel is a 3D feature map).
inplanes – number of input channels for layer1. for SENet154: 128 for SE-ResNet models: 64 for SE-ResNeXt models: 64
downsample_kernel_size – kernel size for downsampling convolutions in layer2, layer3 and layer4. for SENet154: 3 for SE-ResNet models: 1 for SE-ResNeXt models: 1
input_3x3 – If True, use three 3x3 convolutions instead of a single 7x7 convolution in layer0. - For SENet154: True - For SE-ResNet models: False - For SE-ResNeXt models: False
num_classes – number of outputs in last_linear layer. for all models: 1000

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

SENet154#

class monai.networks.nets.SENet154(layers=(3, 8, 36, 3), groups=64, reduction=16, pretrained=False, progress=True, **kwargs)[source]#: SENet154 based on Squeeze-and-Excitation Networks with optional pretrained support when spatial_dims is 2.

SEResNet50#

class monai.networks.nets.SEResNet50(layers=(3, 4, 6, 3), groups=1, reduction=16, dropout_prob=None, inplanes=64, downsample_kernel_size=1, input_3x3=False, pretrained=False, progress=True, **kwargs)[source]#: SEResNet50 based on Squeeze-and-Excitation Networks with optional pretrained support when spatial_dims is 2.

SEResNet101#

class monai.networks.nets.SEResNet101(layers=(3, 4, 23, 3), groups=1, reduction=16, inplanes=64, downsample_kernel_size=1, input_3x3=False, pretrained=False, progress=True, **kwargs)[source]#: SEResNet101 based on Squeeze-and-Excitation Networks with optional pretrained support when spatial_dims is 2.

SEResNet152#

class monai.networks.nets.SEResNet152(layers=(3, 8, 36, 3), groups=1, reduction=16, inplanes=64, downsample_kernel_size=1, input_3x3=False, pretrained=False, progress=True, **kwargs)[source]#: SEResNet152 based on Squeeze-and-Excitation Networks with optional pretrained support when spatial_dims is 2.

SEResNext50#

class monai.networks.nets.SEResNext50(layers=(3, 4, 6, 3), groups=32, reduction=16, dropout_prob=None, inplanes=64, downsample_kernel_size=1, input_3x3=False, pretrained=False, progress=True, **kwargs)[source]#: SEResNext50 based on Squeeze-and-Excitation Networks with optional pretrained support when spatial_dims is 2.

SEResNext101#

class monai.networks.nets.SEResNext101(layers=(3, 4, 23, 3), groups=32, reduction=16, dropout_prob=None, inplanes=64, downsample_kernel_size=1, input_3x3=False, pretrained=False, progress=True, **kwargs)[source]#: SEResNext101 based on Squeeze-and-Excitation Networks with optional pretrained support when spatial_dims is 2.

HighResNet#

class monai.networks.nets.HighResNet(spatial_dims=3, in_channels=1, out_channels=1, norm_type=('batch', {'affine': True}), acti_type=('relu', {'inplace': True}), dropout_prob=0.0, bias=False, layer_params=({'kernel_size': 3, 'n_features': 16, 'name': 'conv_0'}, {'kernels': (3, 3), 'n_features': 16, 'name': 'res_1', 'repeat': 3}, {'kernels': (3, 3), 'n_features': 32, 'name': 'res_2', 'repeat': 3}, {'kernels': (3, 3), 'n_features': 64, 'name': 'res_3', 'repeat': 3}, {'kernel_size': 1, 'n_features': 80, 'name': 'conv_1'}, {'kernel_size': 1, 'name': 'conv_2'}), channel_matching=pad)[source]#

Reimplementation of highres3dnet based on Li et al., “On the compactness, efficiency, and representation of 3D convolutional networks: Brain parcellation as a pretext task”, IPMI ‘17

Adapted from: NifTK/NiftyNet fepegar/highresnet

Parameters:

spatial_dims – number of spatial dimensions of the input image.
in_channels – number of input channels.
out_channels – number of output channels.
norm_type – feature normalization type and arguments. Defaults to ("batch", {"affine": True}).
acti_type – activation type and arguments. Defaults to ("relu", {"inplace": True}).
dropout_prob – probability of the feature map to be zeroed (only applies to the penultimate conv layer).
bias –
whether to have a bias term in convolution blocks. Defaults to False. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.
layer_params – specifying key parameters of each layer/block.
channel_matching –
{"pad", "project"} Specifies handling residual branch and conv branch channel mismatches. Defaults to "pad".
- "pad": with zero padding.
- "project": with a trainable conv with kernel size one.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

class monai.networks.nets.HighResBlock(spatial_dims, in_channels, out_channels, kernels=(3, 3), dilation=1, norm_type=('batch', {'affine': True}), acti_type=('relu', {'inplace': True}), bias=False, channel_matching=pad)[source]#

__init__(spatial_dims, in_channels, out_channels, kernels=(3, 3), dilation=1, norm_type=('batch', {'affine': True}), acti_type=('relu', {'inplace': True}), bias=False, channel_matching=pad)[source]#

Parameters:

spatial_dims – number of spatial dimensions of the input image.
in_channels – number of input channels.
out_channels – number of output channels.
kernels – each integer k in kernels corresponds to a convolution layer with kernel size k.
dilation – spacing between kernel elements.
norm_type – feature normalization type and arguments. Defaults to ("batch", {"affine": True}).
acti_type – {"relu", "prelu", "relu6"} Non-linear activation using ReLU or PReLU. Defaults to "relu".
bias –
whether to have a bias term in convolution blocks. Defaults to False. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.
channel_matching –
{"pad", "project"} Specifies handling residual branch and conv branch channel mismatches. Defaults to "pad".
- "pad": with zero padding.
- "project": with a trainable conv with kernel size one.

Raises:

ValueError – When channel_matching=pad and in_channels > out_channels. Incompatible values.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

DynUNet#

class monai.networks.nets.DynUNet(spatial_dims, in_channels, out_channels, kernel_size, strides, upsample_kernel_size, filters=None, dropout=None, norm_name=('INSTANCE', {'affine': True}), act_name=('leakyrelu', {'inplace': True, 'negative_slope': 0.01}), deep_supervision=False, deep_supr_num=1, res_block=False, trans_bias=False)[source]#

This reimplementation of a dynamic UNet (DynUNet) is based on: Automated Design of Deep Learning Methods for Biomedical Image Segmentation. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation. Optimized U-Net for Brain Tumor Segmentation.

This model is more flexible compared with monai.networks.nets.UNet in three places:

Residual connection is supported in conv blocks.

Anisotropic kernel sizes and strides can be used in each layers.

Deep supervision heads can be added.

The model supports 2D or 3D inputs and is consisted with four kinds of blocks: one input block, n downsample blocks, one bottleneck and n+1 upsample blocks. Where, n>0. The first and last kernel and stride values of the input sequences are used for input block and bottleneck respectively, and the rest value(s) are used for downsample and upsample blocks. Therefore, pleasure ensure that the length of input sequences (kernel_size and strides) is no less than 3 in order to have at least one downsample and upsample blocks.

To meet the requirements of the structure, the input size for each spatial dimension should be divisible by the product of all strides in the corresponding dimension. In addition, the minimal spatial size should have at least one dimension that has twice the size of the product of all strides. For example, if strides=((1, 2, 4), 2, 2, 1), the spatial size should be divisible by (4, 8, 16), and the minimal spatial size is (8, 8, 16) or (4, 16, 16) or (4, 8, 32).

The output size for each spatial dimension equals to the input size of the corresponding dimension divided by the stride in strides[0]. For example, if strides=((1, 2, 4), 2, 2, 1) and the input size is (64, 32, 32), the output size is (64, 16, 8).

For backwards compatibility with old weights, please set strict=False when calling load_state_dict.

Usage example with medical segmentation decathlon dataset is available at: Project-MONAI/tutorials.

Parameters:

spatial_dims (int) – number of spatial dimensions.
in_channels (int) – number of input channels.
out_channels (int) – number of output channels.
kernel_size (Sequence[Union[Sequence[int], int]]) – convolution kernel size.
strides (Sequence[Union[Sequence[int], int]]) – convolution strides for each blocks.
upsample_kernel_size (Sequence[Union[Sequence[int], int]]) – convolution kernel size for transposed convolution layers. The values should equal to strides[1:].
filters (Optional[Sequence[int]]) – number of output channels for each blocks. Different from nnU-Net, in this implementation we add this argument to make the network more flexible. As shown in the third reference, one way to determine this argument is like: [64, 96, 128, 192, 256, 384, 512, 768, 1024][: len(strides)]. The above way is used in the network that wins task 1 in the BraTS21 Challenge. If not specified, the way which nnUNet used will be employed. Defaults to None.
dropout (Union[Tuple, str, float, None]) – dropout ratio. Defaults to no dropout.
norm_name (Union[Tuple, str]) – feature normalization type and arguments. Defaults to INSTANCE. INSTANCE_NVFUSER is a faster version of the instance norm layer, it can be used when: 1) spatial_dims=3, 2) CUDA device is available, 3) apex is installed and 4) non-Windows OS is used.
act_name (Union[Tuple, str]) – activation layer type and arguments. Defaults to leakyrelu.
deep_supervision (bool) – whether to add deep supervision head before output. Defaults to False. If True, in training mode, the forward function will output not only the final feature map (from output_block), but also the feature maps that come from the intermediate up sample layers. In order to unify the return type (the restriction of TorchScript), all intermediate feature maps are interpolated into the same size as the final feature map and stacked together (with a new dimension in the first axis)into one single tensor. For instance, if there are two intermediate feature maps with shapes: (1, 2, 16, 12) and (1, 2, 8, 6), and the final feature map has the shape (1, 2, 32, 24), then all intermediate feature maps will be interpolated into (1, 2, 32, 24), and the stacked tensor will has the shape (1, 3, 2, 32, 24). When calculating the loss, you can use torch.unbind to get all feature maps can compute the loss one by one with the ground truth, then do a weighted average for all losses to achieve the final loss.
deep_supr_num (int) – number of feature maps that will output during deep supervision head. The value should be larger than 0 and less than the number of up sample layers. Defaults to 1.
res_block (bool) – whether to use residual connection based convolution blocks during the network. Defaults to False.
trans_bias (bool) – whether to set the bias parameter in transposed convolution layers. Defaults to False.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

monai.networks.nets.DynUnet#: alias of DynUNet

monai.networks.nets.Dynunet#: alias of DynUNet

UNet#

class monai.networks.nets.UNet(spatial_dims, in_channels, out_channels, channels, strides, kernel_size=3, up_kernel_size=3, num_res_units=0, act='PRELU', norm='INSTANCE', dropout=0.0, bias=True, adn_ordering='NDA')[source]#

Enhanced version of UNet which has residual units implemented with the ResidualUnit class. The residual part uses a convolution to change the input dimensions to match the output dimensions if this is necessary but will use nn.Identity if not. Refer to: https://link.springer.com/chapter/10.1007/978-3-030-12029-0_40.

Each layer of the network has a encode and decode path with a skip connection between them. Data in the encode path is downsampled using strided convolutions (if strides is given values greater than 1) and in the decode path upsampled using strided transpose convolutions. These down or up sampling operations occur at the beginning of each block rather than afterwards as is typical in UNet implementations.

To further explain this consider the first example network given below. This network has 3 layers with strides of 2 for each of the middle layers (the last layer is the bottom connection which does not down/up sample). Input data to this network is immediately reduced in the spatial dimensions by a factor of 2 by the first convolution of the residual unit defining the first layer of the encode part. The last layer of the decode part will upsample its input (data from the previous layer concatenated with data from the skip connection) in the first convolution. this ensures the final output of the network has the same shape as the input.

Padding values for the convolutions are chosen to ensure output sizes are even divisors/multiples of the input sizes if the strides value for a layer is a factor of the input sizes. A typical case is to use strides values of 2 and inputs that are multiples of powers of 2. An input can thus be downsampled evenly however many times its dimensions can be divided by 2, so for the example network inputs would have to have dimensions that are multiples of 4. In the second example network given below the input to the bottom layer will have shape (1, 64, 15, 15) for an input of shape (1, 1, 240, 240) demonstrating the input being reduced in size spatially by 2**4.

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
channels – sequence of channels. Top block first. The length of channels should be no less than 2.
strides – sequence of convolution strides. The length of stride should equal to len(channels) - 1.
kernel_size – convolution kernel size, the value(s) should be odd. If sequence, its length should equal to dimensions. Defaults to 3.
up_kernel_size – upsampling convolution kernel size, the value(s) should be odd. If sequence, its length should equal to dimensions. Defaults to 3.
num_res_units – number of residual units. Defaults to 0.
act – activation type and arguments. Defaults to PReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
dropout – dropout ratio. Defaults to no dropout.
bias –
whether to have a bias term in convolution blocks. Defaults to True. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.
adn_ordering – a string representing the ordering of activation (A), normalization (N), and dropout (D). Defaults to “NDA”. See also: monai.networks.blocks.ADN.

Examples:

from monai.networks.nets import UNet

# 3 layer network with down/upsampling by a factor of 2 at each layer with 2-convolution residual units
net = UNet(
    spatial_dims=2,
    in_channels=1,
    out_channels=1,
    channels=(4, 8, 16),
    strides=(2, 2),
    num_res_units=2
)

# 5 layer network with simple convolution/normalization/dropout/activation blocks defining the layers
net=UNet(
    spatial_dims=2,
    in_channels=1,
    out_channels=1,
    channels=(4, 8, 16, 32, 64),
    strides=(2, 2, 2, 2),
)

Note: The acceptable spatial size of input data depends on the parameters of the network,: to set appropriate spatial size, please check the tutorial for more details: Project-MONAI/tutorials. Typically, when using a stride of 2 in down / up sampling, the output dimensions are either half of the input when downsampling, or twice when upsampling. In this case with N numbers of layers in the network, the inputs must have spatial dimensions that are all multiples of 2^N. Usually, applying resize, pad or crop transforms can help adjust the spatial size of input data.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

monai.networks.nets.Unet#: alias of UNet

monai.networks.nets.unet#: alias of <module ‘monai.networks.nets.unet’ from ‘/home/docs/checkouts/readthedocs.org/user_builds/monai/checkouts/latest/monai/networks/nets/unet.py’>

AttentionUnet#

class monai.networks.nets.AttentionUnet(spatial_dims, in_channels, out_channels, channels, strides, kernel_size=3, up_kernel_size=3, dropout=0.0)[source]#

Attention Unet based on Otkay et al. “Attention U-Net: Learning Where to Look for the Pancreas” https://arxiv.org/abs/1804.03999

Parameters:

spatial_dims – number of spatial dimensions of the input image.
in_channels – number of the input channel.
out_channels – number of the output classes.
channels (Sequence[int]) – sequence of channels. Top block first. The length of channels should be no less than 2.
strides (Sequence[int]) – stride to use for convolutions.
kernel_size – convolution kernel size.
up_kernel_size – convolution kernel size for transposed convolution layers.
dropout – dropout ratio. Defaults to no dropout.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

UNETR#

class monai.networks.nets.UNETR(in_channels, out_channels, img_size, feature_size=16, hidden_size=768, mlp_dim=3072, num_heads=12, pos_embed='conv', proj_type='conv', norm_name='instance', conv_block=True, res_block=True, dropout_rate=0.0, spatial_dims=3, qkv_bias=False, save_attn=False)[source]#

UNETR based on: “Hatamizadeh et al., UNETR: Transformers for 3D Medical Image Segmentation <https://arxiv.org/abs/2103.10504>”

__init__(in_channels, out_channels, img_size, feature_size=16, hidden_size=768, mlp_dim=3072, num_heads=12, pos_embed='conv', proj_type='conv', norm_name='instance', conv_block=True, res_block=True, dropout_rate=0.0, spatial_dims=3, qkv_bias=False, save_attn=False)[source]#

Parameters:

in_channels – dimension of input channels.
out_channels – dimension of output channels.
img_size – dimension of input image.
feature_size – dimension of network feature size. Defaults to 16.
hidden_size – dimension of hidden layer. Defaults to 768.
mlp_dim – dimension of feedforward layer. Defaults to 3072.
num_heads – number of attention heads. Defaults to 12.
proj_type – patch embedding layer type. Defaults to “conv”.
norm_name – feature normalization type and arguments. Defaults to “instance”.
conv_block – if convolutional block is used. Defaults to True.
res_block – if residual block is used. Defaults to True.
dropout_rate – fraction of the input units to drop. Defaults to 0.0.
spatial_dims – number of spatial dims. Defaults to 3.
qkv_bias – apply the bias term for the qkv linear layer in self attention block. Defaults to False.
save_attn – to make accessible the attention in self attention block. Defaults to False.

Deprecated since version 1.4: pos_embed is deprecated in favor of proj_type.

Examples:

# for single channel input 4-channel output with image size of (96,96,96), feature size of 32 and batch norm
>>> net = UNETR(in_channels=1, out_channels=4, img_size=(96,96,96), feature_size=32, norm_name='batch')

 # for single channel input 4-channel output with image size of (96,96), feature size of 32 and batch norm
>>> net = UNETR(in_channels=1, out_channels=4, img_size=96, feature_size=32, norm_name='batch', spatial_dims=2)

# for 4-channel input 3-channel output with image size of (128,128,128), conv position embedding and instance norm
>>> net = UNETR(in_channels=4, out_channels=3, img_size=(128,128,128), proj_type='conv', norm_name='instance')

forward(x_in)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

SwinUNETR#

class monai.networks.nets.SwinUNETR(img_size, in_channels, out_channels, depths=(2, 2, 2, 2), num_heads=(3, 6, 12, 24), feature_size=24, norm_name='instance', drop_rate=0.0, attn_drop_rate=0.0, dropout_path_rate=0.0, normalize=True, use_checkpoint=False, spatial_dims=3, downsample='merging', use_v2=False)[source]#

Swin UNETR based on: “Hatamizadeh et al., Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images <https://arxiv.org/abs/2201.01266>”

__init__(img_size, in_channels, out_channels, depths=(2, 2, 2, 2), num_heads=(3, 6, 12, 24), feature_size=24, norm_name='instance', drop_rate=0.0, attn_drop_rate=0.0, dropout_path_rate=0.0, normalize=True, use_checkpoint=False, spatial_dims=3, downsample='merging', use_v2=False)[source]#

Parameters:

img_size – spatial dimension of input image. This argument is only used for checking that the input image size is divisible by the patch size. The tensor passed to forward() can have a dynamic shape as long as its spatial dimensions are divisible by 2**5. It will be removed in an upcoming version.
in_channels – dimension of input channels.
out_channels – dimension of output channels.
feature_size – dimension of network feature size.
depths – number of layers in each stage.
num_heads – number of attention heads.
norm_name – feature normalization type and arguments.
drop_rate – dropout rate.
attn_drop_rate – attention dropout rate.
dropout_path_rate – drop path rate.
normalize – normalize output intermediate features in each stage.
use_checkpoint – use gradient checkpointing for reduced memory usage.
spatial_dims – number of spatial dims.
downsample – module used for downsampling, available options are “mergingv2”, “merging” and a user-specified nn.Module following the API defined in monai.networks.nets.PatchMerging. The default is currently “merging” (the original version defined in v0.9.0).
use_v2 – using swinunetr_v2, which adds a residual convolution block at the beggining of each swin stage.

Examples:

# for 3D single channel input with size (96,96,96), 4-channel output and feature size of 48.
>>> net = SwinUNETR(img_size=(96,96,96), in_channels=1, out_channels=4, feature_size=48)

# for 3D 4-channel input with size (128,128,128), 3-channel output and (2,4,2,2) layers in each stage.
>>> net = SwinUNETR(img_size=(128,128,128), in_channels=4, out_channels=3, depths=(2,4,2,2))

# for 2D single channel input with size (96,96), 2-channel output and gradient checkpointing.
>>> net = SwinUNETR(img_size=(96,96), in_channels=3, out_channels=2, use_checkpoint=True, spatial_dims=2)

forward(x_in)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

BasicUNet#

class monai.networks.nets.BasicUNet(spatial_dims=3, in_channels=1, out_channels=2, features=(32, 32, 64, 128, 256, 32), act=('LeakyReLU', {'inplace': True, 'negative_slope': 0.1}), norm=('instance', {'affine': True}), bias=True, dropout=0.0, upsample='deconv')[source]#

__init__(spatial_dims=3, in_channels=1, out_channels=2, features=(32, 32, 64, 128, 256, 32), act=('LeakyReLU', {'inplace': True, 'negative_slope': 0.1}), norm=('instance', {'affine': True}), bias=True, dropout=0.0, upsample='deconv')[source]#

A UNet implementation with 1D/2D/3D supports.

Based on:

Falk et al. “U-Net – Deep Learning for Cell Counting, Detection, and Morphometry”. Nature Methods 16, 67–70 (2019), DOI: http://dx.doi.org/10.1038/s41592-018-0261-2

Parameters:

spatial_dims – number of spatial dimensions. Defaults to 3 for spatial 3D inputs.
in_channels – number of input channels. Defaults to 1.
out_channels – number of output channels. Defaults to 2.
features –
six integers as numbers of features. Defaults to (32, 32, 64, 128, 256, 32),
- the first five values correspond to the five-level encoder feature sizes.
- the last value corresponds to the feature size after the last upsampling.
act – activation type and arguments. Defaults to LeakyReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
bias –
whether to have a bias term in convolution blocks. Defaults to True. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.
dropout – dropout ratio. Defaults to no dropout.
upsample – upsampling mode, available options are "deconv", "pixelshuffle", "nontrainable".

Examples:

# for spatial 2D
>>> net = BasicUNet(spatial_dims=2, features=(64, 128, 256, 512, 1024, 128))

# for spatial 2D, with group norm
>>> net = BasicUNet(spatial_dims=2, features=(64, 128, 256, 512, 1024, 128), norm=("group", {"num_groups": 4}))

# for spatial 3D
>>> net = BasicUNet(spatial_dims=3, features=(32, 32, 64, 128, 256, 32))

BasicUNetPlusPlus#

class monai.networks.nets.BasicUNetPlusPlus(spatial_dims=3, in_channels=1, out_channels=2, features=(32, 32, 64, 128, 256, 32), deep_supervision=False, act=('LeakyReLU', {'inplace': True, 'negative_slope': 0.1}), norm=('instance', {'affine': True}), bias=True, dropout=0.0, upsample='deconv')[source]#

__init__(spatial_dims=3, in_channels=1, out_channels=2, features=(32, 32, 64, 128, 256, 32), deep_supervision=False, act=('LeakyReLU', {'inplace': True, 'negative_slope': 0.1}), norm=('instance', {'affine': True}), bias=True, dropout=0.0, upsample='deconv')[source]#

A UNet++ implementation with 1D/2D/3D supports.

Based on:

Zhou et al. “UNet++: A Nested U-Net Architecture for Medical Image Segmentation”. 4th Deep Learning in Medical Image Analysis (DLMIA) Workshop, DOI: https://doi.org/10.48550/arXiv.1807.10165

Parameters:

spatial_dims – number of spatial dimensions. Defaults to 3 for spatial 3D inputs.
in_channels – number of input channels. Defaults to 1.
out_channels – number of output channels. Defaults to 2.
features –
six integers as numbers of features. Defaults to (32, 32, 64, 128, 256, 32),
- the first five values correspond to the five-level encoder feature sizes.
- the last value corresponds to the feature size after the last upsampling.
deep_supervision – whether to prune the network at inference time. Defaults to False. If true, returns a list, whose elements correspond to outputs at different nodes.
act – activation type and arguments. Defaults to LeakyReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
bias –
whether to have a bias term in convolution blocks. Defaults to True. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.
dropout – dropout ratio. Defaults to no dropout.
upsample – upsampling mode, available options are "deconv", "pixelshuffle", "nontrainable".

Examples:

# for spatial 2D
>>> net = BasicUNetPlusPlus(spatial_dims=2, features=(64, 128, 256, 512, 1024, 128))

# for spatial 2D, with deep supervision enabled
>>> net = BasicUNetPlusPlus(spatial_dims=2, features=(64, 128, 256, 512, 1024, 128), deep_supervision=True)

# for spatial 2D, with group norm
>>> net = BasicUNetPlusPlus(spatial_dims=2, features=(64, 128, 256, 512, 1024, 128), norm=("group", {"num_groups": 4}))

# for spatial 3D
>>> net = BasicUNetPlusPlus(spatial_dims=3, features=(32, 32, 64, 128, 256, 32))

See Also

monai.networks.nets.BasicUNet
monai.networks.nets.DynUNet
monai.networks.nets.UNet

forward(x)[source]#

Parameters:: x (Tensor) – input should have spatially N dimensions (Batch, in_channels, dim_0[, dim_1, ..., dim_N-1]), N is defined by dimensions. It is recommended to have dim_n % 16 == 0 to ensure all maxpooling inputs have even edge lengths.
Returns:: A torch Tensor of “raw” predictions in shape (Batch, out_channels, dim_0[, dim_1, ..., dim_N-1]).

monai.networks.nets.BasicUnetPlusPlus#: alias of BasicUNetPlusPlus

monai.networks.nets.BasicunetPlusPlus#: alias of BasicUNetPlusPlus

FlexibleUNet#

class monai.networks.nets.FlexibleUNet(in_channels, out_channels, backbone, pretrained=False, decoder_channels=(256, 128, 64, 32, 16), spatial_dims=2, norm=('batch', {'eps': 0.001, 'momentum': 0.1}), act=('relu', {'inplace': True}), dropout=0.0, decoder_bias=False, upsample='nontrainable', pre_conv='default', interp_mode='nearest', is_pad=True)[source]#

A flexible implementation of UNet-like encoder-decoder architecture.

__init__(in_channels, out_channels, backbone, pretrained=False, decoder_channels=(256, 128, 64, 32, 16), spatial_dims=2, norm=('batch', {'eps': 0.001, 'momentum': 0.1}), act=('relu', {'inplace': True}), dropout=0.0, decoder_bias=False, upsample='nontrainable', pre_conv='default', interp_mode='nearest', is_pad=True)[source]#

A flexible implement of UNet, in which the backbone/encoder can be replaced with any efficient or residual network. Currently the input must have a 2 or 3 spatial dimension and the spatial size of each dimension must be a multiple of 32 if is_pad parameter is False. Please notice each output of backbone must be 2x downsample in spatial dimension of last output. For example, if given a 512x256 2D image and a backbone with 4 outputs. Spatial size of each encoder output should be 256x128, 128x64, 64x32 and 32x16.

Parameters:

in_channels – number of input channels.
out_channels – number of output channels.
backbone – name of backbones to initialize, only support efficientnet and resnet right now, can be from [efficientnet-b0, …, efficientnet-b8, efficientnet-l2, resnet10, …, resnet200].
pretrained – whether to initialize pretrained weights. ImageNet weights are available for efficient networks if spatial_dims=2 and batch norm is used. MedicalNet weights are available for residual networks if spatial_dims=3 and in_channels=1. Default to False.
decoder_channels – number of output channels for all feature maps in decoder. len(decoder_channels) should equal to len(encoder_channels) - 1,default to (256, 128, 64, 32, 16).
spatial_dims – number of spatial dimensions, default to 2.
norm – normalization type and arguments, default to (“batch”, {“eps”: 1e-3, “momentum”: 0.1}).
act – activation type and arguments, default to (“relu”, {“inplace”: True}).
dropout – dropout ratio, default to 0.0.
decoder_bias – whether to have a bias term in decoder’s convolution blocks.
upsample – upsampling mode, available options are``”deconv”, ``"pixelshuffle", "nontrainable".
pre_conv – a conv block applied before upsampling. Only used in the “nontrainable” or “pixelshuffle” mode, default to default.
interp_mode – {"nearest", "linear", "bilinear", "bicubic", "trilinear"} Only used in the “nontrainable” mode.
is_pad – whether to pad upsampling features to fit features from encoder. Default to True. If this parameter is set to “True”, the spatial dim of network input can be arbitrary size, which is not supported by TensorRT. Otherwise, it must be a multiple of 32.

forward(inputs)[source]#

Do a typical encoder-decoder-header inference.

Parameters:: inputs (Tensor) – input should have spatially N dimensions (Batch, in_channels, dim_0[, dim_1, ..., dim_N]), N is defined by dimensions.
Returns:: A torch Tensor of “raw” predictions in shape (Batch, out_channels, dim_0[, dim_1, ..., dim_N]).

VNet#

class monai.networks.nets.VNet(spatial_dims=3, in_channels=1, out_channels=1, act=('elu', {'inplace': True}), dropout_prob=0.5, dropout_prob_down=0.5, dropout_prob_up=(0.5, 0.5), dropout_dim=3, bias=False)[source]#

V-Net based on Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Adapted from the official Caffe implementation. and another pytorch implementation. The model supports 2D or 3D inputs.

Parameters:

spatial_dims – spatial dimension of the input data. Defaults to 3.
in_channels – number of input channels for the network. Defaults to 1. The value should meet the condition that 16 % in_channels == 0.
out_channels – number of output channels for the network. Defaults to 1.
act – activation type in the network. Defaults to ("elu", {"inplace": True}).
dropout_prob_down – dropout ratio for DownTransition blocks. Defaults to 0.5.
dropout_prob_up – dropout ratio for UpTransition blocks. Defaults to (0.5, 0.5).
dropout_dim –
determine the dimensions of dropout. Defaults to (0.5, 0.5).
- dropout_dim = 1, randomly zeroes some of the elements for each channel.
- dropout_dim = 2, Randomly zeroes out entire channels (a channel is a 2D feature map).
- dropout_dim = 3, Randomly zeroes out entire channels (a channel is a 3D feature map).
bias –
whether to have a bias term in convolution blocks. Defaults to False. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.

Deprecated since version 1.2: dropout_prob is deprecated in favor of dropout_prob_down and dropout_prob_up.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

RegUNet#

class monai.networks.nets.RegUNet(spatial_dims, in_channels, num_channel_initial, depth, out_kernel_initializer='kaiming_uniform', out_activation=None, out_channels=3, extract_levels=None, pooling=True, concat_skip=False, encode_kernel_sizes=3)[source]#

Class that implements an adapted UNet. This class also serve as the parent class of LocalNet and GlobalNet

Reference:: O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”, Lecture Notes in Computer Science, 2015, vol. 9351, pp. 234–241. https://arxiv.org/abs/1505.04597
Adapted from:: DeepReg (DeepRegNet/DeepReg)

__init__(spatial_dims, in_channels, num_channel_initial, depth, out_kernel_initializer='kaiming_uniform', out_activation=None, out_channels=3, extract_levels=None, pooling=True, concat_skip=False, encode_kernel_sizes=3)[source]#

Parameters:

spatial_dims – number of spatial dims
in_channels – number of input channels
num_channel_initial – number of initial channels
depth – input is at level 0, bottom is at level depth.
out_kernel_initializer – kernel initializer for the last layer
out_activation – activation at the last layer
out_channels – number of channels for the output
extract_levels – list, which levels from net to extract. The maximum level must equal to depth
pooling – for down-sampling, use non-parameterized pooling if true, otherwise use conv
concat_skip – when up-sampling, concatenate skipped tensor if true, otherwise use addition
encode_kernel_sizes – kernel size for down-sampling

forward(x)[source]#

Parameters:: x – Tensor in shape (batch, in_channels, insize_1, insize_2, [insize_3])
Returns:: Tensor in shape (batch, out_channels, insize_1, insize_2, [insize_3]), with the same spatial size as x

GlobalNet#

class monai.networks.nets.GlobalNet(image_size, spatial_dims, in_channels, num_channel_initial, depth, out_kernel_initializer='kaiming_uniform', out_activation=None, pooling=True, concat_skip=False, encode_kernel_sizes=3, save_theta=False)[source]#

Build GlobalNet for image registration.

Reference:: Hu, Yipeng, et al. “Label-driven weakly-supervised learning for multimodal deformable image registration,” https://arxiv.org/abs/1711.01666

__init__(image_size, spatial_dims, in_channels, num_channel_initial, depth, out_kernel_initializer='kaiming_uniform', out_activation=None, pooling=True, concat_skip=False, encode_kernel_sizes=3, save_theta=False)[source]#

Parameters:

image_size – output displacement field spatial size
spatial_dims – number of spatial dims
in_channels – number of input channels
num_channel_initial – number of initial channels
depth – input is at level 0, bottom is at level depth.
out_kernel_initializer – kernel initializer for the last layer
out_activation – activation at the last layer
pooling – for down-sampling, use non-parameterized pooling if true, otherwise use conv
concat_skip – when up-sampling, concatenate skipped tensor if true, otherwise use addition
encode_kernel_sizes – kernel size for down-sampling
save_theta – whether to save the theta matrix estimation

LocalNet#

class monai.networks.nets.LocalNet(spatial_dims, in_channels, num_channel_initial, extract_levels, out_kernel_initializer='kaiming_uniform', out_activation=None, out_channels=3, pooling=True, use_additive_sampling=True, concat_skip=False, mode='nearest', align_corners=None)[source]#

Reimplementation of LocalNet, based on: Weakly-supervised convolutional neural networks for multimodal image registration. Label-driven weakly-supervised learning for multimodal deformable image registration.

Adapted from:: DeepReg (DeepRegNet/DeepReg)

__init__(spatial_dims, in_channels, num_channel_initial, extract_levels, out_kernel_initializer='kaiming_uniform', out_activation=None, out_channels=3, pooling=True, use_additive_sampling=True, concat_skip=False, mode='nearest', align_corners=None)[source]#

Parameters:

spatial_dims – number of spatial dims
in_channels – number of input channels
num_channel_initial – number of initial channels
out_kernel_initializer – kernel initializer for the last layer
out_activation – activation at the last layer
out_channels – number of channels for the output
extract_levels – list, which levels from net to extract. The maximum level must equal to depth
pooling – for down-sampling, use non-parameterized pooling if true, otherwise use conv3d
use_additive_sampling – whether use additive up-sampling layer for decoding.
concat_skip – when up-sampling, concatenate skipped tensor if true, otherwise use addition
mode – mode for interpolation when use_additive_sampling, default is “nearest”.
align_corners – align_corners for interpolation when use_additive_sampling, default is None.

AutoEncoder#

class monai.networks.nets.AutoEncoder(spatial_dims, in_channels, out_channels, channels, strides, kernel_size=3, up_kernel_size=3, num_res_units=0, inter_channels=None, inter_dilations=None, num_inter_units=2, act='PRELU', norm='INSTANCE', dropout=None, bias=True, padding=None)[source]#

Simple definition of an autoencoder and base class for the architecture implementing monai.networks.nets.VarAutoEncoder. The network is composed of an encode sequence of blocks, followed by an intermediary sequence of blocks, and finally a decode sequence of blocks. The encode and decode blocks are default monai.networks.blocks.Convolution instances with the encode blocks having the given stride and the decode blocks having transpose convolutions with the same stride. If num_res_units is given residual blocks are used instead.

By default the intermediary sequence is empty but if inter_channels is given to specify the output channels of blocks then this will be become a sequence of Convolution blocks or of residual blocks if num_inter_units is given. The optional parameter inter_dilations can be used to specify the dilation values of the convolutions in these blocks, this allows a network to use dilated kernels in this middle section. Since the intermediary section isn’t meant to change the size of the output the strides for all these kernels is 1.

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of input channels.
out_channels – number of output channels.
channels – sequence of channels. Top block first. The length of channels should be no less than 2.
strides – sequence of convolution strides. The length of stride should equal to len(channels) - 1.
kernel_size – convolution kernel size, the value(s) should be odd. If sequence, its length should equal to dimensions. Defaults to 3.
up_kernel_size – upsampling convolution kernel size, the value(s) should be odd. If sequence, its length should equal to dimensions. Defaults to 3.
num_res_units – number of residual units. Defaults to 0.
inter_channels – sequence of channels defining the blocks in the intermediate layer between encode and decode.
inter_dilations – defines the dilation value for each block of the intermediate layer. Defaults to 1.
num_inter_units – number of residual units for each block of the intermediate layer. Defaults to 0.
act – activation type and arguments. Defaults to PReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
dropout – dropout ratio. Defaults to no dropout.
bias –
whether to have a bias term in convolution blocks. Defaults to True. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.
padding – controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension in convolution blocks. Defaults to None.

Examples:

from monai.networks.nets import AutoEncoder

# 3 layers each down/up sampling their inputs by a factor 2 with no intermediate layer
net = AutoEncoder(
    spatial_dims=2,
    in_channels=1,
    out_channels=1,
    channels=(2, 4, 8),
    strides=(2, 2, 2)
)

# 1 layer downsampling by 2, followed by a sequence of residual units with 2 convolutions defined by
# progressively increasing dilations, then final upsample layer
net = AutoEncoder(
        spatial_dims=2,
        in_channels=1,
        out_channels=1,
        channels=(4,),
        strides=(2,),
        inter_channels=(8, 8, 8),
        inter_dilations=(1, 2, 4),
        num_inter_units=2
    )

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Any

VarAutoEncoder#

class monai.networks.nets.VarAutoEncoder(spatial_dims, in_shape, out_channels, latent_size, channels, strides, kernel_size=3, up_kernel_size=3, num_res_units=0, inter_channels=None, inter_dilations=None, num_inter_units=2, act='PRELU', norm='INSTANCE', dropout=None, bias=True, use_sigmoid=True)[source]#

Variational Autoencoder based on the paper - https://arxiv.org/abs/1312.6114

Parameters:

spatial_dims – number of spatial dimensions.
in_shape – shape of input data starting with channel dimension.
out_channels – number of output channels.
latent_size – size of the latent variable.
channels – sequence of channels. Top block first. The length of channels should be no less than 2.
strides – sequence of convolution strides. The length of stride should equal to len(channels) - 1.
kernel_size – convolution kernel size, the value(s) should be odd. If sequence, its length should equal to dimensions. Defaults to 3.
up_kernel_size – upsampling convolution kernel size, the value(s) should be odd. If sequence, its length should equal to dimensions. Defaults to 3.
num_res_units – number of residual units. Defaults to 0.
inter_channels – sequence of channels defining the blocks in the intermediate layer between encode and decode.
inter_dilations – defines the dilation value for each block of the intermediate layer. Defaults to 1.
num_inter_units – number of residual units for each block of the intermediate layer. Defaults to 0.
act – activation type and arguments. Defaults to PReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
dropout – dropout ratio. Defaults to no dropout.
bias –
whether to have a bias term in convolution blocks. Defaults to True. According to Performance Tuning Guide, if a conv layer is directly followed by a batch norm layer, bias should be False.
use_sigmoid – whether to use the sigmoid function on final output. Defaults to True.

Examples:

from monai.networks.nets import VarAutoEncoder

# 3 layer network accepting images with dimensions (1, 32, 32) and using a latent vector with 2 values
model = VarAutoEncoder(
    spatial_dims=2,
    in_shape=(32, 32),  # image spatial shape
    out_channels=1,
    latent_size=2,
    channels=(16, 32, 64),
    strides=(1, 2, 2),
)

See also

Variational autoencoder network with MedNIST Dataset Project-MONAI/tutorials

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: tuple[Tensor, Tensor, Tensor, Tensor]

ViT#

class monai.networks.nets.ViT(in_channels, img_size, patch_size, hidden_size=768, mlp_dim=3072, num_layers=12, num_heads=12, pos_embed='conv', proj_type='conv', pos_embed_type='learnable', classification=False, num_classes=2, dropout_rate=0.0, spatial_dims=3, post_activation='Tanh', qkv_bias=False, save_attn=False)[source]#

Vision Transformer (ViT), based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”

ViT supports Torchscript but only works for Pytorch after 1.8.

__init__(in_channels, img_size, patch_size, hidden_size=768, mlp_dim=3072, num_layers=12, num_heads=12, pos_embed='conv', proj_type='conv', pos_embed_type='learnable', classification=False, num_classes=2, dropout_rate=0.0, spatial_dims=3, post_activation='Tanh', qkv_bias=False, save_attn=False)[source]#

Parameters:

in_channels (int) – dimension of input channels.
img_size (Union[Sequence[int], int]) – dimension of input image.
patch_size (Union[Sequence[int], int]) – dimension of patch size.
hidden_size (int, optional) – dimension of hidden layer. Defaults to 768.
mlp_dim (int, optional) – dimension of feedforward layer. Defaults to 3072.
num_layers (int, optional) – number of transformer blocks. Defaults to 12.
num_heads (int, optional) – number of attention heads. Defaults to 12.
proj_type (str, optional) – patch embedding layer type. Defaults to “conv”.
pos_embed_type (str, optional) – position embedding type. Defaults to “learnable”.
classification (bool, optional) – bool argument to determine if classification is used. Defaults to False.
num_classes (int, optional) – number of classes if classification is used. Defaults to 2.
dropout_rate (float, optional) – fraction of the input units to drop. Defaults to 0.0.
spatial_dims (int, optional) – number of spatial dimensions. Defaults to 3.
post_activation (str, optional) – add a final acivation function to the classification head when classification is True. Default to “Tanh” for nn.Tanh(). Set to other values to remove this function.
qkv_bias (bool, optional) – apply bias to the qkv linear layer in self attention block. Defaults to False.
save_attn (bool, optional) – to make accessible the attention in self attention block. Defaults to False.

Deprecated since version 1.4: pos_embed is deprecated in favor of proj_type.

Examples:

# for single channel input with image size of (96,96,96), conv position embedding and segmentation backbone
>>> net = ViT(in_channels=1, img_size=(96,96,96), proj_type='conv', pos_embed_type='sincos')

# for 3-channel with image size of (128,128,128), 24 layers and classification backbone
>>> net = ViT(in_channels=3, img_size=(128,128,128), proj_type='conv', pos_embed_type='sincos', classification=True)

# for 3-channel with image size of (224,224), 12 layers and classification backbone
>>> net = ViT(in_channels=3, img_size=(224,224), proj_type='conv', pos_embed_type='sincos', classification=True,
>>>           spatial_dims=2)

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

ViTAutoEnc#

class monai.networks.nets.ViTAutoEnc(in_channels, img_size, patch_size, out_channels=1, deconv_chns=16, hidden_size=768, mlp_dim=3072, num_layers=12, num_heads=12, pos_embed='conv', proj_type='conv', dropout_rate=0.0, spatial_dims=3, qkv_bias=False, save_attn=False)[source]#

Vision Transformer (ViT), based on: “Dosovitskiy et al., An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>”

Modified to also give same dimension outputs as the input size of the image

__init__(in_channels, img_size, patch_size, out_channels=1, deconv_chns=16, hidden_size=768, mlp_dim=3072, num_layers=12, num_heads=12, pos_embed='conv', proj_type='conv', dropout_rate=0.0, spatial_dims=3, qkv_bias=False, save_attn=False)[source]#

Parameters:

in_channels – dimension of input channels or the number of channels for input.
img_size – dimension of input image.
patch_size – dimension of patch size
out_channels – number of output channels. Defaults to 1.
deconv_chns – number of channels for the deconvolution layers. Defaults to 16.
hidden_size – dimension of hidden layer. Defaults to 768.
mlp_dim – dimension of feedforward layer. Defaults to 3072.
num_layers – number of transformer blocks. Defaults to 12.
num_heads – number of attention heads. Defaults to 12.
proj_type – position embedding layer type. Defaults to “conv”.
dropout_rate – fraction of the input units to drop. Defaults to 0.0.
spatial_dims – number of spatial dimensions. Defaults to 3.
qkv_bias – apply bias to the qkv linear layer in self attention block. Defaults to False.
save_attn – to make accessible the attention in self attention block. Defaults to False. Defaults to False.

Deprecated since version 1.4: pos_embed is deprecated in favor of proj_type.

Examples:

# for single channel input with image size of (96,96,96), conv position embedding and segmentation backbone
# It will provide an output of same size as that of the input
>>> net = ViTAutoEnc(in_channels=1, patch_size=(16,16,16), img_size=(96,96,96), proj_type='conv')

# for 3-channel with image size of (128,128,128), output will be same size as of input
>>> net = ViTAutoEnc(in_channels=3, patch_size=(16,16,16), img_size=(128,128,128), proj_type='conv')

forward(x)[source]#

Parameters:: x – input tensor must have isotropic spatial dimensions, such as [batch_size, channels, sp_size, sp_size[, sp_size]].

FullyConnectedNet#

class monai.networks.nets.FullyConnectedNet(in_channels, out_channels, hidden_channels, dropout=None, act='PRELU', bias=True, adn_ordering=None)[source]#

Simple full-connected layer neural network composed of a sequence of linear layers with PReLU activation and dropout. The network accepts input with in_channels channels, has output with out_channels channels, and hidden layer output channels given in hidden_channels. If bias is True then linear units have a bias term.

Parameters:

in_channels – number of input channels.
out_channels – number of output channels.
hidden_channels – number of output channels for each hidden layer.
dropout – dropout ratio. Defaults to no dropout.
act – activation type and arguments. Defaults to PReLU.
bias – whether to have a bias term in linear units. Defaults to True.
adn_ordering – order of operations in monai.networks.blocks.ADN.

Examples:

# accepts 4 values and infers 3 values as output, has 3 hidden layers with 10, 20, 10 values as output
net = FullyConnectedNet(4, 3, [10, 20, 10], dropout=0.2)

__init__(in_channels, out_channels, hidden_channels, dropout=None, act='PRELU', bias=True, adn_ordering=None)[source]#: Defines a network accept input with in_channels channels, output of out_channels channels, and hidden layers with channels given in hidden_channels. If bias is True then linear units have a bias term.

VarFullyConnectedNet#

class monai.networks.nets.VarFullyConnectedNet(in_channels, out_channels, latent_size, encode_channels, decode_channels, dropout=None, act='PRELU', bias=True, adn_ordering=None)[source]#

Variational fully-connected network. This is composed of an encode layer, reparameterization layer, and then a decode layer.

Parameters:

in_channels – number of input channels.
out_channels – number of output channels.
latent_size – number of latent variables to use.
encode_channels – number of output channels for each hidden layer of the encode half.
decode_channels – number of output channels for each hidden layer of the decode half.
dropout – dropout ratio. Defaults to no dropout.
act – activation type and arguments. Defaults to PReLU.
bias – whether to have a bias term in linear units. Defaults to True.
adn_ordering – order of operations in monai.networks.blocks.ADN.

Examples:

# accepts inputs with 4 values, uses a latent space of 2 variables, and produces outputs of 3 values
net = VarFullyConnectedNet(4, 3, 2, [5, 10], [10, 5])

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: tuple[Tensor, Tensor, Tensor, Tensor]

Generator#

class monai.networks.nets.Generator(latent_shape, start_shape, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=None, bias=True)[source]#

Defines a simple generator network accepting a latent vector and through a sequence of convolution layers constructs an output tensor of greater size and high dimensionality. The method _get_layer is used to create each of these layers, override this method to define layers beyond the default monai.networks.blocks.Convolution or monai.networks.blocks.ResidualUnit layers.

The layers are constructed using the values in the channels and strides arguments, the number being defined by the length of these (which must match). Input is first passed through a torch.nn.Linear layer to convert the input vector to an image tensor with dimensions start_shape. This passes through the convolution layers and is progressively upsampled if the strides values are greater than 1 using transpose convolutions. The size of the final output is defined by the start_shape dimension and the amount of upsampling done through strides. In the default definition the size of the output’s spatial dimensions will be that of start_shape multiplied by the product of strides, thus the example network below upsamples an starting size of (64, 8, 8) to (1, 64, 64) since its strides are (2, 2, 2).

Parameters:

latent_shape – tuple of integers stating the dimension of the input latent vector (minus batch dimension)
start_shape – tuple of integers stating the dimension of the tensor to pass to convolution subnetwork
channels – tuple of integers stating the output channels of each convolutional layer
strides – tuple of integers stating the stride (upscale factor) of each convolutional layer
kernel_size – integer or tuple of integers stating size of convolutional kernels
num_res_units – integer stating number of convolutions in residual units, 0 means no residual units
act – name or type defining activation layers
norm – name or type defining normalization layers
dropout – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout
bias – boolean stating if convolution layers should have a bias component

Examples:

# 3 layers, latent input vector of shape (42, 24), output volume of shape (1, 64, 64)
net = Generator((42, 24), (64, 8, 8), (32, 16, 1), (2, 2, 2))

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

Regressor#

class monai.networks.nets.Regressor(in_shape, out_shape, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=None, bias=True)[source]#

This defines a network for relating large-sized input tensors to small output tensors, ie. regressing large values to a prediction. An output of a single dimension can be used as value regression or multi-label classification prediction, an output of a single value can be used as a discriminator or critic prediction.

The network is constructed as a sequence of layers, either monai.networks.blocks.Convolution or monai.networks.blocks.ResidualUnit, with a final fully-connected layer resizing the output from the blocks to the final size. Each block is defined with a stride value typically used to downsample the input using strided convolutions. In this way each block progressively condenses information from the input into a deep representation the final fully-connected layer relates to a final result.

Parameters:

in_shape – tuple of integers stating the dimension of the input tensor (minus batch dimension)
out_shape – tuple of integers stating the dimension of the final output tensor (minus batch dimension)
channels – tuple of integers stating the output channels of each convolutional layer
strides – tuple of integers stating the stride (downscale factor) of each convolutional layer
kernel_size – integer or tuple of integers stating size of convolutional kernels
num_res_units – integer stating number of convolutions in residual units, 0 means no residual units
act – name or type defining activation layers
norm – name or type defining normalization layers
dropout – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout
bias – boolean stating if convolution layers should have a bias component

Examples:

# infers a 2-value result (eg. a 2D cartesian coordinate) from a 64x64 image
net = Regressor((1, 64, 64), (2,), (2, 4, 8), (2, 2, 2))

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

Classifier#

class monai.networks.nets.Classifier(in_shape, classes, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=None, bias=True, last_act=None)[source]#

Defines a classification network from Regressor by specifying the output shape as a single dimensional tensor with size equal to the number of classes to predict. The final activation function can also be specified, eg. softmax or sigmoid.

Parameters:

in_shape – tuple of integers stating the dimension of the input tensor (minus batch dimension)
classes – integer stating the dimension of the final output tensor
channels – tuple of integers stating the output channels of each convolutional layer
strides – tuple of integers stating the stride (downscale factor) of each convolutional layer
kernel_size – integer or tuple of integers stating size of convolutional kernels
num_res_units – integer stating number of convolutions in residual units, 0 means no residual units
act – name or type defining activation layers
norm – name or type defining normalization layers
dropout – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout
bias – boolean stating if convolution layers should have a bias component
last_act – name defining the last activation layer

Discriminator#

class monai.networks.nets.Discriminator(in_shape, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=0.25, bias=True, last_act='SIGMOID')[source]#

Defines a discriminator network from Classifier with a single output value and sigmoid activation by default. This is meant for use with GANs or other applications requiring a generic discriminator network.

Parameters:

in_shape – tuple of integers stating the dimension of the input tensor (minus batch dimension)
channels – tuple of integers stating the output channels of each convolutional layer
strides – tuple of integers stating the stride (downscale factor) of each convolutional layer
kernel_size – integer or tuple of integers stating size of convolutional kernels
num_res_units – integer stating number of convolutions in residual units, 0 means no residual units
act – name or type defining activation layers
norm – name or type defining normalization layers
dropout – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout
bias – boolean stating if convolution layers should have a bias component
last_act – name defining the last activation layer

Critic#

class monai.networks.nets.Critic(in_shape, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=0.25, bias=True)[source]#

Defines a critic network from Classifier with a single output value and no final activation. The final layer is nn.Flatten instead of nn.Linear, the final result is computed as the mean over the first dimension. This is meant to be used with Wasserstein GANs.

Parameters:

in_shape – tuple of integers stating the dimension of the input tensor (minus batch dimension)
channels – tuple of integers stating the output channels of each convolutional layer
strides – tuple of integers stating the stride (downscale factor) of each convolutional layer
kernel_size – integer or tuple of integers stating size of convolutional kernels
num_res_units – integer stating number of convolutions in residual units, 0 means no residual units
act – name or type defining activation layers
norm – name or type defining normalization layers
dropout – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout
bias – boolean stating if convolution layers should have a bias component

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

Transchex#

class monai.networks.nets.Transchex(in_channels, img_size, patch_size, num_classes, num_language_layers, num_vision_layers, num_mixed_layers, hidden_size=768, drop_out=0.0, attention_probs_dropout_prob=0.1, gradient_checkpointing=False, hidden_act='gelu', hidden_dropout_prob=0.1, initializer_range=0.02, intermediate_size=3072, layer_norm_eps=1e-12, max_position_embeddings=512, model_type='bert', num_attention_heads=12, num_hidden_layers=12, pad_token_id=0, position_embedding_type='absolute', transformers_version='4.10.2', type_vocab_size=2, use_cache=True, vocab_size=30522, chunk_size_feed_forward=0, is_decoder=False, add_cross_attention=False, path_or_repo_id='bert-base-uncased', filename='pytorch_model.bin')[source]#

TransChex based on: “Hatamizadeh et al.,TransCheX: Self-Supervised Pretraining of Vision-Language Transformers for Chest X-ray Analysis”

__init__(in_channels, img_size, patch_size, num_classes, num_language_layers, num_vision_layers, num_mixed_layers, hidden_size=768, drop_out=0.0, attention_probs_dropout_prob=0.1, gradient_checkpointing=False, hidden_act='gelu', hidden_dropout_prob=0.1, initializer_range=0.02, intermediate_size=3072, layer_norm_eps=1e-12, max_position_embeddings=512, model_type='bert', num_attention_heads=12, num_hidden_layers=12, pad_token_id=0, position_embedding_type='absolute', transformers_version='4.10.2', type_vocab_size=2, use_cache=True, vocab_size=30522, chunk_size_feed_forward=0, is_decoder=False, add_cross_attention=False, path_or_repo_id='bert-base-uncased', filename='pytorch_model.bin')[source]#

Parameters:

in_channels – dimension of input channels.
img_size – dimension of input image.
patch_size – dimension of patch size.
num_classes – number of classes if classification is used.
num_language_layers – number of language transformer layers.
num_vision_layers – number of vision transformer layers.
num_mixed_layers – number of mixed transformer layers.
drop_out – fraction of the input units to drop.
path_or_repo_id – This can be either: - a string, the model id of a model repo on huggingface.co. - a path to a directory potentially containing the file.
filename – The name of the file to locate in path_or_repo.

The other parameters are part of the bert_config to MultiModal.from_pretrained.

Examples:

# for 3-channel with image size of (224,224), patch size of (32,32), 3 classes, 2 language layers,
# 2 vision layers, 2 mixed modality layers and dropout of 0.2 in the classification head
net = Transchex(in_channels=3,
                     img_size=(224, 224),
                     num_classes=3,
                     num_language_layers=2,
                     num_vision_layers=2,
                     num_mixed_layers=2,
                     drop_out=0.2)

forward(input_ids, token_type_ids=None, vision_feats=None)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

NetAdapter#

class monai.networks.nets.NetAdapter(model, num_classes=1, dim=2, in_channels=None, use_conv=False, pool=('avg', {'kernel_size': 7, 'stride': 1}), bias=True, fc_name='fc', node_name='')[source]#

Wrapper to replace the last layer of model by convolutional layer or FC layer.

Parameters:

model – a PyTorch model, which can be both 2D and 3D models. typically, it can be a pretrained model in Torchvision, like: resnet18, resnet34, resnet50, resnet101, resnet152, etc. more details: https://pytorch.org/vision/stable/models.html.
num_classes – number of classes for the last classification layer. Default to 1.
dim – number of supported spatial dimensions in the specified model, depends on the model implementation. default to 2 as most Torchvision models are for 2D image processing.
in_channels – number of the input channels of last layer. if None, get it from in_features of last layer.
use_conv – whether to use convolutional layer to replace the last layer, default to False.
pool – parameters for the pooling layer, it should be a tuple, the first item is name of the pooling layer, the second item is dictionary of the initialization args. if None, will not replace the layers[-2]. default to (“avg”, {“kernel_size”: 7, “stride”: 1}).
bias – the bias value when replacing the last layer. if False, the layer will not learn an additive bias, default to True.
fc_name – the corresponding layer attribute of the last fully connected layer. Defaults to "fc".
node_name – the corresponding feature extractor node name of model. Defaults to “”, the extractor is not in use.

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

TorchVisionFCModel#

class monai.networks.nets.TorchVisionFCModel(model_name='resnet18', num_classes=1, dim=2, in_channels=None, use_conv=False, pool=('avg', {'kernel_size': 7, 'stride': 1}), bias=True, pretrained=False, fc_name='fc', node_name='', weights=None, **kwargs)[source]#

Customize the fully connected layer of (pretrained) TorchVision model or replace it by convolutional layer.

This class supports two primary use cases:

use pool=None to indicate no modification in the pooling layers. It should be used with fc_name to locate the target FC layer to modify: In this case, the class will load a torchvision classification model, replace the last fully connected (FC) layer with a new FC layer with num_classes outputs, example input arguments: use_conv=False, pool=None, fc_name="heads.head". The heads.head specifies the target FC of the input model, could be found by model.named_modules(), for example:
from torchvision.models import vit_b_16
print([name[0] for name in vit_b_16().named_modules()])
use pool="" or set it to a tuple of pooling parameters to indicate modifications of both the pooling and the FC layer. It should be used with node_name to locate the model feature outputs: In this case, the class will load a torchvision model, remove the existing pooling and FC layers, and

append an additional convolution layer: use_conv=True, pool="", node_name="permute"

append an additional pooling and FC layers: use_conv=False, pool=("avg", {"kernel_size": 7, "stride": 1}), node_name="permute"

append an additional pooling and convolution layers: use_conv=True, pool=("avg", {"kernel_size": 7, "stride": 1}), node_name="permute"

The permute in the example is the target feature extraction node of the input model_name, could be found by using the torchvision feature extraction utilities, for example:
from torchvision.models.feature_extraction import get_graph_node_names
from torchvision.models import swin_t
print(get_graph_node_names(swin_t())[0])

Parameters:

model_name – name of any torchvision model with fully connected layer at the end. resnet18 (default), resnet34, resnet50, resnet101, resnet152, resnext50_32x4d, resnext101_32x8d, wide_resnet50_2, wide_resnet101_2, inception_v3. model details: https://pytorch.org/vision/stable/models.html.
num_classes – number of classes for the last classification layer. Default to 1.
dim – number of supported spatial dimensions in the specified model, depends on the model implementation. default to 2 as most Torchvision models are for 2D image processing.
in_channels – number of the input channels of last layer. if None, get it from in_features of last layer.
use_conv – whether to use convolutional layer to replace the last layer, default to False.
pool – parameters for the pooling layer, when it’s a tuple, the first item is name of the pooling layer, the second item is dictionary of the initialization args. If None, will not replace the layers[-2]. default to (“avg”, {“kernel_size”: 7, “stride”: 1}). "" indicates not adding a pooling layer.
bias – the bias value when replacing the last layer. if False, the layer will not learn an additive bias, default to True.
pretrained – whether to use the imagenet pretrained weights. Default to False.
fc_name – the corresponding layer attribute of the last fully connected layer. Defaults to "fc".
node_name – the corresponding feature extractor node name of model. Defaults to “”, not in use.
weights – additional weights enum for the torchvision model.
kwargs – additional parameters for the torchvision model.

Example:

import torch
from torchvision.models.inception import Inception_V3_Weights

from monai.networks.nets import TorchVisionFCModel

model = TorchVisionFCModel(
    "inception_v3",
    num_classes=4,
    weights=Inception_V3_Weights.IMAGENET1K_V1,
    use_conv=False,
    pool=None,
)
# model = TorchVisionFCModel("vit_b_16", num_classes=4, pool=None, in_channels=768, fc_name="heads")
output = model.forward(torch.randn(2, 3, 299, 299))
print(output.shape)  # torch.Size([2, 4])

MILModel#

class monai.networks.nets.MILModel(num_classes, mil_mode='att', pretrained=True, backbone=None, backbone_num_features=None, trans_blocks=4, trans_dropout=0.0)[source]#

Multiple Instance Learning (MIL) model, with a backbone classification model. Currently, it only works for 2D images, a typical use case is for classification of the digital pathology whole slide images. The expected shape of input data is [B, N, C, H, W], where B is the batch_size of PyTorch Dataloader and N is the number of instances extracted from every original image in the batch. A tutorial example is available at: Project-MONAI/tutorials.

Parameters:

num_classes – number of output classes.
mil_mode –
MIL algorithm, available values (Defaults to "att"):
- "mean" - average features from all instances, equivalent to pure CNN (non MIL).
- "max" - retain only the instance with the max probability for loss calculation.
- "att" - attention based MIL https://arxiv.org/abs/1802.04712.
- "att_trans" - transformer MIL https://arxiv.org/abs/2111.01556.
- "att_trans_pyramid" - transformer pyramid MIL https://arxiv.org/abs/2111.01556.
pretrained – init backbone with pretrained weights, defaults to True.
backbone – Backbone classifier CNN (either None, a nn.Module that returns features, or a string name of a torchvision model). Defaults to None, in which case ResNet50 is used.
backbone_num_features – Number of output features of the backbone CNN Defaults to None (necessary only when using a custom backbone)
trans_blocks – number of the blocks in TransformEncoder layer.
trans_dropout – dropout rate in TransformEncoder layer.

forward(x, no_head=False)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

DiNTS#

class monai.networks.nets.DiNTS(dints_space, in_channels, num_classes, act_name='RELU', norm_name=('INSTANCE', {'affine': True}), spatial_dims=3, use_downsample=True, node_a=None)[source]#

Reimplementation of DiNTS based on “DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation <https://arxiv.org/abs/2103.15954>”.

The model contains a pre-defined multi-resolution stem block (defined in this class) and a DiNTS space (defined in monai.networks.nets.TopologyInstance and monai.networks.nets.TopologySearch).

The stem block is for: 1) input downsample and 2) output upsample to original size. The model downsamples the input image by 2 (if use_downsample=True). The downsampled image is downsampled by [1, 2, 4, 8] times (num_depths=4) and used as input to the DiNTS search space (TopologySearch) or the DiNTS instance (TopologyInstance).

TopologyInstance is the final searched model. The initialization requires the searched architecture codes.

TopologySearch is a multi-path topology and cell operation search space. The architecture codes will be initialized as one.

TopologyConstruction is the parent class which constructs the instance and search space.

To meet the requirements of the structure, the input size for each spatial dimension should be: divisible by 2 ** (num_depths + 1).

Parameters:

dints_space – DiNTS search space. The value should be instance of TopologyInstance or TopologySearch.
in_channels – number of input image channels.
num_classes – number of output segmentation classes.
act_name – activation name, default to ‘RELU’.
norm_name – normalization used in convolution blocks. Default to InstanceNorm.
spatial_dims – spatial 2D or 3D inputs.
use_downsample – use downsample in the stem. If False, the search space will be in resolution [1, 1/2, 1/4, 1/8], if True, the search space will be in resolution [1/2, 1/4, 1/8, 1/16].
node_a – node activation numpy matrix. Its shape is (num_depths, num_blocks + 1). +1 for multi-resolution inputs. In model searching stage, node_a can be None. In deployment stage, node_a cannot be None.

forward(x)[source]#

Prediction based on dynamic arch_code.

Parameters:: x (Tensor) – input tensor.

TopologyConstruction for DiNTS#

class monai.networks.nets.TopologyConstruction(arch_code=None, channel_mul=1.0, cell=<class 'monai.networks.nets.dints.Cell'>, num_blocks=6, num_depths=3, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}), use_downsample=True, device='cpu')[source]#

The base class for TopologyInstance and TopologySearch.

Parameters:

arch_code –
[arch_code_a, arch_code_c], numpy arrays. The architecture codes defining the model. For example, for a num_depths=4, num_blocks=12 search space:
- arch_code_a is a 12x10 (10 paths) binary matrix representing if a path is activated.
- arch_code_c is a 12x10x5 (5 operations) binary matrix representing if a cell operation is used.
- arch_code in __init__() is used for creating the network and remove unused network blocks. If None,
all paths and cells operations will be used, and must be in the searching stage (is_search=True).
channel_mul – adjust intermediate channel number, default is 1.
cell – operation of each node.
num_blocks – number of blocks (depth in the horizontal direction) of the DiNTS search space.
num_depths – number of image resolutions of the DiNTS search space: 1, 1/2, 1/4 … in each dimension.
use_downsample – use downsample in the stem. If False, the search space will be in resolution [1, 1/2, 1/4, 1/8], if True, the search space will be in resolution [1/2, 1/4, 1/8, 1/16].
device – ‘cpu’, ‘cuda’, or device ID.

Predefined variables:

filter_nums: default to 32. Double the number of channels after downsample. topology related variables:

arch_code2in: path activation to its incoming node index (resolution). For depth = 4, arch_code2in = [0, 1, 0, 1, 2, 1, 2, 3, 2, 3]. The first path outputs from node 0 (top resolution), the second path outputs from node 1 (second resolution in the search space), the third path outputs from node 0, etc.

arch_code2ops: path activation to operations of upsample 1, keep 0, downsample -1. For depth = 4, arch_code2ops = [0, 1, -1, 0, 1, -1, 0, 1, -1, 0]. The first path does not change resolution, the second path perform upsample, the third perform downsample, etc.

arch_code2out: path activation to its output node index. For depth = 4, arch_code2out = [0, 0, 1, 1, 1, 2, 2, 2, 3, 3], the first and second paths connects to node 0 (top resolution), the 3,4,5 paths connects to node 1, etc.

forward(x)[source]#: This function to be implemented by the architecture instances or search spaces.

TopologyInstance for DiNTS#

class monai.networks.nets.TopologyInstance(arch_code=None, channel_mul=1.0, cell=<class 'monai.networks.nets.dints.Cell'>, num_blocks=6, num_depths=3, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}), use_downsample=True, device='cpu')[source]#

Instance of the final searched architecture. Only used in re-training/inference stage.

__init__(arch_code=None, channel_mul=1.0, cell=<class 'monai.networks.nets.dints.Cell'>, num_blocks=6, num_depths=3, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}), use_downsample=True, device='cpu')[source]#: Initialize DiNTS topology search space of neural architectures.

forward(x)[source]#

Parameters:: x (list[Tensor]) – input tensor.
Return type:: list[Tensor]

TopologySearch for DiNTS#

class monai.networks.nets.TopologySearch(channel_mul=1.0, cell=<class 'monai.networks.nets.dints.Cell'>, arch_code=None, num_blocks=6, num_depths=3, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}), use_downsample=True, device='cpu')[source]#

DiNTS topology search space of neural architectures.

Examples:

from monai.networks.nets.dints import TopologySearch

topology_search_space = TopologySearch(
    channel_mul=0.5, num_blocks=8, num_depths=4, use_downsample=True, spatial_dims=3)
topology_search_space.get_ram_cost_usage(in_size=(2, 16, 80, 80, 80), full=True)
multi_res_images = [
    torch.randn(2, 16, 80, 80, 80),
    torch.randn(2, 32, 40, 40, 40),
    torch.randn(2, 64, 20, 20, 20),
    torch.randn(2, 128, 10, 10, 10)]
prediction = topology_search_space(image)
for x in prediction: print(x.shape)
# torch.Size([2, 16, 80, 80, 80])
# torch.Size([2, 32, 40, 40, 40])
# torch.Size([2, 64, 20, 20, 20])
# torch.Size([2, 128, 10, 10, 10])

Class method overview:

get_prob_a(): convert learnable architecture weights to path activation probabilities.

get_ram_cost_usage(): get estimated ram cost.

get_topology_entropy(): get topology entropy loss in searching stage.

decode(): get final binarized architecture code.

gen_mtx(): generate variables needed for topology search.

Predefined variables:

tidx: index used to convert path activation matrix T = (depth,depth) in transfer_mtx to path activation arch_code (1,3*depth-2), for depth = 4, tidx = [0, 1, 4, 5, 6, 9, 10, 11, 14, 15], A tidx (10 binary values) represents the path activation.
transfer_mtx: feasible path activation matrix (denoted as T) given a node activation pattern. It is used to convert path activation pattern (1, paths) to node activation (1, nodes)
node_act_list: all node activation [2^num_depths-1, depth]. For depth = 4, there are 15 node activation patterns, each of length 4. For example, [1,1,0,0] means nodes 0, 1 are activated (with input paths).
all_connect: All possible path activations. For depth = 4, all_connection has 1024 vectors of length 10 (10 paths). The return value will exclude path activation of all 0.

__init__(channel_mul=1.0, cell=<class 'monai.networks.nets.dints.Cell'>, arch_code=None, num_blocks=6, num_depths=3, spatial_dims=3, act_name='RELU', norm_name=('INSTANCE', {'affine': True}), use_downsample=True, device='cpu')[source]#: Initialize DiNTS topology search space of neural architectures.

decode()[source]#

Decode network log_alpha_a/log_alpha_c using dijkstra shortest path algorithm.

[node_a, arch_code_a, arch_code_c, arch_code_a_max] is decoded when using self.decode().

For example, for a num_depths=4, num_blocks=12 search space:

node_a is a 4x13 binary matrix representing if a feature node is activated (13 because of multi-resolution inputs).

arch_code_a is a 12x10 (10 paths) binary matrix representing if a path is activated.

arch_code_c is a 12x10x5 (5 operations) binary matrix representing if a cell operation is used.

Returns:: arch_code with maximum probability

forward(x)[source]#

Prediction based on dynamic arch_code.

Parameters:: x – a list of num_depths input tensors as a multi-resolution input. tensor is of shape BCHW[D] where C must match self.filter_nums.

gen_mtx(depth)[source]#

Generate elements needed in decoding and topology.

transfer_mtx: feasible path activation matrix (denoted as T) given a node activation pattern.
It is used to convert path activation pattern (1, paths) to node activation (1, nodes)

node_act_list: all node activation [2^num_depths-1, depth]. For depth = 4, there are 15 node activation
patterns, each of length 4. For example, [1,1,0,0] means nodes 0, 1 are activated (with input paths).

all_connect: All possible path activations. For depth = 4, all_connection has 1024 vectors of length 10 (10 paths). The return value will exclude path activation of all 0.

get_prob_a(child=False)[source]#

Get final path and child model probabilities from architecture weights log_alpha_a. This is used in forward pass, getting training loss, and final decoding.

Parameters:

child (bool) – return child probability (used in decoding)

Returns:

the path activation probability of size:: [number of blocks, number of paths in each block]. For 12 blocks, 4 depths search space, the size is [12,10]
probs_a: The probability of all child models (size 1023x10). Each child model is a path activation pattern: (1D vector of length 10 for 10 paths). In total 1023 child models (2^10 -1)

Return type:

arch_code_prob_a

get_ram_cost_usage(in_size, full=False)[source]#

Get estimated output tensor size to approximate RAM consumption.

Parameters:

in_size – input image shape (4D/5D, [BCHW[D]]) at the highest resolution level.
full (bool) – full ram cost usage with all probability of 1.

get_topology_entropy(probs)[source]#

Get topology entropy loss at searching stage.

Parameters:: probs – path activation probabilities

ComplexUnet#

class monai.apps.reconstruction.networks.nets.complex_unet.ComplexUnet(spatial_dims=2, features=(32, 32, 64, 128, 256, 32), act=('LeakyReLU', {'inplace': True, 'negative_slope': 0.1}), norm=('instance', {'affine': True}), bias=True, dropout=0.0, upsample='deconv', pad_factor=16, conv_net=None)[source]#

This variant of U-Net handles complex-value input/output. It can be used as a model to learn sensitivity maps in multi-coil MRI data. It is built based on monai.networks.nets.BasicUNet by default but the user can input their convolutional model as well. ComplexUnet also applies default normalization to the input which makes it more stable to train.

The data being a (complex) 2-channel tensor is a requirement for using this model.

Modified and adopted from: facebookresearch/fastMRI

Parameters:

spatial_dims – number of spatial dimensions.
features – six integers as numbers of features. denotes number of channels in each layer.
act – activation type and arguments. Defaults to LeakyReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
bias – whether to have a bias term in convolution blocks. Defaults to True.
dropout – dropout ratio. Defaults to 0.0.
upsample – upsampling mode, available options are "deconv", "pixelshuffle", "nontrainable".
pad_factor – an integer denoting the number which each padded dimension will be divisible to. For example, 16 means each dimension will be divisible by 16 after padding
conv_net – the learning model used inside the ComplexUnet. The default is monai.networks.nets.basic_unet. The only requirement on the model is to have 2 as input and output number of channels.

forward(x)[source]#

Parameters:: x (Tensor) – input of shape (B,C,H,W,2) for 2D data or (B,C,H,W,D,2) for 3D data
Return type:: Tensor
Returns:: output of shape (B,C,H,W,2) for 2D data or (B,C,H,W,D,2) for 3D data

CoilSensitivityModel#

class monai.apps.reconstruction.networks.nets.coil_sensitivity_model.CoilSensitivityModel(spatial_dims=2, features=(32, 32, 64, 128, 256, 32), act=('LeakyReLU', {'inplace': True, 'negative_slope': 0.1}), norm=('instance', {'affine': True}), bias=True, dropout=0.0, upsample='deconv', coil_dim=1, conv_net=None)[source]#

This class uses a convolutional model to learn coil sensitivity maps for multi-coil MRI reconstruction. The convolutional model is monai.apps.reconstruction.networks.nets.complex_unet by default but can be specified by the user as well. Learning is done on the center of the under-sampled kspace (that region is fully sampled).

The data being a (complex) 2-channel tensor is a requirement for using this model.

Modified and adopted from: facebookresearch/fastMRI

Parameters:

spatial_dims – number of spatial dimensions.
features – six integers as numbers of features. denotes number of channels in each layer.
act – activation type and arguments. Defaults to LeakyReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
bias – whether to have a bias term in convolution blocks. Defaults to True.
dropout – dropout ratio. Defaults to 0.0.
upsample – upsampling mode, available options are "deconv", "pixelshuffle", "nontrainable".
coil_dim – coil dimension in the data
conv_net – the learning model used to estimate the coil sensitivity maps. default is monai.apps.reconstruction.networks.nets.complex_unet. The only requirement on the model is to have 2 as input and output number of channels.

forward(masked_kspace, mask)[source]#

Parameters:

masked_kspace (Tensor) – the under-sampled kspace (which is the input measurement). Its shape is (B,C,H,W,2) for 2D data or (B,C,H,W,D,2) for 3D data.
mask (Tensor) – the under-sampling mask with shape (1,1,1,W,1) for 2D data or (1,1,1,1,D,1) for 3D data.

Return type:

Tensor

Returns:

predicted coil sensitivity maps with shape (B,C,H,W,2) for 2D data or (B,C,H,W,D,2) for 3D data.

get_fully_sampled_region(mask)[source]#

Extracts the size of the fully-sampled part of the kspace. Note that when a kspace is under-sampled, a part of its center is fully sampled. This part is called the Auto Calibration Region (ACR). ACR is used for sensitivity map computation.

Parameters:

mask (Tensor) – the under-sampling mask of shape (…, S, 1) where S denotes the sampling dimension

Return type:

tuple[int, int]

Returns:

A tuple containing

left index of the region
right index of the region

Note

Suppose the mask is of shape (1,1,20,1). If this function returns 8,12 as left and right: indices, then it means that the fully-sampled center region has size 4 starting from 8 to 12.

e2e-VarNet#

class monai.apps.reconstruction.networks.nets.varnet.VariationalNetworkModel(coil_sensitivity_model, refinement_model, num_cascades=12, spatial_dims=2)[source]#

The end-to-end variational network (or simply e2e-VarNet) based on Sriram et. al., “End-to-end variational networks for accelerated MRI reconstruction”. It comprises several cascades each consisting of refinement and data consistency steps. The network takes in the under-sampled kspace and estimates the ground-truth reconstruction.

Modified and adopted from: facebookresearch/fastMRI

Parameters:

coil_sensitivity_model (Module) – A convolutional model for learning coil sensitivity maps. An example is monai.apps.reconstruction.networks.nets.coil_sensitivity_model.CoilSensitivityModel.
refinement_model (Module) – A convolutional network used in the refinement step of e2e-VarNet. An example is monai.apps.reconstruction.networks.nets.complex_unet.ComplexUnet.
num_cascades (int) – Number of cascades. Each cascade is a monai.apps.reconstruction.networks.blocks.varnetblock.VarNetBlock which consists of refinement and data consistency steps.
spatial_dims (int) – number of spatial dimensions.

forward(masked_kspace, mask)[source]#

Parameters:

masked_kspace (Tensor) – The under-sampled kspace. It’s a 2D kspace (B,C,H,W,2) with the last dimension being 2 (for real/imaginary parts) and C denoting the coil dimension. 3D data will have the shape (B,C,H,W,D,2).
mask (Tensor) – The under-sampling mask with shape (1,1,1,W,1) for 2D data or (1,1,1,1,D,1) for 3D data.

Return type:

Tensor

Returns:

The reconstructed image which is the root sum of squares (rss) of the absolute value: of the inverse fourier of the predicted kspace (note that rss combines coil images into one image).

DAF3D#

class monai.networks.nets.DAF3D(in_channels, out_channels, visual_output=False)[source]#

DAF3D network based on ‘Deep Attentive Features for Prostate Segmentation in 3D Transrectal Ultrasound’ <https://arxiv.org/pdf/1907.01743.pdf>. The network consists of a 3D Feature Pyramid Network which is applied on the feature maps of a 3D ResNet, followed by a custom Attention Module and an ASPP module. During training the supervised signal consists of the outputs of the FPN (four Single Layer Features, SLFs), the outputs of the attention module (four Attentive Features) and the final prediction. They are individually compared to the ground truth, the final loss consists of a weighted sum of all individual losses (see DAF3D tutorial for details). There is an additional possiblity to return all supervised signals as well as the Attentive Maps in validation mode to visualize inner functionality of the network.

Parameters:

in_channels – number of input channels.
out_channels – number of output channels.
visual_output – whether to return all SLFs, Attentive Maps, Refined SLFs in validation mode can be used to visualize inner functionality of the network

forward(x)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Quicknat#

class monai.networks.nets.Quicknat(num_classes=33, num_channels=1, num_filters=64, kernel_size=5, kernel_c=1, stride_conv=1, pool=2, stride_pool=2, se_block='None', drop_out=0, act='PRELU', norm='INSTANCE', adn_ordering='NA')[source]#

Model for “Quick segmentation of NeuroAnaTomy (QuickNAT) based on a deep fully convolutional neural network. Refer to: “QuickNAT: A Fully Convolutional Network for Quick and Accurate Segmentation of Neuroanatomy by Abhijit Guha Roya, Sailesh Conjetib, Nassir Navabb, Christian Wachingera”

QuickNAT has an encoder/decoder like 2D F-CNN architecture with 4 encoders and 4 decoders separated by a bottleneck layer. The final layer is a classifier block with softmax. The architecture includes skip connections between all encoder and decoder blocks of the same spatial resolution, similar to the U-Net architecture. All Encoder and Decoder consist of three convolutional layers all with a Batch Normalization and ReLU. The first two convolutional layers are followed by a concatenation layer that concatenates the input feature map with outputs of the current and previous convolutional blocks. The kernel size of the first two convolutional layers is 5*5, the third convolutional layer has a kernel size of 1*1.

Data in the encode path is downsampled using max pooling layers instead of upsamling like UNet and in the decode path upsampled using max un-pooling layers instead of transpose convolutions. The pooling is done at the beginning of the block and the unpool afterwards. The indices of the max pooling in the Encoder are forwarded through the layer to be available to the corresponding Decoder.

The bottleneck block consists of a 5 * 5 convolutional layer and a batch normalization layer to separate the encoder and decoder part of the network, restricting information flow between the encoder and decoder.

The output feature map from the last decoder block is passed to the classifier block, which is a convolutional layer with 1 * 1 kernel size that maps the input to an N channel feature map, where N is the number of segmentation classes.

The original QuickNAT implementation included a enable_test_dropout() mechanism for uncertainty estimation during testing. As the dropout layers are the only stochastic components of this network calling the train() method instead of eval() in testing or inference has the same effect.

Parameters:

num_classes – number of classes to segmentate (output channels).
num_channels – number of input channels.
num_filters – number of output channels for each convolutional layer in a Dense Block.
kernel_size – size of the kernel of each convolutional layer in a Dense Block.
kernel_c – convolution kernel size of classifier block kernel.
stride_convolution – convolution stride. Defaults to 1.
pool – kernel size of the pooling layer,
stride_pool – stride for the pooling layer.
se_block – Squeeze and Excite block type to be included, defaults to None. Valid options : NONE, CSE, SSE, CSSE,
droup_out – dropout ratio. Defaults to no dropout.
act – activation type and arguments. Defaults to PReLU.
norm – feature normalization type and arguments. Defaults to instance norm.
adn_ordering – a string representing the ordering of activation (A), normalization (N), and dropout (D). Defaults to “NA”. See also: monai.networks.blocks.ADN.

Examples:

from monai.networks.nets import QuickNAT

# network with max pooling by a factor of 2 at each layer with no se_block.
net = QuickNAT(
    num_classes=3,
    num_channels=1,
    num_filters=64,
    pool = 2,
    se_block = "None"
)

forward(input)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

get_selayer(n_filters, se_block_type='None')[source]#

Returns the SEBlock defined in the initialization of the QuickNAT model.

Parameters:

n_filters – encoding half of the layer
se_block_type – defaults to None. Valid options are None, CSE, SSE, CSSE

Returns: Appropriate SEBlock. SSE and CSSE not implemented in Monai yet.

property is_cuda#: Check if model parameters are allocated on the GPU.

VoxelMorph#

class monai.networks.nets.VoxelMorphUNet(spatial_dims, in_channels, unet_out_channels, channels, final_conv_channels, final_conv_act='LEAKYRELU', kernel_size=3, up_kernel_size=3, act='LEAKYRELU', norm=None, dropout=0.0, bias=True, use_maxpool=True, adn_ordering='NDA')[source]#

The backbone network used in VoxelMorph. See monai.networks.nets.VoxelMorph for more details.

A concatenated pair of images (moving and fixed) is first passed through a UNet. The output of the UNet is then passed through a series of convolution blocks to produce the final prediction of the displacement field (DDF) or the stationary velocity field (DVF).

In the original implementation, downsample is achieved through maxpooling, here one has the option to use either maxpooling or strided convolution for downsampling. The default is to use maxpooling as it is consistent with the original implementation. Note that for upsampling, the authors of VoxelMorph used nearest neighbor interpolation instead of transposed convolution. In this implementation, only nearest neighbor interpolation is supported in order to be consistent with the original implementation.

An instance of this class can be used as a backbone network for constructing a VoxelMorph network. See the documentation of monai.networks.nets.VoxelMorph for more details and an example on how to construct a VoxelMorph network.

Parameters:

spatial_dims – number of spatial dimensions.
in_channels – number of channels in the input volume after concatenation of moving and fixed images.
unet_out_channels – number of channels in the output of the UNet.
channels – number of channels in each layer of the UNet. See the following example for more details.
final_conv_channels – number of channels in each layer of the final convolution block.
final_conv_act – activation type for the final convolution block. Defaults to LeakyReLU. Since VoxelMorph was originally implemented in tensorflow where the default negative slope for LeakyReLU was 0.2, we use the same default value here.
kernel_size – kernel size for all convolution layers in the UNet. Defaults to 3.
up_kernel_size – kernel size for all convolution layers in the upsampling path of the UNet. Defaults to 3.
act – activation type for all convolution layers in the UNet. Defaults to LeakyReLU with negative slope 0.2.
norm – feature normalization type and arguments for all convolution layers in the UNet. Defaults to None.
dropout – dropout ratio for all convolution layers in the UNet. Defaults to 0.0 (no dropout).
bias – whether to use bias in all convolution layers in the UNet. Defaults to True.
use_maxpool – whether to use maxpooling in the downsampling path of the UNet. Defaults to True. Using maxpooling is the consistent with the original implementation of VoxelMorph. But one can optionally use strided convolution instead (i.e. set use_maxpool to False).
adn_ordering – ordering of activation, dropout, and normalization. Defaults to “NDA”.

forward(concatenated_pairs)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: Tensor

monai.networks.nets.voxelmorphunet#: alias of VoxelMorphUNet

class monai.networks.nets.VoxelMorph(backbone=None, integration_steps=7, half_res=False, spatial_dims=3)[source]#

A re-implementation of VoxelMorph framework for medical image registration as described in https://arxiv.org/pdf/1809.05231.pdf. For more details, please refer to VoxelMorph: A Learning Framework for Deformable Medical Image Registration, Guha Balakrishnan, Amy Zhao, Mert R. Sabuncu, John Guttag, Adrian V. Dalca IEEE TMI: Transactions on Medical Imaging. 2019. eprint arXiv:1809.05231.

This class is intended to be a general framework, based on which a deformable image registration network can be built. Given a user-specified backbone network (e.g., UNet in the original VoxelMorph paper), this class serves as a wrapper that concatenates the input pair of moving and fixed images, passes through the backbone network, integrate the predicted stationary velocity field (DVF) from the backbone network to obtain the displacement field (DDF), and, finally, warp the moving image using the DDF.

To construct a VoxelMorph network, one need to first construct a backbone network (e.g., a monai.networks.nets.VoxelMorphUNet) and pass it to the constructor of monai.networks.nets.VoxelMorph. The backbone network should be able to take a pair of moving and fixed images as input and produce a DVF (or DDF, details to be discussed later) as output.

When forward is called, the input moving and fixed images are first concatenated along the channel dimension and passed through the specified backbone network to produce the prediction of the displacement field (DDF) in the non-diffeomorphic variant (i.e. when integration_steps is set to 0) or the stationary velocity field (DVF) in the diffeomorphic variant (i.e. when integration_steps is set to a positive integer). The DVF is then integrated using a scaling-and-squaring approach via a monai.networks.blocks.warp.DVF2DDF module to produce the DDF. Finally, the DDF is used to warp the moving image to the fixed image using a monai.networks.blocks.warp.Warp module. Optionally, the integration from DVF to DDF can be performed on reduced resolution by specifying half_res to be True, in which case the output DVF from the backbone network is first linearly interpolated to half resolution before integration. The output DDF is then linearly interpolated again back to full resolution before being used to warp the moving image.

Parameters:

backbone – a backbone network.
integration_steps – number of integration steps used for obtaining DDF from DVF via scaling-and-squaring. Defaults to 7. If set to 0, the network will be non-diffeomorphic.
half_res – whether to perform integration on half resolution. Defaults to False.
spatial_dims – number of spatial dimensions, defaults to 3.

Example:

from monai.networks.nets import VoxelMorphUNet, VoxelMorph

# The following example construct an instance of VoxelMorph that matches the original VoxelMorph paper
# https://arxiv.org/pdf/1809.05231.pdf

# First, a backbone network is constructed. In this case, we use a VoxelMorphUNet as the backbone network.
backbone = VoxelMorphUNet(
    spatial_dims=3,
    in_channels=2,
    unet_out_channels=32,
    channels=(16, 32, 32, 32, 32, 32),  # this indicates the down block at the top takes 16 channels as
                                        # input, the corresponding up block at the top produces 32
                                        # channels as output, the second down block takes 32 channels as
                                        # input, and the corresponding up block at the same level
                                        # produces 32 channels as output, etc.
    final_conv_channels=(16, 16)
)

# Then, a full VoxelMorph network is constructed using the specified backbone network.
net = VoxelMorph(
    backbone=backbone,
    integration_steps=7,
    half_res=False
)

# A forward pass through the network would look something like this
moving = torch.randn(1, 1, 160, 192, 224)
fixed = torch.randn(1, 1, 160, 192, 224)
warped, ddf = net(moving, fixed)

forward(moving, fixed)[source]#

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Return type:: tuple[Tensor, Tensor]

monai.networks.nets.voxelmorph#: alias of <module ‘monai.networks.nets.voxelmorph’ from ‘/home/docs/checkouts/readthedocs.org/user_builds/monai/checkouts/latest/monai/networks/nets/voxelmorph.py’>

Utilities#

Utilities and types for defining networks, these depend on PyTorch.

monai.networks.utils.convert_to_onnx(model, inputs, input_names=None, output_names=None, opset_version=None, dynamic_axes=None, filename=None, verify=False, device=None, use_ort=False, ort_provider=None, rtol=0.0001, atol=0.0, use_trace=True, **kwargs)[source]#

Utility to convert a model into ONNX model and optionally verify with ONNX or onnxruntime. See also: https://pytorch.org/docs/stable/onnx.html for how to convert a PyTorch model to ONNX.

Parameters:

model – source PyTorch model to save.
inputs – input sample data used by pytorch.onnx.export. It is also used in ONNX model verification.
input_names – optional input names of the ONNX model.
output_names – optional output names of the ONNX model.
opset_version – version of the (ai.onnx) opset to target. Must be >= 7 and not exceed
PyTorch (the latest opset version supported by) – onnx/onnx and pytorch/pytorch
details (for more) – onnx/onnx and pytorch/pytorch
dynamic_axes – specifies axes of tensors as dynamic (i.e. known only at run-time). If set to None, the exported model will have the shapes of all input and output tensors set to match given ones, for more details: https://pytorch.org/docs/stable/onnx.html#torch.onnx.export.
filename – optional filename to save the ONNX model, if None, don’t save the ONNX model.
verify – whether to verify the ONNX model with ONNX or onnxruntime.
device – target PyTorch device to verify the model, if None, use CUDA if available.
use_ort – whether to use onnxruntime to verify the model.
ort_provider" – onnxruntime provider to use, default is [“CPUExecutionProvider”].
rtol – the relative tolerance when comparing the outputs of PyTorch model and TorchScript model.
atol – the absolute tolerance when comparing the outputs of PyTorch model and TorchScript model.
use_trace – whether to use torch.jit.trace to export the torchscript model.
kwargs – other arguments except obj for torch.jit.script() to convert model, for more details: https://pytorch.org/docs/master/generated/torch.jit.script.html.

monai.networks.utils.convert_to_torchscript(model, filename_or_obj=None, extra_files=None, verify=False, inputs=None, device=None, rtol=0.0001, atol=0.0, use_trace=False, **kwargs)[source]#

Utility to convert a model into TorchScript model and save to file, with optional input / output data verification.

Parameters:

model – source PyTorch model to save.
filename_or_obj – if not None, specify a file-like object (has to implement write and flush) or a string containing a file path name to save the TorchScript model.
extra_files – map from filename to contents which will be stored as part of the save model file. for more details: https://pytorch.org/docs/stable/generated/torch.jit.save.html.
verify – whether to verify the input and output of TorchScript model. if filename_or_obj is not None, load the saved TorchScript model and verify.
inputs – input test data to verify model, should be a sequence of data, every item maps to a argument of model() function.
device – target device to verify the model, if None, use CUDA if available.
rtol – the relative tolerance when comparing the outputs of PyTorch model and TorchScript model.
atol – the absolute tolerance when comparing the outputs of PyTorch model and TorchScript model.
use_trace – whether to use torch.jit.trace to export the TorchScript model.
kwargs – other arguments except obj for torch.jit.script() or torch.jit.trace() (if use_trace is True) to convert model, for more details: https://pytorch.org/docs/master/generated/torch.jit.script.html.

monai.networks.utils.convert_to_trt(model, precision, input_shape, dynamic_batchsize=None, use_trace=False, filename_or_obj=None, verify=False, device=None, use_onnx=False, onnx_input_names=('input_0',), onnx_output_names=('output_0',), rtol=0.01, atol=0.0, **kwargs)[source]#

Utility to export a model into a TensorRT engine-based TorchScript model with optional input / output data verification.

There are two ways to export a model: 1, Torch-TensorRT way: PyTorch module —> TorchScript module —> TensorRT engine-based TorchScript. 2, ONNX-TensorRT way: PyTorch module —> TorchScript module —> ONNX model —> TensorRT engine —> TensorRT engine-based TorchScript.

When exporting through the first way, some models suffer from the slowdown problem, since Torch-TensorRT may only convert a little part of the PyTorch model to the TensorRT engine. However when exporting through the second way, some Python data structures like dict are not supported. And some TorchScript models are not supported by the ONNX if exported through torch.jit.script.

Parameters:

model – a source PyTorch model to convert.
precision – the weight precision of the converted TensorRT engine based TorchScript models. Should be ‘fp32’ or ‘fp16’.
input_shape – the input shape that is used to convert the model. Should be a list like [N, C, H, W] or [N, C, H, W, D].
dynamic_batchsize – a sequence with three elements to define the batch size range of the input for the model to be converted. Should be a sequence like [MIN_BATCH, OPT_BATCH, MAX_BATCH]. After converted, the batchsize of model input should between MIN_BATCH and MAX_BATCH and the OPT_BATCH is the best performance batchsize that the TensorRT tries to fit. The OPT_BATCH should be the most frequently used input batchsize in the application, default to None.
use_trace – whether using torch.jit.trace to convert the PyTorch model to a TorchScript model and then convert to a TensorRT engine based TorchScript model or an ONNX model (if use_onnx is True), default to False.
filename_or_obj – if not None, specify a file-like object (has to implement write and flush) or a string containing a file path name to load the TensorRT engine based TorchScript model for verifying.
verify – whether to verify the input and output of the TensorRT engine based TorchScript model.
device – the target GPU index to convert and verify the model. If None, use #0 GPU.
use_onnx – whether to use the ONNX-TensorRT way to export the TensorRT engine-based TorchScript model.
onnx_input_names – optional input names of the ONNX model. This arg is only useful when use_onnx is True. Should be a sequence like (‘input_0’, ‘input_1’, …, ‘input_N’) where N equals to the number of the model inputs. If not given, will use (‘input_0’,), which supposes the model only has one input.
onnx_output_names – optional output names of the ONNX model. This arg is only useful when use_onnx is True. Should be a sequence like (‘output_0’, ‘output_1’, …, ‘output_N’) where N equals to the number of the model outputs. If not given, will use (‘output_0’,), which supposes the model only has one output.
rtol – the relative tolerance when comparing the outputs between the PyTorch model and TensorRT model.
atol – the absolute tolerance when comparing the outputs between the PyTorch model and TensorRT model.
kwargs – other arguments except module, inputs, enabled_precisions and device for torch_tensorrt.compile() to compile model, for more details: https://pytorch.org/TensorRT/py_api/torch_tensorrt.html#torch-tensorrt-py.

monai.networks.utils.copy_model_state(dst, src, dst_prefix='', mapping=None, exclude_vars=None, inplace=True, filter_func=None)[source]#

Compute a module state_dict, of which the keys are the same as dst. The values of dst are overwritten by the ones from src whenever their keys match. The method provides additional dst_prefix for the dst key when matching them. mapping can be a {“src_key”: “dst_key”} dict, indicating dst[dst_prefix + dst_key] = src[src_key]. This function is mainly to return a model state dict for loading the src model state into the dst model, src and dst can have different dict keys, but their corresponding values normally have the same shape.

Parameters:

dst – a pytorch module or state dict to be updated.
src – a pytorch module or state dict used to get the values used for the update.
dst_prefix – dst key prefix, so that dst[dst_prefix + src_key] will be assigned to the value of src[src_key].
mapping – a {“src_key”: “dst_key”} dict, indicating that dst[dst_prefix + dst_key] to be assigned to the value of src[src_key].
exclude_vars – a regular expression to match the dst variable names, so that their values are not overwritten by src.
inplace – whether to set the dst module with the updated state_dict via load_state_dict. This option is only available when dst is a torch.nn.Module.
filter_func – a filter function used to filter the weights to be loaded. See ‘filter_swinunetr’ in “monai.networks.nets.swin_unetr.py”.

Examples

from monai.networks.nets import BasicUNet
from monai.networks.utils import copy_model_state

model_a = BasicUNet(in_channels=1, out_channels=4)
model_b = BasicUNet(in_channels=1, out_channels=2)
model_a_b, changed, unchanged = copy_model_state(
    model_a, model_b, exclude_vars="conv_0.conv_0", inplace=False)
# dst model updated: 76 of 82 variables.
model_a.load_state_dict(model_a_b)
# <All keys matched successfully>

Returns: an OrderedDict of the updated dst state, the changed, and unchanged keys.

monai.networks.utils.eval_mode(*nets)[source]#

Set network(s) to eval mode and then return to original state at the end.

Parameters:: nets (Module) – Input network(s)

Examples

t=torch.rand(1,1,16,16)
p=torch.nn.Conv2d(1,1,3)
print(p.training)  # True
with eval_mode(p):
    print(p.training)  # False
    print(p(t).sum().backward())  # will correctly raise an exception as gradients are calculated

monai.networks.utils.get_state_dict(obj)[source]#

Get the state dict of input object if has state_dict, otherwise, return object directly. For data parallel model, automatically convert it to regular model first.

Parameters:: obj – input object to check and get the state_dict.

monai.networks.utils.has_nvfuser_instance_norm()[source]#: whether the current environment has InstanceNorm3dNVFuser NVIDIA/apex

monai.networks.utils.icnr_init(conv, upsample_factor, init=<function kaiming_normal_>)[source]#: ICNR initialization for 2D/3D kernels adapted from Aitken et al.,2017 , “Checkerboard artifact free sub-pixel convolution”.

monai.networks.utils.look_up_named_module(name, mod, print_all_options=False)[source]#

get the named module in mod by the attribute name, for example look_up_named_module(net, "features.3.1.attn")

Parameters:

name (str) – a string representing the module attribute.
mod – a pytorch module to be searched (in mod.named_modules()).
print_all_options – whether to print all named modules when name is not found in mod. Defaults to False.

Returns:

the corresponding pytorch module’s subcomponent such as net.features[3][1].attn

monai.networks.utils.normal_init(m, std=0.02, normal_func=<function normal_>)[source]#

Initialize the weight and bias tensors of m’ and its submodules to values from a normal distribution with a stddev of `std’. Weight tensors of convolution and linear modules are initialized with a mean of 0, batch norm modules with a mean of 1. The callable `normal_func’, used to assign values, should have the same arguments as its default normal_(). This can be used with `nn.Module.apply to visit submodules of a network.

Return type:: None

monai.networks.utils.normalize_transform(shape, device=None, dtype=None, align_corners=False, zero_centered=False)[source]#

Compute an affine matrix according to the input shape. The transform normalizes the homogeneous image coordinates to the range of [-1, 1]. Currently the following source coordinates are supported:

align_corners=False, zero_centered=False, normalizing from [-0.5, d-0.5].

align_corners=True, zero_centered=False, normalizing from [0, d-1].

align_corners=False, zero_centered=True, normalizing from [-(d-1)/2, (d-1)/2].

align_corners=True, zero_centered=True, normalizing from [-d/2, d/2].

Parameters:

shape – input spatial shape, a sequence of integers.
device – device on which the returned affine will be allocated.
dtype – data type of the returned affine
align_corners – if True, consider -1 and 1 to refer to the centers of the corner pixels rather than the image corners. See also: https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.grid_sample
zero_centered – whether the coordinates are normalized from a zero-centered range, default to False. Setting this flag and align_corners will jointly specify the normalization source range.

monai.networks.utils.one_hot(labels, num_classes, dtype=torch.float32, dim=1)[source]#

For every value v in labels, the value in the output will be either 1 or 0. Each vector along the dim-th dimension has the “one-hot” format, i.e., it has a total length of num_classes, with a one and num_class-1 zeros. Note that this will include the background label, thus a binary mask should be treated as having two classes.

Parameters:

labels (Tensor) – input tensor of integers to be converted into the ‘one-hot’ format. Internally labels will be converted into integers labels.long().
num_classes (int) – number of output channels, the corresponding length of labels[dim] will be converted to num_classes from 1.
dtype (dtype) – the data type of the output one_hot label.
dim (int) – the dimension to be converted to num_classes channels from 1 channel, should be non-negative number.

Example:

For a tensor labels of dimensions [B]1[spatial_dims], return a tensor of dimensions [B]N[spatial_dims] when num_classes=N number of classes and dim=1.

from monai.networks.utils import one_hot
import torch

a = torch.randint(0, 2, size=(1, 2, 2, 2))
out = one_hot(a, num_classes=2, dim=0)
print(out.shape)  # torch.Size([2, 2, 2, 2])

a = torch.randint(0, 2, size=(2, 1, 2, 2, 2))
out = one_hot(a, num_classes=2, dim=1)
print(out.shape)  # torch.Size([2, 2, 2, 2, 2])

Return type:: Tensor

monai.networks.utils.pixelshuffle(x, spatial_dims, scale_factor)[source]#

Apply pixel shuffle to the tensor x with spatial dimensions spatial_dims and scaling factor scale_factor.

See: Shi et al., 2016, “Real-Time Single Image and Video Super-Resolution Using a nEfficient Sub-Pixel Convolutional Neural Network.”

See: Aitken et al., 2017, “Checkerboard artifact free sub-pixel convolution”.

Parameters:

x (Tensor) – Input tensor
spatial_dims (int) – number of spatial dimensions, typically 2 or 3 for 2D or 3D
scale_factor (int) – factor to rescale the spatial dimensions by, must be >=1

Return type:

Tensor

Returns:

Reshuffled version of x.

Raises:

ValueError – When input channels of x are not divisible by (scale_factor ** spatial_dims)

monai.networks.utils.predict_segmentation(logits, mutually_exclusive=False, threshold=0.0)[source]#

Given the logits from a network, computing the segmentation by thresholding all values above 0 if multi-labels task, computing the argmax along the channel axis if multi-classes task, logits has shape BCHW[D].

Parameters:

logits (Tensor) – raw data of model output.
mutually_exclusive (bool) – if True, logits will be converted into a binary matrix using a combination of argmax, which is suitable for multi-classes task. Defaults to False.
threshold (float) – thresholding the prediction values if multi-labels task.

Return type:

Any

monai.networks.utils.replace_modules(parent, name, new_module, strict_match=True, match_device=True)[source]#

Replace sub-module(s) in a parent module.

The name of the module to be replace can be nested e.g., features.denseblock1.denselayer1.layers.relu1. If this is the case (there are “.” in the module name), then this function will recursively call itself.

Parameters:

parent (Module) – module that contains the module to be replaced
name (str) – name of module to be replaced. Can include “.”.
new_module (Module) – torch.nn.Module to be placed at position name inside parent. This will be deep copied if strict_match == False multiple instances are independent.
strict_match (bool) – if True, module name must == name. If false then name in named_modules() will be used. True can be used to change just one module, whereas False can be used to replace all modules with similar name (e.g., relu).
match_device (bool) – if True, the device of the new module will match the model. Requires all of parent to be on the same device.

Return type:

list[tuple[str, Module]]

Returns:

List of tuples of replaced modules. Element 0 is module name, element 1 is the replaced module.

Raises:

AttributeError – if strict_match is True and name is not a named module in parent.

monai.networks.utils.replace_modules_temp(parent, name, new_module, strict_match=True, match_device=True)[source]#

Temporarily replace sub-module(s) in a parent module (context manager).

See monai.networks.utils.replace_modules.

monai.networks.utils.save_state(src, path, **kwargs)[source]#

Save the state dict of input source data with PyTorch save. It can save nn.Module, state_dict, a dictionary of nn.Module or state_dict. And automatically convert the data parallel module to regular module. For example:

save_state(net, path)
save_state(net.state_dict(), path)
save_state({"net": net, "opt": opt}, path)
net_dp = torch.nn.DataParallel(net)
save_state(net_dp, path)

Refer to: https://pytorch.org/ignite/v0.4.8/generated/ignite.handlers.DiskSaver.html.

Parameters:

src – input data to save, can be nn.Module, state_dict, a dictionary of nn.Module or state_dict.
path – target file path to save the input object.
kwargs – other args for the save_obj except for the obj and path. default func is torch.save(), details of the args: https://pytorch.org/docs/stable/generated/torch.save.html.

monai.networks.utils.set_named_module(mod, name, new_layer)[source]#

look up name in mod and replace the layer with new_layer, return the updated mod.

Parameters:

mod – a pytorch module to be updated.
name (str) – a string representing the target module attribute.
new_layer – a new module replacing the corresponding layer at mod.name.

Returns:

an updated mod

monai.networks.utils.to_norm_affine(affine, src_size, dst_size, align_corners=False, zero_centered=False)[source]#

Given affine defined for coordinates in the pixel space, compute the corresponding affine for the normalized coordinates.

Parameters:

affine (Tensor) – Nxdxd batched square matrix
src_size (Sequence[int]) – source image spatial shape
dst_size (Sequence[int]) – target image spatial shape
align_corners (bool) – if True, consider -1 and 1 to refer to the centers of the corner pixels rather than the image corners. See also: https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.grid_sample
zero_centered (bool) – whether the coordinates are normalized from a zero-centered range, default to False. See also: monai.networks.utils.normalize_transform().

Raises:

TypeError – When affine is not a torch.Tensor.
ValueError – When affine is not Nxdxd.
ValueError – When src_size or dst_size dimensions differ from affine.

Return type:

Tensor

monai.networks.utils.train_mode(*nets)[source]#

Set network(s) to train mode and then return to original state at the end.

Parameters:: nets (Module) – Input network(s)

Examples

t=torch.rand(1,1,16,16)
p=torch.nn.Conv2d(1,1,3)
p.eval()
print(p.training)  # False
with train_mode(p):
    print(p.training)  # True
    print(p(t).sum().backward())  # No exception

This script contains utility functions for developing new networks/blocks in PyTorch.

monai.apps.reconstruction.networks.nets.utils.complex_normalize(x)[source]#

Performs layer mean-std normalization for complex data. Normalization is done for each batch member along each part (part refers to real and imaginary parts), separately.

Parameters:

x (Tensor) – input of shape (B,C,H,W) for 2D data or (B,C,H,W,D) for 3D data

Return type:

tuple[Tensor, Tensor, Tensor]

Returns:

A tuple containing

normalized output of shape (B,C,H,W) for 2D data or (B,C,H,W,D) for 3D data
mean
std

monai.apps.reconstruction.networks.nets.utils.divisible_pad_t(x, k=16)[source]#

Pad input to feed into the network (torch script compatible)

Parameters:

x (Tensor) – input of shape (B,C,H,W) for 2D data or (B,C,H,W,D) for 3D data
k (int) – padding factor. each padded dimension will be divisible by k.

Return type:

tuple[Tensor, tuple[tuple[int, int], tuple[int, int], tuple[int, int], int, int, int]]

Returns:

A tuple containing

padded input
pad sizes (in order to reverse padding if needed)

Example

import torch

# 2D data
x = torch.ones([3,2,50,70])
x_pad,padding_sizes = divisible_pad_t(x, k=16)
# the following line should print (3, 2, 64, 80)
print(x_pad.shape)

# 3D data
x = torch.ones([3,2,50,70,80])
x_pad,padding_sizes = divisible_pad_t(x, k=16)
# the following line should print (3, 2, 64, 80, 80)
print(x_pad.shape)

monai.apps.reconstruction.networks.nets.utils.floor_ceil(n)[source]#

Returns floor and ceil of the input

Parameters:

n (float) – input number

Returns:

floor(n)
ceil(n)

Return type:

A tuple containing

monai.apps.reconstruction.networks.nets.utils.inverse_divisible_pad_t(x, pad_sizes)[source]#

De-pad network output to match its original shape

Parameters:

x (Tensor) – input of shape (B,C,H,W) for 2D data or (B,C,H,W,D) for 3D data
pad_sizes (tuple[tuple[int, int], tuple[int, int], tuple[int, int], int, int, int]) – padding values

Return type:

Tensor

Returns:

de-padded input

monai.apps.reconstruction.networks.nets.utils.reshape_batch_channel_to_channel_dim(x, batch_size)[source]#

Detaches batch and channel dimensions.

Parameters:

x (Tensor) – input of shape (B*C,1,H,W,2) for 2D data or (B*C,1,H,W,D,2) for 3D data
batch_size (int) – batch size

Return type:

Tensor

Returns:

output of shape (B,C,…)

monai.apps.reconstruction.networks.nets.utils.reshape_channel_complex_to_last_dim(x)[source]#

Swaps the complex dimension with the channel dimension so that the network output has 2 as its last dimension

Parameters:: x (Tensor) – input of shape (B,C*2,H,W) for 2D data or (B,C*2,H,W,D) for 3D data
Return type:: Tensor
Returns:: output of shape (B,C,H,W,2) for 2D data or (B,C,H,W,D,2) for 3D data

monai.apps.reconstruction.networks.nets.utils.reshape_channel_to_batch_dim(x)[source]#

Combines batch and channel dimensions.

Parameters:

x (Tensor) – input of shape (B,C,H,W,2) for 2D data or (B,C,H,W,D,2) for 3D data

Returns:

output of shape (B*C,1,…)
batch size

Return type:

A tuple containing

monai.apps.reconstruction.networks.nets.utils.reshape_complex_to_channel_dim(x)[source]#

Swaps the complex dimension with the channel dimension so that the network treats real/imaginary parts as two separate channels.

Parameters:: x (Tensor) – input of shape (B,C,H,W,2) for 2D data or (B,C,H,W,D,2) for 3D data
Return type:: Tensor
Returns:: output of shape (B,C*2,H,W) for 2D data or (B,C*2,H,W,D) for 3D data

monai.apps.reconstruction.networks.nets.utils.sensitivity_map_expand(img, sens_maps, spatial_dims=2)[source]#

Expands an image to its corresponding coil images based on the given sens_maps. Let’s say there are C coils. This function multiples image img with each coil sensitivity map in sens_maps and stacks the resulting C coil images along the channel dimension which is reserved for coils.

Parameters:

img (Tensor) – 2D image (B,1,H,W,2) with the last dimension being 2 (for real/imaginary parts). 3D data will have the shape (B,1,H,W,D,2).
sens_maps (Tensor) – Sensitivity maps for combining coil images. The shape is (B,C,H,W,2) for 2D data or (B,C,H,W,D,2) for 3D data (C denotes the coil dimension).
spatial_dims (int) – is 2 for 2D data and is 3 for 3D data

Return type:

Tensor

Returns:

Expansion of x to (B,C,H,W,2) for 2D data and (B,C,H,W,D,2) for 3D data. The output is transferred: to the frequency domain to yield coil measurements.

monai.apps.reconstruction.networks.nets.utils.sensitivity_map_reduce(kspace, sens_maps, spatial_dims=2)[source]#

Reduces coil measurements to a corresponding image based on the given sens_maps. Let’s say there are C coil measurements inside kspace, then this function multiplies the conjugate of each coil sensitivity map with the corresponding coil image. The result of this process will be C images. Summing those images together gives the resulting “reduced image.”

Parameters:

kspace (Tensor) – 2D kspace (B,C,H,W,2) with the last dimension being 2 (for real/imaginary parts) and C denoting the coil dimension. 3D data will have the shape (B,C,H,W,D,2).
sens_maps (Tensor) – sensitivity maps of the same shape as input x.
spatial_dims (int) – is 2 for 2D data and is 3 for 3D data

Return type:

Tensor

Returns:

reduction of x to (B,1,H,W,2) for 2D data or (B,1,H,W,D,2) for 3D data.

Network architectures#

Blocks#

ADN#

Convolution#

CRF#

ResidualUnit#

Swish#

MemoryEfficientSwish#

FPN#

Mish#

GEGLU#

GCN Module#

Refinement Module#

FCN Module#

Multi-Channel FCN Module#

Dynamic-Unet Block#

DenseBlock#

SegResnet Block#

SABlock Block#

Squeeze-and-Excitation#

Transformer Block#

UNETR Block#

Residual Squeeze-and-Excitation#

Squeeze-and-Excitation Block#

Squeeze-and-Excitation Bottleneck#

Squeeze-and-Excitation Resnet Bottleneck#

Squeeze-and-Excitation ResNeXt Bottleneck#

Simple ASPP#

MaxAvgPooling#

Upsampling#

Registration Residual Conv Block#

Registration Down Sample Block#

Registration Extraction Block#

LocalNet DownSample Block#

LocalNet UpSample Block#

LocalNet Feature Extractor Block#

MLP Block#

Patch Embedding Block#

FactorizedIncreaseBlock#

FactorizedReduceBlock#

P3DActiConvNormBlock#

ActiConvNormBlock#

Warp#

DVF2DDF#

VarNetBlock#

N-Dim Fourier Transform#

Layers#

Factories#

split_args#

Dropout#

Act#

Norm#

Conv#

Pad#

Pool#

ChannelPad#

SkipConnection#

Flatten#

Reshape#

separable_filtering#

apply_filter#

GaussianFilter#

MedianFilter#

median_filter#

BilateralFilter#

TrainableBilateralFilter#

TrainableJointBilateralFilter#

PHLFilter#

GaussianMixtureModel#

SavitzkyGolayFilter#

HilbertTransform#

Affine Transform#

grid_pull#

grid_push#

grid_count#

grid_grad#

LLTM#

ConjugateGradient#

Utilities#

Nets#