Network architectures

Blocks

Convolution

class monai.networks.blocks.Convolution(dimensions, in_channels, out_channels, strides=1, kernel_size=3, act='PRELU', norm='INSTANCE', dropout=None, dropout_dim=1, dilation=1, groups=1, bias=True, conv_only=False, is_transposed=False, padding=None, output_padding=None)[source]

Constructs a convolution with normalization, optional dropout, and optional activation layers:

-- (Conv|ConvTrans) -- Norm -- (Dropout) -- (Acti) --

if conv_only set to True:

-- (Conv|ConvTrans) --
Parameters
  • dimensions (int) – number of spatial dimensions.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • strides (Union[Sequence[int], int]) – convolution stride. Defaults to 1.

  • kernel_size (Union[Sequence[int], int]) – convolution kernel size. Defaults to 3.

  • act (Union[Tuple, str, None]) – activation type and arguments. Defaults to PReLU.

  • norm (Union[Tuple, str]) – feature normalization type and arguments. Defaults to instance norm.

  • dropout (Union[Tuple, str, float, None]) – dropout ratio. Defaults to no dropout.

  • dropout_dim (int) – determine the dimensions of dropout. Defaults to 1. When dropout_dim = 1, randomly zeroes some of the elements for each channel. When dropout_dim = 2, Randomly zeroes out entire channels (a channel is a 2D feature map). When dropout_dim = 3, Randomly zeroes out entire channels (a channel is a 3D feature map). The value of dropout_dim should be no no larger than the value of dimensions.

  • dilation (Union[Sequence[int], int]) – dilation rate. Defaults to 1.

  • groups (int) – controls the connections between inputs and outputs. Defaults to 1.

  • bias (bool) – whether to have a bias term. Defaults to True.

  • conv_only (bool) – whether to use the convolutional layer only. Defaults to False.

  • is_transposed (bool) – if True uses ConvTrans instead of Conv. Defaults to False.

  • padding (Union[Sequence[int], int, None]) – controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension. Defaults to None.

  • output_padding (Union[Sequence[int], int, None]) – controls the additional size added to one side of the output shape. Defaults to None.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

ResidualUnit

class monai.networks.blocks.ResidualUnit(dimensions, in_channels, out_channels, strides=1, kernel_size=3, subunits=2, act='PRELU', norm='INSTANCE', dropout=None, dropout_dim=1, dilation=1, bias=True, last_conv_only=False, padding=None)[source]

Residual module with multiple convolutions and a residual connection.

Parameters
  • dimensions (int) – number of spatial dimensions.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • strides (Union[Sequence[int], int]) – convolution stride. Defaults to 1.

  • kernel_size (Union[Sequence[int], int]) – convolution kernel size. Defaults to 3.

  • subunits (int) – number of convolutions. Defaults to 2.

  • act (Union[Tuple, str, None]) – activation type and arguments. Defaults to PReLU.

  • norm (Union[Tuple, str]) – feature normalization type and arguments. Defaults to instance norm.

  • dropout (Union[Tuple, str, float, None]) – dropout ratio. Defaults to no dropout.

  • dropout_dim (int) – determine the dimensions of dropout. Defaults to 1. When dropout_dim = 1, randomly zeroes some of the elements for each channel. When dropout_dim = 2, Randomly zero out entire channels (a channel is a 2D feature map). When dropout_dim = 3, Randomly zero out entire channels (a channel is a 3D feature map). The value of dropout_dim should be no no larger than the value of dimensions.

  • dilation (Union[Sequence[int], int]) – dilation rate. Defaults to 1.

  • bias (bool) – whether to have a bias term. Defaults to True.

  • last_conv_only (bool) – for the last subunit, whether to use the convolutional layer only. Defaults to False.

  • padding (Union[Sequence[int], int, None]) – controls the amount of implicit zero-paddings on both sides for padding number of points for each dimension. Defaults to None.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

GCN Module

class monai.networks.blocks.GCN(inplanes, planes, ks=7)[source]

The Global Convolutional Network module using large 1D Kx1 and 1xK kernels to represent 2D kernels.

Parameters
  • inplanes (int) – number of input channels.

  • planes (int) – number of output channels.

  • ks (int) – kernel size for one dimension. Defaults to 7.

forward(x)[source]
Parameters

x (Tensor) – in shape (batch, inplanes, spatial_1, spatial_2).

Return type

Tensor

Refinement Module

class monai.networks.blocks.Refine(planes)[source]

Simple residual block to refine the details of the activation maps.

Parameters

planes (int) – number of input channels.

forward(x)[source]
Parameters

x (Tensor) – in shape (batch, planes, spatial_1, spatial_2).

Return type

Tensor

FCN Module

class monai.networks.blocks.FCN(out_channels=1, upsample_mode='bilinear', pretrained=True, progress=True)[source]

2D FCN network with 3 input channels. The small decoder is built with the GCN and Refine modules. The code is adapted from lsqshr’s official 2D code.

Parameters
  • out_channels (int) – number of output channels. Defaults to 1.

  • upsample_mode (str) –

    ["transpose", "bilinear"] The mode of upsampling manipulations. Using the second mode cannot guarantee the model’s reproducibility. Defaults to bilinear.

    • transpose, uses transposed convolution layers.

    • bilinear, uses bilinear interpolate.

  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]
Parameters

x (Tensor) – in shape (batch, 3, spatial_1, spatial_2).

Multi-Channel FCN Module

class monai.networks.blocks.MCFCN(in_channels=3, out_channels=1, upsample_mode='bilinear', pretrained=True, progress=True)[source]

The multi-channel version of the 2D FCN module. Adds a projection layer to take arbitrary number of inputs.

Parameters
  • in_channels (int) – number of input channels. Defaults to 3.

  • out_channels (int) – number of output channels. Defaults to 1.

  • upsample_mode (str) –

    ["transpose", "bilinear"] The mode of upsampling manipulations. Using the second mode cannot guarantee the model’s reproducibility. Defaults to bilinear.

    • transpose, uses transposed convolution layers.

    • bilinear, uses bilinear interpolate.

  • pretrained (bool) – If True, returns a model pre-trained on ImageNet

  • progress (bool) – If True, displays a progress bar of the download to stderr.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

forward(x)[source]
Parameters

x (Tensor) – in shape (batch, in_channels, spatial_1, spatial_2).

No New Unet Block

class monai.networks.blocks.UnetResBlock(spatial_dims, in_channels, out_channels, kernel_size, stride, norm_name)[source]

A skip-connection based module that can be used for DynUNet, based on: Automated Design of Deep Learning Methods for Biomedical Image Segmentation. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation.

Parameters
  • spatial_dims (int) – number of spatial dimensions.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • kernel_size (Union[Sequence[int], int]) – convolution kernel size.

  • stride (Union[Sequence[int], int]) – convolution stride.

  • norm_name (str) – ["batch", "instance", "group"] feature normalization type and arguments. In this module, if using "group", in_channels should be divisible by 16 (default value for num_groups).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class monai.networks.blocks.UnetBasicBlock(spatial_dims, in_channels, out_channels, kernel_size, stride, norm_name)[source]

A CNN module module that can be used for DynUNet, based on: Automated Design of Deep Learning Methods for Biomedical Image Segmentation. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation.

Parameters
  • spatial_dims (int) – number of spatial dimensions.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • kernel_size (Union[Sequence[int], int]) – convolution kernel size.

  • stride (Union[Sequence[int], int]) – convolution stride.

  • norm_name (str) – ["batch", "instance", "group"] feature normalization type and arguments. In this module, if using "group", in_channels should be divisible by 16 (default value for num_groups).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class monai.networks.blocks.UnetUpBlock(spatial_dims, in_channels, out_channels, kernel_size, stride, upsample_kernel_size, norm_name)[source]

An upsampling module that can be used for DynUNet, based on: Automated Design of Deep Learning Methods for Biomedical Image Segmentation. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation.

Parameters
  • spatial_dims (int) – number of spatial dimensions.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • kernel_size (Union[Sequence[int], int]) – convolution kernel size.

  • stride (Union[Sequence[int], int]) – convolution stride.

  • upsample_kernel_size (Union[Sequence[int], int]) – convolution kernel size for transposed convolution layers.

  • norm_name (str) – ["batch", "instance", "group"] feature normalization type and arguments. In this module, if using "group", in_channels should be divisible by 16 (default value for num_groups).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

SegResnet Block

class monai.networks.blocks.ResBlock(spatial_dims, in_channels, kernel_size=3, norm_name='group', num_groups=8)[source]

ResBlock employs skip connection and two convolution blocks and is used in SegResNet based on 3D MRI brain tumor segmentation using autoencoder regularization.

Parameters
  • spatial_dims (int) – number of spatial dimensions, could be 1, 2 or 3.

  • in_channels (int) – number of input channels.

  • kernel_size (int) – convolution kernel size, the value should be an odd number. Defaults to 3.

  • norm_name (str) – feature normalization type, this module only supports group norm, batch norm and instance norm. Defaults to group.

  • num_groups (int) – number of groups to separate the channels into, in this module, in_channels should be divisible by num_groups. Defaults to 8.

Squeeze-and-Excitation

class monai.networks.blocks.ChannelSELayer(spatial_dims, in_channels, r=2, acti_type_1=('relu', {'inplace': True}), acti_type_2='sigmoid')[source]

Re-implementation of the Squeeze-and-Excitation block based on: “Hu et al., Squeeze-and-Excitation Networks, https://arxiv.org/abs/1709.01507”.

Parameters
  • spatial_dims (int) – number of spatial dimensions, could be 1, 2, or 3.

  • in_channels (int) – number of input channels.

  • r (int) – the reduction ratio r in the paper. Defaults to 2.

  • acti_type_1 (Union[Tuple[str, Dict], str]) – activation type of the hidden squeeze layer. Defaults to ("relu", {"inplace": True}).

  • acti_type_2 (Union[Tuple[str, Dict], str]) – activation type of the output squeeze layer. Defaults to “sigmoid”.

Raises

ValueError – When r is nonpositive or larger than in_channels.

forward(x)[source]
Parameters

x (Tensor) – in shape (batch, in_channels, spatial_1[, spatial_2, …]).

Return type

Tensor

Residual Squeeze-and-Excitation

class monai.networks.blocks.ResidualSELayer(spatial_dims, in_channels, r=2, acti_type_1='leakyrelu', acti_type_2='relu')[source]

A “squeeze-and-excitation”-like layer with a residual connection:

--+-- SE --o--
  |        |
  +--------+
Parameters
  • spatial_dims (int) – number of spatial dimensions, could be 1, 2, or 3.

  • in_channels (int) – number of input channels.

  • r (int) – the reduction ratio r in the paper. Defaults to 2.

  • acti_type_1 (Union[Tuple[str, Dict], str]) – defaults to “leakyrelu”.

  • acti_type_2 (Union[Tuple[str, Dict], str]) – defaults to “relu”.

forward(x)[source]
Parameters

x (Tensor) – in shape (batch, in_channels, spatial_1[, spatial_2, …]).

Return type

Tensor

Squeeze-and-Excitation Block

class monai.networks.blocks.SEBlock(spatial_dims, in_channels, n_chns_1, n_chns_2, n_chns_3, conv_param_1=None, conv_param_2=None, conv_param_3=None, project=None, r=2, acti_type_1=('relu', {'inplace': True}), acti_type_2='sigmoid', acti_type_final=('relu', {'inplace': True}))[source]

Residual module enhanced with Squeeze-and-Excitation:

----+- conv1 --  conv2 -- conv3 -- SE -o---
    |                                  |
    +---(channel project if needed)----+

Re-implementation of the SE-Resnet block based on: “Hu et al., Squeeze-and-Excitation Networks, https://arxiv.org/abs/1709.01507”.

Parameters
  • spatial_dims (int) – number of spatial dimensions, could be 1, 2, or 3.

  • in_channels (int) – number of input channels.

  • n_chns_1 (int) – number of output channels in the 1st convolution.

  • n_chns_2 (int) – number of output channels in the 2nd convolution.

  • n_chns_3 (int) – number of output channels in the 3rd convolution.

  • conv_param_1 (Optional[Dict]) – additional parameters to the 1st convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}

  • conv_param_2 (Optional[Dict]) – additional parameters to the 2nd convolution. Defaults to {"kernel_size": 3, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}

  • conv_param_3 (Optional[Dict]) – additional parameters to the 3rd convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": None}

  • project (Optional[Convolution]) – in the case of residual chns and output chns doesn’t match, a project (Conv) layer/block is used to adjust the number of chns. In SENET, it is consisted with a Conv layer as well as a Norm layer. Defaults to None (chns are matchable) or a Conv layer with kernel size 1.

  • r (int) – the reduction ratio r in the paper. Defaults to 2.

  • acti_type_1 (Union[Tuple[str, Dict], str]) – activation type of the hidden squeeze layer. Defaults to “relu”.

  • acti_type_2 (Union[Tuple[str, Dict], str]) – activation type of the output squeeze layer. Defaults to “sigmoid”.

  • acti_type_final (Union[Tuple[str, Dict], str, None]) – activation type of the end of the block. Defaults to “relu”.

forward(x)[source]
Parameters

x (Tensor) – in shape (batch, in_channels, spatial_1[, spatial_2, …]).

Return type

Tensor

Squeeze-and-Excitation Bottleneck

class monai.networks.blocks.SEBottleneck(spatial_dims, inplanes, planes, groups, reduction, stride=1, downsample=None)[source]

Bottleneck for SENet154.

Parameters
  • spatial_dims (int) – number of spatial dimensions, could be 1, 2, or 3.

  • in_channels – number of input channels.

  • n_chns_1 – number of output channels in the 1st convolution.

  • n_chns_2 – number of output channels in the 2nd convolution.

  • n_chns_3 – number of output channels in the 3rd convolution.

  • conv_param_1 – additional parameters to the 1st convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}

  • conv_param_2 – additional parameters to the 2nd convolution. Defaults to {"kernel_size": 3, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}

  • conv_param_3 – additional parameters to the 3rd convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": None}

  • project – in the case of residual chns and output chns doesn’t match, a project (Conv) layer/block is used to adjust the number of chns. In SENET, it is consisted with a Conv layer as well as a Norm layer. Defaults to None (chns are matchable) or a Conv layer with kernel size 1.

  • r – the reduction ratio r in the paper. Defaults to 2.

  • acti_type_1 – activation type of the hidden squeeze layer. Defaults to “relu”.

  • acti_type_2 – activation type of the output squeeze layer. Defaults to “sigmoid”.

  • acti_type_final – activation type of the end of the block. Defaults to “relu”.

Squeeze-and-Excitation Resnet Bottleneck

class monai.networks.blocks.SEResNetBottleneck(spatial_dims, inplanes, planes, groups, reduction, stride=1, downsample=None)[source]

ResNet bottleneck with a Squeeze-and-Excitation module. It follows Caffe implementation and uses strides=stride in conv1 and not in conv2 (the latter is used in the torchvision implementation of ResNet).

Parameters
  • spatial_dims (int) – number of spatial dimensions, could be 1, 2, or 3.

  • in_channels – number of input channels.

  • n_chns_1 – number of output channels in the 1st convolution.

  • n_chns_2 – number of output channels in the 2nd convolution.

  • n_chns_3 – number of output channels in the 3rd convolution.

  • conv_param_1 – additional parameters to the 1st convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}

  • conv_param_2 – additional parameters to the 2nd convolution. Defaults to {"kernel_size": 3, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}

  • conv_param_3 – additional parameters to the 3rd convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": None}

  • project – in the case of residual chns and output chns doesn’t match, a project (Conv) layer/block is used to adjust the number of chns. In SENET, it is consisted with a Conv layer as well as a Norm layer. Defaults to None (chns are matchable) or a Conv layer with kernel size 1.

  • r – the reduction ratio r in the paper. Defaults to 2.

  • acti_type_1 – activation type of the hidden squeeze layer. Defaults to “relu”.

  • acti_type_2 – activation type of the output squeeze layer. Defaults to “sigmoid”.

  • acti_type_final – activation type of the end of the block. Defaults to “relu”.

Squeeze-and-Excitation ResneXt Bottleneck

class monai.networks.blocks.SEResNeXtBottleneck(spatial_dims, inplanes, planes, groups, reduction, stride=1, downsample=None, base_width=4)[source]

ResNeXt bottleneck type C with a Squeeze-and-Excitation module.

Parameters
  • spatial_dims (int) – number of spatial dimensions, could be 1, 2, or 3.

  • in_channels – number of input channels.

  • n_chns_1 – number of output channels in the 1st convolution.

  • n_chns_2 – number of output channels in the 2nd convolution.

  • n_chns_3 – number of output channels in the 3rd convolution.

  • conv_param_1 – additional parameters to the 1st convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}

  • conv_param_2 – additional parameters to the 2nd convolution. Defaults to {"kernel_size": 3, "norm": Norm.BATCH, "act": ("relu", {"inplace": True})}

  • conv_param_3 – additional parameters to the 3rd convolution. Defaults to {"kernel_size": 1, "norm": Norm.BATCH, "act": None}

  • project – in the case of residual chns and output chns doesn’t match, a project (Conv) layer/block is used to adjust the number of chns. In SENET, it is consisted with a Conv layer as well as a Norm layer. Defaults to None (chns are matchable) or a Conv layer with kernel size 1.

  • r – the reduction ratio r in the paper. Defaults to 2.

  • acti_type_1 – activation type of the hidden squeeze layer. Defaults to “relu”.

  • acti_type_2 – activation type of the output squeeze layer. Defaults to “sigmoid”.

  • acti_type_final – activation type of the end of the block. Defaults to “relu”.

Simple ASPP

class monai.networks.blocks.SimpleASPP(spatial_dims, in_channels, conv_out_channels, kernel_sizes=(1, 3, 3, 3), dilations=(1, 2, 4, 6), norm_type='BATCH', acti_type='LEAKYRELU')[source]

A simplified version of the atrous spatial pyramid pooling (ASPP) module.

Chen et al., Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. https://arxiv.org/abs/1802.02611

Wang et al., A Noise-robust Framework for Automatic Segmentation of COVID-19 Pneumonia Lesions from CT Images. https://ieeexplore.ieee.org/document/9109297

Parameters
  • spatial_dims (int) – number of spatial dimensions, could be 1, 2, or 3.

  • in_channels (int) – number of input channels.

  • conv_out_channels (int) – number of output channels of each atrous conv. The final number of output channels is conv_out_channels * len(kernel_sizes).

  • kernel_sizes (Sequence[int]) – a sequence of four convolutional kernel sizes. Defaults to (1, 3, 3, 3) for four (dilated) convolutions.

  • dilations (Sequence[int]) – a sequence of four convolutional dilation parameters. Defaults to (1, 2, 4, 6) for four (dilated) convolutions.

  • norm_type – final kernel-size-one convolution normalization type. Defaults to batch norm.

  • acti_type – final kernel-size-one convolution activation type. Defaults to leaky ReLU.

Raises

ValueError – When kernel_sizes length differs from dilations.

forward(x)[source]
Parameters

x (Tensor) – in shape (batch, channel, spatial_1[, spatial_2, …]).

Return type

Tensor

MaxAvgPooling

class monai.networks.blocks.MaxAvgPool(spatial_dims, kernel_size, stride=None, padding=0, ceil_mode=False)[source]

Downsample with both maxpooling and avgpooling, double the channel size by concatenating the downsampled feature maps.

Parameters
  • spatial_dims (int) – number of spatial dimensions of the input image.

  • kernel_size (Union[Sequence[int], int]) – the kernel size of both pooling operations.

  • stride (Union[Sequence[int], int, None]) – the stride of the window. Default value is kernel_size.

  • padding (Union[Sequence[int], int]) – implicit zero padding to be added to both pooling operations.

  • ceil_mode (bool) – when True, will use ceil instead of floor to compute the output shape.

forward(x)[source]
Parameters

x (Tensor) – Tensor in shape (batch, channel, spatial_1[, spatial_2, …]).

Return type

Tensor

Returns

Tensor in shape (batch, 2*channel, spatial_1[, spatial_2, …]).

Upsampling

class monai.networks.blocks.UpSample(dimensions, in_channels, out_channels=None, scale_factor=2, with_conv=False, mode=<UpsampleMode.LINEAR: 'linear'>, align_corners=True)[source]

Upsample with either kernel 1 conv + interpolation or transposed conv.

Parameters
  • dimensions (int) – number of spatial dimensions of the input image.

  • in_channels (int) – number of channels of the input image.

  • out_channels (Optional[int]) – number of channels of the output image. Defaults to in_channels.

  • scale_factor (Union[Sequence[float], float]) – multiplier for spatial size. Has to match input size if it is a tuple. Defaults to 2.

  • with_conv (bool) – whether to use a transposed convolution for upsampling. Defaults to False.

  • mode (Union[UpsampleMode, str]) – {"nearest", "linear", "bilinear", "bicubic", "trilinear"} If ends with "linear" will use spatial dims to determine the correct interpolation. This corresponds to linear, bilinear, trilinear for 1D, 2D, and 3D respectively. The interpolation mode. Defaults to "linear". See also: https://pytorch.org/docs/stable/nn.html#upsample

  • align_corners (Optional[bool]) – set the align_corners parameter of torch.nn.Upsample. Defaults to True.

forward(x)[source]
Parameters

x (Tensor) – Tensor in shape (batch, channel, spatial_1[, spatial_2, …).

Return type

Tensor

class monai.networks.blocks.SubpixelUpsample(dimensions, in_channels, scale_factor=2, conv_block=None, apply_pad_pool=True)[source]

Upsample via using a subpixel CNN. This module supports 1D, 2D and 3D input images. The module is consisted with two parts. First of all, a convolutional layer is employed to increase the number of channels into: in_channels * (scale_factor ** dimensions). Secondly, a pixel shuffle manipulation is utilized to aggregates the feature maps from low resolution space and build the super resolution space. The first part of the module is not fixed, a sequential layers can be used to replace the default single layer.

See: Shi et al., 2016, “Real-Time Single Image and Video Super-Resolution Using a nEfficient Sub-Pixel Convolutional Neural Network.”

See: Aitken et al., 2017, “Checkerboard artifact free sub-pixel convolution”.

The idea comes from: https://arxiv.org/abs/1609.05158

The pixel shuffle mechanism refers to: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/PixelShuffle.cpp and: https://github.com/pytorch/pytorch/pull/6340/files

Parameters
  • dimensions (int) – number of spatial dimensions of the input image.

  • in_channels (int) – number of channels of the input image.

  • scale_factor (int) – multiplier for spatial size. Defaults to 2.

  • conv_block (Optional[Module]) – a conv block to extract feature maps before upsampling. Defaults to None. When conv_block is None, one reserved conv layer will be utilized.

  • apply_pad_pool (bool) – if True the upsampled tensor is padded then average pooling is applied with a kernel the size of scale_factor with a stride of 1. This implements the nearest neighbour resize convolution component of subpixel convolutions described in Aitken et al.

forward(x)[source]
Parameters

x (Tensor) – Tensor in shape (batch, channel, spatial_1[, spatial_2, …).

Return type

Tensor

Layers

Factories

Defines factories for creating layers in generic, extensible, and dimensionally independent ways. A separate factory object is created for each type of layer, and factory functions keyed to names are added to these objects. Whenever a layer is requested the factory name and any necessary arguments are passed to the factory object. The return value is typically a type but can be any callable producing a layer object.

The factory objects contain functions keyed to names converted to upper case, these names can be referred to as members of the factory so that they can function as constant identifiers. eg. instance normalisation is named Norm.INSTANCE.

For example, to get a transpose convolution layer the name is needed and then a dimension argument is provided which is passed to the factory function:

dimension = 3
name = Conv.CONVTRANS
conv = Conv[name, dimension]

This allows the dimension value to be set in the constructor, for example so that the dimensionality of a network is parameterizable. Not all factories require arguments after the name, the caller must be aware which are required.

Defining new factories involves creating the object then associating it with factory functions:

fact = LayerFactory()

@fact.factory_function('test')
def make_something(x, y):
    # do something with x and y to choose which layer type to return
    return SomeLayerType
...

# request object from factory TEST with 1 and 2 as values for x and y
layer = fact[fact.TEST, 1, 2]

Typically the caller of a factory would know what arguments to pass (ie. the dimensionality of the requested type) but can be parameterized with the factory name and the arguments to pass to the created type at instantiation time:

def use_factory(fact_args):
    fact_name, type_args = split_args
    layer_type = fact[fact_name, 1, 2]
    return layer_type(**type_args)
...

kw_args = {'arg0':0, 'arg1':True}
layer = use_factory( (fact.TEST, kwargs) )
class monai.networks.layers.LayerFactory[source]

Factory object for creating layers, this uses given factory functions to actually produce the types or constructing callables. These functions are referred to by name and can be added at any time.

add_factory_callable(name, func)[source]

Add the factory function to this object under the given name.

Return type

None

factory_function(name)[source]

Decorator for adding a factory function with the given name.

Return type

Callable

get_constructor(factory_name, *args)[source]

Get the constructor for the given factory name and arguments.

Raises

TypeError – When factory_name is not a str.

Return type

Any

property names

Produces all factory names.

Return type

Tuple[str, …]

split_args

monai.networks.layers.split_args(args)[source]

Split arguments in a way to be suitable for using with the factory types. If args is a string it’s interpreted as the type name.

Parameters

args (str or a tuple of object name and kwarg dict) – input arguments to be parsed.

Raises

TypeError – When args type is not in Union[str, Tuple[Union[str, Callable], dict]].

Examples:

>>> act_type, args = split_args("PRELU")
>>> monai.networks.layers.Act[act_type]
<class 'torch.nn.modules.activation.PReLU'>

>>> act_type, args = split_args(("PRELU", {"num_parameters": 1, "init": 0.25}))
>>> monai.networks.layers.Act[act_type](**args)
PReLU(num_parameters=1)

Dropout

The supported member is: DROPOUT. Please see monai.networks.layers.split_args for additional args parsing.

Act

The supported members are: ELU, RELU, LEAKYRELU, PRELU, RELU6, SELU, CELU, GELU, SIGMOID, TANH, SOFTMAX, LOGSOFTMAX. Please see monai.networks.layers.split_args for additional args parsing.

Norm

The supported members are: INSTANCE, BATCH, GROUP. Please see monai.networks.layers.split_args for additional args parsing.

Conv

The supported members are: CONV, CONVTRANS. Please see monai.networks.layers.split_args for additional args parsing.

Pool

The supported members are: MAX, ADAPTIVEMAX, AVG, ADAPTIVEAVG. Please see monai.networks.layers.split_args for additional args parsing.

SkipConnection

class monai.networks.layers.SkipConnection(submodule, cat_dim=1)[source]

Concats the forward pass input with the result from the given submodule.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Flatten

class monai.networks.layers.Flatten[source]

Flattens the given input in the forward pass to be [B,-1] in shape.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

GaussianFilter

class monai.networks.layers.GaussianFilter(spatial_dims, sigma, truncated=4.0)[source]
Parameters
  • spatial_dims (int) – number of spatial dimensions of the input image. must have shape (Batch, channels, H[, W, …]).

  • sigma (Union[Sequence[float], float]) – std.

  • truncated (float) – spreads how many stds.

forward(x)[source]
Parameters

x (Tensor) – in shape [Batch, chns, H, W, D].

Raises

TypeError – When x is not a torch.Tensor.

Return type

Tensor

Affine Transform

class monai.networks.layers.AffineTransform(spatial_size=None, normalized=False, mode=<GridSampleMode.BILINEAR: 'bilinear'>, padding_mode=<GridSamplePadMode.ZEROS: 'zeros'>, align_corners=False, reverse_indexing=True)[source]

Apply affine transformations with a batch of affine matrices.

When normalized=False and reverse_indexing=True, it does the commonly used resampling in the ‘pull’ direction following the scipy.ndimage.affine_transform convention. In this case theta is equivalent to (ndim+1, ndim+1) input matrix of scipy.ndimage.affine_transform, operates on homogeneous coordinates. See also: https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.affine_transform.html

When normalized=True and reverse_indexing=False, it applies theta to the normalized coordinates (coords. in the range of [-1, 1]) directly. This is often used with align_corners=False to achieve resolution-agnostic resampling, thus useful as a part of trainable modules such as the spatial transformer networks. See also: https://pytorch.org/tutorials/intermediate/spatial_transformer_tutorial.html

Parameters
  • spatial_size (Union[Sequence[int], int, None]) – output spatial shape, the full output shape will be [N, C, *spatial_size] where N and C are inferred from the src input of self.forward.

  • normalized (bool) – indicating whether the provided affine matrix theta is defined for the normalized coordinates. If normalized=False, theta will be converted to operate on normalized coordinates as pytorch affine_grid works with the normalized coordinates.

  • mode (Union[GridSampleMode, str]) – {"bilinear", "nearest"} Interpolation mode to calculate output values. Defaults to "bilinear". See also: https://pytorch.org/docs/stable/nn.functional.html#grid-sample

  • padding_mode (Union[GridSamplePadMode, str]) – {"zeros", "border", "reflection"} Padding mode for outside grid values. Defaults to "zeros". See also: https://pytorch.org/docs/stable/nn.functional.html#grid-sample

  • align_corners (bool) – see also https://pytorch.org/docs/stable/nn.functional.html#grid-sample.

  • reverse_indexing (bool) – whether to reverse the spatial indexing of image and coordinates. set to False if theta follows pytorch’s default “D, H, W” convention. set to True if theta follows scipy.ndimage default “i, j, k” convention.

forward(src, theta, spatial_size=None)[source]

theta must be an affine transformation matrix with shape 3x3 or Nx3x3 or Nx2x3 or 2x3 for spatial 2D transforms, 4x4 or Nx4x4 or Nx3x4 or 3x4 for spatial 3D transforms, where N is the batch size. theta will be converted into float Tensor for the computation.

Parameters
  • src (array_like) – image in spatial 2D or 3D (N, C, spatial_dims), where N is the batch dim, C is the number of channels.

  • theta (array_like) – Nx3x3, Nx2x3, 3x3, 2x3 for spatial 2D inputs, Nx4x4, Nx3x4, 3x4, 4x4 for spatial 3D inputs. When the batch dimension is omitted, theta will be repeated N times, N is the batch dim of src.

  • spatial_size (Union[Sequence[int], int, None]) – output spatial shape, the full output shape will be [N, C, *spatial_size] where N and C are inferred from the src.

Raises
  • TypeError – When theta is not a torch.Tensor.

  • ValueError – When theta is not one of [Nxdxd, dxd].

  • ValueError – When theta is not one of [Nx3x3, Nx4x4].

  • TypeError – When src is not a torch.Tensor.

  • ValueError – When src spatially is not one of [2D, 3D].

  • ValueError – When affine and image batch dimension differ.

Return type

Tensor

grid_pull

monai.networks.layers.grid_pull(input, grid, interpolation='linear', bound='zero', extrapolate=True)[source]

Sample an image with respect to a deformation field.

interpolation can be an int, a string or an InterpolationType. Possible values are:

- 0 or 'nearest'    or InterpolationType.nearest
- 1 or 'linear'     or InterpolationType.linear
- 2 or 'quadratic'  or InterpolationType.quadratic
- 3 or 'cubic'      or InterpolationType.cubic
- 4 or 'fourth'     or InterpolationType.fourth
- etc.

A list of values can be provided, in the order [W, H, D], to specify dimension-specific interpolation orders.

bound can be an int, a string or a BoundType. Possible values are:

- 0 or 'replicate'  or BoundType.replicate
- 1 or 'dct1'       or BoundType.dct1
- 2 or 'dct2'       or BoundType.dct2
- 3 or 'dst1'       or BoundType.dst1
- 4 or 'dst2'       or BoundType.dst2
- 5 or 'dft'        or BoundType.dft
- 6 or 'sliding'    or BoundType.sliding [not implemented]
- 7 or 'zero'       or BoundType.zero

A list of values can be provided, in the order [W, H, D], to specify dimension-specific boundary conditions. sliding is a specific condition than only applies to flow fields (with as many channels as dimensions). It cannot be dimension-specific. Note that:

  • dft corresponds to circular padding

  • dct2 corresponds to Neumann boundary conditions (symmetric)

  • dst2 corresponds to Dirichlet boundary conditions (antisymmetric)

See:

https://en.wikipedia.org/wiki/Discrete_cosine_transform https://en.wikipedia.org/wiki/Discrete_sine_transform

Parameters
  • input (Tensor) – Input image. (B, C, Wi, Hi, Di).

  • grid (Tensor) – Deformation field. (B, Wo, Ho, Do, 2|3).

  • interpolation (int or list[int] , optional) – Interpolation order. Defaults to 1.

  • bound (BoundType, or list[BoundType], optional) – Boundary conditions. Defaults to ‘zero’.

  • extrapolate (bool) – Extrapolate out-of-bound data. Defaults to True.

Returns

Deformed image (B, C, Wo, Ho, Do).

Return type

output (torch.Tensor)

grid_push

monai.networks.layers.grid_push(input, grid, shape=None, interpolation='linear', bound='zero', extrapolate=True)[source]

Splat an image with respect to a deformation field (pull adjoint).

interpolation can be an int, a string or an InterpolationType. Possible values are:

- 0 or 'nearest'    or InterpolationType.nearest
- 1 or 'linear'     or InterpolationType.linear
- 2 or 'quadratic'  or InterpolationType.quadratic
- 3 or 'cubic'      or InterpolationType.cubic
- 4 or 'fourth'     or InterpolationType.fourth
- etc.

A list of values can be provided, in the order [W, H, D], to specify dimension-specific interpolation orders.

bound can be an int, a string or a BoundType. Possible values are:

- 0 or 'replicate'  or BoundType.replicate
- 1 or 'dct1'       or BoundType.dct1
- 2 or 'dct2'       or BoundType.dct2
- 3 or 'dst1'       or BoundType.dst1
- 4 or 'dst2'       or BoundType.dst2
- 5 or 'dft'        or BoundType.dft
- 6 or 'sliding'    or BoundType.sliding [not implemented]
- 7 or 'zero'       or BoundType.zero

A list of values can be provided, in the order [W, H, D], to specify dimension-specific boundary conditions. sliding is a specific condition than only applies to flow fields (with as many channels as dimensions). It cannot be dimension-specific. Note that:

  • dft corresponds to circular padding

  • dct2 corresponds to Neumann boundary conditions (symmetric)

  • dst2 corresponds to Dirichlet boundary conditions (antisymmetric)

Parameters
  • input (Tensor) – Input image (B, C, Wi, Hi, Di).

  • grid (Tensor) – Deformation field (B, Wi, Hi, Di, 2|3).

  • shape – Shape of the source image.

  • interpolation (int or list[int] , optional) – Interpolation order. Defaults to 1.

  • bound (BoundType, or list[BoundType], optional) – Boundary conditions. Defaults to ‘zero’.

  • extrapolate (bool) – Extrapolate out-of-bound data. Defaults to True.

Returns

Splatted image (B, C, Wo, Ho, Do).

Return type

output (torch.Tensor)

grid_count

monai.networks.layers.grid_count(grid, shape=None, interpolation='linear', bound='zero', extrapolate=True)[source]

Splatting weights with respect to a deformation field (pull adjoint).

This function is equivalent to applying grid_push to an image of ones.

interpolation can be an int, a string or an InterpolationType. Possible values are:

- 0 or 'nearest'    or InterpolationType.nearest
- 1 or 'linear'     or InterpolationType.linear
- 2 or 'quadratic'  or InterpolationType.quadratic
- 3 or 'cubic'      or InterpolationType.cubic
- 4 or 'fourth'     or InterpolationType.fourth
- etc.

A list of values can be provided, in the order [W, H, D], to specify dimension-specific interpolation orders.

bound can be an int, a string or a BoundType. Possible values are:

- 0 or 'replicate'  or BoundType.replicate
- 1 or 'dct1'       or BoundType.dct1
- 2 or 'dct2'       or BoundType.dct2
- 3 or 'dst1'       or BoundType.dst1
- 4 or 'dst2'       or BoundType.dst2
- 5 or 'dft'        or BoundType.dft
- 6 or 'sliding'    or BoundType.sliding [not implemented]
- 7 or 'zero'       or BoundType.zero

A list of values can be provided, in the order [W, H, D], to specify dimension-specific boundary conditions. sliding is a specific condition than only applies to flow fields (with as many channels as dimensions). It cannot be dimension-specific. Note that:

  • dft corresponds to circular padding

  • dct2 corresponds to Neumann boundary conditions (symmetric)

  • dst2 corresponds to Dirichlet boundary conditions (antisymmetric)

Parameters
  • grid (Tensor) – Deformation field (B, Wi, Hi, Di, 2|3).

  • shape – shape of the source image.

  • interpolation (int or list[int] , optional) – Interpolation order. Defaults to 1.

  • bound (BoundType, or list[BoundType], optional) – Boundary conditions. Defaults to ‘zero’.

  • extrapolate (bool, optional) – Extrapolate out-of-bound data. Defaults to True.

Returns

Splat weights (B, 1, Wo, Ho, Do).

Return type

output (torch.Tensor)

grid_grad

monai.networks.layers.grid_grad(input, grid, interpolation='linear', bound='zero', extrapolate=True)[source]

Sample an image with respect to a deformation field.

interpolation can be an int, a string or an InterpolationType. Possible values are:

- 0 or 'nearest'    or InterpolationType.nearest
- 1 or 'linear'     or InterpolationType.linear
- 2 or 'quadratic'  or InterpolationType.quadratic
- 3 or 'cubic'      or InterpolationType.cubic
- 4 or 'fourth'     or InterpolationType.fourth
- etc.

A list of values can be provided, in the order [W, H, D], to specify dimension-specific interpolation orders.

bound can be an int, a string or a BoundType. Possible values are:

- 0 or 'replicate'  or BoundType.replicate
- 1 or 'dct1'       or BoundType.dct1
- 2 or 'dct2'       or BoundType.dct2
- 3 or 'dst1'       or BoundType.dst1
- 4 or 'dst2'       or BoundType.dst2
- 5 or 'dft'        or BoundType.dft
- 6 or 'sliding'    or BoundType.sliding [not implemented]
- 7 or 'zero'       or BoundType.zero

A list of values can be provided, in the order [W, H, D], to specify dimension-specific boundary conditions. sliding is a specific condition than only applies to flow fields (with as many channels as dimensions). It cannot be dimension-specific. Note that:

  • dft corresponds to circular padding

  • dct2 corresponds to Neumann boundary conditions (symmetric)

  • dst2 corresponds to Dirichlet boundary conditions (antisymmetric)

Parameters
  • input (Tensor) – Input image. (B, C, Wi, Hi, Di).

  • grid (Tensor) – Deformation field. (B, Wo, Ho, Do, 2|3).

  • interpolation (int or list[int] , optional) – Interpolation order. Defaults to 1.

  • bound (BoundType, or list[BoundType], optional) – Boundary conditions. Defaults to ‘zero’.

  • extrapolate (bool) – Extrapolate out-of-bound data. Defaults to True.

Returns

Sampled gradients (B, C, Wo, Ho, Do, 2|3).

Return type

output (torch.Tensor)

LLTM

class monai.networks.layers.LLTM(input_features, state_size)[source]

This recurrent unit is similar to an LSTM, but differs in that it lacks a forget gate and uses an Exponential Linear Unit (ELU) as its internal activation function. Because this unit never forgets, call it LLTM, or Long-Long-Term-Memory unit. It has both C++ and CUDA implementation, automatically switch according to the target device where put this module to.

Parameters
  • input_features (int) – size of input feature data

  • state_size (int) – size of the state of recurrent unit

Referring to: https://pytorch.org/tutorials/advanced/cpp_extension.html

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Utilities

monai.networks.layers.convutils.same_padding(kernel_size, dilation=1)[source]

Return the padding value needed to ensure a convolution using the given kernel size produces an output of the same shape as the input for a stride of 1, otherwise ensure a shape of the input divided by the stride rounded down.

Raises

NotImplementedError – When np.any((kernel_size - 1) * dilation % 2 == 1).

Return type

Union[Tuple[int, …], int]

monai.networks.layers.convutils.calculate_out_shape(in_shape, kernel_size, stride, padding)[source]

Calculate the output tensor shape when applying a convolution to a tensor of shape inShape with kernel size kernel_size, stride value stride, and input padding value padding. All arguments can be scalars or multiple values, return value is a scalar if all inputs are scalars.

Return type

Union[Tuple[int, …], int]

monai.networks.layers.convutils.gaussian_1d(sigma, truncated=4.0)[source]

one dimensional gaussian kernel.

Parameters
  • sigma (Union[float, Tensor]) – std of the kernel

  • truncated (float) – tail length

Raises

ValueError – When sigma is non-positive.

Return type

ndarray

Returns

1D torch tensor

Nets

Ahnet

class monai.networks.nets.AHNet(layers=(3, 4, 6, 3), spatial_dims=3, out_channels=1, upsample_mode='transpose')[source]

AHNet based on Anisotropic Hybrid Network. Adapted from lsqshr’s official code. Except from the original network that supports 3D inputs, this implementation also supports 2D inputs. According to the tests for deconvolutions, using "transpose" rather than linear interpolations is faster. Therefore, this implementation sets "transpose" as the default upsampling method.

To meet to requirements of the structure, the input size of the first dim-1 dimensions should be divisible by 128. In addition, for linear interpolation based upsampling modes, the input size of the first dim-1 dimensions should be divisible by 32 and no less than 128. If you need to use lower sizes, please reduce the largest blocks in PSP module and change the num_input_features in Final module.

In order to use pretrained weights from 2D FCN/MCFCN, please call the copy_from function, for example:

model = monai.networks.nets.AHNet(out_channels=2, upsample_mode='transpose')
model2d = monai.networks.blocks.FCN()
model.copy_from(model2d)
Parameters
  • layers (tuple) – number of residual blocks for 4 layers of the network (layer1…layer4). Defaults to (3, 4, 6, 3).

  • spatial_dims (int) – spatial dimension of the input data. Defaults to 3.

  • out_channels (int) – number of output channels for the network. Defaults to 1.

  • upsample_mode (str) –

    ["transpose", "bilinear", "trilinear"] The mode of upsampling manipulations. Using the last two modes cannot guarantee the model’s reproducibility. Defaults to transpose.

    • "transpose", uses transposed convolution layers.

    • "bilinear", uses bilinear interpolate.

    • "trilinear", uses trilinear interpolate.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Densenet3D

class monai.networks.nets.DenseNet(spatial_dims, in_channels, out_channels, init_features=64, growth_rate=32, block_config=(6, 12, 24, 16), bn_size=4, dropout_prob=0.0)[source]

Densenet based on: Densely Connected Convolutional Networks. Adapted from PyTorch Hub 2D version.

Parameters
  • spatial_dims (int) – number of spatial dimensions of the input image.

  • in_channels (int) – number of the input channel.

  • out_channels (int) – number of the output classes.

  • init_features (int) – number of filters in the first convolution layer.

  • growth_rate (int) – how many filters to add each layer (k in paper).

  • block_config (Sequence[int]) – how many layers in each pooling block.

  • bn_size (int) – multiplicative factor for number of bottle neck layers. (i.e. bn_size * k features in the bottleneck layer)

  • dropout_prob (float) – dropout rate after each dense layer.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

monai.networks.nets.densenet121(**kwargs)[source]
Return type

DenseNet

monai.networks.nets.densenet169(**kwargs)[source]
Return type

DenseNet

monai.networks.nets.densenet201(**kwargs)[source]
Return type

DenseNet

monai.networks.nets.densenet264(**kwargs)[source]
Return type

DenseNet

SegResnet

class monai.networks.nets.SegResNet(spatial_dims=3, init_filters=8, in_channels=1, out_channels=2, dropout_prob=None, norm_name='group', num_groups=8, use_conv_final=True, blocks_down=(1, 2, 2, 4), blocks_up=(1, 1, 1), upsample_mode='trilinear')[source]

SegResNet based on 3D MRI brain tumor segmentation using autoencoder regularization. The module does not include the variational autoencoder (VAE). The model supports 2D or 3D inputs.

Parameters
  • spatial_dims (int) – spatial dimension of the input data. Defaults to 3.

  • init_filters (int) – number of output channels for initial convolution layer. Defaults to 8.

  • in_channels (int) – number of input channels for the network. Defaults to 1.

  • out_channels (int) – number of output channels for the network. Defaults to 2.

  • dropout_prob (Optional[float]) – probability of an element to be zero-ed. Defaults to None.

  • norm_name (str) – feature normalization type, this module only supports group norm, batch norm and instance norm. Defaults to group.

  • num_groups (int) – number of groups to separate the channels into. Defaults to 8.

  • use_conv_final (bool) – if add a final convolution block to output. Defaults to True.

  • blocks_down (tuple) – number of down sample blocks in each layer. Defaults to [1,2,2,4].

  • blocks_up (tuple) – number of up sample blocks in each layer. Defaults to [1,1,1].

  • upsample_mode (str) –

    ["transpose", "bilinear", "trilinear"] The mode of upsampling manipulations. Using the last two modes cannot guarantee the model’s reproducibility. Defaults to``trilinear``.

    • transpose, uses transposed convolution layers.

    • bilinear, uses bilinear interpolate.

    • trilinear, uses trilinear interpolate.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

SegResnetVAE

class monai.networks.nets.SegResNetVAE(input_image_size, vae_estimate_std=False, vae_default_std=0.3, vae_nz=256, spatial_dims=3, init_filters=8, in_channels=1, out_channels=2, dropout_prob=None, norm_name='group', num_groups=8, use_conv_final=True, blocks_down=(1, 2, 2, 4), blocks_up=(1, 1, 1), upsample_mode='trilinear')[source]

SegResNetVAE based on 3D MRI brain tumor segmentation using autoencoder regularization. The module contains the variational autoencoder (VAE). The model supports 2D or 3D inputs.

Parameters
  • spatial_dims (int) – spatial dimension of the input data. Defaults to 3.

  • init_filters (int) – number of output channels for initial convolution layer. Defaults to 8.

  • in_channels (int) – number of input channels for the network. Defaults to 1.

  • out_channels (int) – number of output channels for the network. Defaults to 2.

  • dropout_prob (Optional[float]) – probability of an element to be zero-ed. Defaults to None.

  • norm_name (str) – feature normalization type, this module only supports group norm, batch norm and instance norm. Defaults to group.

  • num_groups (int) – number of groups to separate the channels into. Defaults to 8.

  • use_conv_final (bool) – if add a final convolution block to output. Defaults to True.

  • blocks_down (tuple) – number of down sample blocks in each layer. Defaults to [1,2,2,4].

  • blocks_up (tuple) – number of up sample blocks in each layer. Defaults to [1,1,1].

  • upsample_mode (str) –

    ["transpose", "bilinear", "trilinear"] The mode of upsampling manipulations. Using the last two modes cannot guarantee the model’s reproducibility. Defaults to``trilinear``.

    • transpose, uses transposed convolution layers.

    • bilinear, uses bilinear interpolate.

    • trilinear, uses trilinear interpolate.

  • use_vae – if use the variational autoencoder (VAE) during training. Defaults to False.

  • input_image_size (Sequence[int]) – the size of images to input into the network. It is used to determine the in_features of the fc layer in VAE. When use_vae == True, please ensure that this parameter is set. Defaults to None.

  • vae_estimate_std (bool) – whether to estimate the standard deviations in VAE. Defaults to False.

  • vae_default_std (float) – if not to estimate the std, use the default value. Defaults to 0.3.

  • vae_nz (int) – number of latent variables in VAE. Defaults to 256. Where, 128 to represent mean, and 128 to represent std.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Senet

class monai.networks.nets.SENet(spatial_dims, in_channels, block, layers, groups, reduction, dropout_prob=0.2, dropout_dim=1, inplanes=128, downsample_kernel_size=3, input_3x3=True, num_classes=1000)[source]

SENet based on Squeeze-and-Excitation Networks. Adapted from Cadene Hub 2D version.

Parameters
  • spatial_dims (int) – spatial dimension of the input data.

  • in_channels (int) – channel number of the input data.

  • block (Type[Union[SEBottleneck, SEResNetBottleneck, SEResNeXtBottleneck]]) – SEBlock class. for SENet154: SEBottleneck for SE-ResNet models: SEResNetBottleneck for SE-ResNeXt models: SEResNeXtBottleneck

  • layers (List[int]) – number of residual blocks for 4 layers of the network (layer1…layer4).

  • groups (int) – number of groups for the 3x3 convolution in each bottleneck block. for SENet154: 64 for SE-ResNet models: 1 for SE-ResNeXt models: 32

  • reduction (int) – reduction ratio for Squeeze-and-Excitation modules. for all models: 16

  • dropout_prob (Optional[float]) – drop probability for the Dropout layer. if None the Dropout layer is not used. for SENet154: 0.2 for SE-ResNet models: None for SE-ResNeXt models: None

  • dropout_dim (int) – determine the dimensions of dropout. Defaults to 1. When dropout_dim = 1, randomly zeroes some of the elements for each channel. When dropout_dim = 2, Randomly zeroes out entire channels (a channel is a 2D feature map). When dropout_dim = 3, Randomly zeroes out entire channels (a channel is a 3D feature map).

  • inplanes (int) – number of input channels for layer1. for SENet154: 128 for SE-ResNet models: 64 for SE-ResNeXt models: 64

  • downsample_kernel_size (int) – kernel size for downsampling convolutions in layer2, layer3 and layer4. for SENet154: 3 for SE-ResNet models: 1 for SE-ResNeXt models: 1

  • input_3x3 (bool) – If True, use three 3x3 convolutions instead of a single 7x7 convolution in layer0. - For SENet154: True - For SE-ResNet models: False - For SE-ResNeXt models: False

  • num_classes (int) – number of outputs in last_linear layer. for all models: 1000

Initializes internal Module state, shared by both nn.Module and ScriptModule.

monai.networks.nets.senet154(spatial_dims, in_channels, num_classes)[source]
Return type

SENet

monai.networks.nets.se_resnet50(spatial_dims, in_channels, num_classes)[source]
Return type

SENet

monai.networks.nets.se_resnet101(spatial_dims, in_channels, num_classes)[source]
Return type

SENet

monai.networks.nets.se_resnet152(spatial_dims, in_channels, num_classes)[source]
Return type

SENet

monai.networks.nets.se_resnext50_32x4d(spatial_dims, in_channels, num_classes)[source]
Return type

SENet

monai.networks.nets.se_resnext101_32x4d(spatial_dims, in_channels, num_classes)[source]
Return type

SENet

Highresnet

class monai.networks.nets.HighResNet(spatial_dims=3, in_channels=1, out_channels=1, norm_type=<Normalisation.BATCH: 'batch'>, acti_type=<Activation.RELU: 'relu'>, dropout_prob=None, layer_params=({'name': 'conv_0', 'n_features': 16, 'kernel_size': 3}, {'name': 'res_1', 'n_features': 16, 'kernels': (3, 3), 'repeat': 3}, {'name': 'res_2', 'n_features': 32, 'kernels': (3, 3), 'repeat': 3}, {'name': 'res_3', 'n_features': 64, 'kernels': (3, 3), 'repeat': 3}, {'name': 'conv_1', 'n_features': 80, 'kernel_size': 1}, {'name': 'conv_2', 'kernel_size': 1}))[source]

Reimplementation of highres3dnet based on Li et al., “On the compactness, efficiency, and representation of 3D convolutional networks: Brain parcellation as a pretext task”, IPMI ‘17

Adapted from: https://github.com/NifTK/NiftyNet/blob/v0.6.0/niftynet/network/highres3dnet.py https://github.com/fepegar/highresnet

Parameters
  • spatial_dims (int) – number of spatial dimensions of the input image.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • norm_type (Union[Normalisation, str]) – {"batch", "instance"} Feature normalisation with batchnorm or instancenorm. Defaults to "batch".

  • acti_type (Union[Activation, str]) – {"relu", "prelu", "relu6"} Non-linear activation using ReLU or PReLU. Defaults to "relu".

  • dropout_prob (Optional[float]) – probability of the feature map to be zeroed (only applies to the penultimate conv layer).

  • layer_params (Sequence[Dict]) – specifying key parameters of each layer/block.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

class monai.networks.nets.HighResBlock(spatial_dims, in_channels, out_channels, kernels=(3, 3), dilation=1, norm_type=<Normalisation.INSTANCE: 'instance'>, acti_type=<Activation.RELU: 'relu'>, channel_matching=<ChannelMatching.PAD: 'pad'>)[source]
Parameters
  • spatial_dims (int) – number of spatial dimensions of the input image.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • kernels (Sequence[int]) – each integer k in kernels corresponds to a convolution layer with kernel size k.

  • dilation (Union[Sequence[int], int]) – spacing between kernel elements.

  • norm_type (Union[Normalisation, str]) – {"batch", "instance"} Feature normalisation with batchnorm or instancenorm. Defaults to "instance".

  • acti_type (Union[Activation, str]) – {"relu", "prelu", "relu6"} Non-linear activation using ReLU or PReLU. Defaults to "relu".

  • channel_matching (Union[ChannelMatching, str]) –

    {"pad", "project"} Specifies handling residual branch and conv branch channel mismatches. Defaults to "pad".

    • "pad": with zero padding.

    • "project": with a trainable conv with kernel size.

Raises

ValueError – When channel_matching=pad and in_channels > out_channels. Incompatible values.

DynUNet

class monai.networks.nets.DynUNet(spatial_dims, in_channels, out_channels, kernel_size, strides, upsample_kernel_size, norm_name='instance', deep_supervision=True, deep_supr_num=1, res_block=False)[source]

This reimplementation of a dynamic UNet (DynUNet) is based on: Automated Design of Deep Learning Methods for Biomedical Image Segmentation. nnU-Net: Self-adapting Framework for U-Net-Based Medical Image Segmentation.

This model is more flexible compared with monai.networks.nets.UNet in three places:

  • Residual connection is supported in conv blocks.

  • Anisotropic kernel sizes and strides can be used in each layers.

  • Deep supervision heads can be added.

The model supports 2D or 3D inputs and is consisted with four kinds of blocks: one input block, n downsample blocks, one bottleneck and n+1 upsample blocks. Where, n>0. The first and last kernel and stride values of the input sequences are used for input block and bottleneck respectively, and the rest value(s) are used for downsample and upsample blocks. Therefore, pleasure ensure that the length of input sequences (kernel_size and strides) is no less than 3 in order to have at least one downsample upsample blocks.

Parameters
  • spatial_dims (int) – number of spatial dimensions.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • kernel_size (Sequence[Union[Sequence[int], int]]) – convolution kernel size.

  • strides (Sequence[Union[Sequence[int], int]]) – convolution strides for each blocks.

  • upsample_kernel_size (Sequence[Union[Sequence[int], int]]) – convolution kernel size for transposed convolution layers.

  • norm_name (str) – ["batch", "instance", "group"] feature normalization type and arguments.

  • deep_supervision (bool) – whether to add deep supervision head before output. Defaults to True. If added, in training mode, the network will output not only the last feature maps (after being converted via output block), but also the previous feature maps that come from the intermediate up sample layers.

  • deep_supr_num (int) – number of feature maps that will output during deep supervision head. The value should be less than the number of up sample layers. Defaults to 1.

  • res_block (bool) – whether to use residual connection based convolution blocks during the network. Defaults to True.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Unet

class monai.networks.nets.UNet(dimensions, in_channels, out_channels, channels, strides, kernel_size=3, up_kernel_size=3, num_res_units=0, act='PRELU', norm='INSTANCE', dropout=0)[source]

Enhanced version of UNet which has residual units implemented with the ResidualUnit class. The residual part uses a convolution to change the input dimensions to match the output dimensions if this is necessary but will use nn.Identity if not. Refer to: https://link.springer.com/chapter/10.1007/978-3-030-12029-0_40.

Parameters
  • dimensions (int) – number of spatial dimensions.

  • in_channels (int) – number of input channels.

  • out_channels (int) – number of output channels.

  • channels (Sequence[int]) – sequence of channels. Top block first.

  • strides (Sequence[int]) – convolution stride.

  • kernel_size (Union[Sequence[int], int]) – convolution kernel size. Defaults to 3.

  • up_kernel_size (Union[Sequence[int], int]) – upsampling convolution kernel size. Defaults to 3.

  • num_res_units (int) – number of residual units. Defaults to 0.

  • act – activation type and arguments. Defaults to PReLU.

  • norm – feature normalization type and arguments. Defaults to instance norm.

  • dropout – dropout ratio. Defaults to no dropout.

monai.networks.nets.Unet

alias of monai.networks.nets.unet.UNet

monai.networks.nets.unet

alias of monai.networks.nets.unet.UNet

Vnet

class monai.networks.nets.VNet(spatial_dims=3, in_channels=1, out_channels=1, act=('elu', {'inplace': True}), dropout_prob=0.5, dropout_dim=3)[source]

V-Net based on Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. Adapted from the official Caffe implementation. and another pytorch implementation. The model supports 2D or 3D inputs.

Parameters
  • spatial_dims (int) – spatial dimension of the input data. Defaults to 3.

  • in_channels (int) – number of input channels for the network. Defaults to 1. The value should meet the condition that 16 % in_channels == 0.

  • out_channels (int) – number of output channels for the network. Defaults to 1.

  • act (Union[Tuple[str, Dict], str]) – activation type in the network. Defaults to ("elu", {"inplace": True}).

  • dropout_prob (float) – dropout ratio. Defaults to 0.5. Defaults to 3.

  • dropout_dim (int) –

    determine the dimensions of dropout. Defaults to 3.

    • dropout_dim = 1, randomly zeroes some of the elements for each channel.

    • dropout_dim = 2, Randomly zeroes out entire channels (a channel is a 2D feature map).

    • dropout_dim = 3, Randomly zeroes out entire channels (a channel is a 3D feature map).

Initializes internal Module state, shared by both nn.Module and ScriptModule.

Generator

class monai.networks.nets.Generator(latent_shape, start_shape, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=None, bias=True)[source]

Defines a simple generator network accepting a latent vector and through a sequence of convolution layers constructs an output tensor of greater size and high dimensionality. The method _get_layer is used to create each of these layers, override this method to define layers beyond the default Convolution or ResidualUnit layers.

For example, a generator accepting a latent vector if shape (42,24) and producing an output volume of shape (1,64,64) can be constructed as:

gen = Generator((42, 24), (64, 8, 8), (32, 16, 1), (2, 2, 2))

Construct the generator network with the number of layers defined by channels and strides. In the forward pass a nn.Linear layer relates the input latent vector to a tensor of dimensions start_shape, this is then fed forward through the sequence of convolutional layers. The number of layers is defined by the length of channels and strides which must match, each layer having the number of output channels given in channels and an upsample factor given in strides (ie. a transpose convolution with that stride size).

Parameters
  • latent_shape (Sequence[int]) – tuple of integers stating the dimension of the input latent vector (minus batch dimension)

  • start_shape (Sequence[int]) – tuple of integers stating the dimension of the tensor to pass to convolution subnetwork

  • channels (Sequence[int]) – tuple of integers stating the output channels of each convolutional layer

  • strides (Sequence[int]) – tuple of integers stating the stride (upscale factor) of each convolutional layer

  • kernel_size (Union[Sequence[int], int]) – integer or tuple of integers stating size of convolutional kernels

  • num_res_units (int) – integer stating number of convolutions in residual units, 0 means no residual units

  • act – name or type defining activation layers

  • norm – name or type defining normalization layers

  • dropout (Optional[float]) – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout

  • bias (bool) – boolean stating if convolution layers should have a bias component

Regressor

class monai.networks.nets.Regressor(in_shape, out_shape, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=None, bias=True)[source]

This defines a network for relating large-sized input tensors to small output tensors, ie. regressing large values to a prediction. An output of a single dimension can be used as value regression or multi-label classification prediction, an output of a single value can be used as a discriminator or critic prediction.

Construct the regressor network with the number of layers defined by channels and strides. Inputs are first passed through the convolutional layers in the forward pass, the output from this is then pass through a fully connected layer to relate them to the final output tensor.

Parameters
  • in_shape (Sequence[int]) – tuple of integers stating the dimension of the input tensor (minus batch dimension)

  • out_shape (Sequence[int]) – tuple of integers stating the dimension of the final output tensor

  • channels (Sequence[int]) – tuple of integers stating the output channels of each convolutional layer

  • strides (Sequence[int]) – tuple of integers stating the stride (downscale factor) of each convolutional layer

  • kernel_size (Union[Sequence[int], int]) – integer or tuple of integers stating size of convolutional kernels

  • num_res_units (int) – integer stating number of convolutions in residual units, 0 means no residual units

  • act – name or type defining activation layers

  • norm – name or type defining normalization layers

  • dropout (Optional[float]) – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout

  • bias (bool) – boolean stating if convolution layers should have a bias component

Classifier

class monai.networks.nets.Classifier(in_shape, classes, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=None, bias=True, last_act=None)[source]

Defines a classification network from Regressor by specifying the output shape as a single dimensional tensor with size equal to the number of classes to predict. The final activation function can also be specified, eg. softmax or sigmoid.

Parameters
  • in_shape (Sequence[int]) – tuple of integers stating the dimension of the input tensor (minus batch dimension)

  • classes (int) – integer stating the dimension of the final output tensor

  • channels (Sequence[int]) – tuple of integers stating the output channels of each convolutional layer

  • strides (Sequence[int]) – tuple of integers stating the stride (downscale factor) of each convolutional layer

  • kernel_size (Union[Sequence[int], int]) – integer or tuple of integers stating size of convolutional kernels

  • num_res_units (int) – integer stating number of convolutions in residual units, 0 means no residual units

  • act – name or type defining activation layers

  • norm – name or type defining normalization layers

  • dropout (Optional[float]) – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout

  • bias (bool) – boolean stating if convolution layers should have a bias component

  • last_act (Optional[str]) – name defining the last activation layer

Discriminator

class monai.networks.nets.Discriminator(in_shape, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=0.25, bias=True, last_act='SIGMOID')[source]

Defines a discriminator network from Classifier with a single output value and sigmoid activation by default. This is meant for use with GANs or other applications requiring a generic discriminator network.

Parameters
  • in_shape (Sequence[int]) – tuple of integers stating the dimension of the input tensor (minus batch dimension)

  • channels (Sequence[int]) – tuple of integers stating the output channels of each convolutional layer

  • strides (Sequence[int]) – tuple of integers stating the stride (downscale factor) of each convolutional layer

  • kernel_size (Union[Sequence[int], int]) – integer or tuple of integers stating size of convolutional kernels

  • num_res_units (int) – integer stating number of convolutions in residual units, 0 means no residual units

  • act – name or type defining activation layers

  • norm – name or type defining normalization layers

  • dropout (Optional[float]) – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout

  • bias (bool) – boolean stating if convolution layers should have a bias component

  • last_act – name defining the last activation layer

Critic

class monai.networks.nets.Critic(in_shape, channels, strides, kernel_size=3, num_res_units=2, act='PRELU', norm='INSTANCE', dropout=0.25, bias=True)[source]

Defines a critic network from Classifier with a single output value and no final activation. The final layer is nn.Flatten instead of nn.Linear, the final result is computed as the mean over the first dimension. This is meant to be used with Wasserstein GANs.

Parameters
  • in_shape (Sequence[int]) – tuple of integers stating the dimension of the input tensor (minus batch dimension)

  • channels (Sequence[int]) – tuple of integers stating the output channels of each convolutional layer

  • strides (Sequence[int]) – tuple of integers stating the stride (downscale factor) of each convolutional layer

  • kernel_size (Union[Sequence[int], int]) – integer or tuple of integers stating size of convolutional kernels

  • num_res_units (int) – integer stating number of convolutions in residual units, 0 means no residual units

  • act – name or type defining activation layers

  • norm – name or type defining normalization layers

  • dropout (Optional[float]) – optional float value in range [0, 1] stating dropout probability for layers, None for no dropout

  • bias (bool) – boolean stating if convolution layers should have a bias component

Utilities

Utilities and types for defining networks, these depend on PyTorch.

monai.networks.utils.icnr_init(conv, upsample_factor, init=<function kaiming_normal_>)[source]

ICNR initialization for 2D/3D kernels adapted from Aitken et al.,2017 , “Checkerboard artifact free sub-pixel convolution”.

monai.networks.utils.normal_init(m, std=0.02, normal_func=<function normal_>)[source]

Initialize the weight and bias tensors of m’ and its submodules to values from a normal distribution with a stddev of `std’. Weight tensors of convolution and linear modules are initialized with a mean of 0, batch norm modules with a mean of 1. The callable `normal_func’, used to assign values, should have the same arguments as its default normal_(). This can be used with `nn.Module.apply to visit submodules of a network.

Return type

None

monai.networks.utils.normalize_transform(shape, device=None, dtype=None, align_corners=False)[source]

Compute an affine matrix according to the input shape. The transform normalizes the homogeneous image coordinates to the range of [-1, 1].

Parameters
  • shape (Sequence[int]) – input spatial shape

  • device (Optional[device]) – device on which the returned affine will be allocated.

  • dtype (Optional[dtype]) – data type of the returned affine

  • align_corners (bool) – if True, consider -1 and 1 to refer to the centers of the corner pixels rather than the image corners. See also: https://pytorch.org/docs/stable/nn.functional.html#torch.nn.functional.grid_sample

Return type

Tensor

monai.networks.utils.one_hot(labels, num_classes, dtype=torch.float32, dim=1)[source]

For a tensor labels of dimensions B1[spatial_dims], return a tensor of dimensions BN[spatial_dims] for num_classes N number of classes.

Example

For every value v = labels[b,1,h,w], the value in the result at [b,v,h,w] will be 1 and all others 0. Note that this will include the background label, thus a binary mask should be treated as having 2 classes.

Return type

Tensor

monai.networks.utils.pixelshuffle(x, dimensions, scale_factor)[source]

Apply pixel shuffle to the tensor x with spatial dimensions dimensions and scaling factor scale_factor.

See: Shi et al., 2016, “Real-Time Single Image and Video Super-Resolution Using a nEfficient Sub-Pixel Convolutional Neural Network.”

See: Aitken et al., 2017, “Checkerboard artifact free sub-pixel convolution”.

Parameters
  • x (Tensor) – Input tensor

  • dimensions (int) – number of spatial dimensions, typically 2 or 3 for 2D or 3D

  • scale_factor (int) – factor to rescale the spatial dimensions by, must be >=1

Return type

Tensor

Returns

Reshuffled version of x.

Raises

ValueError – When input channels of x are not divisible by (scale_factor ** dimensions)

monai.networks.utils.predict_segmentation(logits, mutually_exclusive=False, threshold=0.0)[source]

Given the logits from a network, computing the segmentation by thresholding all values above 0 if multi-labels task, computing the argmax along the channel axis if multi-classes task, logits has shape BCHW[D].

Parameters
  • logits (Tensor) – raw data of model output.

  • mutually_exclusive (bool) – if True, logits will be converted into a binary matrix using a combination of argmax, which is suitable for multi-classes task. Defaults to False.

  • threshold (float) – thresholding the prediction values if multi-labels task.

Return type

Tensor

monai.networks.utils.to_norm_affine(affine, src_size, dst_size, align_corners=False)[source]

Given affine defined for coordinates in the pixel space, compute the corresponding affine for the normalized coordinates.

Parameters
Raises
  • TypeError – When affine is not a torch.Tensor.

  • ValueError – When affine is not Nxdxd.

  • ValueError – When src_size or dst_size dimensions differ from affine.

Return type

Tensor