Loss functions¶
Segmentation Losses¶
DiceLoss¶

class
monai.losses.
DiceLoss
(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction=<LossReduction.MEAN: 'mean'>, smooth_nr=1e05, smooth_dr=1e05, batch=False)[source]¶ Compute average Dice loss between two tensors. It can support both multiclasses and multilabels tasks. Input logits input (BNHW[D] where N is number of classes) is compared with ground truth target (BNHW[D]). Axis N of input is expected to have logit predictions for each class rather than being image channels, while the same axis of target can be 1 or N (onehot format). The smooth_nr and smooth_dr parameters are values added to the intersection and union components of the interoverunion calculation to smooth results respectively, these values should be small. The include_background class attribute can be set to False for an instance of DiceLoss to exclude the first category (channel index 0) which is by convention assumed to be background. If the nonbackground segmentations are small compared to the total image size they can get overwhelmed by the signal from the background so excluding it in such cases helps convergence.
Milletari, F. et. al. (2016) VNet: Fully Convolutional Neural Networks forVolumetric Medical Image Segmentation, 3DV, 2016.
 Parameters
include_background (
bool
) – if False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the onehot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction.softmax (
bool
) – if True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.
 Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.

monai.losses.
Dice
¶ alias of
monai.losses.dice.DiceLoss

monai.losses.
dice
¶ alias of
monai.losses.dice.DiceLoss
MaskedDiceLoss¶

class
monai.losses.
MaskedDiceLoss
(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction=<LossReduction.MEAN: 'mean'>, smooth_nr=1e05, smooth_dr=1e05, batch=False)[source]¶ Add an additional masking process before DiceLoss, accept a binary mask ([0, 1]) indicating a region, input and target will be masked by the region: region with mask 1 will keep the original value, region with 0 mask will be converted to 0. Then feed input and target to normal DiceLoss computation. This has the effect of ensuring only the masked region contributes to the loss computation and hence gradient calculation.
 Parameters
include_background (
bool
) – if False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the onehot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction.softmax (
bool
) – if True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.
 Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.
GeneralizedDiceLoss¶

class
monai.losses.
GeneralizedDiceLoss
(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, w_type=<Weight.SQUARE: 'square'>, reduction=<LossReduction.MEAN: 'mean'>, smooth_nr=1e05, smooth_dr=1e05, batch=False)[source]¶ Compute the generalised Dice loss defined in:
Sudre, C. et. al. (2017) Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. DLMIA 2017.
 Parameters
include_background (
bool
) – If False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the onehot format. Defaults to False.sigmoid (
bool
) – If True, apply a sigmoid function to the prediction.softmax (
bool
) – If True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.squared_pred – use squared versions of targets and predictions in the denominator or not.
w_type (
Union
[Weight
,str
]) – {"square"
,"simple"
,"uniform"
} Type of function to transform ground truth volume to a weight factor. Defaults to"square"
.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, intersection over union is computed from each item in the batch.
 Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.

monai.losses.
generalized_dice
¶ alias of
monai.losses.dice.GeneralizedDiceLoss
GeneralizedWassersteinDiceLoss¶

class
monai.losses.
GeneralizedWassersteinDiceLoss
(dist_matrix, weighting_mode='default', reduction=<LossReduction.MEAN: 'mean'>, smooth_nr=1e05, smooth_dr=1e05)[source]¶ Compute the generalized Wasserstein Dice Loss defined in:
Fidon L. et al. (2017) Generalised Wasserstein Dice Score for Imbalanced Multiclass Segmentation using Holistic Convolutional Networks. BrainLes 2017.
Or its variant (use the option weighting_mode=”GDL”) defined in the Appendix of:
Tilborghs, S. et al. (2020) Comparative study of deep learning methods for the automatic segmentation of lung, lesion and lesion type in CT scans of COVID19 patients. arXiv preprint arXiv:2007.15546
 Parameters
dist_matrix (
Union
[ndarray
,Tensor
]) – 2d tensor or 2d numpy array; matrix of distances between the classes.must have dimension C x C where C is the number of classes. (It) –
weighting_mode (
str
) –{
"default"
,"GDL"
} Specifies how to weight the classspecific sum of errors. Default to"default"
."default"
: (recommended) use the original weighting method as in:Fidon L. et al. (2017) Generalised Wasserstein Dice Score for Imbalanced Multiclass Segmentation using Holistic Convolutional Networks. BrainLes 2017.
"GDL"
: use a GDLlike weighting method as in the Appendix of:Tilborghs, S. et al. (2020) Comparative study of deep learning methods for the automatic segmentation of lung, lesion and lesion type in CT scans of COVID19 patients. arXiv preprint arXiv:2007.15546
reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.
 Raises
ValueError – When
dist_matrix
is not a square matrix.
Example
import torch import numpy as np from monai.losses import GeneralizedWassersteinDiceLoss # Example with 3 classes (including the background: label 0). # The distance between the background class (label 0) and the other classes is the maximum, equal to 1. # The distance between class 1 and class 2 is 0.5. dist_mat = np.array([[0.0, 1.0, 1.0], [1.0, 0.0, 0.5], [1.0, 0.5, 0.0]], dtype=np.float32) wass_loss = GeneralizedWassersteinDiceLoss(dist_matrix=dist_mat) pred_score = torch.tensor([[1000, 0, 0], [0, 1000, 0], [0, 0, 1000]], dtype=torch.float32) grnd = torch.tensor([0, 1, 2], dtype=torch.int64) wass_loss(pred_score, grnd) # 0

forward
(input, target)[source]¶  Parameters
input (
Tensor
) – the shape should be BNH[WD].target (
Tensor
) – the shape should be BNH[WD].
 Return type
Tensor

wasserstein_distance_map
(flat_proba, flat_target)[source]¶ Compute the voxelwise Wasserstein distance between the flattened prediction and the flattened labels (ground_truth) with respect to the distance matrix on the label space M. This corresponds to eq. 6 in:
Fidon L. et al. (2017) Generalised Wasserstein Dice Score for Imbalanced Multiclass Segmentation using Holistic Convolutional Networks. BrainLes 2017.
 Parameters
flat_proba (
Tensor
) – the probabilities of input(predicted) tensor.flat_target (
Tensor
) – the target tensor.
 Return type
Tensor

monai.losses.
generalized_wasserstein_dice
¶ alias of
monai.losses.dice.GeneralizedWassersteinDiceLoss
DiceCELoss¶

class
monai.losses.
DiceCELoss
(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction='mean', smooth_nr=1e05, smooth_dr=1e05, batch=False, ce_weight=None)[source]¶ Compute both Dice loss and Cross Entropy Loss, and return the sum of these two losses. Input logits input (BNHW[D] where N is number of classes) is compared with ground truth target (BNHW[D]). Axis N of input is expected to have logit predictions for each class rather than being image channels, while the same axis of target can be 1 or N (onehot format). The smooth_nr and smooth_dr parameters are values added for dice loss part to the intersection and union components of the interoverunion calculation to smooth results respectively, these values should be small. The include_background class attribute can be set to False for an instance of the loss to exclude the first category (channel index 0) which is by convention assumed to be background. If the nonbackground segmentations are small compared to the total image size they can get overwhelmed by the signal from the background so excluding it in such cases helps convergence.
 Parameters
is only used for cross entropy loss (ce_weight) –
is used for both losses and other (reduction) –
are only used for dice loss. (parameters) –
include_background (
bool
) – if False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the onehot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction.softmax (
bool
) – if True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
str
) –{
"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
. The dice loss should as least reduce the spatial dimensions, which is different from cross entropy loss, thus here thenone
option cannot be used."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.ce_weight (
Optional
[Tensor
]) – a rescaling weight given to each class for cross entropy loss. Seetorch.nn.CrossEntropyLoss()
for more information.

forward
(input, target)[source]¶  Parameters
input (
Tensor
) – the shape should be BNH[WD].target (
Tensor
) – the shape should be BNH[WD] or B1H[WD].
 Raises
ValueError – When number of dimensions for input and target are different.
ValueError – When number of channels for target is nither 1 or the same as input.
 Return type
Tensor
FocalLoss¶

class
monai.losses.
FocalLoss
(gamma=2.0, weight=None, reduction=<LossReduction.MEAN: 'mean'>)[source]¶ Reimplementation of the Focal Loss described in:
“Focal Loss for Dense Object Detection”, T. Lin et al., ICCV 2017
“AnatomyNet: Deep learning for fast and fully automated whole‐volume segmentation of head and neck anatomy”, Zhu et al., Medical Physics 2018
 Parameters
gamma (
float
) – value of the exponent gamma in the definition of the Focal loss.weight (
Optional
[Tensor
]) – weights to apply to the voxels of each class. If None no weights are applied. This corresponds to the weights lpha in [1].reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
Example
import torch from monai.losses import FocalLoss pred = torch.tensor([[1, 0], [0, 1], [1, 0]], dtype=torch.float32) grnd = torch.tensor([[0], [1], [0]], dtype=torch.int64) fl = FocalLoss() fl(pred, grnd)

forward
(logits, target)[source]¶  Parameters
logits (
Tensor
) – the shape should be BCH[WD]. where C (greater than 1) is the number of classes. Softmax over the logits is integrated in this module for improved numerical stability.target (
Tensor
) – the shape should be B1H[WD] or BCH[WD]. If the target’s shape is B1H[WD], the target that this loss expects should be a class index in the range [0, C1] where C is the number of classes.
 Raises
ValueError – When
target
ndim differs fromlogits
.ValueError – When
target
channel is not 1 andtarget
shape differs fromlogits
.ValueError – When
self.reduction
is not one of [“mean”, “sum”, “none”].
 Return type
Tensor
TverskyLoss¶

class
monai.losses.
TverskyLoss
(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, alpha=0.5, beta=0.5, reduction=<LossReduction.MEAN: 'mean'>, smooth_nr=1e05, smooth_dr=1e05, batch=False)[source]¶ Compute the Tversky loss defined in:
Sadegh et al. (2017) Tversky loss function for image segmentation using 3D fully convolutional deep networks. (https://arxiv.org/abs/1706.05721)
 Parameters
include_background (
bool
) – If False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the onehot format. Defaults to False.sigmoid (
bool
) – If True, apply a sigmoid function to the prediction.softmax (
bool
) – If True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.alpha (
float
) – weight of false positivesbeta (
float
) – weight of false negativesreduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.
 Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.
Registration Losses¶
BendingEnergyLoss¶

class
monai.losses.
BendingEnergyLoss
(reduction=<LossReduction.MEAN: 'mean'>)[source]¶ Calculate the bending energy based on secondorder differentiation of pred using central finite difference.
 Adapted from:
DeepReg (https://github.com/DeepRegNet/DeepReg)
 Parameters
reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
LocalNormalizedCrossCorrelationLoss¶

class
monai.losses.
LocalNormalizedCrossCorrelationLoss
(in_channels, ndim=3, kernel_size=3, kernel_type='rectangular', reduction=<LossReduction.MEAN: 'mean'>, smooth_nr=1e07, smooth_dr=1e07)[source]¶ Local squared zeronormalized crosscorrelation. The loss is based on a moving kernel/window over the y_true/y_pred, within the window the square of zncc is calculated. The kernel can be a rectangular / triangular / gaussian window. The final loss is the averaged loss over all windows.
 Adapted from:
https://github.com/voxelmorph/voxelmorph/blob/legacy/src/losses.py DeepReg (https://github.com/DeepRegNet/DeepReg)
 Parameters
in_channels (
int
) – number of input channelsndim (
int
) – number of spatial ndimensions, {1
,2
,3
}. Defaults to 3.kernel_size (
int
) – kernel spatial size, must be odd.kernel_type (
str
) – {"rectangular"
,"triangular"
,"gaussian"
}. Defaults to"rectangular"
.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid nan.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.
GlobalMutualInformationLoss¶

class
monai.losses.
GlobalMutualInformationLoss
(num_bins=23, sigma_ratio=0.5, reduction=<LossReduction.MEAN: 'mean'>, smooth_nr=1e07, smooth_dr=1e07)[source]¶ Differentiable global mutual information loss via Parzen windowing method.
 Reference:
https://dspace.mit.edu/handle/1721.1/123142, Section 3.1, equation 3.13.5, Algorithm 1
 Parameters
num_bins (
int
) – number of bins for intensitysigma_ratio (
float
) – a hyper param for gaussian functionreduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid nan.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.