Loss functions¶
Segmentation Losses¶
DiceLoss¶
- class monai.losses.DiceLoss(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05, batch=False)[source]¶
Compute average Dice loss between two tensors. It can support both multi-classes and multi-labels tasks. The data input (BNHW[D] where N is number of classes) is compared with ground truth target (BNHW[D]).
Note that axis N of input is expected to be logits or probabilities for each class, if passing logits as input, must set sigmoid=True or softmax=True, or specifying other_act. And the same axis of target can be 1 or N (one-hot format).
The smooth_nr and smooth_dr parameters are values added to the intersection and union components of the inter-over-union calculation to smooth results respectively, these values should be small.
The original paper: Milletari, F. et. al. (2016) V-Net: Fully Convolutional Neural Networks forVolumetric Medical Image Segmentation, 3DV, 2016.
- Parameters
include_background (
bool
) – if False, channel index 0 (background category) is excluded from the calculation. if the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background so excluding it in such cases helps convergence.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction.softmax (
bool
) – if True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.
- Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.
- __init__(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05, batch=False)[source]¶
- Parameters
include_background (
bool
) – if False, channel index 0 (background category) is excluded from the calculation. if the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background so excluding it in such cases helps convergence.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction.softmax (
bool
) – if True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.
- Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.
- forward(input, target)[source]¶
- Parameters
input (
Tensor
) – the shape should be BNH[WD], where N is the number of classes.target (
Tensor
) – the shape should be BNH[WD] or B1H[WD], where N is the number of classes.
- Raises
AssertionError – When input and target (after one hot transform if set) have different shapes.
ValueError – When
self.reduction
is not one of [“mean”, “sum”, “none”].
Example
>>> from monai.losses.dice import * # NOQA >>> import torch >>> from monai.losses.dice import DiceLoss >>> B, C, H, W = 7, 5, 3, 2 >>> input = torch.rand(B, C, H, W) >>> target_idx = torch.randint(low=0, high=C - 1, size=(B, H, W)).long() >>> target = one_hot(target_idx[:, None, ...], num_classes=C) >>> self = DiceLoss(reduction='none') >>> loss = self(input, target) >>> assert np.broadcast_shapes(loss.shape, input.shape) == input.shape
- Return type
Tensor
- monai.losses.Dice¶
alias of
monai.losses.dice.DiceLoss
- monai.losses.dice¶
alias of <module ‘monai.losses.dice’ from ‘/home/docs/checkouts/readthedocs.org/user_builds/monai/checkouts/stable/monai/losses/dice.py’>
MaskedDiceLoss¶
- class monai.losses.MaskedDiceLoss(*args, **kwargs)[source]¶
Add an additional masking process before DiceLoss, accept a binary mask ([0, 1]) indicating a region, input and target will be masked by the region: region with mask 1 will keep the original value, region with 0 mask will be converted to 0. Then feed input and target to normal DiceLoss computation. This has the effect of ensuring only the masked region contributes to the loss computation and hence gradient calculation.
Args follow
monai.losses.DiceLoss
.- __init__(*args, **kwargs)[source]¶
Args follow
monai.losses.DiceLoss
.
GeneralizedDiceLoss¶
- class monai.losses.GeneralizedDiceLoss(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, w_type=Weight.SQUARE, reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05, batch=False)[source]¶
Compute the generalised Dice loss defined in:
Sudre, C. et. al. (2017) Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. DLMIA 2017.
- Parameters
include_background (
bool
) – If False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – If True, apply a sigmoid function to the prediction.softmax (
bool
) – If True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.w_type (
Union
[Weight
,str
]) – {"square"
,"simple"
,"uniform"
} Type of function to transform ground truth volume to a weight factor. Defaults to"square"
.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, intersection over union is computed from each item in the batch.
- Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.
- __init__(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, w_type=Weight.SQUARE, reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05, batch=False)[source]¶
- Parameters
include_background (
bool
) – If False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – If True, apply a sigmoid function to the prediction.softmax (
bool
) – If True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.w_type (
Union
[Weight
,str
]) – {"square"
,"simple"
,"uniform"
} Type of function to transform ground truth volume to a weight factor. Defaults to"square"
.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, intersection over union is computed from each item in the batch.
- Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.
- monai.losses.generalized_dice¶
GeneralizedWassersteinDiceLoss¶
- class monai.losses.GeneralizedWassersteinDiceLoss(dist_matrix, weighting_mode='default', reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05)[source]¶
Compute the generalized Wasserstein Dice Loss defined in:
Fidon L. et al. (2017) Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks. BrainLes 2017.
Or its variant (use the option weighting_mode=”GDL”) defined in the Appendix of:
Tilborghs, S. et al. (2020) Comparative study of deep learning methods for the automatic segmentation of lung, lesion and lesion type in CT scans of COVID-19 patients. arXiv preprint arXiv:2007.15546
- Parameters
dist_matrix (
Union
[ndarray
,Tensor
]) – 2d tensor or 2d numpy array; matrix of distances between the classes.classes. (It must have dimension C x C where C is the number of) –
weighting_mode (
str
) –{
"default"
,"GDL"
} Specifies how to weight the class-specific sum of errors. Default to"default"
."default"
: (recommended) use the original weighting method as in:Fidon L. et al. (2017) Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks. BrainLes 2017.
"GDL"
: use a GDL-like weighting method as in the Appendix of:Tilborghs, S. et al. (2020) Comparative study of deep learning methods for the automatic segmentation of lung, lesion and lesion type in CT scans of COVID-19 patients. arXiv preprint arXiv:2007.15546
reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.
- Raises
ValueError – When
dist_matrix
is not a square matrix.
Example
import torch import numpy as np from monai.losses import GeneralizedWassersteinDiceLoss # Example with 3 classes (including the background: label 0). # The distance between the background class (label 0) and the other classes is the maximum, equal to 1. # The distance between class 1 and class 2 is 0.5. dist_mat = np.array([[0.0, 1.0, 1.0], [1.0, 0.0, 0.5], [1.0, 0.5, 0.0]], dtype=np.float32) wass_loss = GeneralizedWassersteinDiceLoss(dist_matrix=dist_mat) pred_score = torch.tensor([[1000, 0, 0], [0, 1000, 0], [0, 0, 1000]], dtype=torch.float32) grnd = torch.tensor([0, 1, 2], dtype=torch.int64) wass_loss(pred_score, grnd) # 0
- __init__(dist_matrix, weighting_mode='default', reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05)[source]¶
- Parameters
dist_matrix (
Union
[ndarray
,Tensor
]) – 2d tensor or 2d numpy array; matrix of distances between the classes.classes. (It must have dimension C x C where C is the number of) –
weighting_mode (
str
) –{
"default"
,"GDL"
} Specifies how to weight the class-specific sum of errors. Default to"default"
."default"
: (recommended) use the original weighting method as in:Fidon L. et al. (2017) Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks. BrainLes 2017.
"GDL"
: use a GDL-like weighting method as in the Appendix of:Tilborghs, S. et al. (2020) Comparative study of deep learning methods for the automatic segmentation of lung, lesion and lesion type in CT scans of COVID-19 patients. arXiv preprint arXiv:2007.15546
reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.
- Raises
ValueError – When
dist_matrix
is not a square matrix.
Example
import torch import numpy as np from monai.losses import GeneralizedWassersteinDiceLoss # Example with 3 classes (including the background: label 0). # The distance between the background class (label 0) and the other classes is the maximum, equal to 1. # The distance between class 1 and class 2 is 0.5. dist_mat = np.array([[0.0, 1.0, 1.0], [1.0, 0.0, 0.5], [1.0, 0.5, 0.0]], dtype=np.float32) wass_loss = GeneralizedWassersteinDiceLoss(dist_matrix=dist_mat) pred_score = torch.tensor([[1000, 0, 0], [0, 1000, 0], [0, 0, 1000]], dtype=torch.float32) grnd = torch.tensor([0, 1, 2], dtype=torch.int64) wass_loss(pred_score, grnd) # 0
- forward(input, target)[source]¶
- Parameters
input (
Tensor
) – the shape should be BNH[WD].target (
Tensor
) – the shape should be BNH[WD].
- Return type
Tensor
- wasserstein_distance_map(flat_proba, flat_target)[source]¶
Compute the voxel-wise Wasserstein distance between the flattened prediction and the flattened labels (ground_truth) with respect to the distance matrix on the label space M. This corresponds to eq. 6 in:
Fidon L. et al. (2017) Generalised Wasserstein Dice Score for Imbalanced Multi-class Segmentation using Holistic Convolutional Networks. BrainLes 2017.
- Parameters
flat_proba (
Tensor
) – the probabilities of input(predicted) tensor.flat_target (
Tensor
) – the target tensor.
- Return type
Tensor
- monai.losses.generalized_wasserstein_dice¶
DiceCELoss¶
- class monai.losses.DiceCELoss(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction='mean', smooth_nr=1e-05, smooth_dr=1e-05, batch=False, ce_weight=None, lambda_dice=1.0, lambda_ce=1.0)[source]¶
Compute both Dice loss and Cross Entropy Loss, and return the weighted sum of these two losses. The details of Dice loss is shown in
monai.losses.DiceLoss
. The details of Cross Entropy Loss is shown intorch.nn.CrossEntropyLoss
. In this implementation, two deprecated parameterssize_average
andreduce
, and the parameterignore_index
are not supported.- Parameters
loss. (reduction is used for both losses and other parameters are only used for dice) –
loss. –
include_background (
bool
) – if False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction, only used by the DiceLoss, don’t need to specify activation function for CrossEntropyLoss.softmax (
bool
) – if True, apply a softmax function to the prediction, only used by the DiceLoss, don’t need to specify activation function for CrossEntropyLoss.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh. only used by the DiceLoss, don’t need to specify activation function for CrossEntropyLoss.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
str
) –{
"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
. The dice loss should as least reduce the spatial dimensions, which is different from cross entropy loss, thus here thenone
option cannot be used."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.ce_weight (
Optional
[Tensor
]) – a rescaling weight given to each class for cross entropy loss. Seetorch.nn.CrossEntropyLoss()
for more information.lambda_dice (
float
) – the trade-off weight value for dice loss. The value should be no less than 0.0. Defaults to 1.0.lambda_ce (
float
) – the trade-off weight value for cross entropy loss. The value should be no less than 0.0. Defaults to 1.0.
- __init__(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction='mean', smooth_nr=1e-05, smooth_dr=1e-05, batch=False, ce_weight=None, lambda_dice=1.0, lambda_ce=1.0)[source]¶
- Parameters
loss. (reduction is used for both losses and other parameters are only used for dice) –
loss. –
include_background (
bool
) – if False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction, only used by the DiceLoss, don’t need to specify activation function for CrossEntropyLoss.softmax (
bool
) – if True, apply a softmax function to the prediction, only used by the DiceLoss, don’t need to specify activation function for CrossEntropyLoss.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh. only used by the DiceLoss, don’t need to specify activation function for CrossEntropyLoss.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
str
) –{
"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
. The dice loss should as least reduce the spatial dimensions, which is different from cross entropy loss, thus here thenone
option cannot be used."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.ce_weight (
Optional
[Tensor
]) – a rescaling weight given to each class for cross entropy loss. Seetorch.nn.CrossEntropyLoss()
for more information.lambda_dice (
float
) – the trade-off weight value for dice loss. The value should be no less than 0.0. Defaults to 1.0.lambda_ce (
float
) – the trade-off weight value for cross entropy loss. The value should be no less than 0.0. Defaults to 1.0.
- ce(input, target)[source]¶
Compute CrossEntropy loss for the input and target. Will remove the channel dim according to PyTorch CrossEntropyLoss: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html?#torch.nn.CrossEntropyLoss.
- forward(input, target)[source]¶
- Parameters
input (
Tensor
) – the shape should be BNH[WD].target (
Tensor
) – the shape should be BNH[WD] or B1H[WD].
- Raises
ValueError – When number of dimensions for input and target are different.
ValueError – When number of channels for target is neither 1 nor the same as input.
- Return type
Tensor
DiceFocalLoss¶
- class monai.losses.DiceFocalLoss(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction='mean', smooth_nr=1e-05, smooth_dr=1e-05, batch=False, gamma=2.0, focal_weight=None, lambda_dice=1.0, lambda_focal=1.0)[source]¶
Compute both Dice loss and Focal Loss, and return the weighted sum of these two losses. The details of Dice loss is shown in
monai.losses.DiceLoss
. The details of Focal Loss is shown inmonai.losses.FocalLoss
.- Parameters
gamma (
float
) –loss. (and other parameters are only used for dice) –
include_background (
bool
) –losses (to_onehot_y``and ``reduction are used for both) –
loss. –
include_background – if False channel index 0 (background category) is excluded from the calculation.
to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction, only used by the DiceLoss, don’t need to specify activation function for FocalLoss.softmax (
bool
) – if True, apply a softmax function to the prediction, only used by the DiceLoss, don’t need to specify activation function for FocalLoss.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh. only used by the DiceLoss, don’t need to specify activation function for FocalLoss.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
str
) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.gamma – value of the exponent gamma in the definition of the Focal loss.
focal_weight (
Union
[Sequence
[float
],float
,int
,Tensor
,None
]) – weights to apply to the voxels of each class. If None no weights are applied. The input can be a single value (same weight for all classes), a sequence of values (the length of the sequence should be the same as the number of classes).lambda_dice (
float
) – the trade-off weight value for dice loss. The value should be no less than 0.0. Defaults to 1.0.lambda_focal (
float
) – the trade-off weight value for focal loss. The value should be no less than 0.0. Defaults to 1.0.
- __init__(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, squared_pred=False, jaccard=False, reduction='mean', smooth_nr=1e-05, smooth_dr=1e-05, batch=False, gamma=2.0, focal_weight=None, lambda_dice=1.0, lambda_focal=1.0)[source]¶
- Parameters
gamma (
float
) –loss. (and other parameters are only used for dice) –
include_background (
bool
) –losses (to_onehot_y``and ``reduction are used for both) –
loss. –
include_background – if False channel index 0 (background category) is excluded from the calculation.
to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – if True, apply a sigmoid function to the prediction, only used by the DiceLoss, don’t need to specify activation function for FocalLoss.softmax (
bool
) – if True, apply a softmax function to the prediction, only used by the DiceLoss, don’t need to specify activation function for FocalLoss.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh. only used by the DiceLoss, don’t need to specify activation function for FocalLoss.squared_pred (
bool
) – use squared versions of targets and predictions in the denominator or not.jaccard (
bool
) – compute Jaccard Index (soft IoU) instead of dice or not.reduction (
str
) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.gamma – value of the exponent gamma in the definition of the Focal loss.
focal_weight (
Union
[Sequence
[float
],float
,int
,Tensor
,None
]) – weights to apply to the voxels of each class. If None no weights are applied. The input can be a single value (same weight for all classes), a sequence of values (the length of the sequence should be the same as the number of classes).lambda_dice (
float
) – the trade-off weight value for dice loss. The value should be no less than 0.0. Defaults to 1.0.lambda_focal (
float
) – the trade-off weight value for focal loss. The value should be no less than 0.0. Defaults to 1.0.
- forward(input, target)[source]¶
- Parameters
input (
Tensor
) – the shape should be BNH[WD]. The input should be the original logits due to the restriction ofmonai.losses.FocalLoss
.target (
Tensor
) – the shape should be BNH[WD] or B1H[WD].
- Raises
ValueError – When number of dimensions for input and target are different.
ValueError – When number of channels for target is neither 1 nor the same as input.
- Return type
Tensor
FocalLoss¶
- class monai.losses.FocalLoss(include_background=True, to_onehot_y=False, gamma=2.0, weight=None, reduction=LossReduction.MEAN)[source]¶
FocalLoss is an extension of BCEWithLogitsLoss that down-weights loss from high confidence correct predictions.
Reimplementation of the Focal Loss (with a build-in sigmoid activation) described in:
“Focal Loss for Dense Object Detection”, T. Lin et al., ICCV 2017
“AnatomyNet: Deep learning for fast and fully automated whole‐volume segmentation of head and neck anatomy”, Zhu et al., Medical Physics 2018
Example
>>> import torch >>> from monai.losses import FocalLoss >>> from torch.nn import BCEWithLogitsLoss >>> shape = B, N, *DIMS = 2, 3, 5, 7, 11 >>> input = torch.rand(*shape) >>> target = torch.rand(*shape) >>> # Demonstrate equivalence to BCE when gamma=0 >>> fl_g0_criterion = FocalLoss(reduction='none', gamma=0) >>> fl_g0_loss = fl_g0_criterion(input, target) >>> bce_criterion = BCEWithLogitsLoss(reduction='none') >>> bce_loss = bce_criterion(input, target) >>> assert torch.allclose(fl_g0_loss, bce_loss) >>> # Demonstrate "focus" by setting gamma > 0. >>> fl_g2_criterion = FocalLoss(reduction='none', gamma=2) >>> fl_g2_loss = fl_g2_criterion(input, target) >>> # Mark easy and hard cases >>> is_easy = (target > 0.7) & (input > 0.7) >>> is_hard = (target > 0.7) & (input < 0.3) >>> easy_loss_g0 = fl_g0_loss[is_easy].mean() >>> hard_loss_g0 = fl_g0_loss[is_hard].mean() >>> easy_loss_g2 = fl_g2_loss[is_easy].mean() >>> hard_loss_g2 = fl_g2_loss[is_hard].mean() >>> # Gamma > 0 causes the loss function to "focus" on the hard >>> # cases. IE, easy cases are downweighted, so hard cases >>> # receive a higher proportion of the loss. >>> hard_to_easy_ratio_g2 = hard_loss_g2 / easy_loss_g2 >>> hard_to_easy_ratio_g0 = hard_loss_g0 / easy_loss_g0 >>> assert hard_to_easy_ratio_g2 > hard_to_easy_ratio_g0
- Parameters
include_background (
bool
) – if False, channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.gamma (
float
) – value of the exponent gamma in the definition of the Focal loss.weight (
Union
[Sequence
[float
],float
,int
,Tensor
,None
]) – weights to apply to the voxels of each class. If None no weights are applied. This corresponds to the weights lpha in [1]. The input can be a single value (same weight for all classes), a sequence of values (the length of the sequence should be the same as the number of classes, if notinclude_background
, the number should not include class 0). The value/values should be no less than 0. Defaults to None.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
Example
>>> import torch >>> from monai.losses import FocalLoss >>> pred = torch.tensor([[1, 0], [0, 1], [1, 0]], dtype=torch.float32) >>> grnd = torch.tensor([[0], [1], [0]], dtype=torch.int64) >>> fl = FocalLoss(to_onehot_y=True) >>> fl(pred, grnd)
- __init__(include_background=True, to_onehot_y=False, gamma=2.0, weight=None, reduction=LossReduction.MEAN)[source]¶
- Parameters
include_background (
bool
) – if False, channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.gamma (
float
) – value of the exponent gamma in the definition of the Focal loss.weight (
Union
[Sequence
[float
],float
,int
,Tensor
,None
]) – weights to apply to the voxels of each class. If None no weights are applied. This corresponds to the weights lpha in [1]. The input can be a single value (same weight for all classes), a sequence of values (the length of the sequence should be the same as the number of classes, if notinclude_background
, the number should not include class 0). The value/values should be no less than 0. Defaults to None.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
Example
>>> import torch >>> from monai.losses import FocalLoss >>> pred = torch.tensor([[1, 0], [0, 1], [1, 0]], dtype=torch.float32) >>> grnd = torch.tensor([[0], [1], [0]], dtype=torch.int64) >>> fl = FocalLoss(to_onehot_y=True) >>> fl(pred, grnd)
- forward(input, target)[source]¶
- Parameters
input (
Tensor
) – the shape should be BNH[WD], where N is the number of classes. The input should be the original logits since it will be transformed by a sigmoid in the forward function.target (
Tensor
) – the shape should be BNH[WD] or B1H[WD], where N is the number of classes.
- Raises
ValueError – When input and target (after one hot transform if set) have different shapes.
ValueError – When
self.reduction
is not one of [“mean”, “sum”, “none”].ValueError – When
self.weight
is a sequence and the length is not equal to the number of classes.ValueError – When
self.weight
is/contains a value that is less than 0.
- Return type
Tensor
TverskyLoss¶
- class monai.losses.TverskyLoss(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, alpha=0.5, beta=0.5, reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05, batch=False)[source]¶
Compute the Tversky loss defined in:
Sadegh et al. (2017) Tversky loss function for image segmentation using 3D fully convolutional deep networks. (https://arxiv.org/abs/1706.05721)
- Parameters
include_background (
bool
) – If False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – If True, apply a sigmoid function to the prediction.softmax (
bool
) – If True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.alpha (
float
) – weight of false positivesbeta (
float
) – weight of false negativesreduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.
- Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.
- __init__(include_background=True, to_onehot_y=False, sigmoid=False, softmax=False, other_act=None, alpha=0.5, beta=0.5, reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05, batch=False)[source]¶
- Parameters
include_background (
bool
) – If False channel index 0 (background category) is excluded from the calculation.to_onehot_y (
bool
) – whether to convert y into the one-hot format. Defaults to False.sigmoid (
bool
) – If True, apply a sigmoid function to the prediction.softmax (
bool
) – If True, apply a softmax function to the prediction.other_act (
Optional
[Callable
]) – if don’t want to use sigmoid or softmax, use other callable function to execute other activation layers, Defaults toNone
. for example: other_act = torch.tanh.alpha (
float
) – weight of false positivesbeta (
float
) – weight of false negativesreduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid zero.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.batch (
bool
) – whether to sum the intersection and union areas over the batch dimension before the dividing. Defaults to False, a Dice loss value is computed independently from each item in the batch before any reduction.
- Raises
TypeError – When
other_act
is not anOptional[Callable]
.ValueError – When more than 1 of [
sigmoid=True
,softmax=True
,other_act is not None
]. Incompatible values.
ContrastiveLoss¶
- class monai.losses.ContrastiveLoss(temperature=0.5, batch_size=1, reduction='sum')[source]¶
Compute the Contrastive loss defined in:
Chen, Ting, et al. “A simple framework for contrastive learning of visual representations.” International conference on machine learning. PMLR, 2020. (http://proceedings.mlr.press/v119/chen20j.html)
- Adapted from:
https://github.com/Sara-Ahmed/SiT/blob/1aacd6adcd39b71efc903d16b4e9095b97dda76f/losses.py#L5
- Parameters
temperature (
float
) – Can be scaled between 0 and 1 for learning from negative samples, ideally set to 0.5.batch_size (
int
) – The number of samples.
- Raises
ValueError – When an input of dimension length > 2 is passed
ValueError – When input and target are of different shapes
Deprecated since version 0.8.0: reduction is no longer supported.
- __init__(temperature=0.5, batch_size=1, reduction='sum')[source]¶
- Parameters
temperature (
float
) – Can be scaled between 0 and 1 for learning from negative samples, ideally set to 0.5.batch_size (
int
) – The number of samples.
- Raises
ValueError – When an input of dimension length > 2 is passed
ValueError – When input and target are of different shapes
Deprecated since version 0.8.0: reduction is no longer supported.
Registration Losses¶
BendingEnergyLoss¶
- class monai.losses.BendingEnergyLoss(reduction=LossReduction.MEAN)[source]¶
Calculate the bending energy based on second-order differentiation of pred using central finite difference.
- Adapted from:
DeepReg (https://github.com/DeepRegNet/DeepReg)
- Parameters
reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
- __init__(reduction=LossReduction.MEAN)[source]¶
- Parameters
reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
LocalNormalizedCrossCorrelationLoss¶
- class monai.losses.LocalNormalizedCrossCorrelationLoss(spatial_dims=3, kernel_size=3, kernel_type='rectangular', reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05, ndim=None)[source]¶
Local squared zero-normalized cross-correlation. The loss is based on a moving kernel/window over the y_true/y_pred, within the window the square of zncc is calculated. The kernel can be a rectangular / triangular / gaussian window. The final loss is the averaged loss over all windows.
- Adapted from:
https://github.com/voxelmorph/voxelmorph/blob/legacy/src/losses.py DeepReg (https://github.com/DeepRegNet/DeepReg)
- Parameters
spatial_dims (
int
) – number of spatial dimensions, {1
,2
,3
}. Defaults to 3.kernel_size (
int
) – kernel spatial size, must be odd.kernel_type (
str
) – {"rectangular"
,"triangular"
,"gaussian"
}. Defaults to"rectangular"
.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid nan.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.
Deprecated since version 0.6.0:
ndim
is deprecated, usespatial_dims
.- __init__(spatial_dims=3, kernel_size=3, kernel_type='rectangular', reduction=LossReduction.MEAN, smooth_nr=1e-05, smooth_dr=1e-05, ndim=None)[source]¶
- Parameters
spatial_dims (
int
) – number of spatial dimensions, {1
,2
,3
}. Defaults to 3.kernel_size (
int
) – kernel spatial size, must be odd.kernel_type (
str
) – {"rectangular"
,"triangular"
,"gaussian"
}. Defaults to"rectangular"
.reduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid nan.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.
Deprecated since version 0.6.0:
ndim
is deprecated, usespatial_dims
.
GlobalMutualInformationLoss¶
- class monai.losses.GlobalMutualInformationLoss(kernel_type='gaussian', num_bins=23, sigma_ratio=0.5, reduction=LossReduction.MEAN, smooth_nr=1e-07, smooth_dr=1e-07)[source]¶
Differentiable global mutual information loss via Parzen windowing method.
- Reference:
https://dspace.mit.edu/handle/1721.1/123142, Section 3.1, equation 3.1-3.5, Algorithm 1
- Parameters
kernel_type (
str
) –{
"gaussian"
,"b-spline"
}"gaussian"
: adapted from DeepReg Reference: https://dspace.mit.edu/handle/1721.1/123142, Section 3.1, equation 3.1-3.5, Algorithm 1."b-spline"
: based on the method of Mattes et al [1,2] and adapted from ITK .. rubric:: References- [1] “Nonrigid multimodality image registration”
D. Mattes, D. R. Haynor, H. Vesselle, T. Lewellen and W. Eubank Medical Imaging 2001: Image Processing, 2001, pp. 1609-1620.
- [2] “PET-CT Image Registration in the Chest Using Free-form Deformations”
D. Mattes, D. R. Haynor, H. Vesselle, T. Lewellen and W. Eubank IEEE Transactions in Medical Imaging. Vol.22, No.1, January 2003. pp.120-128.
num_bins (
int
) – number of bins for intensitysigma_ratio (
float
) – a hyper param for gaussian functionreduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid nan.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.
- __init__(kernel_type='gaussian', num_bins=23, sigma_ratio=0.5, reduction=LossReduction.MEAN, smooth_nr=1e-07, smooth_dr=1e-07)[source]¶
- Parameters
kernel_type (
str
) –{
"gaussian"
,"b-spline"
}"gaussian"
: adapted from DeepReg Reference: https://dspace.mit.edu/handle/1721.1/123142, Section 3.1, equation 3.1-3.5, Algorithm 1."b-spline"
: based on the method of Mattes et al [1,2] and adapted from ITK .. rubric:: References- [1] “Nonrigid multimodality image registration”
D. Mattes, D. R. Haynor, H. Vesselle, T. Lewellen and W. Eubank Medical Imaging 2001: Image Processing, 2001, pp. 1609-1620.
- [2] “PET-CT Image Registration in the Chest Using Free-form Deformations”
D. Mattes, D. R. Haynor, H. Vesselle, T. Lewellen and W. Eubank IEEE Transactions in Medical Imaging. Vol.22, No.1, January 2003. pp.120-128.
num_bins (
int
) – number of bins for intensitysigma_ratio (
float
) – a hyper param for gaussian functionreduction (
Union
[LossReduction
,str
]) –{
"none"
,"mean"
,"sum"
} Specifies the reduction to apply to the output. Defaults to"mean"
."none"
: no reduction will be applied."mean"
: the sum of the output will be divided by the number of elements in the output."sum"
: the output will be summed.
smooth_nr (
float
) – a small constant added to the numerator to avoid nan.smooth_dr (
float
) – a small constant added to the denominator to avoid nan.
- forward(pred, target)[source]¶
- Parameters
pred (
Tensor
) – the shape should be B[NDHW].target (
Tensor
) – the shape should be same as the pred shape.
- Raises
ValueError – When
self.reduction
is not one of [“mean”, “sum”, “none”].- Return type
Tensor
Loss Wrappers¶
MultiScaleLoss¶
- class monai.losses.MultiScaleLoss(loss, scales=None, kernel='gaussian', reduction=LossReduction.MEAN)[source]¶
This is a wrapper class. It smooths the input and target at different scales before passing them into the wrapped loss function.
- Adapted from:
DeepReg (https://github.com/DeepRegNet/DeepReg)
- Parameters
loss (
_Loss
) – loss function to be wrappedscales (
Optional
[List
]) – list of scalars or None, if None, do not apply any scaling.kernel (
str
) – gaussian or cauchy.
- __init__(loss, scales=None, kernel='gaussian', reduction=LossReduction.MEAN)[source]¶
- Parameters
loss (
_Loss
) – loss function to be wrappedscales (
Optional
[List
]) – list of scalars or None, if None, do not apply any scaling.kernel (
str
) – gaussian or cauchy.
- forward(y_true, y_pred)[source]¶
Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.- Return type
Tensor
MaskedLoss¶
- class monai.losses.MaskedLoss(loss, *loss_args, **loss_kwargs)[source]¶
This is a wrapper class for the loss functions. It allows for additional weighting masks to be applied to both input and target.
See also
- Parameters
loss (
Union
[Callable
,_Loss
]) – loss function to be wrapped, this could be a loss class or an instance of a loss class.loss_args – arguments to the loss function’s constructor if loss is a class.
loss_kwargs – keyword arguments to the loss function’s constructor if loss is a class.
- __init__(loss, *loss_args, **loss_kwargs)[source]¶
- Parameters
loss (
Union
[Callable
,_Loss
]) – loss function to be wrapped, this could be a loss class or an instance of a loss class.loss_args – arguments to the loss function’s constructor if loss is a class.
loss_kwargs – keyword arguments to the loss function’s constructor if loss is a class.