Metrics#
FROC#
- monai.metrics.compute_fp_tp_probs(probs, y_coord, x_coord, evaluation_mask, labels_to_exclude=None, resolution_level=0)[source]#
This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to distinguish true positive and false positive predictions. A true positive prediction is defined when the detection point is within the annotated ground truth region.
- Parameters:
probs – an array with shape (n,) that represents the probabilities of the detections. Where, n is the number of predicted detections.
y_coord – an array with shape (n,) that represents the Y-coordinates of the detections.
x_coord – an array with shape (n,) that represents the X-coordinates of the detections.
evaluation_mask – the ground truth mask for evaluation.
labels_to_exclude – labels in this list will not be counted for metric calculation.
resolution_level – the level at which the evaluation mask is made.
- Returns:
an array that contains the probabilities of the false positive detections. tp_probs: an array that contains the probabilities of the True positive detections. num_targets: the total number of targets (excluding labels_to_exclude) for all images under evaluation.
- Return type:
fp_probs
- monai.metrics.compute_froc_curve_data(fp_probs, tp_probs, num_targets, num_images)[source]#
This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to compute the required data for plotting the Free Response Operating Characteristic (FROC) curve.
- Parameters:
fp_probs – an array that contains the probabilities of the false positive detections for all images under evaluation.
tp_probs – an array that contains the probabilities of the True positive detections for all images under evaluation.
num_targets – the total number of targets (excluding labels_to_exclude) for all images under evaluation.
num_images – the number of images under evaluation.
- monai.metrics.compute_froc_score(fps_per_image, total_sensitivity, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8))[source]#
This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to compute the challenge’s second evaluation metric, which is defined as the average sensitivity at the predefined false positive rates per whole slide image.
- Parameters:
fps_per_image (
ndarray
) – the average number of false positives per image for different thresholds.total_sensitivity (
ndarray
) – sensitivities (true positive rates) for different thresholds.eval_thresholds (
tuple
) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.
- Return type:
Any
Metric#
Variance#
- monai.metrics.compute_variance(y_pred, include_background=True, spatial_map=False, scalar_reduction='mean', threshold=0.0005)[source]#
- Parameters:
y_pred – [N, C, H, W, D] or [N, C, H, W] or [N, C, H] where N is repeats, C is channels and H, W, D stand for Height, Width & Depth
include_background – Whether to include the background of the spatial image or channel 0 of the 1-D vector
spatial_map – Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image dimensions
scalar_reduction – reduction type of the metric, either ‘sum’ or ‘mean’ can be used
threshold – To avoid NaN’s a threshold is used to replace zero’s
- Returns:
A single scalar uncertainty/variance value or the spatial map of uncertainty/variance
- class monai.metrics.VarianceMetric(include_background=True, spatial_map=False, scalar_reduction='sum', threshold=0.0005)[source]#
Compute the Variance of a given T-repeats N-dimensional array/tensor. The primary usage is as an uncertainty based metric for Active Learning.
It can return the spatial variance/uncertainty map based on user choice or a single scalar value via mean/sum of the variance for scoring purposes
- Parameters:
include_background (
bool
) – Whether to include the background of the spatial image or channel 0 of the 1-D vectorspatial_map (
bool
) – Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image dimensionsscalar_reduction (
str
) – reduction type of the metric, either ‘sum’ or ‘mean’ can be usedthreshold (
float
) – To avoid NaN’s a threshold is used to replace zero’s
LabelQualityScore#
- monai.metrics.label_quality_score(y_pred, y, include_background=True, scalar_reduction='mean')[source]#
The assumption is that the DL model makes better predictions than the provided label quality, hence the difference can be treated as a label quality score
- Parameters:
y_pred – Input data of dimension [B, C, H, W, D] or [B, C, H, W] or [B, C, H] where B is Batch-size, C is channels and H, W, D stand for Height, Width & Depth
y – Ground Truth of dimension [B, C, H, W, D] or [B, C, H, W] or [B, C, H] where B is Batch-size, C is channels and H, W, D stand for Height, Width & Depth
include_background – Whether to include the background of the spatial image or channel 0 of the 1-D vector
scalar_reduction – reduction type of the metric, either ‘sum’ or ‘mean’ can be used to retrieve a single scalar value, if set to ‘none’ a spatial map will be returned
- Returns:
A single scalar absolute difference value as score with a reduction based on sum/mean or the spatial map of absolute difference
- class monai.metrics.LabelQualityScore(include_background=True, scalar_reduction='sum')[source]#
The assumption is that the DL model makes better predictions than the provided label quality, hence the difference can be treated as a label quality score
It can be combined with variance/uncertainty for active learning frameworks to factor in the quality of label along with uncertainty :type include_background:
bool
:param include_background: Whether to include the background of the spatial image or channel 0 of the 1-D vector :param spatial_map: Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image :param dimensions: :type scalar_reduction:str
:param scalar_reduction: reduction type of the metric, either ‘sum’ or ‘mean’ can be used
IterationMetric#
- class monai.metrics.IterationMetric[source]#
Base class for metrics computation at the iteration level, that is, on a min-batch of samples usually using the model outcome of one iteration.
__call__ is designed to handle y_pred and y (optional) in torch tensors or a list/tuple of tensors.
Subclasses typically implement the _compute_tensor function for the actual tensor computation logic.
Cumulative#
- class monai.metrics.Cumulative[source]#
Utility class for the typical cumulative computation process based on PyTorch Tensors. It provides interfaces to accumulate values in the local buffers, synchronize buffers across distributed nodes, and aggregate the buffered values.
In multi-processing, PyTorch programs usually distribute data to multiple nodes. Each node runs with a subset of the data, adds values to its local buffers. Calling get_buffer could gather all the results and aggregate can further handle the results to generate the final outcomes.
Users can implement their own aggregate method to handle the results, using get_buffer to get the buffered contents.
Note: the data list should have the same length every time calling add() in a round, it will automatically create buffers according to the length of data list.
Typically, this class is expected to execute the following steps:
from monai.metrics import Cumulative c = Cumulative() c.append(1) # adds a value c.extend([2, 3]) # adds a batch of values c.extend([4, 5, 6]) # adds a batch of values print(c.get_buffer()) # tensor([1, 2, 3, 4, 5, 6]) print(len(c)) # 6 c.reset() print(len(c)) # 0
The following is an example of maintaining two internal buffers:
from monai.metrics import Cumulative c = Cumulative() c.append(1, 2) # adds a value to two buffers respectively c.extend([3, 4], [5, 6]) # adds batches of values print(c.get_buffer()) # [tensor([1, 3, 4]), tensor([2, 5, 6])] print(len(c))
The following is an example of extending with variable length data:
import torch from monai.metrics import Cumulative c = Cumulative() c.extend(torch.zeros((8, 2)), torch.zeros((6, 2))) # adds batches c.append(torch.zeros((2, ))) # adds a value print(c.get_buffer()) # [torch.zeros((9, 2)), torch.zeros((6, 2))] print(len(c))
- __init__()[source]#
Initialize the internal buffers. self._buffers are local buffers, they are not usually used directly. self._sync_buffers are the buffers with all the results across all the nodes.
- abstract aggregate(*args, **kwargs)[source]#
Aggregate final results based on the gathered buffers. This method is expected to use get_buffer to gather the local buffer contents.
- Return type:
Any
- append(*data)[source]#
Add samples to the local cumulative buffers. A buffer will be allocated for each data item. Compared with self.extend, this method adds a single sample (instead of a “batch”) to the local buffers.
- Parameters:
data (
Any
) – each item will be converted into a torch tensor. they will be stacked at the 0-th dim with a new dimension when get_buffer() is called.- Return type:
None
- extend(*data)[source]#
Extend the local buffers with new (“batch-first”) data. A buffer will be allocated for each data item. Compared with self.append, this method adds a “batch” of data to the local buffers.
- Parameters:
data (
Any
) – each item can be a “batch-first” tensor or a list of “channel-first” tensors. they will be concatenated at the 0-th dimension when get_buffer() is called.- Return type:
None
CumulativeIterationMetric#
- class monai.metrics.CumulativeIterationMetric[source]#
Base class of cumulative metric which collects metrics on each mini-batch data at the iteration level.
Typically, it computes some intermediate results for each iteration, adds them to the buffers, then the buffer contents could be gathered and aggregated for the final result when epoch completed. Currently,``Cumulative.aggregate()`` and
IterationMetric._compute_tensor()
are expected to be implemented.For example, MeanDice inherits this class and the usage is as follows:
dice_metric = DiceMetric(include_background=True, reduction="mean") for val_data in val_loader: val_outputs = model(val_data["img"]) val_outputs = [postprocessing_transform(i) for i in decollate_batch(val_outputs)] # compute metric for current iteration dice_metric(y_pred=val_outputs, y=val_data["seg"]) # callable to add metric to the buffer # aggregate the final mean dice result metric = dice_metric.aggregate().item() # reset the status for next computation round dice_metric.reset()
And to load predictions and labels from files, then compute metrics with multi-processing, please refer to: Project-MONAI/tutorials.
LossMetric#
- class monai.metrics.LossMetric(loss_fn, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
A wrapper to make
loss_fn
available as a cumulative metric. That is, the loss values computed from mini-batches can be combined in thereduction
mode across multiple iterations, as a quantitative measurement of a model.Example:
import torch from monai.losses import DiceLoss from monai.metrics import LossMetric dice_loss = DiceLoss(include_background=True) loss_metric = LossMetric(loss_fn=dice_loss) # first iteration y_pred = torch.tensor([[[[1.0, 0.0], [0.0, 1.0]]]]) # shape [batch=1, channel=1, 2, 2] y = torch.tensor([[[[1.0, 0.0], [1.0, 1.0]]]]) # shape [batch=1, channel=1, 2, 2] loss_metric(y_pred, y) # second iteration y_pred = torch.tensor([[[[1.0, 0.0], [0.0, 0.0]]]]) # shape [batch=1, channel=1, 2, 2] y = torch.tensor([[[[1.0, 0.0], [1.0, 1.0]]]]) # shape [batch=1, channel=1, 2, 2] loss_metric(y_pred, y) # aggregate print(loss_metric.aggregate(reduction="none")) # tensor([[0.2000], [0.5000]]) (shape [batch=2, channel=1]) # reset loss_metric.reset() print(loss_metric.aggregate())
- Parameters:
loss_fn – a callable function that takes
y_pred
and optionallyy
as input (in the “batch-first” format), returns a “batch-first” tensor of loss values.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
- aggregate(reduction=None)[source]#
Returns the aggregated loss value across multiple iterations.
- Parameters:
reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.
Mean Dice#
- class monai.metrics.DiceMetric(include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False, ignore_empty=True, num_classes=None, return_with_label=False)[source]#
Compute average Dice score for a set of pairs of prediction-groundtruth segmentations.
It supports both multi-classes and multi-labels tasks. Input y_pred is compared with ground truth y. y_pred is expected to have binarized predictions and y can be single-channel class indices or in the one-hot format. The include_background parameter can be set to
False
to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]), y can also be in the format of B1HW[D].Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
include_background – whether to include Dice computation on the first channel of the predicted output. Defaults to
True
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
ignore_empty – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.
num_classes – number of input channels (always including the background). When this is None,
y_pred.shape[1]
will be used. This option is useful when bothy_pred
andy
are single-channel class indices and the number of classes is not automatically inferred from data.return_with_label – whether to return the metrics with label, only works when reduction is “mean_batch”. If True, use “label_{index}” as the key corresponding to C channels; if ‘include_background’ is True, the index begins at “0”, otherwise at “1”. It can also take a list of label names. The outcome will then be returned as a dictionary.
- aggregate(reduction=None)[source]#
Execute reduction and aggregation logic for the output of compute_dice.
- Parameters:
reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.
- class monai.metrics.DiceHelper(include_background=None, sigmoid=False, softmax=None, activate=False, get_not_nans=True, reduction=MetricReduction.MEAN_BATCH, ignore_empty=True, num_classes=None)[source]#
Compute Dice score between two tensors y_pred and y. y_pred and y can be single-channel class indices or in the one-hot format.
Example:
import torch from monai.metrics import DiceHelper n_classes, batch_size = 5, 16 spatial_shape = (128, 128, 128) y_pred = torch.rand(batch_size, n_classes, *spatial_shape).float() # predictions y = torch.randint(0, n_classes, size=(batch_size, 1, *spatial_shape)).long() # ground truth score, not_nans = DiceHelper(include_background=False, sigmoid=True, softmax=True)(y_pred, y) print(score, not_nans)
- __init__(include_background=None, sigmoid=False, softmax=None, activate=False, get_not_nans=True, reduction=MetricReduction.MEAN_BATCH, ignore_empty=True, num_classes=None)[source]#
- Parameters:
include_background – whether to include the score on the first channel (default to the value of sigmoid, False).
sigmoid – whether
y_pred
are/will be sigmoid activated outputs. If True, thresholding at 0.5 will be performed to get the discrete prediction. Defaults to False.softmax – whether
y_pred
are softmax activated outputs. If True, argmax will be performed to get the discrete prediction. Defaults to the value ofnot sigmoid
.activate – whether to apply sigmoid to
y_pred
ifsigmoid
is True. Defaults to False. This option is only valid whensigmoid
is True.get_not_nans – whether to return the number of not-nan values.
reduction – define mode of reduction to the metrics
ignore_empty – if True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the Union of
y_pred
andy
is empty.num_classes – number of input channels (always including the background). When this is None,
y_pred.shape[1]
will be used. This option is useful when bothy_pred
andy
are single-channel class indices and the number of classes is not automatically inferred from data.
Mean IoU#
- monai.metrics.compute_iou(y_pred, y, include_background=True, ignore_empty=True)[source]#
Computes Intersection over Union (IoU) score metric from a batch of predictions.
- Parameters:
y_pred (
Tensor
) – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.y (
Tensor
) – ground truth to compute mean IoU metric. It must be one-hot format and first dim is batch. The values should be binarized.include_background (
bool
) – whether to include IoU computation on the first channel of the predicted output. Defaults to True.ignore_empty (
bool
) – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.
- Return type:
Tensor
- Returns:
IoU scores per batch and per class, (shape [batch_size, num_classes]).
- Raises:
ValueError – when y_pred and y have different shapes.
- class monai.metrics.MeanIoU(include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False, ignore_empty=True)[source]#
Compute average Intersection over Union (IoU) score between two tensors. It supports both multi-classes and multi-labels tasks. Input y_pred is compared with ground truth y. y_pred is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in
monai.transforms.post
first to achieve binarized values. The include_background parameter can be set toFalse
to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background. y_pred and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]).Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
include_background – whether to include IoU computation on the first channel of the predicted output. Defaults to
True
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
ignore_empty – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.
- aggregate(reduction=None)[source]#
Execute reduction logic for the output of compute_iou.
- Parameters:
reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.
Generalized Dice Score#
- monai.metrics.compute_generalized_dice(y_pred, y, include_background=True, weight_type=Weight.SQUARE)[source]#
Computes the Generalized Dice Score and returns a tensor with its per image values.
- Parameters:
y_pred (torch.Tensor) – binarized segmentation model output. It should be binarized, in one-hot format and in the NCHW[D] format, where N is the batch dimension, C is the channel dimension, and the remaining are the spatial dimensions.
y (torch.Tensor) – binarized ground-truth. It should be binarized, in one-hot format and have the same shape as y_pred.
include_background (bool, optional) – whether to include score computation on the first channel of the predicted output. Defaults to True.
weight_type (Union[Weight, str], optional) – {
"square"
,"simple"
,"uniform"
}. Type of function to transform ground truth volume into a weight factor. Defaults to"square"
.
- Returns:
per batch and per class Generalized Dice Score, i.e., with the shape [batch_size, num_classes].
- Return type:
torch.Tensor
- Raises:
ValueError – if y_pred or y are not PyTorch tensors, if y_pred and y have less than three dimensions, or y_pred and y don’t have the same shape.
- class monai.metrics.GeneralizedDiceScore(include_background=True, reduction=MetricReduction.MEAN_BATCH, weight_type=Weight.SQUARE)[source]#
Compute the Generalized Dice Score metric between tensors, as the complement of the Generalized Dice Loss defined in:
- Sudre, C. et. al. (2017) Generalised Dice overlap as a deep learning
loss function for highly unbalanced segmentations. DLMIA 2017.
The inputs y_pred and y are expected to be one-hot, binarized channel-first or batch-first tensors, i.e., CHW[D] or BCHW[D].
Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
include_background (bool, optional) – whether to include the background class (assumed to be in channel 0), in the score computation. Defaults to True.
reduction (str, optional) – define mode of reduction to the metrics. Available reduction modes: {
"none"
,"mean_batch"
,"sum_batch"
}. Default to"mean_batch"
. If “none”, will not do reduction.weight_type (Union[Weight, str], optional) – {
"square"
,"simple"
,"uniform"
}. Type of function to transform ground truth volume into a weight factor. Defaults to"square"
.
- Raises:
ValueError – when the weight_type is not one of {
"none"
,"mean"
,"sum"
}.
- aggregate(reduction=None)[source]#
Execute reduction logic for the output of compute_generalized_dice.
- Parameters:
reduction (Union[MetricReduction, str, None], optional) – define mode of reduction to the metrics. Available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
}. Defaults to"mean"
. If “none”, will not do reduction.
Area under the ROC curve#
- monai.metrics.compute_roc_auc(y_pred, y, average=Average.MACRO)[source]#
Computes Area Under the Receiver Operating Characteristic Curve (ROC AUC). Referring to: sklearn.metrics.roc_auc_score.
- Parameters:
y_pred – input data to compute, typical classification model output. the first dim must be batch, if multi-classes, it must be in One-Hot format. for example: shape [16] or [16, 1] for a binary data, shape [16, 2] for 2 classes data.
y – ground truth to compute ROC AUC metric, the first dim must be batch. if multi-classes, it must be in One-Hot format. for example: shape [16] or [16, 1] for a binary data, shape [16, 2] for 2 classes data.
average –
{
"macro"
,"weighted"
,"micro"
,"none"
} Type of averaging performed if not binary classification. Defaults to"macro"
."macro"
: calculate metrics for each label, and find their unweighted mean.This does not take label imbalance into account.
"weighted"
: calculate metrics for each label, and find their average,weighted by support (the number of true instances for each label).
"micro"
: calculate metrics globally by considering each element of the labelindicator matrix as a label.
"none"
: the scores for each class are returned.
- Raises:
ValueError – When
y_pred
dimension is not one of [1, 2].ValueError – When
y
dimension is not one of [1, 2].ValueError – When
average
is not one of [“macro”, “weighted”, “micro”, “none”].
Note
ROCAUC expects y to be comprised of 0’s and 1’s. y_pred must be either prob. estimates or confidence values.
- class monai.metrics.ROCAUCMetric(average=Average.MACRO)[source]#
Computes Area Under the Receiver Operating Characteristic Curve (ROC AUC). Referring to: sklearn.metrics.roc_auc_score. The input y_pred and y can be a list of channel-first Tensor or a batch-first Tensor.
Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
average –
{
"macro"
,"weighted"
,"micro"
,"none"
} Type of averaging performed if not binary classification. Defaults to"macro"
."macro"
: calculate metrics for each label, and find their unweighted mean.This does not take label imbalance into account.
"weighted"
: calculate metrics for each label, and find their average,weighted by support (the number of true instances for each label).
"micro"
: calculate metrics globally by considering each element of the labelindicator matrix as a label.
"none"
: the scores for each class are returned.
- aggregate(average=None)[source]#
Typically y_pred and y are stored in the cumulative buffers at each iteration, This function reads the buffers and computes the area under the ROC.
- Parameters:
average – {
"macro"
,"weighted"
,"micro"
,"none"
} Type of averaging performed if not binary classification. Defaults to self.average.
Confusion matrix#
- monai.metrics.get_confusion_matrix(y_pred, y, include_background=True)[source]#
Compute confusion matrix. A tensor with the shape [BC4] will be returned. Where, the third dimension represents the number of true positive, false positive, true negative and false negative values for each channel of each sample within the input batch. Where, B equals to the batch size and C equals to the number of classes that need to be computed.
- Parameters:
y_pred (
Tensor
) – input data to compute. It must be one-hot format and first dim is batch. The values should be binarized.y (
Tensor
) – ground truth to compute the metric. It must be one-hot format and first dim is batch. The values should be binarized.include_background (
bool
) – whether to include metric computation on the first channel of the predicted output. Defaults to True.
- Raises:
ValueError – when y_pred and y have different shapes.
- Return type:
Tensor
- monai.metrics.compute_confusion_matrix_metric(metric_name, confusion_matrix)[source]#
This function is used to compute confusion matrix related metric.
- Parameters:
metric_name (
str
) – ["sensitivity"
,"specificity"
,"precision"
,"negative predictive value"
,"miss rate"
,"fall out"
,"false discovery rate"
,"false omission rate"
,"prevalence threshold"
,"threat score"
,"accuracy"
,"balanced accuracy"
,"f1 score"
,"matthews correlation coefficient"
,"fowlkes mallows index"
,"informedness"
,"markedness"
] Some of the metrics have multiple aliases (as shown in the wikipedia page aforementioned), and you can also input those names instead.confusion_matrix (
Tensor
) – Please see the doc string of the functionget_confusion_matrix
for more details.
- Raises:
ValueError – when the size of the last dimension of confusion_matrix is not 4.
NotImplementedError – when specify a not implemented metric_name.
- Return type:
Tensor
- class monai.metrics.ConfusionMatrixMetric(include_background=True, metric_name='hit_rate', compute_sample=False, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Compute confusion matrix related metrics. This function supports to calculate all metrics mentioned in: Confusion matrix. It can support both multi-classes and multi-labels classification and segmentation tasks. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in
monai.transforms.post
first to achieve binarized values. The include_background parameter can be set toFalse
for an instance to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background.Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
include_background – whether to include metric computation on the first channel of the predicted output. Defaults to True.
metric_name – [
"sensitivity"
,"specificity"
,"precision"
,"negative predictive value"
,"miss rate"
,"fall out"
,"false discovery rate"
,"false omission rate"
,"prevalence threshold"
,"threat score"
,"accuracy"
,"balanced accuracy"
,"f1 score"
,"matthews correlation coefficient"
,"fowlkes mallows index"
,"informedness"
,"markedness"
] Some of the metrics have multiple aliases (as shown in the wikipedia page aforementioned), and you can also input those names instead. Except for input only one metric, multiple metrics are also supported via input a sequence of metric names, such as (“sensitivity”, “precision”, “recall”), ifcompute_sample
isTrue
, multiplef
andnot_nans
will be returned with the same order as input names when calling the class.compute_sample – when reducing, if
True
, each sample’s metric will be computed based on each confusion matrix first. ifFalse
, compute reduction on the confusion matrices first, defaults toFalse
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns [(metric, not_nans), …]. If False, aggregate() returns [metric, …]. Here not_nans count the number of not nans for True Positive, False Positive, True Negative and False Negative. Its shape depends on the shape of the metric, and it has one more dimension with size 4. For example, if the shape of the metric is [3, 3], not_nans has the shape [3, 3, 4].
- aggregate(compute_sample=False, reduction=None)[source]#
Execute reduction for the confusion matrix values.
- Parameters:
compute_sample – when reducing, if
True
, each sample’s metric will be computed based on each confusion matrix first. ifFalse
, compute reduction on the confusion matrices first, defaults toFalse
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.
Hausdorff distance#
- monai.metrics.compute_hausdorff_distance(y_pred, y, include_background=False, distance_metric='euclidean', percentile=None, directed=False, spacing=None)[source]#
Compute the Hausdorff distance.
- Parameters:
y_pred – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.
y – ground truth to compute mean the distance. It must be one-hot format and first dim is batch. The values should be binarized.
include_background – whether to include distance computation on the first channel of the predicted output. Defaults to
False
.distance_metric – : [
"euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
.percentile – an optional float number between 0 and 100. If specified, the corresponding percentile of the Hausdorff Distance rather than the maximum result will be achieved. Defaults to
None
.directed – whether to calculate directed Hausdorff distance. Defaults to
False
.spacing – spacing of pixel (or voxel). This parameter is relevant only if
distance_metric
is set to"euclidean"
. If a single number, isotropic spacing with that value is used for all images in the batch. If a sequence of numbers, the length of the sequence must be equal to the image dimensions. This spacing will be used for all images in the batch. If a sequence of sequences, the length of the outer sequence must be equal to the batch size. If inner sequence has length 1, isotropic spacing with that value is used for all images in the batch, else the inner sequence length must be equal to the image dimensions. IfNone
, spacing of unity is used for all images in batch. Defaults toNone
.
- monai.metrics.compute_percent_hausdorff_distance(edges_pred, edges_gt, distance_metric='euclidean', percentile=None, spacing=None)[source]#
This function is used to compute the directed Hausdorff distance.
- class monai.metrics.HausdorffDistanceMetric(include_background=False, distance_metric='euclidean', percentile=None, directed=False, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Compute Hausdorff Distance between two tensors. It can support both multi-classes and multi-labels tasks. It supports both directed and non-directed Hausdorff distance calculation. In addition, specify the percentile parameter can get the percentile of the distance. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in
monai.transforms.post
first to achieve binarized values. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]). The implementation refers to DeepMind’s implementation.Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
include_background – whether to include distance computation on the first channel of the predicted output. Defaults to
False
.distance_metric – : [
"euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
.percentile – an optional float number between 0 and 100. If specified, the corresponding percentile of the Hausdorff Distance rather than the maximum result will be achieved. Defaults to
None
.directed – whether to calculate directed Hausdorff distance. Defaults to
False
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
- aggregate(reduction=None)[source]#
Execute reduction logic for the output of compute_hausdorff_distance.
- Parameters:
reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.
Average surface distance#
- monai.metrics.compute_average_surface_distance(y_pred, y, include_background=False, symmetric=False, distance_metric='euclidean', spacing=None)[source]#
This function is used to compute the Average Surface Distance from y_pred to y under the default setting. In addition, if sets
symmetric = True
, the average symmetric surface distance between these two inputs will be returned. The implementation refers to DeepMind’s implementation.- Parameters:
y_pred – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.
y – ground truth to compute mean the distance. It must be one-hot format and first dim is batch. The values should be binarized.
include_background – whether to include distance computation on the first channel of the predicted output. Defaults to
False
.symmetric – whether to calculate the symmetric average surface distance between seg_pred and seg_gt. Defaults to
False
.distance_metric – : [
"euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
.spacing – spacing of pixel (or voxel). This parameter is relevant only if
distance_metric
is set to"euclidean"
. If a single number, isotropic spacing with that value is used for all images in the batch. If a sequence of numbers, the length of the sequence must be equal to the image dimensions. This spacing will be used for all images in the batch. If a sequence of sequences, the length of the outer sequence must be equal to the batch size. If inner sequence has length 1, isotropic spacing with that value is used for all images in the batch, else the inner sequence length must be equal to the image dimensions. IfNone
, spacing of unity is used for all images in batch. Defaults toNone
.
- class monai.metrics.SurfaceDistanceMetric(include_background=False, symmetric=False, distance_metric='euclidean', reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Compute Surface Distance between two tensors. It can support both multi-classes and multi-labels tasks. It supports both symmetric and asymmetric surface distance calculation. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in
monai.transforms.post
first to achieve binarized values. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]).Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
include_background – whether to include distance computation on the first channel of the predicted output. Defaults to
False
.symmetric – whether to calculate the symmetric average surface distance between seg_pred and seg_gt. Defaults to
False
.distance_metric – : [
"euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
- aggregate(reduction=None)[source]#
Execute reduction logic for the output of compute_average_surface_distance.
- Parameters:
reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.
Surface dice#
- monai.metrics.compute_surface_dice(y_pred, y, class_thresholds, include_background=False, distance_metric='euclidean', spacing=None, use_subvoxels=False)[source]#
This function computes the (Normalized) Surface Dice (NSD) between the two tensors y_pred (referred to as \(\hat{Y}\)) and y (referred to as \(Y\)). This metric determines which fraction of a segmentation boundary is correctly predicted. A boundary element is considered correctly predicted if the closest distance to the reference boundary is smaller than or equal to the specified threshold related to the acceptable amount of deviation in pixels. The NSD is bounded between 0 and 1.
This implementation supports multi-class tasks with an individual threshold \(\tau_c\) for each class \(c\). The class-specific NSD for batch index \(b\), \(\operatorname {NSD}_{b,c}\), is computed using the function:
(1)#\[\operatorname {NSD}_{b,c} \left(Y_{b,c}, \hat{Y}_{b,c}\right) = \frac{\left|\mathcal{D}_{Y_{b,c}}^{'}\right| + \left| \mathcal{D}_{\hat{Y}_{b,c}}^{'} \right|}{\left|\mathcal{D}_{Y_{b,c}}\right| + \left|\mathcal{D}_{\hat{Y}_{b,c}}\right|}\]with \(\mathcal{D}_{Y_{b,c}}\) and \(\mathcal{D}_{\hat{Y}_{b,c}}\) being two sets of nearest-neighbor distances. \(\mathcal{D}_{Y_{b,c}}\) is computed from the predicted segmentation boundary towards the reference segmentation boundary and vice-versa for \(\mathcal{D}_{\hat{Y}_{b,c}}\). \(\mathcal{D}_{Y_{b,c}}^{'}\) and \(\mathcal{D}_{\hat{Y}_{b,c}}^{'}\) refer to the subsets of distances that are smaller or equal to the acceptable distance \(\tau_c\):
\[\mathcal{D}_{Y_{b,c}}^{'} = \{ d \in \mathcal{D}_{Y_{b,c}} \, | \, d \leq \tau_c \}.\]In the case of a class neither being present in the predicted segmentation, nor in the reference segmentation, a nan value will be returned for this class. In the case of a class being present in only one of predicted segmentation or reference segmentation, the class NSD will be 0.
This implementation is based on https://arxiv.org/abs/2111.05408 and supports 2D and 3D images. The computation of boundaries follows DeepMind’s implementation deepmind/surface-distance when use_subvoxels=True; Otherwise the length of a segmentation boundary is interpreted as the number of its edge pixels.
- Parameters:
y_pred – Predicted segmentation, typically segmentation model output. It must be a one-hot encoded, batch-first tensor [B,C,H,W] or [B,C,H,W,D].
y – Reference segmentation. It must be a one-hot encoded, batch-first tensor [B,C,H,W] or [B,C,H,W,D].
class_thresholds – List of class-specific thresholds. The thresholds relate to the acceptable amount of deviation in the segmentation boundary in pixels. Each threshold needs to be a finite, non-negative number.
include_background – Whether to include the surface dice computation on the first channel of the predicted output. Defaults to
False
.distance_metric – The metric used to compute surface distances. One of [
"euclidean"
,"chessboard"
,"taxicab"
]. Defaults to"euclidean"
.spacing – spacing of pixel (or voxel). This parameter is relevant only if
distance_metric
is set to"euclidean"
. If a single number, isotropic spacing with that value is used for all images in the batch. If a sequence of numbers, the length of the sequence must be equal to the image dimensions. This spacing will be used for all images in the batch. If a sequence of sequences, the length of the outer sequence must be equal to the batch size. If inner sequence has length 1, isotropic spacing with that value is used for all images in the batch, else the inner sequence length must be equal to the image dimensions. IfNone
, spacing of unity is used for all images in batch. Defaults toNone
.use_subvoxels – Whether to use subvoxel distances. Defaults to
False
.
- Raises:
ValueError – If y_pred and/or y are not PyTorch tensors.
ValueError – If y_pred and/or y do not have four dimensions.
ValueError – If y_pred and/or y have different shapes.
ValueError – If y_pred and/or y are not one-hot encoded
ValueError – If the number of channels of y_pred and/or y is different from the number of class thresholds.
ValueError – If any class threshold is not finite.
ValueError – If any class threshold is negative.
- Returns:
Pytorch Tensor of shape [B,C], containing the NSD values \(\operatorname {NSD}_{b,c}\) for each batch index \(b\) and class \(c\).
- class monai.metrics.SurfaceDiceMetric(class_thresholds, include_background=False, distance_metric='euclidean', reduction=MetricReduction.MEAN, get_not_nans=False, use_subvoxels=False)[source]#
Computes the Normalized Surface Dice (NSD) for each batch sample and class of predicted segmentations y_pred and corresponding reference segmentations y according to equation (1). This implementation is based on https://arxiv.org/abs/2111.05408 and supports 2D and 3D images. Be aware that by default (use_subvoxels=False), the computation of boundaries is different from DeepMind’s implementation deepmind/surface-distance. In this implementation, the length/area of a segmentation boundary is interpreted as the number of its edge pixels. In DeepMind’s implementation, the length of a segmentation boundary depends on the local neighborhood (cf. https://arxiv.org/abs/1809.04430). This issue is discussed here: Project-MONAI/MONAI#4103.
The class- and batch sample-wise NSD values can be aggregated with the function aggregate.
Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
class_thresholds – List of class-specific thresholds. The thresholds relate to the acceptable amount of deviation in the segmentation boundary in pixels. Each threshold needs to be a finite, non-negative number.
include_background – Whether to include NSD computation on the first channel of the predicted output. Defaults to
False
.distance_metric – The metric used to compute surface distances. One of [
"euclidean"
,"chessboard"
,"taxicab"
]. Defaults to"euclidean"
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count. Defaults to
False
. not_nans is the number of batch samples for which not all class-specific NSD values were nan values. If set toTrue
, the function aggregate will return both the aggregated NSD and the not_nans count. If set toFalse
, aggregate will only return the aggregated NSD.use_subvoxels – Whether to use subvoxel distances. Defaults to
False
.
- aggregate(reduction=None)[source]#
Aggregates the output of _compute_tensor.
- Parameters:
reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.- Returns:
If get_not_nans is set to
True
, this function returns the aggregated NSD and the not_nans count. If get_not_nans is set toFalse
, this function returns only the aggregated NSD.
PanopticQualityMetric#
- monai.metrics.compute_panoptic_quality(pred, gt, metric_name='pq', remap=True, match_iou_threshold=0.5, smooth_numerator=1e-06, output_confusion_matrix=False)[source]#
Computes Panoptic Quality (PQ). If specifying metric_name to “SQ” or “RQ”, Segmentation Quality (SQ) or Recognition Quality (RQ) will be returned instead.
In addition, if output_confusion_matrix is True, the function will return a tensor with shape 4, which represents the true positive, false positive, false negative and the sum of iou. These four values are used to calculate PQ, and returning them directly enables further calculation over all images.
- Parameters:
pred (
Tensor
) – input data to compute, it must be in the form of HW and have integer type.gt (
Tensor
) – ground truth. It must have the same shape as pred and have integer type.metric_name (
str
) – output metric. The value can be “pq”, “sq” or “rq”.remap (
bool
) – whether to remap pred and gt to ensure contiguous ordering of instance id.match_iou_threshold (
float
) – IOU threshold to determine the pairing between pred and gt. Usually, it should >= 0.5, the pairing between instances of pred and gt are identical. If set match_iou_threshold < 0.5, this function uses Munkres assignment to find the maximal amount of unique pairing.smooth_numerator (
float
) – a small constant added to the numerator to avoid zero.
- Raises:
ValueError – when pred and gt have different shapes.
ValueError – when match_iou_threshold <= 0.0 or > 1.0.
- Return type:
Tensor
- class monai.metrics.PanopticQualityMetric(num_classes, metric_name='pq', reduction=MetricReduction.MEAN_BATCH, match_iou_threshold=0.5, smooth_numerator=1e-06)[source]#
Compute Panoptic Quality between two instance segmentation masks. If specifying metric_name to “SQ” or “RQ”, Segmentation Quality (SQ) or Recognition Quality (RQ) will be returned instead.
Panoptic Quality is a metric used in panoptic segmentation tasks. This task unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). Compared with semantic segmentation, panoptic segmentation distinguish different instances that belong to same class. Compared with instance segmentation, panoptic segmentation does not allow overlap and only one semantic label and one instance id can be assigned to each pixel. Please refer to the following paper for more details: https://openaccess.thecvf.com/content_CVPR_2019/papers/Kirillov_Panoptic_Segmentation_CVPR_2019_paper.pdf
This class also refers to the following implementation: TissueImageAnalytics/CoNIC
- Parameters:
num_classes – number of classes. The number should not count the background.
metric_name – output metric. The value can be “pq”, “sq” or “rq”. Except for input only one metric, multiple metrics are also supported via input a sequence of metric names such as (“pq”, “sq”, “rq”). If input a sequence, a list of results with the same order as the input names will be returned.
reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.match_iou_threshold – IOU threshold to determine the pairing between y_pred and y. Usually, it should >= 0.5, the pairing between instances of y_pred and y are identical. If set match_iou_threshold < 0.5, this function uses Munkres assignment to find the maximal amount of unique pairing.
smooth_numerator – a small constant added to the numerator to avoid zero.
- aggregate(reduction=None)[source]#
Execute reduction logic for the output of compute_panoptic_quality.
- Parameters:
reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to self.reduction. if “none”, will not do reduction.
Mean squared error#
- class monai.metrics.MSEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Compute Mean Squared Error between two tensors using function:
\[\operatorname {MSE}\left(Y, \hat{Y}\right) =\frac {1}{n}\sum _{i=1}^{n}\left(y_i-\hat{y_i} \right)^{2}.\]More info: https://en.wikipedia.org/wiki/Mean_squared_error
Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.
Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).
Mean absolute error#
- class monai.metrics.MAEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Compute Mean Absolute Error between two tensors using function:
\[\operatorname {MAE}\left(Y, \hat{Y}\right) =\frac {1}{n}\sum _{i=1}^{n}\left|y_i-\hat{y_i}\right|.\]More info: https://en.wikipedia.org/wiki/Mean_absolute_error
Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.
Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).
Root mean squared error#
- class monai.metrics.RMSEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Compute Root Mean Squared Error between two tensors using function:
\[\operatorname {RMSE}\left(Y, \hat{Y}\right) ={ \sqrt{ \frac {1}{n}\sum _{i=1}^{n}\left(y_i-\hat{y_i}\right)^2 } } \ = \sqrt {\operatorname{MSE}\left(Y, \hat{Y}\right)}.\]More info: https://en.wikipedia.org/wiki/Root-mean-square_deviation
Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.
Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).
Peak signal to noise ratio#
- class monai.metrics.PSNRMetric(max_val, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Compute Peak Signal To Noise Ratio between two tensors using function:
\[\operatorname{PSNR}\left(Y, \hat{Y}\right) = 20 \cdot \log_{10} \left({\mathit{MAX}}_Y\right) \ -10 \cdot \log_{10}\left(\operatorname{MSE\left(Y, \hat{Y}\right)}\right)\]More info: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
Help taken from: tensorflow/tensorflow line 4139
Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.
Example of the typical execution steps of this metric class follows
monai.metrics.metric.Cumulative
.- Parameters:
max_val – The dynamic range of the images/volumes (i.e., the difference between the maximum and the minimum allowed values e.g. 255 for a uint8 image).
reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).
Structural similarity index measure#
- class monai.metrics.regression.SSIMMetric(spatial_dims, data_range=1.0, kernel_type=KernelType.GAUSSIAN, win_size=11, kernel_sigma=1.5, k1=0.01, k2=0.03, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Computes the Structural Similarity Index Measure (SSIM).
\[\operatorname {SSIM}(x,y) =\frac {(2 \mu_x \mu_y + c_1)(2 \sigma_{xy} + c_2)}{((\mu_x^2 + \ \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)}\]- For more info, visit
- SSIM reference paper:
Wang, Zhou, et al. “Image quality assessment: from error visibility to structural similarity.” IEEE transactions on image processing 13.4 (2004): 600-612.
- Parameters:
spatial_dims – number of spatial dimensions of the input images.
data_range – value range of input images. (usually 1.0 or 255)
kernel_type – type of kernel, can be “gaussian” or “uniform”.
win_size – window size of kernel
kernel_sigma – standard deviation for Gaussian kernel.
k1 – stability constant used in the luminance denominator
k2 – stability constant used in the contrast denominator
reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reductionget_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans)
Multi-scale structural similarity index measure#
- class monai.metrics.MultiScaleSSIMMetric(spatial_dims, data_range=1.0, kernel_type=KernelType.GAUSSIAN, kernel_size=11, kernel_sigma=1.5, k1=0.01, k2=0.03, weights=(0.0448, 0.2856, 0.3001, 0.2363, 0.1333), reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Computes the Multi-Scale Structural Similarity Index Measure (MS-SSIM).
- MS-SSIM reference paper:
Wang, Z., Simoncelli, E.P. and Bovik, A.C., 2003, November. “Multiscale structural similarity for image quality assessment.” In The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, 2003 (Vol. 2, pp. 1398-1402). IEEE
- Parameters:
spatial_dims – number of spatial dimensions of the input images.
data_range – value range of input images. (usually 1.0 or 255)
kernel_type – type of kernel, can be “gaussian” or “uniform”.
kernel_size – size of kernel
kernel_sigma – standard deviation for Gaussian kernel.
k1 – stability constant used in the luminance denominator
k2 – stability constant used in the contrast denominator
weights – parameters for image similarity and contrast sensitivity at different resolution scores.
reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reductionget_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans)
Fréchet Inception Distance#
- monai.metrics.compute_frechet_distance(mu_x, sigma_x, mu_y, sigma_y, epsilon=1e-06)[source]#
The Frechet distance between multivariate normal distributions.
- Return type:
Tensor
- class monai.metrics.FIDMetric[source]#
Frechet Inception Distance (FID). The FID calculates the distance between two distributions of feature vectors. Based on: Heusel M. et al. “Gans trained by a two time-scale update rule converge to a local nash equilibrium.” https://arxiv.org/abs/1706.08500. The inputs for this metric should be two groups of feature vectors (with format (number images, number of features)) extracted from a pretrained network.
Originally, it was proposed to use the activations of the pool_3 layer of an Inception v3 pretrained with Imagenet. However, others networks pretrained on medical datasets can be used as well (for example, RadImageNwt for 2D and MedicalNet for 3D images). If the chosen model output is not a scalar, a global spatia average pooling should be used.
Maximum Mean Discrepancy#
- monai.metrics.compute_mmd(y, y_pred, y_mapping)[source]#
- Parameters:
y – first sample (e.g., the reference image). Its shape is (B,C,W,H) for 2D data and (B,C,W,H,D) for 3D.
y_pred – second sample (e.g., the reconstructed image). It has similar shape as y.
y_mapping – Callable to transform the y tensors before computing the metric.
- class monai.metrics.MMDMetric(y_mapping=None)[source]#
Unbiased Maximum Mean Discrepancy (MMD) is a kernel-based method for measuring the similarity between two distributions. It is a non-negative metric where a smaller value indicates a closer match between the two distributions.
Gretton, A., et al,, 2012. A kernel two-sample test. The Journal of Machine Learning Research, 13(1), pp.723-773.
- Parameters:
y_mapping – Callable to transform the y tensors before computing the metric. It is usually a Gaussian or Laplace filter, but it can be any function that takes a tensor as input and returns a tensor as output such as a feature extractor or an Identity function., e.g. y_mapping = lambda x: x.square().
Cumulative average#
- class monai.metrics.CumulativeAverage[source]#
A utility class to keep track of average values. For example during training/validation loop, we need to accumulate the per-batch metrics and calculate the final average value for the whole dataset. When training in multi-gpu environment, with DistributedDataParallel, it will average across the processes.
Example:
from monai.metrics import CumulativeAverage run_avg = CumulativeAverage() batch_size = 8 for i in range(len(train_set)): ... val = calc_metric(x,y) #some metric value run_avg.append(val, count=batch_size) val_avg = run_avg.aggregate() #average value
- aggregate(to_numpy=True)[source]#
returns the total average value (averaged across processes)
- Parameters:
to_numpy (
bool
) – whether to convert to numpy array. Defaults to True- Return type:
Union
[ndarray
,Tensor
]
- append(val, count=1)[source]#
- Append with a new value, and an optional count. Any data type is supported that is convertable
with torch.as_tensor() e.g. number, list, numpy array, or Tensor.
- Parameters:
val – value (e.g. number, list, numpy array or Tensor) to keep track of
count – count (e.g. number, list, numpy array or Tensor), to update the contribution count
- For example:
# a simple constant tracking avg = CumulativeAverage() avg.append(0.6) avg.append(0.8) print(avg.aggregate()) #prints 0.7
# an array tracking, e.g. metrics from 3 classes avg= CumulativeAverage() avg.append([0.2, 0.4, 0.4]) avg.append([0.4, 0.6, 0.4]) print(avg.aggregate()) #prints [0.3, 0.5. 0.4]
# different contributions / counts avg= CumulativeAverage() avg.append(1, count=4) #avg metric 1 coming from a batch of 4 avg.append(2, count=6) #avg metric 2 coming from a batch of 6 print(avg.aggregate()) #prints 1.6 == (1*4 +2*6)/(4+6)
# different contributions / counts avg= CumulativeAverage() avg.append([0.5, 0.5, 0], count=[1, 1, 0]) # last elements count is zero to ignore it avg.append([0.5, 0.5, 0.5], count=[1, 1, 1]) # print(avg.aggregate()) #prints [0.5, 0.5, 0,5] == ([0.5, 0.5, 0] + [0.5, 0.5, 0.5]) / ([1, 1, 0] + [1, 1, 1])
Metrics reloaded binary#
- class monai.metrics.MetricsReloadedBinary(metric_name, include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#
Wraps the binary pairwise metrics of MetricsReloaded.
- Parameters:
metric_name – Name of a binary metric from the MetricsReloaded package.
include_background – whether to include computation on the first channel of the predicted output. Defaults to
True
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
Example:
import torch from monai.metrics import MetricsReloadedBinary metric_name = "Cohens Kappa" metric = MetricsReloadedBinary(metric_name=metric_name) # first iteration # shape [batch=1, channel=1, 2, 2] y_pred = torch.tensor([[[[1.0, 0.0], [0.0, 1.0]]]]) y = torch.tensor([[[[1.0, 0.0], [1.0, 1.0]]]]) print(metric(y_pred, y)) # second iteration # shape [batch=1, channel=1, 2, 2] y_pred = torch.tensor([[[[1.0, 0.0], [0.0, 0.0]]]]) y = torch.tensor([[[[1.0, 0.0], [1.0, 1.0]]]]) print(metric(y_pred, y)) # aggregate # shape ([batch=2, channel=1]) print(metric.aggregate(reduction="none")) # tensor([[0.5], [0.2]]) # reset metric.reset()
Metrics reloaded categorical#
- class monai.metrics.MetricsReloadedCategorical(metric_name, include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False, smooth_dr=1e-05)[source]#
Wraps the categorical pairwise metrics of MetricsReloaded.
- Parameters:
metric_name – Name of a categorical metric from the MetricsReloaded package.
include_background – whether to include computation on the first channel of the predicted output. Defaults to
True
.reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, will not do reduction.get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
smooth_dr – a small constant added to the denominator to avoid nan. OBS: should be greater than zero.
Example:
import torch from monai.metrics import MetricsReloadedCategorical metric_name = "Weighted Cohens Kappa" metric = MetricsReloadedCategorical(metric_name=metric_name) # first iteration # shape [bach=1, channel=3, 2, 2] y_pred = torch.tensor([[[[0, 0], [0, 1]], [[0, 0], [0, 0]], [[1, 1], [1, 0]]]]) y = torch.tensor([[[[1, 0], [0, 1]], [[0, 1], [0, 0]], [[0, 0], [1, 0]]]]) print(metric(y_pred, y)) # second iteration # shape [batch=1, channel=3, 2, 2] y_pred = torch.tensor([[[[1, 0], [0, 1]], [[0, 1], [1, 0]], [[0, 0], [0, 0]]]]) y = torch.tensor([[[[1, 0], [0, 1]], [[0, 1], [0, 0]], [[0, 0], [1, 0]]]]) print(metric(y_pred, y)) # aggregate # shape ([batch=2, channel=1]) print(metric.aggregate(reduction="none")) # tensor([[0.2727], [0.6000]]) # reset metric.reset()
Utilities#
- monai.metrics.utils.do_metric_reduction(f, reduction=MetricReduction.MEAN)[source]#
This function is to do the metric reduction for calculated not-nan metrics of each sample’s each class. The function also returns not_nans, which counts the number of not nans for the metric.
- Parameters:
f – a tensor that contains the calculated metric scores per batch and per class. The first two dims should be batch and class.
reduction – define the mode to reduce metrics, will only apply reduction on not-nan values, available reduction modes: {
"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}, default to"mean"
. if “none”, return the input f tensor and not_nans.
- Raises:
ValueError – When
reduction
is not one of [“mean”, “sum”, “mean_batch”, “sum_batch”, “mean_channel”, “sum_channel” “none”].
- monai.metrics.utils.get_code_to_measure_table(spacing, device=None)[source]#
returns a table mapping neighbourhood code to the surface area or contour length.
- Parameters:
spacing – a sequence of 2 or 3 numbers, indicating the spacing in the spatial dimensions.
device – device to put the table on.
- monai.metrics.utils.get_mask_edges(seg_pred, seg_gt, label_idx=1, crop=True, spacing=None, always_return_as_numpy=True)[source]#
Compute edges from binary segmentation masks. This function is helpful to further calculate metrics such as Average Surface Distance and Hausdorff Distance. The input images can be binary or labelfield images. If labelfield images are supplied, they are converted to binary images using label_idx.
In order to improve the computing efficiency, before getting the edges, the images can be cropped and only keep the foreground if not specifies
crop = False
.We require that images are the same size, and assume that they occupy the same space (spacing, orientation, etc.).
- Parameters:
seg_pred – the predicted binary or labelfield image.
seg_gt – the actual binary or labelfield image.
label_idx – for labelfield images, convert to binary with seg_pred = seg_pred == label_idx.
crop – crop input images and only keep the foregrounds. In order to maintain two inputs’ shapes, here the bounding box is achieved by
(seg_pred | seg_gt)
which represents the union set of two images. Defaults toTrue
.spacing – the input spacing. If not None, the subvoxel edges and areas will be computed. otherwise scipy’s binary erosion is used to calculate the edges.
always_return_as_numpy – whether to a numpy array regardless of the input type. If False, return the same type as inputs.
- monai.metrics.utils.get_surface_distance(seg_pred, seg_gt, distance_metric='euclidean', spacing=None)[source]#
This function is used to compute the surface distances from seg_pred to seg_gt.
- Parameters:
seg_pred – the edge of the predictions.
seg_gt – the edge of the ground truth.
distance_metric –
: [
"euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
."euclidean"
, uses Exact Euclidean distance transform."chessboard"
, uses chessboard metric in chamfer type of transform."taxicab"
, uses taxicab metric in chamfer type of transform.
spacing – spacing of pixel (or voxel). This parameter is relevant only if
distance_metric
is set to"euclidean"
. Several input options are allowed: (1) If a single number, isotropic spacing with that value is used. (2) If a sequence of numbers, the length of the sequence must be equal to the image dimensions. (3) IfNone
, spacing of unity is used. Defaults toNone
.
Note
If seg_pred or seg_gt is all 0, may result in nan/inf distance.
- monai.metrics.utils.ignore_background(y_pred, y)[source]#
This function is used to remove background (the first channel) for y_pred and y.
- Parameters:
y_pred (~NdarrayTensor) – predictions. As for classification tasks, y_pred should has the shape [BN] where N is larger than 1. As for segmentation tasks, the shape should be [BNHW] or [BNHWD].
y (~NdarrayTensor) – ground truth, the first dim is batch.
- Return type:
tuple
[~NdarrayTensor, ~NdarrayTensor]
- monai.metrics.utils.is_binary_tensor(input, name)[source]#
Determines whether the input tensor is torch binary tensor or not.
- Parameters:
input (torch.Tensor) – tensor to validate.
name (str) – name of the tensor being checked.
- Raises:
ValueError – if input is not a PyTorch Tensor.
Note
A warning message is printed, if the tensor is not binary.
- Return type:
None
- monai.metrics.utils.prepare_spacing(spacing, batch_size, img_dim)[source]#
This function is used to prepare the spacing parameter to include batch dimension for the computation of surface distance, hausdorff distance or surface dice.
An example with batch_size = 4 and img_dim = 3: input spacing = None -> output spacing = [None, None, None, None] input spacing = 0.8 -> output spacing = [0.8, 0.8, 0.8, 0.8] input spacing = [0.8, 0.5, 0.9] -> output spacing = [[0.8, 0.5, 0.9], [0.8, 0.5, 0.9], [0.8, 0.5, 0.9], [0.8, 0.5, 0.9]] input spacing = [0.8, 0.7, 1.2, 0.8] -> output spacing = [0.8, 0.7, 1.2, 0.8] (same as input)
An example with batch_size = 3 and img_dim = 3: input spacing = [0.8, 0.5, 0.9] -> output spacing = [[0.8, 0.5, 0.9], [0.8, 0.5, 0.9], [0.8, 0.5, 0.9], [0.8, 0.5, 0.9]]
- Parameters:
spacing – can be a float, a sequence of length img_dim, or a sequence with length batch_size
img_dim. (that includes floats or sequences of length) –
- Raises:
ValueError – when spacing is a sequence of sequence, where the outer sequence length does not
equal batch_size or inner sequence length does not equal img_dim. –
- Returns:
a sequence with length batch_size that includes integers, floats or sequences of length img_dim.
- Return type:
spacing
- monai.metrics.utils.remap_instance_id(pred, by_size=False)[source]#
This function is used to rename all instance id of pred, so that the id is contiguous. For example: all ids of the input can be [0, 1, 2] rather than [0, 2, 5]. This function is helpful for calculating metrics like Panoptic Quality (PQ). The implementation refers to:
- Parameters:
pred (
Tensor
) – segmentation predictions in the form of torch tensor. Each value of the tensor should be an integer, and represents the prediction of its corresponding instance id.by_size (
bool
) – if True, largest instance will be assigned a smaller id.
- Return type:
Tensor