Metrics#

FROC#

monai.metrics.compute_fp_tp_probs(probs, y_coord, x_coord, evaluation_mask, labels_to_exclude=None, resolution_level=0)[source]#

This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to distinguish true positive and false positive predictions. A true positive prediction is defined when the detection point is within the annotated ground truth region.

Parameters:
  • probs – an array with shape (n,) that represents the probabilities of the detections. Where, n is the number of predicted detections.

  • y_coord – an array with shape (n,) that represents the Y-coordinates of the detections.

  • x_coord – an array with shape (n,) that represents the X-coordinates of the detections.

  • evaluation_mask – the ground truth mask for evaluation.

  • labels_to_exclude – labels in this list will not be counted for metric calculation.

  • resolution_level – the level at which the evaluation mask is made.

Returns:

an array that contains the probabilities of the false positive detections. tp_probs: an array that contains the probabilities of the True positive detections. num_targets: the total number of targets (excluding labels_to_exclude) for all images under evaluation.

Return type:

fp_probs

monai.metrics.compute_froc_curve_data(fp_probs, tp_probs, num_targets, num_images)[source]#

This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to compute the required data for plotting the Free Response Operating Characteristic (FROC) curve.

Parameters:
  • fp_probs – an array that contains the probabilities of the false positive detections for all images under evaluation.

  • tp_probs – an array that contains the probabilities of the True positive detections for all images under evaluation.

  • num_targets – the total number of targets (excluding labels_to_exclude) for all images under evaluation.

  • num_images – the number of images under evaluation.

monai.metrics.compute_froc_score(fps_per_image, total_sensitivity, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8))[source]#

This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to compute the challenge’s second evaluation metric, which is defined as the average sensitivity at the predefined false positive rates per whole slide image.

Parameters:
  • fps_per_image (ndarray) – the average number of false positives per image for different thresholds.

  • total_sensitivity (ndarray) – sensitivities (true positive rates) for different thresholds.

  • eval_thresholds (tuple) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.

Return type:

Any

Metric#

class monai.metrics.Metric[source]#

Base class for metric computation for evaluating the performance of a model. __call__ is designed to execute the computation.

Variance#

monai.metrics.compute_variance(y_pred, include_background=True, spatial_map=False, scalar_reduction='mean', threshold=0.0005)[source]#
Parameters:
  • y_pred – [N, C, H, W, D] or [N, C, H, W] or [N, C, H] where N is repeats, C is channels and H, W, D stand for Height, Width & Depth

  • include_background – Whether to include the background of the spatial image or channel 0 of the 1-D vector

  • spatial_map – Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image dimensions

  • scalar_reduction – reduction type of the metric, either ‘sum’ or ‘mean’ can be used

  • threshold – To avoid NaN’s a threshold is used to replace zero’s

Returns:

A single scalar uncertainty/variance value or the spatial map of uncertainty/variance

class monai.metrics.VarianceMetric(include_background=True, spatial_map=False, scalar_reduction='sum', threshold=0.0005)[source]#

Compute the Variance of a given T-repeats N-dimensional array/tensor. The primary usage is as an uncertainty based metric for Active Learning.

It can return the spatial variance/uncertainty map based on user choice or a single scalar value via mean/sum of the variance for scoring purposes

Parameters:
  • include_background (bool) – Whether to include the background of the spatial image or channel 0 of the 1-D vector

  • spatial_map (bool) – Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image dimensions

  • scalar_reduction (str) – reduction type of the metric, either ‘sum’ or ‘mean’ can be used

  • threshold (float) – To avoid NaN’s a threshold is used to replace zero’s

LabelQualityScore#

monai.metrics.label_quality_score(y_pred, y, include_background=True, scalar_reduction='mean')[source]#

The assumption is that the DL model makes better predictions than the provided label quality, hence the difference can be treated as a label quality score

Parameters:
  • y_pred – Input data of dimension [B, C, H, W, D] or [B, C, H, W] or [B, C, H] where B is Batch-size, C is channels and H, W, D stand for Height, Width & Depth

  • y – Ground Truth of dimension [B, C, H, W, D] or [B, C, H, W] or [B, C, H] where B is Batch-size, C is channels and H, W, D stand for Height, Width & Depth

  • include_background – Whether to include the background of the spatial image or channel 0 of the 1-D vector

  • scalar_reduction – reduction type of the metric, either ‘sum’ or ‘mean’ can be used to retrieve a single scalar value, if set to ‘none’ a spatial map will be returned

Returns:

A single scalar absolute difference value as score with a reduction based on sum/mean or the spatial map of absolute difference

class monai.metrics.LabelQualityScore(include_background=True, scalar_reduction='sum')[source]#

The assumption is that the DL model makes better predictions than the provided label quality, hence the difference can be treated as a label quality score

It can be combined with variance/uncertainty for active learning frameworks to factor in the quality of label along with uncertainty :type include_background: bool :param include_background: Whether to include the background of the spatial image or channel 0 of the 1-D vector :param spatial_map: Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image :param dimensions: :type scalar_reduction: str :param scalar_reduction: reduction type of the metric, either ‘sum’ or ‘mean’ can be used

IterationMetric#

class monai.metrics.IterationMetric[source]#

Base class for metrics computation at the iteration level, that is, on a min-batch of samples usually using the model outcome of one iteration.

__call__ is designed to handle y_pred and y (optional) in torch tensors or a list/tuple of tensors.

Subclasses typically implement the _compute_tensor function for the actual tensor computation logic.

Cumulative#

class monai.metrics.Cumulative[source]#

Utility class for the typical cumulative computation process based on PyTorch Tensors. It provides interfaces to accumulate values in the local buffers, synchronize buffers across distributed nodes, and aggregate the buffered values.

In multi-processing, PyTorch programs usually distribute data to multiple nodes. Each node runs with a subset of the data, adds values to its local buffers. Calling get_buffer could gather all the results and aggregate can further handle the results to generate the final outcomes.

Users can implement their own aggregate method to handle the results, using get_buffer to get the buffered contents.

Note: the data list should have the same length every time calling add() in a round, it will automatically create buffers according to the length of data list.

Typically, this class is expected to execute the following steps:

from monai.metrics import Cumulative

c = Cumulative()
c.append(1)  # adds a value
c.extend([2, 3])  # adds a batch of values
c.extend([4, 5, 6])  # adds a batch of values
print(c.get_buffer())  # tensor([1, 2, 3, 4, 5, 6])
print(len(c))  # 6
c.reset()
print(len(c))  # 0

The following is an example of maintaining two internal buffers:

from monai.metrics import Cumulative

c = Cumulative()
c.append(1, 2)  # adds a value to two buffers respectively
c.extend([3, 4], [5, 6])  # adds batches of values
print(c.get_buffer())  # [tensor([1, 3, 4]), tensor([2, 5, 6])]
print(len(c))

The following is an example of extending with variable length data:

import torch
from monai.metrics import Cumulative

c = Cumulative()
c.extend(torch.zeros((8, 2)), torch.zeros((6, 2)))  # adds batches
c.append(torch.zeros((2, )))  # adds a value
print(c.get_buffer())  # [torch.zeros((9, 2)), torch.zeros((6, 2))]
print(len(c))
__init__()[source]#

Initialize the internal buffers. self._buffers are local buffers, they are not usually used directly. self._sync_buffers are the buffers with all the results across all the nodes.

abstract aggregate(*args, **kwargs)[source]#

Aggregate final results based on the gathered buffers. This method is expected to use get_buffer to gather the local buffer contents.

Return type:

Any

append(*data)[source]#

Add samples to the local cumulative buffers. A buffer will be allocated for each data item. Compared with self.extend, this method adds a single sample (instead of a “batch”) to the local buffers.

Parameters:

data (Any) – each item will be converted into a torch tensor. they will be stacked at the 0-th dim with a new dimension when get_buffer() is called.

Return type:

None

extend(*data)[source]#

Extend the local buffers with new (“batch-first”) data. A buffer will be allocated for each data item. Compared with self.append, this method adds a “batch” of data to the local buffers.

Parameters:

data (Any) – each item can be a “batch-first” tensor or a list of “channel-first” tensors. they will be concatenated at the 0-th dimension when get_buffer() is called.

Return type:

None

get_buffer()[source]#

Get the synchronized list of buffers. A typical usage is to generate the metrics report based on the raw metric details. Each buffer is a PyTorch Tensor.

reset()[source]#

Reset the buffers for cumulative tensors and the synced results.

CumulativeIterationMetric#

class monai.metrics.CumulativeIterationMetric[source]#

Base class of cumulative metric which collects metrics on each mini-batch data at the iteration level.

Typically, it computes some intermediate results for each iteration, adds them to the buffers, then the buffer contents could be gathered and aggregated for the final result when epoch completed. Currently,``Cumulative.aggregate()`` and IterationMetric._compute_tensor() are expected to be implemented.

For example, MeanDice inherits this class and the usage is as follows:

dice_metric = DiceMetric(include_background=True, reduction="mean")

for val_data in val_loader:
    val_outputs = model(val_data["img"])
    val_outputs = [postprocessing_transform(i) for i in decollate_batch(val_outputs)]
    # compute metric for current iteration
    dice_metric(y_pred=val_outputs, y=val_data["seg"])  # callable to add metric to the buffer

# aggregate the final mean dice result
metric = dice_metric.aggregate().item()

# reset the status for next computation round
dice_metric.reset()

And to load predictions and labels from files, then compute metrics with multi-processing, please refer to: Project-MONAI/tutorials.

LossMetric#

class monai.metrics.LossMetric(loss_fn, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

A wrapper to make loss_fn available as a cumulative metric. That is, the loss values computed from mini-batches can be combined in the reduction mode across multiple iterations, as a quantitative measurement of a model.

Example:

import torch
from monai.losses import DiceLoss
from monai.metrics import LossMetric

dice_loss = DiceLoss(include_background=True)
loss_metric = LossMetric(loss_fn=dice_loss)

# first iteration
y_pred = torch.tensor([[[[1.0, 0.0], [0.0, 1.0]]]])  # shape [batch=1, channel=1, 2, 2]
y = torch.tensor([[[[1.0, 0.0], [1.0, 1.0]]]])  # shape [batch=1, channel=1, 2, 2]
loss_metric(y_pred, y)

# second iteration
y_pred = torch.tensor([[[[1.0, 0.0], [0.0, 0.0]]]])  # shape [batch=1, channel=1, 2, 2]
y = torch.tensor([[[[1.0, 0.0], [1.0, 1.0]]]])  # shape [batch=1, channel=1, 2, 2]
loss_metric(y_pred, y)

# aggregate
print(loss_metric.aggregate(reduction="none"))  # tensor([[0.2000], [0.5000]]) (shape [batch=2, channel=1])

# reset
loss_metric.reset()
print(loss_metric.aggregate())
Parameters:
  • loss_fn – a callable function that takes y_pred and optionally y as input (in the “batch-first” format), returns a “batch-first” tensor of loss values.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

aggregate(reduction=None)[source]#

Returns the aggregated loss value across multiple iterations.

Parameters:

reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Mean Dice#

class monai.metrics.DiceMetric(include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False, ignore_empty=True, num_classes=None, return_with_label=False)[source]#

Compute average Dice score for a set of pairs of prediction-groundtruth segmentations.

It supports both multi-classes and multi-labels tasks. Input y_pred is compared with ground truth y. y_pred is expected to have binarized predictions and y can be single-channel class indices or in the one-hot format. The include_background parameter can be set to False to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]), y can also be in the format of B1HW[D].

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • include_background – whether to include Dice computation on the first channel of the predicted output. Defaults to True.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

  • ignore_empty – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.

  • num_classes – number of input channels (always including the background). When this is None, y_pred.shape[1] will be used. This option is useful when both y_pred and y are single-channel class indices and the number of classes is not automatically inferred from data.

  • return_with_label – whether to return the metrics with label, only works when reduction is “mean_batch”. If True, use “label_{index}” as the key corresponding to C channels; if ‘include_background’ is True, the index begins at “0”, otherwise at “1”. It can also take a list of label names. The outcome will then be returned as a dictionary.

aggregate(reduction=None)[source]#

Execute reduction and aggregation logic for the output of compute_dice.

Parameters:

reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

class monai.metrics.DiceHelper(include_background=None, sigmoid=False, softmax=None, activate=False, get_not_nans=True, reduction=MetricReduction.MEAN_BATCH, ignore_empty=True, num_classes=None)[source]#

Compute Dice score between two tensors y_pred and y. y_pred and y can be single-channel class indices or in the one-hot format.

Example:

import torch
from monai.metrics import DiceHelper

n_classes, batch_size = 5, 16
spatial_shape = (128, 128, 128)

y_pred = torch.rand(batch_size, n_classes, *spatial_shape).float()  # predictions
y = torch.randint(0, n_classes, size=(batch_size, 1, *spatial_shape)).long()  # ground truth

score, not_nans = DiceHelper(include_background=False, sigmoid=True, softmax=True)(y_pred, y)
print(score, not_nans)
__init__(include_background=None, sigmoid=False, softmax=None, activate=False, get_not_nans=True, reduction=MetricReduction.MEAN_BATCH, ignore_empty=True, num_classes=None)[source]#
Parameters:
  • include_background – whether to include the score on the first channel (default to the value of sigmoid, False).

  • sigmoid – whether y_pred are/will be sigmoid activated outputs. If True, thresholding at 0.5 will be performed to get the discrete prediction. Defaults to False.

  • softmax – whether y_pred are softmax activated outputs. If True, argmax will be performed to get the discrete prediction. Defaults to the value of not sigmoid.

  • activate – whether to apply sigmoid to y_pred if sigmoid is True. Defaults to False. This option is only valid when sigmoid is True.

  • get_not_nans – whether to return the number of not-nan values.

  • reduction – define mode of reduction to the metrics

  • ignore_empty – if True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the Union of y_pred and y is empty.

  • num_classes – number of input channels (always including the background). When this is None, y_pred.shape[1] will be used. This option is useful when both y_pred and y are single-channel class indices and the number of classes is not automatically inferred from data.

Mean IoU#

monai.metrics.compute_iou(y_pred, y, include_background=True, ignore_empty=True)[source]#

Computes Intersection over Union (IoU) score metric from a batch of predictions.

Parameters:
  • y_pred (Tensor) – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.

  • y (Tensor) – ground truth to compute mean IoU metric. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background (bool) – whether to include IoU computation on the first channel of the predicted output. Defaults to True.

  • ignore_empty (bool) – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.

Return type:

Tensor

Returns:

IoU scores per batch and per class, (shape [batch_size, num_classes]).

Raises:

ValueError – when y_pred and y have different shapes.

class monai.metrics.MeanIoU(include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False, ignore_empty=True)[source]#

Compute average Intersection over Union (IoU) score between two tensors. It supports both multi-classes and multi-labels tasks. Input y_pred is compared with ground truth y. y_pred is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. The include_background parameter can be set to False to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background. y_pred and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]).

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • include_background – whether to include IoU computation on the first channel of the predicted output. Defaults to True.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

  • ignore_empty – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_iou.

Parameters:

reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Generalized Dice Score#

monai.metrics.compute_generalized_dice(y_pred, y, include_background=True, weight_type=Weight.SQUARE)[source]#

Computes the Generalized Dice Score and returns a tensor with its per image values.

Parameters:
  • y_pred (torch.Tensor) – binarized segmentation model output. It should be binarized, in one-hot format and in the NCHW[D] format, where N is the batch dimension, C is the channel dimension, and the remaining are the spatial dimensions.

  • y (torch.Tensor) – binarized ground-truth. It should be binarized, in one-hot format and have the same shape as y_pred.

  • include_background (bool, optional) – whether to include score computation on the first channel of the predicted output. Defaults to True.

  • weight_type (Union[Weight, str], optional) – {"square", "simple", "uniform"}. Type of function to transform ground truth volume into a weight factor. Defaults to "square".

Returns:

per batch and per class Generalized Dice Score, i.e., with the shape [batch_size, num_classes].

Return type:

torch.Tensor

Raises:

ValueError – if y_pred or y are not PyTorch tensors, if y_pred and y have less than three dimensions, or y_pred and y don’t have the same shape.

class monai.metrics.GeneralizedDiceScore(include_background=True, reduction=MetricReduction.MEAN_BATCH, weight_type=Weight.SQUARE)[source]#

Compute the Generalized Dice Score metric between tensors, as the complement of the Generalized Dice Loss defined in:

Sudre, C. et. al. (2017) Generalised Dice overlap as a deep learning

loss function for highly unbalanced segmentations. DLMIA 2017.

The inputs y_pred and y are expected to be one-hot, binarized channel-first or batch-first tensors, i.e., CHW[D] or BCHW[D].

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • include_background (bool, optional) – whether to include the background class (assumed to be in channel 0), in the score computation. Defaults to True.

  • reduction (str, optional) – define mode of reduction to the metrics. Available reduction modes: {"none", "mean_batch", "sum_batch"}. Default to "mean_batch". If “none”, will not do reduction.

  • weight_type (Union[Weight, str], optional) – {"square", "simple", "uniform"}. Type of function to transform ground truth volume into a weight factor. Defaults to "square".

Raises:

ValueError – when the weight_type is not one of {"none", "mean", "sum"}.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_generalized_dice.

Parameters:

reduction (Union[MetricReduction, str, None], optional) – define mode of reduction to the metrics. Available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch"}. Defaults to "mean". If “none”, will not do reduction.

Area under the ROC curve#

monai.metrics.compute_roc_auc(y_pred, y, average=Average.MACRO)[source]#

Computes Area Under the Receiver Operating Characteristic Curve (ROC AUC). Referring to: sklearn.metrics.roc_auc_score.

Parameters:
  • y_pred – input data to compute, typical classification model output. the first dim must be batch, if multi-classes, it must be in One-Hot format. for example: shape [16] or [16, 1] for a binary data, shape [16, 2] for 2 classes data.

  • y – ground truth to compute ROC AUC metric, the first dim must be batch. if multi-classes, it must be in One-Hot format. for example: shape [16] or [16, 1] for a binary data, shape [16, 2] for 2 classes data.

  • average

    {"macro", "weighted", "micro", "none"} Type of averaging performed if not binary classification. Defaults to "macro".

    • "macro": calculate metrics for each label, and find their unweighted mean.

      This does not take label imbalance into account.

    • "weighted": calculate metrics for each label, and find their average,

      weighted by support (the number of true instances for each label).

    • "micro": calculate metrics globally by considering each element of the label

      indicator matrix as a label.

    • "none": the scores for each class are returned.

Raises:
  • ValueError – When y_pred dimension is not one of [1, 2].

  • ValueError – When y dimension is not one of [1, 2].

  • ValueError – When average is not one of [“macro”, “weighted”, “micro”, “none”].

Note

ROCAUC expects y to be comprised of 0’s and 1’s. y_pred must be either prob. estimates or confidence values.

class monai.metrics.ROCAUCMetric(average=Average.MACRO)[source]#

Computes Area Under the Receiver Operating Characteristic Curve (ROC AUC). Referring to: sklearn.metrics.roc_auc_score. The input y_pred and y can be a list of channel-first Tensor or a batch-first Tensor.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:

average

{"macro", "weighted", "micro", "none"} Type of averaging performed if not binary classification. Defaults to "macro".

  • "macro": calculate metrics for each label, and find their unweighted mean.

    This does not take label imbalance into account.

  • "weighted": calculate metrics for each label, and find their average,

    weighted by support (the number of true instances for each label).

  • "micro": calculate metrics globally by considering each element of the label

    indicator matrix as a label.

  • "none": the scores for each class are returned.

aggregate(average=None)[source]#

Typically y_pred and y are stored in the cumulative buffers at each iteration, This function reads the buffers and computes the area under the ROC.

Parameters:

average – {"macro", "weighted", "micro", "none"} Type of averaging performed if not binary classification. Defaults to self.average.

Confusion matrix#

monai.metrics.get_confusion_matrix(y_pred, y, include_background=True)[source]#

Compute confusion matrix. A tensor with the shape [BC4] will be returned. Where, the third dimension represents the number of true positive, false positive, true negative and false negative values for each channel of each sample within the input batch. Where, B equals to the batch size and C equals to the number of classes that need to be computed.

Parameters:
  • y_pred (Tensor) – input data to compute. It must be one-hot format and first dim is batch. The values should be binarized.

  • y (Tensor) – ground truth to compute the metric. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background (bool) – whether to include metric computation on the first channel of the predicted output. Defaults to True.

Raises:

ValueError – when y_pred and y have different shapes.

Return type:

Tensor

monai.metrics.compute_confusion_matrix_metric(metric_name, confusion_matrix)[source]#

This function is used to compute confusion matrix related metric.

Parameters:
  • metric_name (str) – ["sensitivity", "specificity", "precision", "negative predictive value", "miss rate", "fall out", "false discovery rate", "false omission rate", "prevalence threshold", "threat score", "accuracy", "balanced accuracy", "f1 score", "matthews correlation coefficient", "fowlkes mallows index", "informedness", "markedness"] Some of the metrics have multiple aliases (as shown in the wikipedia page aforementioned), and you can also input those names instead.

  • confusion_matrix (Tensor) – Please see the doc string of the function get_confusion_matrix for more details.

Raises:
  • ValueError – when the size of the last dimension of confusion_matrix is not 4.

  • NotImplementedError – when specify a not implemented metric_name.

Return type:

Tensor

class monai.metrics.ConfusionMatrixMetric(include_background=True, metric_name='hit_rate', compute_sample=False, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute confusion matrix related metrics. This function supports to calculate all metrics mentioned in: Confusion matrix. It can support both multi-classes and multi-labels classification and segmentation tasks. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. The include_background parameter can be set to False for an instance to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • include_background – whether to include metric computation on the first channel of the predicted output. Defaults to True.

  • metric_name – ["sensitivity", "specificity", "precision", "negative predictive value", "miss rate", "fall out", "false discovery rate", "false omission rate", "prevalence threshold", "threat score", "accuracy", "balanced accuracy", "f1 score", "matthews correlation coefficient", "fowlkes mallows index", "informedness", "markedness"] Some of the metrics have multiple aliases (as shown in the wikipedia page aforementioned), and you can also input those names instead. Except for input only one metric, multiple metrics are also supported via input a sequence of metric names, such as (“sensitivity”, “precision”, “recall”), if compute_sample is True, multiple f and not_nans will be returned with the same order as input names when calling the class.

  • compute_sample – when reducing, if True, each sample’s metric will be computed based on each confusion matrix first. if False, compute reduction on the confusion matrices first, defaults to False.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns [(metric, not_nans), …]. If False, aggregate() returns [metric, …]. Here not_nans count the number of not nans for True Positive, False Positive, True Negative and False Negative. Its shape depends on the shape of the metric, and it has one more dimension with size 4. For example, if the shape of the metric is [3, 3], not_nans has the shape [3, 3, 4].

aggregate(compute_sample=False, reduction=None)[source]#

Execute reduction for the confusion matrix values.

Parameters:
  • compute_sample – when reducing, if True, each sample’s metric will be computed based on each confusion matrix first. if False, compute reduction on the confusion matrices first, defaults to False.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Hausdorff distance#

monai.metrics.compute_hausdorff_distance(y_pred, y, include_background=False, distance_metric='euclidean', percentile=None, directed=False, spacing=None)[source]#

Compute the Hausdorff distance.

Parameters:
  • y_pred – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.

  • y – ground truth to compute mean the distance. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background – whether to include distance computation on the first channel of the predicted output. Defaults to False.

  • distance_metric – : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

  • percentile – an optional float number between 0 and 100. If specified, the corresponding percentile of the Hausdorff Distance rather than the maximum result will be achieved. Defaults to None.

  • directed – whether to calculate directed Hausdorff distance. Defaults to False.

  • spacing – spacing of pixel (or voxel). This parameter is relevant only if distance_metric is set to "euclidean". If a single number, isotropic spacing with that value is used for all images in the batch. If a sequence of numbers, the length of the sequence must be equal to the image dimensions. This spacing will be used for all images in the batch. If a sequence of sequences, the length of the outer sequence must be equal to the batch size. If inner sequence has length 1, isotropic spacing with that value is used for all images in the batch, else the inner sequence length must be equal to the image dimensions. If None, spacing of unity is used for all images in batch. Defaults to None.

monai.metrics.compute_percent_hausdorff_distance(edges_pred, edges_gt, distance_metric='euclidean', percentile=None, spacing=None)[source]#

This function is used to compute the directed Hausdorff distance.

class monai.metrics.HausdorffDistanceMetric(include_background=False, distance_metric='euclidean', percentile=None, directed=False, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Hausdorff Distance between two tensors. It can support both multi-classes and multi-labels tasks. It supports both directed and non-directed Hausdorff distance calculation. In addition, specify the percentile parameter can get the percentile of the distance. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]). The implementation refers to DeepMind’s implementation.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • include_background – whether to include distance computation on the first channel of the predicted output. Defaults to False.

  • distance_metric – : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

  • percentile – an optional float number between 0 and 100. If specified, the corresponding percentile of the Hausdorff Distance rather than the maximum result will be achieved. Defaults to None.

  • directed – whether to calculate directed Hausdorff distance. Defaults to False.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_hausdorff_distance.

Parameters:

reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Average surface distance#

monai.metrics.compute_average_surface_distance(y_pred, y, include_background=False, symmetric=False, distance_metric='euclidean', spacing=None)[source]#

This function is used to compute the Average Surface Distance from y_pred to y under the default setting. In addition, if sets symmetric = True, the average symmetric surface distance between these two inputs will be returned. The implementation refers to DeepMind’s implementation.

Parameters:
  • y_pred – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.

  • y – ground truth to compute mean the distance. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background – whether to include distance computation on the first channel of the predicted output. Defaults to False.

  • symmetric – whether to calculate the symmetric average surface distance between seg_pred and seg_gt. Defaults to False.

  • distance_metric – : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

  • spacing – spacing of pixel (or voxel). This parameter is relevant only if distance_metric is set to "euclidean". If a single number, isotropic spacing with that value is used for all images in the batch. If a sequence of numbers, the length of the sequence must be equal to the image dimensions. This spacing will be used for all images in the batch. If a sequence of sequences, the length of the outer sequence must be equal to the batch size. If inner sequence has length 1, isotropic spacing with that value is used for all images in the batch, else the inner sequence length must be equal to the image dimensions. If None, spacing of unity is used for all images in batch. Defaults to None.

class monai.metrics.SurfaceDistanceMetric(include_background=False, symmetric=False, distance_metric='euclidean', reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Surface Distance between two tensors. It can support both multi-classes and multi-labels tasks. It supports both symmetric and asymmetric surface distance calculation. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]).

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • include_background – whether to include distance computation on the first channel of the predicted output. Defaults to False.

  • symmetric – whether to calculate the symmetric average surface distance between seg_pred and seg_gt. Defaults to False.

  • distance_metric – : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_average_surface_distance.

Parameters:

reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Surface dice#

monai.metrics.compute_surface_dice(y_pred, y, class_thresholds, include_background=False, distance_metric='euclidean', spacing=None, use_subvoxels=False)[source]#

This function computes the (Normalized) Surface Dice (NSD) between the two tensors y_pred (referred to as \(\hat{Y}\)) and y (referred to as \(Y\)). This metric determines which fraction of a segmentation boundary is correctly predicted. A boundary element is considered correctly predicted if the closest distance to the reference boundary is smaller than or equal to the specified threshold related to the acceptable amount of deviation in pixels. The NSD is bounded between 0 and 1.

This implementation supports multi-class tasks with an individual threshold \(\tau_c\) for each class \(c\). The class-specific NSD for batch index \(b\), \(\operatorname {NSD}_{b,c}\), is computed using the function:

(1)#\[\operatorname {NSD}_{b,c} \left(Y_{b,c}, \hat{Y}_{b,c}\right) = \frac{\left|\mathcal{D}_{Y_{b,c}}^{'}\right| + \left| \mathcal{D}_{\hat{Y}_{b,c}}^{'} \right|}{\left|\mathcal{D}_{Y_{b,c}}\right| + \left|\mathcal{D}_{\hat{Y}_{b,c}}\right|}\]

with \(\mathcal{D}_{Y_{b,c}}\) and \(\mathcal{D}_{\hat{Y}_{b,c}}\) being two sets of nearest-neighbor distances. \(\mathcal{D}_{Y_{b,c}}\) is computed from the predicted segmentation boundary towards the reference segmentation boundary and vice-versa for \(\mathcal{D}_{\hat{Y}_{b,c}}\). \(\mathcal{D}_{Y_{b,c}}^{'}\) and \(\mathcal{D}_{\hat{Y}_{b,c}}^{'}\) refer to the subsets of distances that are smaller or equal to the acceptable distance \(\tau_c\):

\[\mathcal{D}_{Y_{b,c}}^{'} = \{ d \in \mathcal{D}_{Y_{b,c}} \, | \, d \leq \tau_c \}.\]

In the case of a class neither being present in the predicted segmentation, nor in the reference segmentation, a nan value will be returned for this class. In the case of a class being present in only one of predicted segmentation or reference segmentation, the class NSD will be 0.

This implementation is based on https://arxiv.org/abs/2111.05408 and supports 2D and 3D images. The computation of boundaries follows DeepMind’s implementation deepmind/surface-distance when use_subvoxels=True; Otherwise the length of a segmentation boundary is interpreted as the number of its edge pixels.

Parameters:
  • y_pred – Predicted segmentation, typically segmentation model output. It must be a one-hot encoded, batch-first tensor [B,C,H,W] or [B,C,H,W,D].

  • y – Reference segmentation. It must be a one-hot encoded, batch-first tensor [B,C,H,W] or [B,C,H,W,D].

  • class_thresholds – List of class-specific thresholds. The thresholds relate to the acceptable amount of deviation in the segmentation boundary in pixels. Each threshold needs to be a finite, non-negative number.

  • include_background – Whether to include the surface dice computation on the first channel of the predicted output. Defaults to False.

  • distance_metric – The metric used to compute surface distances. One of ["euclidean", "chessboard", "taxicab"]. Defaults to "euclidean".

  • spacing – spacing of pixel (or voxel). This parameter is relevant only if distance_metric is set to "euclidean". If a single number, isotropic spacing with that value is used for all images in the batch. If a sequence of numbers, the length of the sequence must be equal to the image dimensions. This spacing will be used for all images in the batch. If a sequence of sequences, the length of the outer sequence must be equal to the batch size. If inner sequence has length 1, isotropic spacing with that value is used for all images in the batch, else the inner sequence length must be equal to the image dimensions. If None, spacing of unity is used for all images in batch. Defaults to None.

  • use_subvoxels – Whether to use subvoxel distances. Defaults to False.

Raises:
  • ValueError – If y_pred and/or y are not PyTorch tensors.

  • ValueError – If y_pred and/or y do not have four dimensions.

  • ValueError – If y_pred and/or y have different shapes.

  • ValueError – If y_pred and/or y are not one-hot encoded

  • ValueError – If the number of channels of y_pred and/or y is different from the number of class thresholds.

  • ValueError – If any class threshold is not finite.

  • ValueError – If any class threshold is negative.

Returns:

Pytorch Tensor of shape [B,C], containing the NSD values \(\operatorname {NSD}_{b,c}\) for each batch index \(b\) and class \(c\).

class monai.metrics.SurfaceDiceMetric(class_thresholds, include_background=False, distance_metric='euclidean', reduction=MetricReduction.MEAN, get_not_nans=False, use_subvoxels=False)[source]#

Computes the Normalized Surface Dice (NSD) for each batch sample and class of predicted segmentations y_pred and corresponding reference segmentations y according to equation (1). This implementation is based on https://arxiv.org/abs/2111.05408 and supports 2D and 3D images. Be aware that by default (use_subvoxels=False), the computation of boundaries is different from DeepMind’s implementation deepmind/surface-distance. In this implementation, the length/area of a segmentation boundary is interpreted as the number of its edge pixels. In DeepMind’s implementation, the length of a segmentation boundary depends on the local neighborhood (cf. https://arxiv.org/abs/1809.04430). This issue is discussed here: Project-MONAI/MONAI#4103.

The class- and batch sample-wise NSD values can be aggregated with the function aggregate.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • class_thresholds – List of class-specific thresholds. The thresholds relate to the acceptable amount of deviation in the segmentation boundary in pixels. Each threshold needs to be a finite, non-negative number.

  • include_background – Whether to include NSD computation on the first channel of the predicted output. Defaults to False.

  • distance_metric – The metric used to compute surface distances. One of ["euclidean", "chessboard", "taxicab"]. Defaults to "euclidean".

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count. Defaults to False. not_nans is the number of batch samples for which not all class-specific NSD values were nan values. If set to True, the function aggregate will return both the aggregated NSD and the not_nans count. If set to False, aggregate will only return the aggregated NSD.

  • use_subvoxels – Whether to use subvoxel distances. Defaults to False.

aggregate(reduction=None)[source]#

Aggregates the output of _compute_tensor.

Parameters:

reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Returns:

If get_not_nans is set to True, this function returns the aggregated NSD and the not_nans count. If get_not_nans is set to False, this function returns only the aggregated NSD.

PanopticQualityMetric#

monai.metrics.compute_panoptic_quality(pred, gt, metric_name='pq', remap=True, match_iou_threshold=0.5, smooth_numerator=1e-06, output_confusion_matrix=False)[source]#

Computes Panoptic Quality (PQ). If specifying metric_name to “SQ” or “RQ”, Segmentation Quality (SQ) or Recognition Quality (RQ) will be returned instead.

In addition, if output_confusion_matrix is True, the function will return a tensor with shape 4, which represents the true positive, false positive, false negative and the sum of iou. These four values are used to calculate PQ, and returning them directly enables further calculation over all images.

Parameters:
  • pred (Tensor) – input data to compute, it must be in the form of HW and have integer type.

  • gt (Tensor) – ground truth. It must have the same shape as pred and have integer type.

  • metric_name (str) – output metric. The value can be “pq”, “sq” or “rq”.

  • remap (bool) – whether to remap pred and gt to ensure contiguous ordering of instance id.

  • match_iou_threshold (float) – IOU threshold to determine the pairing between pred and gt. Usually, it should >= 0.5, the pairing between instances of pred and gt are identical. If set match_iou_threshold < 0.5, this function uses Munkres assignment to find the maximal amount of unique pairing.

  • smooth_numerator (float) – a small constant added to the numerator to avoid zero.

Raises:
  • ValueError – when pred and gt have different shapes.

  • ValueError – when match_iou_threshold <= 0.0 or > 1.0.

Return type:

Tensor

class monai.metrics.PanopticQualityMetric(num_classes, metric_name='pq', reduction=MetricReduction.MEAN_BATCH, match_iou_threshold=0.5, smooth_numerator=1e-06)[source]#

Compute Panoptic Quality between two instance segmentation masks. If specifying metric_name to “SQ” or “RQ”, Segmentation Quality (SQ) or Recognition Quality (RQ) will be returned instead.

Panoptic Quality is a metric used in panoptic segmentation tasks. This task unifies the typically distinct tasks of semantic segmentation (assign a class label to each pixel) and instance segmentation (detect and segment each object instance). Compared with semantic segmentation, panoptic segmentation distinguish different instances that belong to same class. Compared with instance segmentation, panoptic segmentation does not allow overlap and only one semantic label and one instance id can be assigned to each pixel. Please refer to the following paper for more details: https://openaccess.thecvf.com/content_CVPR_2019/papers/Kirillov_Panoptic_Segmentation_CVPR_2019_paper.pdf

This class also refers to the following implementation: TissueImageAnalytics/CoNIC

Parameters:
  • num_classes – number of classes. The number should not count the background.

  • metric_name – output metric. The value can be “pq”, “sq” or “rq”. Except for input only one metric, multiple metrics are also supported via input a sequence of metric names such as (“pq”, “sq”, “rq”). If input a sequence, a list of results with the same order as the input names will be returned.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

  • match_iou_threshold – IOU threshold to determine the pairing between y_pred and y. Usually, it should >= 0.5, the pairing between instances of y_pred and y are identical. If set match_iou_threshold < 0.5, this function uses Munkres assignment to find the maximal amount of unique pairing.

  • smooth_numerator – a small constant added to the numerator to avoid zero.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_panoptic_quality.

Parameters:

reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Mean squared error#

class monai.metrics.MSEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Mean Squared Error between two tensors using function:

\[\operatorname {MSE}\left(Y, \hat{Y}\right) =\frac {1}{n}\sum _{i=1}^{n}\left(y_i-\hat{y_i} \right)^{2}.\]

More info: https://en.wikipedia.org/wiki/Mean_squared_error

Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).

Mean absolute error#

class monai.metrics.MAEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Mean Absolute Error between two tensors using function:

\[\operatorname {MAE}\left(Y, \hat{Y}\right) =\frac {1}{n}\sum _{i=1}^{n}\left|y_i-\hat{y_i}\right|.\]

More info: https://en.wikipedia.org/wiki/Mean_absolute_error

Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).

Root mean squared error#

class monai.metrics.RMSEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Root Mean Squared Error between two tensors using function:

\[\operatorname {RMSE}\left(Y, \hat{Y}\right) ={ \sqrt{ \frac {1}{n}\sum _{i=1}^{n}\left(y_i-\hat{y_i}\right)^2 } } \ = \sqrt {\operatorname{MSE}\left(Y, \hat{Y}\right)}.\]

More info: https://en.wikipedia.org/wiki/Root-mean-square_deviation

Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).

Peak signal to noise ratio#

class monai.metrics.PSNRMetric(max_val, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Peak Signal To Noise Ratio between two tensors using function:

\[\operatorname{PSNR}\left(Y, \hat{Y}\right) = 20 \cdot \log_{10} \left({\mathit{MAX}}_Y\right) \ -10 \cdot \log_{10}\left(\operatorname{MSE\left(Y, \hat{Y}\right)}\right)\]

More info: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

Help taken from: tensorflow/tensorflow line 4139

Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters:
  • max_val – The dynamic range of the images/volumes (i.e., the difference between the maximum and the minimum allowed values e.g. 255 for a uint8 image).

  • reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).

Structural similarity index measure#

class monai.metrics.regression.SSIMMetric(spatial_dims, data_range=1.0, kernel_type=KernelType.GAUSSIAN, win_size=11, kernel_sigma=1.5, k1=0.01, k2=0.03, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Computes the Structural Similarity Index Measure (SSIM).

\[\operatorname {SSIM}(x,y) =\frac {(2 \mu_x \mu_y + c_1)(2 \sigma_{xy} + c_2)}{((\mu_x^2 + \ \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)}\]
For more info, visit

https://vicuesoft.com/glossary/term/ssim-ms-ssim/

SSIM reference paper:

Wang, Zhou, et al. “Image quality assessment: from error visibility to structural similarity.” IEEE transactions on image processing 13.4 (2004): 600-612.

Parameters:
  • spatial_dims – number of spatial dimensions of the input images.

  • data_range – value range of input images. (usually 1.0 or 255)

  • kernel_type – type of kernel, can be “gaussian” or “uniform”.

  • win_size – window size of kernel

  • kernel_sigma – standard deviation for Gaussian kernel.

  • k1 – stability constant used in the luminance denominator

  • k2 – stability constant used in the contrast denominator

  • reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans)

Multi-scale structural similarity index measure#

class monai.metrics.MultiScaleSSIMMetric(spatial_dims, data_range=1.0, kernel_type=KernelType.GAUSSIAN, kernel_size=11, kernel_sigma=1.5, k1=0.01, k2=0.03, weights=(0.0448, 0.2856, 0.3001, 0.2363, 0.1333), reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Computes the Multi-Scale Structural Similarity Index Measure (MS-SSIM).

MS-SSIM reference paper:

Wang, Z., Simoncelli, E.P. and Bovik, A.C., 2003, November. “Multiscale structural similarity for image quality assessment.” In The Thirty-Seventh Asilomar Conference on Signals, Systems & Computers, 2003 (Vol. 2, pp. 1398-1402). IEEE

Parameters:
  • spatial_dims – number of spatial dimensions of the input images.

  • data_range – value range of input images. (usually 1.0 or 255)

  • kernel_type – type of kernel, can be “gaussian” or “uniform”.

  • kernel_size – size of kernel

  • kernel_sigma – standard deviation for Gaussian kernel.

  • k1 – stability constant used in the luminance denominator

  • k2 – stability constant used in the contrast denominator

  • weights – parameters for image similarity and contrast sensitivity at different resolution scores.

  • reduction – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans)

Fréchet Inception Distance#

monai.metrics.compute_frechet_distance(mu_x, sigma_x, mu_y, sigma_y, epsilon=1e-06)[source]#

The Frechet distance between multivariate normal distributions.

Return type:

Tensor

class monai.metrics.FIDMetric[source]#

Frechet Inception Distance (FID). The FID calculates the distance between two distributions of feature vectors. Based on: Heusel M. et al. “Gans trained by a two time-scale update rule converge to a local nash equilibrium.” https://arxiv.org/abs/1706.08500. The inputs for this metric should be two groups of feature vectors (with format (number images, number of features)) extracted from a pretrained network.

Originally, it was proposed to use the activations of the pool_3 layer of an Inception v3 pretrained with Imagenet. However, others networks pretrained on medical datasets can be used as well (for example, RadImageNwt for 2D and MedicalNet for 3D images). If the chosen model output is not a scalar, a global spatia average pooling should be used.

Maximum Mean Discrepancy#

monai.metrics.compute_mmd(y, y_pred, y_mapping)[source]#
Parameters:
  • y – first sample (e.g., the reference image). Its shape is (B,C,W,H) for 2D data and (B,C,W,H,D) for 3D.

  • y_pred – second sample (e.g., the reconstructed image). It has similar shape as y.

  • y_mapping – Callable to transform the y tensors before computing the metric.

class monai.metrics.MMDMetric(y_mapping=None)[source]#

Unbiased Maximum Mean Discrepancy (MMD) is a kernel-based method for measuring the similarity between two distributions. It is a non-negative metric where a smaller value indicates a closer match between the two distributions.

Gretton, A., et al,, 2012. A kernel two-sample test. The Journal of Machine Learning Research, 13(1), pp.723-773.

Parameters:

y_mapping – Callable to transform the y tensors before computing the metric. It is usually a Gaussian or Laplace filter, but it can be any function that takes a tensor as input and returns a tensor as output such as a feature extractor or an Identity function., e.g. y_mapping = lambda x: x.square().

Cumulative average#

class monai.metrics.CumulativeAverage[source]#

A utility class to keep track of average values. For example during training/validation loop, we need to accumulate the per-batch metrics and calculate the final average value for the whole dataset. When training in multi-gpu environment, with DistributedDataParallel, it will average across the processes.

Example:

from monai.metrics import CumulativeAverage

run_avg = CumulativeAverage()
batch_size = 8
for i in range(len(train_set)):
    ...
    val = calc_metric(x,y) #some metric value
    run_avg.append(val, count=batch_size)

val_avg = run_avg.aggregate() #average value
aggregate(to_numpy=True)[source]#

returns the total average value (averaged across processes)

Parameters:

to_numpy (bool) – whether to convert to numpy array. Defaults to True

Return type:

Union[ndarray, Tensor]

append(val, count=1)[source]#
Append with a new value, and an optional count. Any data type is supported that is convertable

with torch.as_tensor() e.g. number, list, numpy array, or Tensor.

Parameters:
  • val – value (e.g. number, list, numpy array or Tensor) to keep track of

  • count – count (e.g. number, list, numpy array or Tensor), to update the contribution count

For example:

# a simple constant tracking avg = CumulativeAverage() avg.append(0.6) avg.append(0.8) print(avg.aggregate()) #prints 0.7

# an array tracking, e.g. metrics from 3 classes avg= CumulativeAverage() avg.append([0.2, 0.4, 0.4]) avg.append([0.4, 0.6, 0.4]) print(avg.aggregate()) #prints [0.3, 0.5. 0.4]

# different contributions / counts avg= CumulativeAverage() avg.append(1, count=4) #avg metric 1 coming from a batch of 4 avg.append(2, count=6) #avg metric 2 coming from a batch of 6 print(avg.aggregate()) #prints 1.6 == (1*4 +2*6)/(4+6)

# different contributions / counts avg= CumulativeAverage() avg.append([0.5, 0.5, 0], count=[1, 1, 0]) # last elements count is zero to ignore it avg.append([0.5, 0.5, 0.5], count=[1, 1, 1]) # print(avg.aggregate()) #prints [0.5, 0.5, 0,5] == ([0.5, 0.5, 0] + [0.5, 0.5, 0.5]) / ([1, 1, 0] + [1, 1, 1])

get_current(to_numpy=True)[source]#

returns the most recent value (averaged across processes)

Parameters:

to_numpy (bool) – whether to convert to numpy array. Defaults to True

Return type:

Union[ndarray, Tensor]

reset()[source]#

Reset all stats

Return type:

None

Metrics reloaded binary#

class monai.metrics.MetricsReloadedBinary(metric_name, include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Wraps the binary pairwise metrics of MetricsReloaded.

Parameters:
  • metric_name – Name of a binary metric from the MetricsReloaded package.

  • include_background – whether to include computation on the first channel of the predicted output. Defaults to True.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

Example:

import torch
from monai.metrics import MetricsReloadedBinary

metric_name = "Cohens Kappa"
metric = MetricsReloadedBinary(metric_name=metric_name)

# first iteration
# shape [batch=1, channel=1, 2, 2]
y_pred = torch.tensor([[[[1.0, 0.0], [0.0, 1.0]]]])
y = torch.tensor([[[[1.0, 0.0], [1.0, 1.0]]]])
print(metric(y_pred, y))

# second iteration
# shape [batch=1, channel=1, 2, 2]
y_pred = torch.tensor([[[[1.0, 0.0], [0.0, 0.0]]]])
y = torch.tensor([[[[1.0, 0.0], [1.0, 1.0]]]])
print(metric(y_pred, y))

# aggregate
# shape ([batch=2, channel=1])
print(metric.aggregate(reduction="none"))  # tensor([[0.5], [0.2]])

# reset
metric.reset()

Metrics reloaded categorical#

class monai.metrics.MetricsReloadedCategorical(metric_name, include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False, smooth_dr=1e-05)[source]#

Wraps the categorical pairwise metrics of MetricsReloaded.

Parameters:
  • metric_name – Name of a categorical metric from the MetricsReloaded package.

  • include_background – whether to include computation on the first channel of the predicted output. Defaults to True.

  • reduction – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

  • smooth_dr – a small constant added to the denominator to avoid nan. OBS: should be greater than zero.

Example:

import torch
from monai.metrics import MetricsReloadedCategorical

metric_name = "Weighted Cohens Kappa"
metric = MetricsReloadedCategorical(metric_name=metric_name)

# first iteration
# shape [bach=1, channel=3, 2, 2]
y_pred = torch.tensor([[[[0, 0], [0, 1]], [[0, 0], [0, 0]], [[1, 1], [1, 0]]]])
y = torch.tensor([[[[1, 0], [0, 1]], [[0, 1], [0, 0]], [[0, 0], [1, 0]]]])
print(metric(y_pred, y))

# second iteration
# shape [batch=1, channel=3, 2, 2]
y_pred = torch.tensor([[[[1, 0], [0, 1]], [[0, 1], [1, 0]], [[0, 0], [0, 0]]]])
y = torch.tensor([[[[1, 0], [0, 1]], [[0, 1], [0, 0]], [[0, 0], [1, 0]]]])
print(metric(y_pred, y))

# aggregate
# shape ([batch=2, channel=1])
print(metric.aggregate(reduction="none"))  # tensor([[0.2727], [0.6000]])

# reset
metric.reset()

Utilities#

monai.metrics.utils.do_metric_reduction(f, reduction=MetricReduction.MEAN)[source]#

This function is to do the metric reduction for calculated not-nan metrics of each sample’s each class. The function also returns not_nans, which counts the number of not nans for the metric.

Parameters:
  • f – a tensor that contains the calculated metric scores per batch and per class. The first two dims should be batch and class.

  • reduction – define the mode to reduce metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, return the input f tensor and not_nans.

Raises:

ValueError – When reduction is not one of [“mean”, “sum”, “mean_batch”, “sum_batch”, “mean_channel”, “sum_channel” “none”].

monai.metrics.utils.get_code_to_measure_table(spacing, device=None)[source]#

returns a table mapping neighbourhood code to the surface area or contour length.

Parameters:
  • spacing – a sequence of 2 or 3 numbers, indicating the spacing in the spatial dimensions.

  • device – device to put the table on.

monai.metrics.utils.get_mask_edges(seg_pred, seg_gt, label_idx=1, crop=True, spacing=None, always_return_as_numpy=True)[source]#

Compute edges from binary segmentation masks. This function is helpful to further calculate metrics such as Average Surface Distance and Hausdorff Distance. The input images can be binary or labelfield images. If labelfield images are supplied, they are converted to binary images using label_idx.

In order to improve the computing efficiency, before getting the edges, the images can be cropped and only keep the foreground if not specifies crop = False.

We require that images are the same size, and assume that they occupy the same space (spacing, orientation, etc.).

Parameters:
  • seg_pred – the predicted binary or labelfield image.

  • seg_gt – the actual binary or labelfield image.

  • label_idx – for labelfield images, convert to binary with seg_pred = seg_pred == label_idx.

  • crop – crop input images and only keep the foregrounds. In order to maintain two inputs’ shapes, here the bounding box is achieved by (seg_pred | seg_gt) which represents the union set of two images. Defaults to True.

  • spacing – the input spacing. If not None, the subvoxel edges and areas will be computed. otherwise scipy’s binary erosion is used to calculate the edges.

  • always_return_as_numpy – whether to a numpy array regardless of the input type. If False, return the same type as inputs.

monai.metrics.utils.get_surface_distance(seg_pred, seg_gt, distance_metric='euclidean', spacing=None)[source]#

This function is used to compute the surface distances from seg_pred to seg_gt.

Parameters:
  • seg_pred – the edge of the predictions.

  • seg_gt – the edge of the ground truth.

  • distance_metric

    : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

    • "euclidean", uses Exact Euclidean distance transform.

    • "chessboard", uses chessboard metric in chamfer type of transform.

    • "taxicab", uses taxicab metric in chamfer type of transform.

  • spacing – spacing of pixel (or voxel). This parameter is relevant only if distance_metric is set to "euclidean". Several input options are allowed: (1) If a single number, isotropic spacing with that value is used. (2) If a sequence of numbers, the length of the sequence must be equal to the image dimensions. (3) If None, spacing of unity is used. Defaults to None.

Note

If seg_pred or seg_gt is all 0, may result in nan/inf distance.

monai.metrics.utils.ignore_background(y_pred, y)[source]#

This function is used to remove background (the first channel) for y_pred and y.

Parameters:
  • y_pred (~NdarrayTensor) – predictions. As for classification tasks, y_pred should has the shape [BN] where N is larger than 1. As for segmentation tasks, the shape should be [BNHW] or [BNHWD].

  • y (~NdarrayTensor) – ground truth, the first dim is batch.

Return type:

tuple[~NdarrayTensor, ~NdarrayTensor]

monai.metrics.utils.is_binary_tensor(input, name)[source]#

Determines whether the input tensor is torch binary tensor or not.

Parameters:
  • input (torch.Tensor) – tensor to validate.

  • name (str) – name of the tensor being checked.

Raises:

ValueError – if input is not a PyTorch Tensor.

Note

A warning message is printed, if the tensor is not binary.

Return type:

None

monai.metrics.utils.prepare_spacing(spacing, batch_size, img_dim)[source]#

This function is used to prepare the spacing parameter to include batch dimension for the computation of surface distance, hausdorff distance or surface dice.

An example with batch_size = 4 and img_dim = 3: input spacing = None -> output spacing = [None, None, None, None] input spacing = 0.8 -> output spacing = [0.8, 0.8, 0.8, 0.8] input spacing = [0.8, 0.5, 0.9] -> output spacing = [[0.8, 0.5, 0.9], [0.8, 0.5, 0.9], [0.8, 0.5, 0.9], [0.8, 0.5, 0.9]] input spacing = [0.8, 0.7, 1.2, 0.8] -> output spacing = [0.8, 0.7, 1.2, 0.8] (same as input)

An example with batch_size = 3 and img_dim = 3: input spacing = [0.8, 0.5, 0.9] -> output spacing = [[0.8, 0.5, 0.9], [0.8, 0.5, 0.9], [0.8, 0.5, 0.9], [0.8, 0.5, 0.9]]

Parameters:
  • spacing – can be a float, a sequence of length img_dim, or a sequence with length batch_size

  • img_dim. (that includes floats or sequences of length) –

Raises:
  • ValueError – when spacing is a sequence of sequence, where the outer sequence length does not

  • equal batch_size or inner sequence length does not equal img_dim.

Returns:

a sequence with length batch_size that includes integers, floats or sequences of length img_dim.

Return type:

spacing

monai.metrics.utils.remap_instance_id(pred, by_size=False)[source]#

This function is used to rename all instance id of pred, so that the id is contiguous. For example: all ids of the input can be [0, 1, 2] rather than [0, 2, 5]. This function is helpful for calculating metrics like Panoptic Quality (PQ). The implementation refers to:

vqdang/hover_net

Parameters:
  • pred (Tensor) – segmentation predictions in the form of torch tensor. Each value of the tensor should be an integer, and represents the prediction of its corresponding instance id.

  • by_size (bool) – if True, largest instance will be assigned a smaller id.

Return type:

Tensor