Metrics#

FROC#

monai.metrics.compute_fp_tp_probs(probs, y_coord, x_coord, evaluation_mask, labels_to_exclude=None, resolution_level=0)[source]#

This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to distinguish true positive and false positive predictions. A true positive prediction is defined when the detection point is within the annotated ground truth region.

Parameters
  • probs (Union[ndarray, Tensor]) – an array with shape (n,) that represents the probabilities of the detections. Where, n is the number of predicted detections.

  • y_coord (Union[ndarray, Tensor]) – an array with shape (n,) that represents the Y-coordinates of the detections.

  • x_coord (Union[ndarray, Tensor]) – an array with shape (n,) that represents the X-coordinates of the detections.

  • evaluation_mask (Union[ndarray, Tensor]) – the ground truth mask for evaluation.

  • labels_to_exclude (Optional[List]) – labels in this list will not be counted for metric calculation.

  • resolution_level (int) – the level at which the evaluation mask is made.

Returns

an array that contains the probabilities of the false positive detections. tp_probs: an array that contains the probabilities of the True positive detections. num_targets: the total number of targets (excluding labels_to_exclude) for all images under evaluation.

Return type

fp_probs

monai.metrics.compute_froc_curve_data(fp_probs, tp_probs, num_targets, num_images)[source]#

This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to compute the required data for plotting the Free Response Operating Characteristic (FROC) curve.

Parameters
  • fp_probs (Union[ndarray, Tensor]) – an array that contains the probabilities of the false positive detections for all images under evaluation.

  • tp_probs (Union[ndarray, Tensor]) – an array that contains the probabilities of the True positive detections for all images under evaluation.

  • num_targets (int) – the total number of targets (excluding labels_to_exclude) for all images under evaluation.

  • num_images (int) – the number of images under evaluation.

monai.metrics.compute_froc_score(fps_per_image, total_sensitivity, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8))[source]#

This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to compute the challenge’s second evaluation metric, which is defined as the average sensitivity at the predefined false positive rates per whole slide image.

Parameters
  • fps_per_image (ndarray) – the average number of false positives per image for different thresholds.

  • total_sensitivity (ndarray) – sensitivities (true positive rates) for different thresholds.

  • eval_thresholds (Tuple) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.

Metric#

class monai.metrics.Metric[source]#

Base class for metric computation for evaluating the performance of a model. __call__ is designed to execute the computation.

Variance#

monai.metrics.compute_variance(y_pred, include_background=True, spatial_map=False, scalar_reduction='mean', threshold=0.0005)[source]#
Parameters
  • y_pred (Tensor) – [N, C, H, W, D] or [N, C, H, W] or [N, C, H] where N is repeats, C is channels and H, W, D stand for Height, Width & Depth

  • include_background (bool) – Whether to include the background of the spatial image or channel 0 of the 1-D vector

  • spatial_map (bool) – Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image dimensions

  • scalar_reduction (str) – reduction type of the metric, either ‘sum’ or ‘mean’ can be used

  • threshold (float) – To avoid NaN’s a threshold is used to replace zero’s

Returns

A single scalar uncertainty/variance value or the spatial map of uncertainty/variance

class monai.metrics.VarianceMetric(include_background=True, spatial_map=False, scalar_reduction='sum', threshold=0.0005)[source]#

Compute the Variance of a given T-repeats N-dimensional array/tensor. The primary usage is as a uncertainty based metric for Active Learning.

It can return the spatial variance/uncertainty map based on user choice or a single scalar value via mean/sum of the variance for scoring purposes

Parameters
  • include_background (bool) – Whether to include the background of the spatial image or channel 0 of the 1-D vector

  • spatial_map (bool) – Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image dimensions

  • scalar_reduction (str) – reduction type of the metric, either ‘sum’ or ‘mean’ can be used

  • threshold (float) – To avoid NaN’s a threshold is used to replace zero’s

LabelQualityScore#

monai.metrics.label_quality_score(y_pred, y, include_background=True, scalar_reduction='mean')[source]#

The assumption is that the DL model makes better predictions than the provided label quality, hence the difference can be treated as a label quality score

Parameters
  • y_pred (Tensor) – Input data of dimension [B, C, H, W, D] or [B, C, H, W] or [B, C, H] where B is Batch-size, C is channels and H, W, D stand for Height, Width & Depth

  • y (Tensor) – Ground Truth of dimension [B, C, H, W, D] or [B, C, H, W] or [B, C, H] where B is Batch-size, C is channels and H, W, D stand for Height, Width & Depth

  • include_background (bool) – Whether to include the background of the spatial image or channel 0 of the 1-D vector

  • scalar_reduction (str) – reduction type of the metric, either ‘sum’ or ‘mean’ can be used to retrieve a single scalar value, if set to ‘none’ a spatial map will be returned

Returns

A single scalar absolute difference value as score with a reduction based on sum/mean or the spatial map of absolute difference

class monai.metrics.LabelQualityScore(include_background=True, scalar_reduction='sum')[source]#

The assumption is that the DL model makes better predictions than the provided label quality, hence the difference can be treated as a label quality score

It can be combined with variance/uncertainty for active learning frameworks to factor in the quality of label along with uncertainty :type include_background: bool :param include_background: Whether to include the background of the spatial image or channel 0 of the 1-D vector :param spatial_map: Boolean, if set to True, spatial map of variance will be returned corresponding to i/p image :param dimensions: :type scalar_reduction: str :param scalar_reduction: reduction type of the metric, either ‘sum’ or ‘mean’ can be used

IterationMetric#

class monai.metrics.IterationMetric[source]#

Base class for metrics computation at the iteration level, that is, on a min-batch of samples usually using the model outcome of one iteration.

__call__ is designed to handle y_pred and y (optional) in torch tensors or a list/tuple of tensors.

Subclasses typically implement the _compute_tensor function for the actual tensor computation logic.

Cumulative#

class monai.metrics.Cumulative[source]#

Utility class for the typical cumulative computation process based on PyTorch Tensors. It provides interfaces to accumulate values in the local buffers, synchronize buffers across distributed nodes, and aggregate the buffered values.

In multi-processing, PyTorch programs usually distribute data to multiple nodes. Each node runs with a subset of the data, adds values to its local buffers. Calling get_buffer could gather all the results and aggregate can further handle the results to generate the final outcomes.

Users can implement their own aggregate method to handle the results, using get_buffer to get the buffered contents.

Note: the data list should have the same length every time calling add() in a round, it will automatically create buffers according to the length of data list.

Typically, this class is expected to execute the following steps:

from monai.metrics import Cumulative

c = Cumulative()
c.append(1)  # adds a value
c.extend([2, 3])  # adds a batch of values
c.extend([4, 5, 6])  # adds a batch of values
print(c.get_buffer())  # tensor([1, 2, 3, 4, 5, 6])
print(len(c))  # 6
c.reset()
print(len(c))  # 0

The following is an example of maintaining two internal buffers:

from monai.metrics import Cumulative

c = Cumulative()
c.append(1, 2)  # adds a value to two buffers respectively
c.extend([3, 4], [5, 6])  # adds batches of values
print(c.get_buffer())  # [tensor([1, 3, 4]), tensor([2, 5, 6])]
print(len(c))

The following is an example of extending with variable length data:

import torch
from monai.metrics import Cumulative

c = Cumulative()
c.extend(torch.zeros((8, 2)), torch.zeros((6, 2)))  # adds batches
c.append(torch.zeros((2, )))  # adds a value
print(c.get_buffer())  # [torch.zeros((9, 2)), torch.zeros((6, 2))]
print(len(c))
__init__()[source]#

Initialize the internal buffers. self._buffers are local buffers, they are not usually used directly. self._sync_buffers are the buffers with all the results across all the nodes.

abstract aggregate(*args, **kwargs)[source]#

Aggregate final results based on the gathered buffers. This method is expected to use get_buffer to gather the local buffer contents.

append(*data)[source]#

Add samples to the local cumulative buffers. A buffer will be allocated for each data item. Compared with self.extend, this method adds a single sample (instead of a “batch”) to the local buffers.

Parameters

data – each item will be converted into a torch tensor. they will be stacked at the 0-th dim with a new dimension when get_buffer() is called.

Return type

None

extend(*data)[source]#

Extend the local buffers with new (“batch-first”) data. A buffer will be allocated for each data item. Compared with self.append, this method adds a “batch” of data to the local buffers.

Parameters

data – each item can be a “batch-first” tensor or a list of “channel-first” tensors. they will be concatenated at the 0-th dimension when get_buffer() is called.

Return type

None

get_buffer()[source]#

Get the synchronized list of buffers. A typical usage is to generate the metrics report based on the raw metric details. Each buffer is a PyTorch Tensor.

reset()[source]#

Reset the buffers for cumulative tensors and the synced results.

CumulativeIterationMetric#

class monai.metrics.CumulativeIterationMetric[source]#

Base class of cumulative metric which collects metrics on each mini-batch data at the iteration level.

Typically, it computes some intermediate results for each iteration, adds them to the buffers, then the buffer contents could be gathered and aggregated for the final result when epoch completed.

For example, MeanDice inherits this class and the usage is as follows:

dice_metric = DiceMetric(include_background=True, reduction="mean")

for val_data in val_loader:
    val_outputs = model(val_data["img"])
    val_outputs = [postprocessing_transform(i) for i in decollate_batch(val_outputs)]
    # compute metric for current iteration
    dice_metric(y_pred=val_outputs, y=val_data["seg"])  # callable to add metric to the buffer

# aggregate the final mean dice result
metric = dice_metric.aggregate().item()

# reset the status for next computation round
dice_metric.reset()

And to load predictions and labels from files, then compute metrics with multi-processing, please refer to: https://github.com/Project-MONAI/tutorials/blob/master/modules/compute_metric.py.

Mean Dice#

monai.metrics.compute_meandice(y_pred, y, include_background=True, ignore_empty=True)[source]#

Computes Dice score metric from full size Tensor and collects average.

Parameters
  • y_pred (Tensor) – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.

  • y (Tensor) – ground truth to compute mean dice metric. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background (bool) – whether to skip Dice computation on the first channel of the predicted output. Defaults to True.

  • ignore_empty (bool) – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.

Return type

Tensor

Returns

Dice scores per batch and per class, (shape [batch_size, num_classes]).

Raises

ValueError – when y_pred and y have different shapes.

class monai.metrics.DiceMetric(include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False, ignore_empty=True)[source]#

Compute average Dice score between two tensors. It can support both multi-classes and multi-labels tasks. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. The include_background parameter can be set to False to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]).

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • include_background (bool) – whether to skip Dice computation on the first channel of the predicted output. Defaults to True.

  • reduction (Union[MetricReduction, str]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

  • ignore_empty (bool) – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_meandice.

Parameters

reduction (Union[MetricReduction, str, None]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Mean IoU#

monai.metrics.compute_meaniou(y_pred, y, include_background=True, ignore_empty=True)[source]#

Computes IoU score metric from full size Tensor and collects average.

Parameters
  • y_pred (Tensor) – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.

  • y (Tensor) – ground truth to compute mean IoU metric. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background (bool) – whether to skip IoU computation on the first channel of the predicted output. Defaults to True.

  • ignore_empty (bool) – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.

Return type

Tensor

Returns

IoU scores per batch and per class, (shape [batch_size, num_classes]).

Raises

ValueError – when y_pred and y have different shapes.

class monai.metrics.MeanIoU(include_background=True, reduction=MetricReduction.MEAN, get_not_nans=False, ignore_empty=True)[source]#

Compute average IoU score between two tensors. It can support both multi-classes and multi-labels tasks. Input y_pred is compared with ground truth y. y_pred is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. The include_background parameter can be set to False to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background. y_pred and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]).

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • include_background (bool) – whether to skip IoU computation on the first channel of the predicted output. Defaults to True.

  • reduction (Union[MetricReduction, str]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

  • ignore_empty (bool) – whether to ignore empty ground truth cases during calculation. If True, NaN value will be set for empty ground truth cases. If False, 1 will be set if the predictions of empty ground truth cases are also empty.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_meaniou.

Parameters

reduction (Union[MetricReduction, str, None]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Generalized Dice Score#

monai.metrics.compute_generalized_dice(y_pred, y, include_background=True, weight_type=Weight.SQUARE)[source]#

Computes the Generalized Dice Score and returns a tensor with its per image values.

Parameters
  • y_pred (torch.Tensor) – binarized segmentation model output. It should be binarized, in one-hot format and in the NCHW[D] format, where N is the batch dimension, C is the channel dimension, and the remaining are the spatial dimensions.

  • y (torch.Tensor) – binarized ground-truth. It should be binarized, in one-hot format and have the same shape as y_pred.

  • include_background (bool, optional) – whether to skip score computation on the first channel of the predicted output. Defaults to True.

  • weight_type (Union[Weight, str], optional) – {"square", "simple", "uniform"}. Type of function to transform ground truth volume into a weight factor. Defaults to "square".

Returns

per batch and per class Generalized Dice Score, i.e., with the shape [batch_size, num_classes].

Return type

torch.Tensor

Raises

ValueError – if y_pred or y are not PyTorch tensors, if y_pred and y have less than three dimensions, or y_pred and y don’t have the same shape.

class monai.metrics.GeneralizedDiceScore(include_background=True, reduction=MetricReduction.MEAN_BATCH, weight_type=Weight.SQUARE)[source]#

Compute the Generalized Dice Score metric between tensors, as the complement of the Generalized Dice Loss defined in:

Sudre, C. et. al. (2017) Generalised Dice overlap as a deep learning

loss function for highly unbalanced segmentations. DLMIA 2017.

The inputs y_pred and y are expected to be one-hot, binarized channel-first or batch-first tensors, i.e., CHW[D] or BCHW[D].

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • include_background (bool, optional) – whether to include the background class (assumed to be in channel 0), in the score computation. Defaults to True.

  • reduction (str, optional) – define mode of reduction to the metrics. Available reduction modes: {"none", "mean_batch", "sum_batch"}. Default to "mean_batch". If “none”, will not do reduction.

  • weight_type (Union[Weight, str], optional) – {"square", "simple", "uniform"}. Type of function to transform ground truth volume into a weight factor. Defaults to "square".

Raises

ValueError – when the weight_type is not one of {"none", "mean", "sum"}.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_generalized_dice.

Parameters

reduction (Union[MetricReduction, str, None], optional) – define mode of reduction to the metrics. Available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch"}. Defaults to "mean". If “none”, will not do reduction.

Area under the ROC curve#

monai.metrics.compute_roc_auc(y_pred, y, average=Average.MACRO)[source]#

Computes Area Under the Receiver Operating Characteristic Curve (ROC AUC). Referring to: sklearn.metrics.roc_auc_score.

Parameters
  • y_pred (Tensor) – input data to compute, typical classification model output. the first dim must be batch, if multi-classes, it must be in One-Hot format. for example: shape [16] or [16, 1] for a binary data, shape [16, 2] for 2 classes data.

  • y (Tensor) – ground truth to compute ROC AUC metric, the first dim must be batch. if multi-classes, it must be in One-Hot format. for example: shape [16] or [16, 1] for a binary data, shape [16, 2] for 2 classes data.

  • average (Union[Average, str]) –

    {"macro", "weighted", "micro", "none"} Type of averaging performed if not binary classification. Defaults to "macro".

    • "macro": calculate metrics for each label, and find their unweighted mean.

      This does not take label imbalance into account.

    • "weighted": calculate metrics for each label, and find their average,

      weighted by support (the number of true instances for each label).

    • "micro": calculate metrics globally by considering each element of the label

      indicator matrix as a label.

    • "none": the scores for each class are returned.

Raises
  • ValueError – When y_pred dimension is not one of [1, 2].

  • ValueError – When y dimension is not one of [1, 2].

  • ValueError – When average is not one of [“macro”, “weighted”, “micro”, “none”].

Note

ROCAUC expects y to be comprised of 0’s and 1’s. y_pred must be either prob. estimates or confidence values.

class monai.metrics.ROCAUCMetric(average=Average.MACRO)[source]#

Computes Area Under the Receiver Operating Characteristic Curve (ROC AUC). Referring to: sklearn.metrics.roc_auc_score. The input y_pred and y can be a list of channel-first Tensor or a batch-first Tensor.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters

average (Union[Average, str]) –

{"macro", "weighted", "micro", "none"} Type of averaging performed if not binary classification. Defaults to "macro".

  • "macro": calculate metrics for each label, and find their unweighted mean.

    This does not take label imbalance into account.

  • "weighted": calculate metrics for each label, and find their average,

    weighted by support (the number of true instances for each label).

  • "micro": calculate metrics globally by considering each element of the label

    indicator matrix as a label.

  • "none": the scores for each class are returned.

aggregate(average=None)[source]#

Typically y_pred and y are stored in the cumulative buffers at each iteration, This function reads the buffers and computes the area under the ROC.

Parameters

average (Union[Average, str, None]) – {"macro", "weighted", "micro", "none"} Type of averaging performed if not binary classification. Defaults to self.average.

Confusion matrix#

monai.metrics.get_confusion_matrix(y_pred, y, include_background=True)[source]#

Compute confusion matrix. A tensor with the shape [BC4] will be returned. Where, the third dimension represents the number of true positive, false positive, true negative and false negative values for each channel of each sample within the input batch. Where, B equals to the batch size and C equals to the number of classes that need to be computed.

Parameters
  • y_pred (Tensor) – input data to compute. It must be one-hot format and first dim is batch. The values should be binarized.

  • y (Tensor) – ground truth to compute the metric. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background (bool) – whether to skip metric computation on the first channel of the predicted output. Defaults to True.

Raises

ValueError – when y_pred and y have different shapes.

monai.metrics.compute_confusion_matrix_metric(metric_name, confusion_matrix)[source]#

This function is used to compute confusion matrix related metric.

Parameters
  • metric_name (str) – ["sensitivity", "specificity", "precision", "negative predictive value", "miss rate", "fall out", "false discovery rate", "false omission rate", "prevalence threshold", "threat score", "accuracy", "balanced accuracy", "f1 score", "matthews correlation coefficient", "fowlkes mallows index", "informedness", "markedness"] Some of the metrics have multiple aliases (as shown in the wikipedia page aforementioned), and you can also input those names instead.

  • confusion_matrix (Tensor) – Please see the doc string of the function get_confusion_matrix for more details.

Raises
  • ValueError – when the size of the last dimension of confusion_matrix is not 4.

  • NotImplementedError – when specify a not implemented metric_name.

class monai.metrics.ConfusionMatrixMetric(include_background=True, metric_name='hit_rate', compute_sample=False, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute confusion matrix related metrics. This function supports to calculate all metrics mentioned in: Confusion matrix. It can support both multi-classes and multi-labels classification and segmentation tasks. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. The include_background parameter can be set to False for an instance to exclude the first category (channel index 0) which is by convention assumed to be background. If the non-background segmentations are small compared to the total image size they can get overwhelmed by the signal from the background.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • include_background (bool) – whether to skip metric computation on the first channel of the predicted output. Defaults to True.

  • metric_name (Union[Sequence[str], str]) – ["sensitivity", "specificity", "precision", "negative predictive value", "miss rate", "fall out", "false discovery rate", "false omission rate", "prevalence threshold", "threat score", "accuracy", "balanced accuracy", "f1 score", "matthews correlation coefficient", "fowlkes mallows index", "informedness", "markedness"] Some of the metrics have multiple aliases (as shown in the wikipedia page aforementioned), and you can also input those names instead. Except for input only one metric, multiple metrics are also supported via input a sequence of metric names, such as (“sensitivity”, “precision”, “recall”), if compute_sample is True, multiple f and not_nans will be returned with the same order as input names when calling the class.

  • compute_sample (bool) – when reducing, if True, each sample’s metric will be computed based on each confusion matrix first. if False, compute reduction on the confusion matrices first, defaults to False.

  • reduction (Union[MetricReduction, str]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns [(metric, not_nans), …]. If False, aggregate() returns [metric, …]. Here not_nans count the number of not nans for True Positive, False Positive, True Negative and False Negative. Its shape depends on the shape of the metric, and it has one more dimension with size 4. For example, if the shape of the metric is [3, 3], not_nans has the shape [3, 3, 4].

aggregate(compute_sample=False, reduction=None)[source]#

Execute reduction for the confusion matrix values.

Parameters
  • compute_sample (bool) – when reducing, if True, each sample’s metric will be computed based on each confusion matrix first. if False, compute reduction on the confusion matrices first, defaults to False.

  • reduction (Union[MetricReduction, str, None]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Hausdorff distance#

monai.metrics.compute_hausdorff_distance(y_pred, y, include_background=False, distance_metric='euclidean', percentile=None, directed=False)[source]#

Compute the Hausdorff distance.

Parameters
  • y_pred (Union[ndarray, Tensor]) – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.

  • y (Union[ndarray, Tensor]) – ground truth to compute mean the distance. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background (bool) – whether to skip distance computation on the first channel of the predicted output. Defaults to False.

  • distance_metric (str) – : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

  • percentile (Optional[float]) – an optional float number between 0 and 100. If specified, the corresponding percentile of the Hausdorff Distance rather than the maximum result will be achieved. Defaults to None.

  • directed (bool) – whether to calculate directed Hausdorff distance. Defaults to False.

monai.metrics.compute_percent_hausdorff_distance(edges_pred, edges_gt, distance_metric='euclidean', percentile=None)[source]#

This function is used to compute the directed Hausdorff distance.

class monai.metrics.HausdorffDistanceMetric(include_background=False, distance_metric='euclidean', percentile=None, directed=False, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Hausdorff Distance between two tensors. It can support both multi-classes and multi-labels tasks. It supports both directed and non-directed Hausdorff distance calculation. In addition, specify the percentile parameter can get the percentile of the distance. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]). The implementation refers to DeepMind’s implementation.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • include_background (bool) – whether to include distance computation on the first channel of the predicted output. Defaults to False.

  • distance_metric (str) – : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

  • percentile (Optional[float]) – an optional float number between 0 and 100. If specified, the corresponding percentile of the Hausdorff Distance rather than the maximum result will be achieved. Defaults to None.

  • directed (bool) – whether to calculate directed Hausdorff distance. Defaults to False.

  • reduction (Union[MetricReduction, str]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_hausdorff_distance.

Parameters

reduction (Union[MetricReduction, str, None]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Average surface distance#

monai.metrics.compute_average_surface_distance(y_pred, y, include_background=False, symmetric=False, distance_metric='euclidean')[source]#

This function is used to compute the Average Surface Distance from y_pred to y under the default setting. In addition, if sets symmetric = True, the average symmetric surface distance between these two inputs will be returned. The implementation refers to DeepMind’s implementation.

Parameters
  • y_pred (Union[ndarray, Tensor]) – input data to compute, typical segmentation model output. It must be one-hot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.

  • y (Union[ndarray, Tensor]) – ground truth to compute mean the distance. It must be one-hot format and first dim is batch. The values should be binarized.

  • include_background (bool) – whether to skip distance computation on the first channel of the predicted output. Defaults to False.

  • symmetric (bool) – whether to calculate the symmetric average surface distance between seg_pred and seg_gt. Defaults to False.

  • distance_metric (str) – : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

class monai.metrics.SurfaceDistanceMetric(include_background=False, symmetric=False, distance_metric='euclidean', reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Surface Distance between two tensors. It can support both multi-classes and multi-labels tasks. It supports both symmetric and asymmetric surface distance calculation. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in one-hot format. You can use suitable transforms in monai.transforms.post first to achieve binarized values. y_preds and y can be a list of channel-first Tensor (CHW[D]) or a batch-first Tensor (BCHW[D]).

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • include_background (bool) – whether to skip distance computation on the first channel of the predicted output. Defaults to False.

  • symmetric (bool) – whether to calculate the symmetric average surface distance between seg_pred and seg_gt. Defaults to False.

  • distance_metric (str) – : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

  • reduction (Union[MetricReduction, str]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.

aggregate(reduction=None)[source]#

Execute reduction logic for the output of compute_average_surface_distance.

Parameters

reduction (Union[MetricReduction, str, None]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Surface dice#

monai.metrics.compute_surface_dice(y_pred, y, class_thresholds, include_background=False, distance_metric='euclidean')[source]#

This function computes the (Normalized) Surface Dice (NSD) between the two tensors y_pred (referred to as \(\hat{Y}\)) and y (referred to as \(Y\)). This metric determines which fraction of a segmentation boundary is correctly predicted. A boundary element is considered correctly predicted if the closest distance to the reference boundary is smaller than or equal to the specified threshold related to the acceptable amount of deviation in pixels. The NSD is bounded between 0 and 1.

This implementation supports multi-class tasks with an individual threshold \(\tau_c\) for each class \(c\). The class-specific NSD for batch index \(b\), \(\operatorname {NSD}_{b,c}\), is computed using the function:

(1)#\[\operatorname {NSD}_{b,c} \left(Y_{b,c}, \hat{Y}_{b,c}\right) = \frac{\left|\mathcal{D}_{Y_{b,c}}^{'}\right| + \left| \mathcal{D}_{\hat{Y}_{b,c}}^{'} \right|}{\left|\mathcal{D}_{Y_{b,c}}\right| + \left|\mathcal{D}_{\hat{Y}_{b,c}}\right|}\]

with \(\mathcal{D}_{Y_{b,c}}\) and \(\mathcal{D}_{\hat{Y}_{b,c}}\) being two sets of nearest-neighbor distances. \(\mathcal{D}_{Y_{b,c}}\) is computed from the predicted segmentation boundary towards the reference segmentation boundary and vice-versa for \(\mathcal{D}_{\hat{Y}_{b,c}}\). \(\mathcal{D}_{Y_{b,c}}^{'}\) and \(\mathcal{D}_{\hat{Y}_{b,c}}^{'}\) refer to the subsets of distances that are smaller or equal to the acceptable distance \(\tau_c\):

\[\mathcal{D}_{Y_{b,c}}^{'} = \{ d \in \mathcal{D}_{Y_{b,c}} \, | \, d \leq \tau_c \}.\]

In the case of a class neither being present in the predicted segmentation, nor in the reference segmentation, a nan value will be returned for this class. In the case of a class being present in only one of predicted segmentation or reference segmentation, the class NSD will be 0.

This implementation is based on https://arxiv.org/abs/2111.05408 and supports 2D images. Be aware that the computation of boundaries is different from DeepMind’s implementation https://github.com/deepmind/surface-distance. In this implementation, the length of a segmentation boundary is interpreted as the number of its edge pixels. In DeepMind’s implementation, the length of a segmentation boundary depends on the local neighborhood (cf. https://arxiv.org/abs/1809.04430).

Parameters
  • y_pred (Tensor) – Predicted segmentation, typically segmentation model output. It must be a one-hot encoded, batch-first tensor [B,C,H,W].

  • y (Tensor) – Reference segmentation. It must be a one-hot encoded, batch-first tensor [B,C,H,W].

  • class_thresholds (List[float]) – List of class-specific thresholds. The thresholds relate to the acceptable amount of deviation in the segmentation boundary in pixels. Each threshold needs to be a finite, non-negative number.

  • include_background (bool) – Whether to skip the surface dice computation on the first channel of the predicted output. Defaults to False.

  • distance_metric (str) – The metric used to compute surface distances. One of ["euclidean", "chessboard", "taxicab"]. Defaults to "euclidean".

Raises
  • ValueError – If y_pred and/or y are not PyTorch tensors.

  • ValueError – If y_pred and/or y do not have four dimensions.

  • ValueError – If y_pred and/or y have different shapes.

  • ValueError – If y_pred and/or y are not one-hot encoded

  • ValueError – If the number of channels of y_pred and/or y is different from the number of class thresholds.

  • ValueError – If any class threshold is not finite.

  • ValueError – If any class threshold is negative.

Returns

Pytorch Tensor of shape [B,C], containing the NSD values \(\operatorname {NSD}_{b,c}\) for each batch index \(b\) and class \(c\).

class monai.metrics.SurfaceDiceMetric(class_thresholds, include_background=False, distance_metric='euclidean', reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Computes the Normalized Surface Distance (NSD) for each batch sample and class of predicted segmentations y_pred and corresponding reference segmentations y according to equation (1). This implementation supports 2D images. For 3D images, please refer to DeepMind’s implementation https://github.com/deepmind/surface-distance.

The class- and batch sample-wise NSD values can be aggregated with the function aggregate.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • class_thresholds (List[float]) – List of class-specific thresholds. The thresholds relate to the acceptable amount of deviation in the segmentation boundary in pixels. Each threshold needs to be a finite, non-negative number.

  • include_background (bool) – Whether to skip NSD computation on the first channel of the predicted output. Defaults to False.

  • distance_metric (str) – The metric used to compute surface distances. One of ["euclidean", "chessboard", "taxicab"]. Defaults to "euclidean".

  • reduction (Union[MetricReduction, str]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count. Defaults to False. not_nans is the number of batch samples for which not all class-specific NSD values were nan values. If set to True, the function aggregate will return both the aggregated NSD and the not_nans count. If set to False, aggregate will only return the aggregated NSD.

aggregate(reduction=None)[source]#

Aggregates the output of _compute_tensor.

Parameters

reduction (Union[MetricReduction, str, None]) – define mode of reduction to the metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to self.reduction. if “none”, will not do reduction.

Returns

If get_not_nans is set to True, this function returns the aggregated NSD and the not_nans count. If get_not_nans is set to False, this function returns only the aggregated NSD.

Mean squared error#

class monai.metrics.MSEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Mean Squared Error between two tensors using function:

\[\operatorname {MSE}\left(Y, \hat{Y}\right) =\frac {1}{n}\sum _{i=1}^{n}\left(y_i-\hat{y_i} \right)^{2}.\]

More info: https://en.wikipedia.org/wiki/Mean_squared_error

Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • reduction (Union[MetricReduction, str]) – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).

Mean absolute error#

class monai.metrics.MAEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Mean Absolute Error between two tensors using function:

\[\operatorname {MAE}\left(Y, \hat{Y}\right) =\frac {1}{n}\sum _{i=1}^{n}\left|y_i-\hat{y_i}\right|.\]

More info: https://en.wikipedia.org/wiki/Mean_absolute_error

Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • reduction (Union[MetricReduction, str]) – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).

Root mean squared error#

class monai.metrics.RMSEMetric(reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Root Mean Squared Error between two tensors using function:

\[\operatorname {RMSE}\left(Y, \hat{Y}\right) ={ \sqrt{ \frac {1}{n}\sum _{i=1}^{n}\left(y_i-\hat{y_i}\right)^2 } } \ = \sqrt {\operatorname{MSE}\left(Y, \hat{Y}\right)}.\]

More info: https://en.wikipedia.org/wiki/Root-mean-square_deviation

Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • reduction (Union[MetricReduction, str]) – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).

Peak signal to noise ratio#

class monai.metrics.PSNRMetric(max_val, reduction=MetricReduction.MEAN, get_not_nans=False)[source]#

Compute Peak Signal To Noise Ratio between two tensors using function:

\[\operatorname{PSNR}\left(Y, \hat{Y}\right) = 20 \cdot \log_{10} \left({\mathit{MAX}}_Y\right) \ -10 \cdot \log_{10}\left(\operatorname{MSE\left(Y, \hat{Y}\right)}\right)\]

More info: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

Help taken from: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/image_ops_impl.py line 4139

Input y_pred is compared with ground truth y. Both y_pred and y are expected to be real-valued, where y_pred is output from a regression model.

Example of the typical execution steps of this metric class follows monai.metrics.metric.Cumulative.

Parameters
  • max_val (Union[int, float]) – The dynamic range of the images/volumes (i.e., the difference between the maximum and the minimum allowed values e.g. 255 for a uint8 image).

  • reduction (Union[MetricReduction, str]) – define the mode to reduce metrics, will only execute reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, will not do reduction.

  • get_not_nans (bool) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).

Structural similarity index measure#

class monai.metrics.regression.SSIMMetric(data_range, win_size=7, k1=0.01, k2=0.03, spatial_dims=2)[source]#

Build a Pytorch version of the SSIM metric based on the original formula of SSIM

\[\operatorname {SSIM}(x,y) =\frac {(2 \mu_x \mu_y + c_1)(2 \sigma_{xy} + c_2)}{((\mu_x^2 + \ \mu_y^2 + c_1)(\sigma_x^2 + \sigma_y^2 + c_2)}\]
For more info, visit

https://vicuesoft.com/glossary/term/ssim-ms-ssim/

Modified and adopted from:

https://github.com/facebookresearch/fastMRI/blob/main/banding_removal/fastmri/ssim_loss_mixin.py

SSIM reference paper:

Wang, Zhou, et al. “Image quality assessment: from error visibility to structural similarity.” IEEE transactions on image processing 13.4 (2004): 600-612.

Parameters
  • data_range (Tensor) – dynamic range of the data

  • win_size (int) – gaussian weighting window size

  • k1 (float) – stability constant used in the luminance denominator

  • k2 (float) – stability constant used in the contrast denominator

  • spatial_dims (int) – if 2, input shape is expected to be (B,C,W,H). if 3, it is expected to be (B,C,W,H,D)

Cumulative average#

class monai.metrics.CumulativeAverage[source]#

Cumulatively record data value and aggregate for the average value. It supports single class or multi-class data, for example, value can be 0.44 (a loss value) or [0.3, 0.4] (metrics of two classes). It also supports distributed data parallel, sync data when aggregating. For example, recording loss values and compute the overall average value in every 5 iterations:

average = CumulativeAverage()
for i, d in enumerate(dataloader):
    loss = ...
    average.append(loss)
    if i % 5 == 0:
        print(f"cumulative average of loss: {average.aggregate()}")
average.reset()
aggregate()[source]#

Sync data from all the ranks and compute the average value with previous sum value.

reset()[source]#

Reset all the running status, including buffers, sum, not nans count, etc.

Utilities#

monai.metrics.utils.do_metric_reduction(f, reduction=MetricReduction.MEAN)[source]#

This function is to do the metric reduction for calculated not-nan metrics of each sample’s each class. The function also returns not_nans, which counts the number of not nans for the metric.

Parameters
  • f (Tensor) – a tensor that contains the calculated metric scores per batch and per class. The first two dims should be batch and class.

  • reduction (Union[MetricReduction, str]) – define the mode to reduce metrics, will only apply reduction on not-nan values, available reduction modes: {"none", "mean", "sum", "mean_batch", "sum_batch", "mean_channel", "sum_channel"}, default to "mean". if “none”, return the input f tensor and not_nans.

Raises

ValueError – When reduction is not one of [“mean”, “sum”, “mean_batch”, “sum_batch”, “mean_channel”, “sum_channel” “none”].

monai.metrics.utils.get_mask_edges(seg_pred, seg_gt, label_idx=1, crop=True)[source]#

Do binary erosion and use XOR for input to get the edges. This function is helpful to further calculate metrics such as Average Surface Distance and Hausdorff Distance. The input images can be binary or labelfield images. If labelfield images are supplied, they are converted to binary images using label_idx.

scipy’s binary erosion is used to calculate the edges of the binary labelfield.

In order to improve the computing efficiency, before getting the edges, the images can be cropped and only keep the foreground if not specifies crop = False.

We require that images are the same size, and assume that they occupy the same space (spacing, orientation, etc.).

Parameters
  • seg_pred – the predicted binary or labelfield image.

  • seg_gt – the actual binary or labelfield image.

  • label_idx (int) – for labelfield images, convert to binary with seg_pred = seg_pred == label_idx.

  • crop (bool) – crop input images and only keep the foregrounds. In order to maintain two inputs’ shapes, here the bounding box is achieved by (seg_pred | seg_gt) which represents the union set of two images. Defaults to True.

Return type

Tuple[ndarray, ndarray]

monai.metrics.utils.get_surface_distance(seg_pred, seg_gt, distance_metric='euclidean')[source]#

This function is used to compute the surface distances from seg_pred to seg_gt.

Parameters
  • seg_pred (ndarray) – the edge of the predictions.

  • seg_gt (ndarray) – the edge of the ground truth.

  • distance_metric (str) –

    : ["euclidean", "chessboard", "taxicab"] the metric used to compute surface distance. Defaults to "euclidean".

    • "euclidean", uses Exact Euclidean distance transform.

    • "chessboard", uses chessboard metric in chamfer type of transform.

    • "taxicab", uses taxicab metric in chamfer type of transform.

Note

If seg_pred or seg_gt is all 0, may result in nan/inf distance.

Return type

ndarray

monai.metrics.utils.ignore_background(y_pred, y)[source]#

This function is used to remove background (the first channel) for y_pred and y.

Parameters
  • y_pred (Union[ndarray, Tensor]) – predictions. As for classification tasks, y_pred should has the shape [BN] where N is larger than 1. As for segmentation tasks, the shape should be [BNHW] or [BNHWD].

  • y (Union[ndarray, Tensor]) – ground truth, the first dim is batch.

monai.metrics.utils.is_binary_tensor(input, name)[source]#

Determines whether the input tensor is torch binary tensor or not.

Parameters
  • input (torch.Tensor) – tensor to validate.

  • name (str) – name of the tensor being checked.

Raises

ValueError – if input is not a PyTorch Tensor.

Returns

warning message, if the tensor is not binary. Otherwise, None.

Return type

Union[str, None]