Metrics¶
FROC¶

monai.metrics.
compute_froc_score
(fps_per_image, total_sensitivity, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8))[source]¶ This function is modified from the official evaluation code of CAMELYON 16 Challenge, and used to compute the challenge’s second evaluation metric, which is defined as the average sensitivity at the predefined false positive rates per whole slide image.
 Parameters
fps_per_image (
ndarray
) – the average number of false positives per image for different thresholds.total_sensitivity (
ndarray
) – sensitivities (true positive rates) for different thresholds.eval_thresholds (
Tuple
) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.
Metric¶
IterationMetric¶

class
monai.metrics.
IterationMetric
[source]¶ Base class of Metrics interface for computation on a batch of tensors, usually the data of 1 iteration. __call__ is supposed to compute independent logic for several samples of y_pred and y`(optional). Ususally, subclass only needs to implement the `_compute_tensor function for computation process. The input data shape should be list of channelfirst tensors or a batchfirst tensor.
Cumulative¶

class
monai.metrics.
Cumulative
[source]¶ Utility class for the typical cumulative computation process based on PyTorch Tensors. It cumulates tensors in the buffer, then sync across distributed ranks and aggregate.
To speed up computation with multiprocessing, PyTorch programs usually split data to distributed ranks by DistributedSampler before an epoch, every rank then computes only based on its own data part and add to the buffers in its process. Eventually, sync the values of all ranks to compute the final results.
Note: the data list should have the same length every time calling add() in a round, it will automatically create buffers according to the length of data list.
Typically, this class is expected to execute the steps referring to below examples:
cum = Cumulative() cum.add(x, y) cum.add(a, b) cum.add(c, d) cum.agrregate() result = cum.get_buffer() cum.reset()

add
(*data)[source]¶ Add samples to the cumulative buffers.
 Parameters
data (
Tensor
) – list of input tensor, make sure the input data order is always the same in a round. every item of data will be added to the corresponding buffer.

CumulativeIterationMetric¶

class
monai.metrics.
CumulativeIterationMetric
[source]¶ Base class of cumulative metric which computes on batch data of every iteration and aggregate. Typically, it computes some intermediate results for every iteration, cumulates in buffers, then syncs across all the distributed ranks and aggregates for the final result when epoch completed.
Mean Dice¶

monai.metrics.
compute_meandice
(y_pred, y, include_background=True)[source]¶ Computes Dice score metric from full size Tensor and collects average.
 Parameters
y_pred (
Tensor
) – input data to compute, typical segmentation model output. It must be onehot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.y (
Tensor
) – ground truth to compute mean dice metric. It must be onehot format and first dim is batch. The values should be binarized.include_background (
bool
) – whether to skip Dice computation on the first channel of the predicted output. Defaults to True.
 Return type
Tensor
 Returns
Dice scores per batch and per class, (shape [batch_size, n_classes]).
 Raises
ValueError – when y_pred and y have different shapes.

class
monai.metrics.
DiceMetric
(include_background=True, reduction=<MetricReduction.MEAN: 'mean'>, get_not_nans=False)[source]¶ Compute average Dice loss between two tensors. It can support both multiclasses and multilabels tasks. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in onehot format. You can use suitable transforms in
monai.transforms.post
first to achieve binarized values. The include_background parameter can be set toFalse
for an instance of DiceLoss to exclude the first category (channel index 0) which is by convention assumed to be background. If the nonbackground segmentations are small compared to the total image size they can get overwhelmed by the signal from the background so excluding it in such cases helps convergence. y_preds and y can be a list of channelfirst Tensor (CHW[D]) or a batchfirst Tensor (BCHW[D]). Parameters
include_background (
bool
) – whether to skip Dice computation on the first channel of the predicted output. Defaults toTrue
.reduction (
Union
[MetricReduction
,str
]) – {"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
} Define the mode to reduce computation result. Defaults to"mean"
.get_not_nans (
bool
) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
Area under the ROC curve¶

monai.metrics.
compute_roc_auc
(y_pred, y, average=<Average.MACRO: 'macro'>)[source]¶ Computes Area Under the Receiver Operating Characteristic Curve (ROC AUC). Referring to: sklearn.metrics.roc_auc_score.
 Parameters
y_pred (
Tensor
) – input data to compute, typical classification model output. it must be OneHot format and first dim is batch, example shape: [16] or [16, 2].y (
Tensor
) – ground truth to compute ROC AUC metric, the first dim is batch. example shape: [16, 1] will be converted into [16, 2] (where 2 is inferred from y_pred).average (
Union
[Average
,str
]) –{
"macro"
,"weighted"
,"micro"
,"none"
} Type of averaging performed if not binary classification. Defaults to"macro"
."macro"
: calculate metrics for each label, and find their unweighted mean.This does not take label imbalance into account.
"weighted"
: calculate metrics for each label, and find their average,weighted by support (the number of true instances for each label).
"micro"
: calculate metrics globally by considering each element of the labelindicator matrix as a label.
"none"
: the scores for each class are returned.
 Raises
ValueError – When
y_pred
dimension is not one of [1, 2].ValueError – When
y
dimension is not one of [1, 2].ValueError – When
average
is not one of [“macro”, “weighted”, “micro”, “none”].
Note
ROCAUC expects y to be comprised of 0’s and 1’s. y_pred must be either prob. estimates or confidence values.

class
monai.metrics.
ROCAUCMetric
(average=<Average.MACRO: 'macro'>)[source]¶ Computes Area Under the Receiver Operating Characteristic Curve (ROC AUC). Referring to: sklearn.metrics.roc_auc_score. The input y_pred and y can be a list of channelfirst Tensor or a batchfirst Tensor.
 Parameters
average (
Union
[Average
,str
]) –{
"macro"
,"weighted"
,"micro"
,"none"
} Type of averaging performed if not binary classification. Defaults to"macro"
."macro"
: calculate metrics for each label, and find their unweighted mean.This does not take label imbalance into account.
"weighted"
: calculate metrics for each label, and find their average,weighted by support (the number of true instances for each label).
"micro"
: calculate metrics globally by considering each element of the labelindicator matrix as a label.
"none"
: the scores for each class are returned.
Confusion matrix¶

monai.metrics.
get_confusion_matrix
(y_pred, y, include_background=True)[source]¶ Compute confusion matrix. A tensor with the shape [BC4] will be returned. Where, the third dimension represents the number of true positive, false positive, true negative and false negative values for each channel of each sample within the input batch. Where, B equals to the batch size and C equals to the number of classes that need to be computed.
 Parameters
y_pred (
Tensor
) – input data to compute. It must be onehot format and first dim is batch. The values should be binarized.y (
Tensor
) – ground truth to compute the metric. It must be onehot format and first dim is batch. The values should be binarized.include_background (
bool
) – whether to skip metric computation on the first channel of the predicted output. Defaults to True.
 Raises
ValueError – when y_pred and y have different shapes.

class
monai.metrics.
ConfusionMatrixMetric
(include_background=True, metric_name='hit_rate', compute_sample=False, reduction=<MetricReduction.MEAN: 'mean'>, get_not_nans=False)[source]¶ Compute confusion matrix related metrics. This function supports to calculate all metrics mentioned in: Confusion matrix. It can support both multiclasses and multilabels classification and segmentation tasks. y_preds is expected to have binarized predictions and y should be in onehot format. You can use suitable transforms in
monai.transforms.post
first to achieve binarized values. The include_background parameter can be set toFalse
for an instance to exclude the first category (channel index 0) which is by convention assumed to be background. If the nonbackground segmentations are small compared to the total image size they can get overwhelmed by the signal from the background so excluding it in such cases helps convergence. Parameters
include_background (
bool
) – whether to skip metric computation on the first channel of the predicted output. Defaults to True.metric_name (
Union
[Sequence
[str
],str
]) – ["sensitivity"
,"specificity"
,"precision"
,"negative predictive value"
,"miss rate"
,"fall out"
,"false discovery rate"
,"false omission rate"
,"prevalence threshold"
,"threat score"
,"accuracy"
,"balanced accuracy"
,"f1 score"
,"matthews correlation coefficient"
,"fowlkes mallows index"
,"informedness"
,"markedness"
] Some of the metrics have multiple aliases (as shown in the wikipedia page aforementioned), and you can also input those names instead. Except for input only one metric, multiple metrics are also supported via input a sequence of metric names, such as (“sensitivity”, “precision”, “recall”), ifcompute_sample
isTrue
, multiplef
andnot_nans
will be returned with the same order as input names when calling the class.compute_sample (
bool
) – when reducing, ifTrue
, each sample’s metric will be computed based on each confusion matrix first. ifFalse
, compute reduction on the confusion matrices first, defaults toFalse
.reduction (
Union
[MetricReduction
,str
]) – {"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
}get_not_nans (
bool
) – whether to return the not_nans count, if True, aggregate() returns [(metric, not_nans), …]. If False, aggregate() returns [metric, …]. Here not_nans count the number of not nans for True Positive, False Positive, True Negative and False Negative. Its shape depends on the shape of the metric, and it has one more dimension with size 4. For example, if the shape of the metric is [3, 3], not_nans has the shape [3, 3, 4].
Hausdorff distance¶

monai.metrics.
compute_hausdorff_distance
(y_pred, y, include_background=False, distance_metric='euclidean', percentile=None, directed=False)[source]¶ Compute the Hausdorff distance.
 Parameters
y_pred (
Union
[ndarray
,Tensor
]) – input data to compute, typical segmentation model output. It must be onehot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.y (
Union
[ndarray
,Tensor
]) – ground truth to compute mean the distance. It must be onehot format and first dim is batch. The values should be binarized.include_background (
bool
) – whether to skip distance computation on the first channel of the predicted output. Defaults toFalse
.distance_metric (
str
) – : ["euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
.percentile (
Optional
[float
]) – an optional float number between 0 and 100. If specified, the corresponding percentile of the Hausdorff Distance rather than the maximum result will be achieved. Defaults toNone
.directed (
bool
) – whether to calculate directed Hausdorff distance. Defaults toFalse
.

class
monai.metrics.
HausdorffDistanceMetric
(include_background=False, distance_metric='euclidean', percentile=None, directed=False, reduction=<MetricReduction.MEAN: 'mean'>, get_not_nans=False)[source]¶ Compute Hausdorff Distance between two tensors. It can support both multiclasses and multilabels tasks. It supports both directed and nondirected Hausdorff distance calculation. In addition, specify the percentile parameter can get the percentile of the distance. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in onehot format. You can use suitable transforms in
monai.transforms.post
first to achieve binarized values. y_preds and y can be a list of channelfirst Tensor (CHW[D]) or a batchfirst Tensor (BCHW[D]). The implementation refers to DeepMind’s implementation. Parameters
include_background (
bool
) – whether to include distance computation on the first channel of the predicted output. Defaults toFalse
.distance_metric (
str
) – : ["euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
.percentile (
Optional
[float
]) – an optional float number between 0 and 100. If specified, the corresponding percentile of the Hausdorff Distance rather than the maximum result will be achieved. Defaults toNone
.directed (
bool
) – whether to calculate directed Hausdorff distance. Defaults toFalse
.reduction (
Union
[MetricReduction
,str
]) – {"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
} Define the mode to reduce computation result. Defaults to"mean"
.get_not_nans (
bool
) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
Average surface distance¶

monai.metrics.
compute_average_surface_distance
(y_pred, y, include_background=False, symmetric=False, distance_metric='euclidean')[source]¶ This function is used to compute the Average Surface Distance from y_pred to y under the default setting. In addition, if sets
symmetric = True
, the average symmetric surface distance between these two inputs will be returned. The implementation refers to DeepMind’s implementation. Parameters
y_pred (
Union
[ndarray
,Tensor
]) – input data to compute, typical segmentation model output. It must be onehot format and first dim is batch, example shape: [16, 3, 32, 32]. The values should be binarized.y (
Union
[ndarray
,Tensor
]) – ground truth to compute mean the distance. It must be onehot format and first dim is batch. The values should be binarized.include_background (
bool
) – whether to skip distance computation on the first channel of the predicted output. Defaults toFalse
.symmetric (
bool
) – whether to calculate the symmetric average surface distance between seg_pred and seg_gt. Defaults toFalse
.distance_metric (
str
) – : ["euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
.

class
monai.metrics.
SurfaceDistanceMetric
(include_background=False, symmetric=False, distance_metric='euclidean', reduction=<MetricReduction.MEAN: 'mean'>, get_not_nans=False)[source]¶ Compute Surface Distance between two tensors. It can support both multiclasses and multilabels tasks. It supports both symmetric and asymmetric surface distance calculation. Input y_pred is compared with ground truth y. y_preds is expected to have binarized predictions and y should be in onehot format. You can use suitable transforms in
monai.transforms.post
first to achieve binarized values. y_preds and y can be a list of channelfirst Tensor (CHW[D]) or a batchfirst Tensor (BCHW[D]). Parameters
include_background (
bool
) – whether to skip distance computation on the first channel of the predicted output. Defaults toFalse
.symmetric (
bool
) – whether to calculate the symmetric average surface distance between seg_pred and seg_gt. Defaults toFalse
.distance_metric (
str
) – : ["euclidean"
,"chessboard"
,"taxicab"
] the metric used to compute surface distance. Defaults to"euclidean"
.reduction (
Union
[MetricReduction
,str
]) – {"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
} Define the mode to reduce computation result. Defaults to"mean"
.get_not_nans (
bool
) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans). Here not_nans count the number of not nans for the metric, thus its shape equals to the shape of the metric.
Mean squared error¶

class
monai.metrics.
MSEMetric
(reduction=<MetricReduction.MEAN: 'mean'>, get_not_nans=False)[source]¶ Compute Mean Squared Error between two tensors using function:
\[\operatorname {MSE}\left(Y, \hat{Y}\right) =\frac {1}{n}\sum _{i=1}^{n}\left(y_i\hat{y_i} \right)^{2}.\]More info: https://en.wikipedia.org/wiki/Mean_squared_error
Input y_pred is compared with ground truth y. Both y_pred and y are expected to be realvalued, where y_pred is output from a regression model.
 Parameters
reduction (
Union
[MetricReduction
,str
]) – {"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
} Define the mode to reduce computation result of 1 batch data. Defaults to"mean"
.get_not_nans (
bool
) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).
Mean absolute error¶

class
monai.metrics.
MAEMetric
(reduction=<MetricReduction.MEAN: 'mean'>, get_not_nans=False)[source]¶ Compute Mean Absolute Error between two tensors using function:
\[\operatorname {MAE}\left(Y, \hat{Y}\right) =\frac {1}{n}\sum _{i=1}^{n}\lefty_i\hat{y_i}\right.\]More info: https://en.wikipedia.org/wiki/Mean_absolute_error
Input y_pred is compared with ground truth y. Both y_pred and y are expected to be realvalued, where y_pred is output from a regression model.
 Parameters
reduction (
Union
[MetricReduction
,str
]) – {"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
} Define the mode to reduce computation result of 1 batch data. Defaults to"mean"
.get_not_nans (
bool
) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).
Root mean squared error¶

class
monai.metrics.
RMSEMetric
(reduction=<MetricReduction.MEAN: 'mean'>, get_not_nans=False)[source]¶ Compute Root Mean Squared Error between two tensors using function:
\[\operatorname {RMSE}\left(Y, \hat{Y}\right) ={ \sqrt{ \frac {1}{n}\sum _{i=1}^{n}\left(y_i\hat{y_i}\right)^2 } } \ = \sqrt {\operatorname{MSE}\left(Y, \hat{Y}\right)}.\]More info: https://en.wikipedia.org/wiki/Rootmeansquare_deviation
Input y_pred is compared with ground truth y. Both y_pred and y are expected to be realvalued, where y_pred is output from a regression model.
 Parameters
reduction (
Union
[MetricReduction
,str
]) – {"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
} Define the mode to reduce computation result of 1 batch data. Defaults to"mean"
.get_not_nans (
bool
) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).
Peak signal to noise ratio¶

class
monai.metrics.
PSNRMetric
(max_val, reduction=<MetricReduction.MEAN: 'mean'>, get_not_nans=False)[source]¶ Compute Peak Signal To Noise Ratio between two tensors using function:
\[\operatorname{PSNR}\left(Y, \hat{Y}\right) = 20 \cdot \log_{10} \left({\mathit{MAX}}_Y\right) \ 10 \cdot \log_{10}\left(\operatorname{MSE\left(Y, \hat{Y}\right)}\right)\]More info: https://en.wikipedia.org/wiki/Peak_signaltonoise_ratio
Help taken from: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/ops/image_ops_impl.py line 4139
Input y_pred is compared with ground truth y. Both y_pred and y are expected to be realvalued, where y_pred is output from a regression model.
 Parameters
max_val (
Union
[int
,float
]) – The dynamic range of the images/volumes (i.e., the difference between the maximum and the minimum allowed values e.g. 255 for a uint8 image).reduction (
Union
[MetricReduction
,str
]) – {"none"
,"mean"
,"sum"
,"mean_batch"
,"sum_batch"
,"mean_channel"
,"sum_channel"
} Define the mode to reduce computation result of 1 batch data. Defaults to"mean"
.get_not_nans (
bool
) – whether to return the not_nans count, if True, aggregate() returns (metric, not_nans).