Applications#

Datasets#

class monai.apps.MedNISTDataset(root_dir, section, transform=(), download=False, seed=0, val_frac=0.1, test_frac=0.1, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, runtime_cache=False)[source]#

The Dataset to automatically download MedNIST data and generate items for training, validation or test. It’s based on CacheDataset to accelerate the training process.

Parameters:

root_dir – target directory to download and load MedNIST dataset.
section – expected data section, can be: training, validation or test.
transform – transforms to execute operations on input data.
download – whether to download and extract the MedNIST from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy MedNIST.tar.gz file or MedNIST folder to root directory.
seed – random seed to randomly split training, validation and test datasets, default is 0.
val_frac – percentage of validation fraction in the whole dataset, default is 0.1.
test_frac – percentage of test fraction in the whole dataset, default is 0.1.
cache_num – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers – the number of worker threads if computing cache in the initialization. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
runtime_cache – whether to compute cache at the runtime, default to False to prepare the cache content at initialization. See: monai.data.CacheDataset.

Raises:

ValueError – When root_dir is not a directory.
RuntimeError – When dataset_dir doesn’t exist and downloading is not selected (download=False).

get_num_classes()[source]#

Get number of classes.

Return type:: int

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: None

class monai.apps.DecathlonDataset(root_dir, task, section, transform=(), download=False, seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, runtime_cache=False)[source]#

The Dataset to automatically download the data of Medical Segmentation Decathlon challenge (http://medicaldecathlon.com/) and generate items for training, validation or test. It will also load these properties from the JSON config file of dataset. user can call get_properties() to get specified properties or all the properties loaded. It’s based on monai.data.CacheDataset to accelerate the training process.

Parameters:

root_dir – user’s local directory for caching and loading the MSD datasets.
task – which task to download and execute: one of list (“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”).
section – expected data section, can be: training, validation or test.
transform – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D].
download – whether to download and extract the Decathlon from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.
val_frac – percentage of validation fraction in the whole dataset, default is 0.2.
seed – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.
cache_num – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers – the number of worker threads if computing cache in the initialization. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
runtime_cache – whether to compute cache at the runtime, default to False to prepare the cache content at initialization. See: monai.data.CacheDataset.

Raises:

ValueError – When root_dir is not a directory.
ValueError – When task is not one of [“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”].
RuntimeError – When dataset_dir doesn’t exist and downloading is not selected (download=False).

Example:

transform = Compose(
    [
        LoadImaged(keys=["image", "label"]),
        EnsureChannelFirstd(keys=["image", "label"]),
        ScaleIntensityd(keys="image"),
        ToTensord(keys=["image", "label"]),
    ]
)

val_data = DecathlonDataset(
    root_dir="./", task="Task09_Spleen", transform=transform, section="validation", seed=12345, download=True
)

print(val_data[0]["image"], val_data[0]["label"])

get_indices()[source]#

Get the indices of datalist used in this dataset.

Return type:: ndarray

get_properties(keys=None)[source]#: Get the loaded properties of dataset with specified keys. If no keys specified, return all the loaded properties.

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: None

class monai.apps.TciaDataset(root_dir, collection, section, transform=(), download=False, download_len=-1, seg_type='SEG', modality_tag=(8, 96), ref_series_uid_tag=(32, 14), ref_sop_uid_tag=(8, 4437), specific_tags=((8, 4373), (8, 4416), (12294, 16), (32, 13), (16, 16), (16, 32), (32, 17), (32, 18)), fname_regex='^(?!.*LICENSE).*', seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=0.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, runtime_cache=False)[source]#

The Dataset to automatically download the data from a public The Cancer Imaging Archive (TCIA) dataset and generate items for training, validation or test.

The Highdicom library is used to load dicom data with modality “SEG”, but only a part of collections are supported, such as: “C4KC-KiTS”, “NSCLC-Radiomics”, “NSCLC-Radiomics-Interobserver1”, “ QIN-PROSTATE-Repeatability” and “PROSTATEx”. Therefore, if “seg” is included in keys of the LoadImaged transform and loading some other collections, errors may be raised. For supported collections, the original “SEG” information may not always be consistent for each dicom file. Therefore, to avoid creating different format of labels, please use the label_dict argument of PydicomReader when calling the LoadImaged transform. The prepared label dicts of collections that are mentioned above is also saved in: monai.apps.tcia.TCIA_LABEL_DICT. You can also refer to the second example bellow.

This class is based on monai.data.CacheDataset to accelerate the training process.

Parameters:

root_dir – user’s local directory for caching and loading the TCIA dataset.
collection – name of a TCIA collection. a TCIA dataset is defined as a collection. Please check the following list to browse the collection list (only public collections can be downloaded): https://www.cancerimagingarchive.net/collections/
section – expected data section, can be: training, validation or test.
transform – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D]. If not specified, LoadImaged(reader=”PydicomReader”, keys=[“image”]) will be used as the default transform. In addition, we suggest to set the argument labels for PydicomReader if segmentations are needed to be loaded. The original labels for each dicom series may be different, using this argument is able to unify the format of labels.
download – whether to download and extract the dataset, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.
download_len – number of series that will be downloaded, the value should be larger than 0 or -1, where -1 means all series will be downloaded. Default is -1.
seg_type – modality type of segmentation that is used to do the first step download. Default is “SEG”.
modality_tag – tag of modality. Default is (0x0008, 0x0060).
ref_series_uid_tag – tag of referenced Series Instance UID. Default is (0x0020, 0x000e).
ref_sop_uid_tag – tag of referenced SOP Instance UID. Default is (0x0008, 0x1155).
specific_tags – tags that will be loaded for “SEG” series. This argument will be used in monai.data.PydicomReader. Default is [(0x0008, 0x1115), (0x0008,0x1140), (0x3006, 0x0010), (0x0020,0x000D), (0x0010,0x0010), (0x0010,0x0020), (0x0020,0x0011), (0x0020,0x0012)].
fname_regex – a regular expression to match the file names when the input is a folder. If provided, only the matched files will be included. For example, to include the file name “image_0001.dcm”, the regular expression could be “.*image_(d+).dcm”. Default to “^(?!.*LICENSE).*”, ignoring any file name containing “LICENSE”.
val_frac – percentage of validation fraction in the whole dataset, default is 0.2.
seed – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.
cache_num – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate – percentage of cached data in total, default is 0.0 (no cache). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers – the number of worker threads if computing cache in the initialization. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
runtime_cache – whether to compute cache at the runtime, default to False to prepare the cache content at initialization. See: monai.data.CacheDataset.

Example:

# collection is "Pancreatic-CT-CBCT-SEG", seg_type is "RTSTRUCT"
data = TciaDataset(
    root_dir="./", collection="Pancreatic-CT-CBCT-SEG", seg_type="RTSTRUCT", download=True
)

# collection is "C4KC-KiTS", seg_type is "SEG", and load both images and segmentations
from monai.apps.tcia import TCIA_LABEL_DICT
transform = Compose(
    [
        LoadImaged(reader="PydicomReader", keys=["image", "seg"], label_dict=TCIA_LABEL_DICT["C4KC-KiTS"]),
        EnsureChannelFirstd(keys=["image", "seg"]),
        ResampleToMatchd(keys="image", key_dst="seg"),
    ]
)
data = TciaDataset(
    root_dir="./", collection="C4KC-KiTS", section="validation", seed=12345, download=True
)

print(data[0]["seg"].shape)

get_indices()[source]#

Get the indices of datalist used in this dataset.

Return type:: ndarray

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: None

class monai.apps.CrossValidation(dataset_cls, nfolds=5, seed=0, **dataset_params)[source]#

Cross validation dataset based on the general dataset which must have _split_datalist API.

Parameters:

dataset_cls (object) – dataset class to be used to create the cross validation partitions. It must have _split_datalist API.
nfolds (int) – number of folds to split the data for cross validation.
seed (int) – random seed to randomly shuffle the datalist before splitting into N folds, default is 0.
dataset_params (Any) – other additional parameters for the dataset_cls base class.

Example of 5 folds cross validation training:

cvdataset = CrossValidation(
    dataset_cls=DecathlonDataset,
    nfolds=5,
    seed=12345,
    root_dir="./",
    task="Task09_Spleen",
    section="training",
    transform=train_transform,
    download=True,
)
dataset_fold0_train = cvdataset.get_dataset(folds=[1, 2, 3, 4])
dataset_fold0_val = cvdataset.get_dataset(folds=0, transform=val_transform, download=False)
# execute training for fold 0 ...

dataset_fold1_train = cvdataset.get_dataset(folds=[0, 2, 3, 4])
dataset_fold1_val = cvdataset.get_dataset(folds=1, transform=val_transform, download=False)
# execute training for fold 1 ...

...

dataset_fold4_train = ...
# execute training for fold 4 ...

get_dataset(folds, **dataset_params)[source]#

Generate dataset based on the specified fold indices in the cross validation group.

Parameters:

folds – index of folds for training or validation, if a list of values, concatenate the data.
dataset_params – other additional parameters for the dataset_cls base class, will override the same parameters in self.dataset_params.

Clara MMARs#

monai.apps.download_mmar(item, mmar_dir=None, progress=True, api=True, version=-1)[source]#

Download and extract Medical Model Archive (MMAR) from Nvidia Clara Train.

Utilities#

monai.apps.check_hash(filepath, val=None, hash_type='md5')[source]#

Verify hash signature of specified file.

Parameters:

filepath – path of source file to verify hash value.
val – expected hash value of the file.
hash_type – type of hash algorithm to use, default is “md5”. The supported hash types are “md5”, “sha1”, “sha256”, “sha512”. See also: monai.apps.utils.SUPPORTED_HASH_TYPES.

monai.apps.download_url(url, filepath='', hash_val=None, hash_type='md5', progress=True, **gdown_kwargs)[source]#

Download file from specified URL link, support process bar and hash check.

Parameters:

url – source URL link to download file.
filepath – target filepath to save the downloaded file (including the filename). If undefined, os.path.basename(url) will be used.
hash_val – expected hash value to validate the downloaded file. if None, skip hash validation.
hash_type – ‘md5’ or ‘sha1’, defaults to ‘md5’.
progress – whether to display a progress bar.
gdown_kwargs – other args for gdown except for the url, output and quiet. these args will only be used if download from google drive. details of the args of it: wkentaro/gdown

Raises:

RuntimeError – When the hash validation of the filepath existing file fails.
RuntimeError – When a network issue or denied permission prevents the file download from url to filepath.
URLError – See urllib.request.urlretrieve.
HTTPError – See urllib.request.urlretrieve.
ContentTooShortError – See urllib.request.urlretrieve.
IOError – See urllib.request.urlretrieve.
RuntimeError – When the hash validation of the url downloaded file fails.

monai.apps.extractall(filepath, output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True)[source]#

Extract file to the output directory. Expected file types are: zip, tar.gz and tar.

Parameters:

filepath – the file path of compressed file.
output_dir – target directory to save extracted files.
hash_val – expected hash value to validate the compressed file. if None, skip hash validation.
hash_type – ‘md5’ or ‘sha1’, defaults to ‘md5’.
file_type – string of file type for decompressing. Leave it empty to infer the type from the filepath basename.
has_base – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.

Raises:

RuntimeError – When the hash validation of the filepath compressed file fails.
NotImplementedError – When the filepath file extension is not one of [zip”, “tar.gz”, “tar”].

monai.apps.download_and_extract(url, filepath='', output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True, progress=True)[source]#

Download file from URL and extract it to the output directory.

Parameters:

url – source URL link to download file.
filepath – the file path of the downloaded compressed file. use this option to keep the directly downloaded compressed file, to avoid further repeated downloads.
output_dir – target directory to save extracted files. default is the current directory.
hash_val – expected hash value to validate the downloaded file. if None, skip hash validation.
hash_type – ‘md5’ or ‘sha1’, defaults to ‘md5’.
file_type – string of file type for decompressing. Leave it empty to infer the type from url’s base file name.
has_base – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.
progress – whether to display progress bar.

Deepgrow#

monai.apps.deepgrow.dataset.create_dataset(datalist, output_dir, dimension, pixdim, image_key='image', label_key='label', base_dir=None, limit=0, relative_path=False, transforms=None)[source]#

Utility to pre-process and create dataset list for Deepgrow training over on existing one. The input data list is normally a list of images and labels (3D volume) that needs pre-processing for Deepgrow training pipeline.

Parameters:

datalist –
A list of data dictionary. Each entry should at least contain ‘image_key’: <image filename>. For example, typical input data can be a list of dictionaries:
```
[{'image': <image filename>, 'label': <label filename>}]
```
output_dir – target directory to store the training data for Deepgrow Training
pixdim – output voxel spacing.
dimension – dimension for Deepgrow training. It can be 2 or 3.
image_key – image key in input datalist. Defaults to ‘image’.
label_key – label key in input datalist. Defaults to ‘label’.
base_dir – base directory in case related path is used for the keys in datalist. Defaults to None.
limit – limit number of inputs for pre-processing. Defaults to 0 (no limit).
relative_path – output keys values should be based on relative path. Defaults to False.
transforms – explicit transforms to execute operations on input data.

Raises:

ValueError – When dimension is not one of [2, 3]
ValueError – When datalist is Empty

Returns:

A new datalist that contains path to the images/labels after pre-processing.

Example:

datalist = create_dataset(
    datalist=[{'image': 'img1.nii', 'label': 'label1.nii'}],
    base_dir=None,
    output_dir=output_2d,
    dimension=2,
    image_key='image',
    label_key='label',
    pixdim=(1.0, 1.0),
    limit=0,
    relative_path=True
)

print(datalist[0]["image"], datalist[0]["label"])

class monai.apps.deepgrow.interaction.Interaction(transforms, max_interactions, train, key_probability='probability')[source]#

Ignite process_function used to introduce interactions (simulation of clicks) for Deepgrow Training/Evaluation. For more details please refer to: https://pytorch.org/ignite/generated/ignite.engine.engine.Engine.html. This implementation is based on:

Sakinis et al., Interactive segmentation of medical images through fully convolutional neural networks. (2019) https://arxiv.org/abs/1903.08205

Parameters:

transforms – execute additional transformation during every iteration (before train). Typically, several Tensor based transforms composed by Compose.
max_interactions – maximum number of interactions per iteration
train – training or evaluation
key_probability – field name to fill probability for every interaction

class monai.apps.deepgrow.transforms.AddInitialSeedPointd(label='label', guidance='guidance', sids='sids', sid='sid', connected_regions=5)[source]#

Add random guidance as initial seed point for a given label.

Note that the label is of size (C, D, H, W) or (C, H, W)

The guidance is of size (2, N, # of dims) where N is number of guidance added. # of dims = 4 when C, D, H, W; # of dims = 3 when (C, H, W)

Parameters:

label (str) – label source.
guidance (str) – key to store guidance.
sids (str) – key that represents list of valid slice indices for the given label.
sid (str) – key that represents the slice to add initial seed point. If not present, random sid will be chosen.
connected_regions (int) – maximum connected regions to use for adding initial points.

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.

class monai.apps.deepgrow.transforms.AddGuidanceSignald(image='image', guidance='guidance', sigma=2, number_intensity_ch=1)[source]#

Add Guidance signal for input image.

Based on the “guidance” points, apply gaussian to them and add them as new channel for input image.

Parameters:

image (str) – key to the image source.
guidance (str) – key to store guidance.
sigma (int) – standard deviation for Gaussian kernel.
number_intensity_ch (int) – channel index.

class monai.apps.deepgrow.transforms.AddRandomGuidanced(guidance='guidance', discrepancy='discrepancy', probability='probability')[source]#

Add random guidance based on discrepancies that were found between label and prediction. input shape is as below: Guidance is of shape (2, N, # of dim) Discrepancy is of shape (2, C, D, H, W) or (2, C, H, W) Probability is of shape (1)

Parameters:

guidance (str) – key to guidance source.
discrepancy (str) – key that represents discrepancies found between label and prediction.
probability (str) – key that represents click/interaction probability.

randomize(data=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.

class monai.apps.deepgrow.transforms.AddGuidanceFromPointsd(ref_image, guidance='guidance', foreground='foreground', background='background', axis=0, depth_first=True, spatial_dims=2, slice_key='slice', meta_keys=None, meta_key_postfix='meta_dict')[source]#

Add guidance based on user clicks.

We assume the input is loaded by LoadImaged and has the shape of (H, W, D) originally. Clicks always specify the coordinates in (H, W, D)

If depth_first is True:

Input is now of shape (D, H, W), will return guidance that specifies the coordinates in (D, H, W)

else:

Input is now of shape (H, W, D), will return guidance that specifies the coordinates in (H, W, D)

Parameters:

ref_image – key to reference image to fetch current and original image details.
guidance – output key to store guidance.
foreground – key that represents user foreground (+ve) clicks.
background – key that represents user background (-ve) clicks.
axis – axis that represents slices in 3D volume. (axis to Depth)
depth_first – if depth (slices) is positioned at first dimension.
spatial_dims – dimensions based on model used for deepgrow (2D vs 3D).
slice_key – key that represents applicable slice to add guidance.
meta_keys – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.
meta_key_postfix – if meta_key is None, use {ref_image}_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

class monai.apps.deepgrow.transforms.SpatialCropForegroundd(keys, source_key, spatial_size, select_fn=<function is_positive>, channel_indices=None, margin=0, allow_smaller=True, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Crop only the foreground object of the expected images.

Difference VS monai.transforms.CropForegroundd:

If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.

This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]

The typical usage is to help training and evaluation if the valid part is small in the whole medical image. The valid part can be determined by any field in the data with source_key, for example:

Select values > 0 in image field as the foreground and crop on all fields specified by keys.
Select label = 3 in label field as the foreground to crop on all fields specified by keys.
Select label > 0 in the third channel of a One-Hot label field as the foreground to crop all keys fields.

Users can define arbitrary function to select expected foreground from the whole source image or specified channels. And it can also add margin to every dim of the bounding box of foreground object.

Parameters:

keys – keys of the corresponding items to be transformed. See also: monai.transforms.MapTransform
source_key – data source to generate the bounding box of foreground, can be image or label, etc.
spatial_size – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.
select_fn – function to select expected foreground, default is to select values > 0.
channel_indices – if defined, select foreground only on the specified channels of image. if None, select foreground on the whole image.
margin – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
allow_smaller – when computing box size with margin, whether allow the image size to be smaller than box size, default to True. if the margined size is bigger than image size, will pad with specified mode.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_keys is None, use {key}_{meta_key_postfix} to fetch/store the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key – key to record the start coordinate of spatial bounding box for foreground.
end_coord_key – key to record the end coordinate of spatial bounding box for foreground.
original_shape_key – key to record original shape for foreground.
cropped_shape_key – key to record cropped shape for foreground.
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.SpatialCropGuidanced(keys, guidance, spatial_size, margin=20, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Crop image based on guidance with minimal spatial size.

If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.
This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]

Input data is of shape (C, spatial_1, [spatial_2, …])

Parameters:

keys – keys of the corresponding items to be transformed.
guidance – key to the guidance. It is used to generate the bounding box of foreground
spatial_size – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.
margin – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key – key to record the start coordinate of spatial bounding box for foreground.
end_coord_key – key to record the end coordinate of spatial bounding box for foreground.
original_shape_key – key to record original shape for foreground.
cropped_shape_key – key to record cropped shape for foreground.
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.RestoreLabeld(keys, ref_image, slice_only=False, mode=nearest, align_corners=None, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Restores label based on the ref image.

The ref_image is assumed that it went through the following transforms:

Fetch2DSliced (If 2D)

Spacingd

SpatialCropGuidanced

Resized

And its shape is assumed to be (C, D, H, W)

This transform tries to undo these operation so that the result label can be overlapped with original volume. It does the following operation:

Undo Resized

Undo SpatialCropGuidanced

Undo Spacingd

Undo Fetch2DSliced

The resulting label is of shape (D, H, W)

Parameters:

keys – keys of the corresponding items to be transformed.
ref_image – reference image to fetch current and original image details
slice_only – apply only to an applicable slice, in case of 2D model/prediction
mode – {"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} One of the listed string values or a user supplied function for padding. Defaults to "constant". See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html
align_corners – Geometrically, we consider the pixels of the input as squares rather than points. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html It also can be a sequence of bool, each element corresponds to a key in keys.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_key is None, use key_{meta_key_postfix} to fetch the metadata according to the key data, default is `meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key – key that records the start coordinate of spatial bounding box for foreground.
end_coord_key – key that records the end coordinate of spatial bounding box for foreground.
original_shape_key – key that records original shape for foreground.
cropped_shape_key – key that records cropped shape for foreground.
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.ResizeGuidanced(guidance, ref_image, meta_keys=None, meta_key_postfix='meta_dict', cropped_shape_key='foreground_cropped_shape')[source]#

Resize the guidance based on cropped vs resized image.

This transform assumes that the images have been cropped and resized. And the shape after cropped is store inside the meta dict of ref image.

Parameters:

guidance – key to guidance
ref_image – key to reference image to fetch current and original image details
meta_keys – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.
meta_key_postfix – if meta_key is None, use {ref_image}_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
cropped_shape_key – key that records cropped shape for foreground.

class monai.apps.deepgrow.transforms.FindDiscrepancyRegionsd(label='label', pred='pred', discrepancy='discrepancy')[source]#

Find discrepancy between prediction and actual during click interactions during training.

Parameters:

label (str) – key to label source.
pred (str) – key to prediction source.
discrepancy (str) – key to store discrepancies found between label and prediction.

class monai.apps.deepgrow.transforms.FindAllValidSlicesd(label='label', sids='sids')[source]#

Find/List all valid slices in the label. Label is assumed to be a 4D Volume with shape CDHW, where C=1.

Parameters:

label (str) – key to the label source.
sids (str) – key to store slices indices having valid label map.

class monai.apps.deepgrow.transforms.Fetch2DSliced(keys, guidance='guidance', axis=0, meta_keys=None, meta_key_postfix='meta_dict', allow_missing_keys=False)[source]#

Fetch one slice in case of a 3D volume.

The volume only contains spatial coordinates.

Parameters:

keys – keys of the corresponding items to be transformed.
guidance – key that represents guidance.
axis – axis that represents slice in 3D volume.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – use key_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
allow_missing_keys – don’t raise exception if key is missing.

Pathology#

class monai.apps.pathology.inferers.SlidingWindowHoVerNetInferer(roi_size, sw_batch_size=1, overlap=0.25, mode=constant, sigma_scale=0.125, padding_mode=constant, cval=0.0, sw_device=None, device=None, progress=False, cache_roi_weight_map=False, cpu_thresh=None, extra_input_padding=None)[source]#

Sliding window method for HoVerNet model inference, with sw_batch_size windows for every model.forward(). Usage example can be found in the monai.inferers.Inferer base class.

Parameters:

roi_size – the window size to execute SlidingWindow evaluation. If it has non-positive components, the corresponding inputs size will be used. if the components of the roi_size are non-positive values, the transform will use the corresponding components of img size. For example, roi_size=(32, -1) will be adapted to (32, 64) if the second spatial dimension size of img is 64.
sw_batch_size – the batch size to run window slices.
overlap – Amount of overlap between scans.
mode –
{"constant", "gaussian"} How to blend output of overlapping windows. Defaults to "constant".
- "constant”: gives equal weight to all predictions.
- "gaussian”: gives less weight to predictions on edges of windows.
sigma_scale – the standard deviation coefficient of the Gaussian window when mode is "gaussian". Default: 0.125. Actual window sigma is sigma_scale * dim_size. When sigma_scale is a sequence of floats, the values denote sigma_scale at the corresponding spatial dimensions.
padding_mode – {"constant", "reflect", "replicate", "circular"} Padding mode when roi_size is larger than inputs. Defaults to "constant" See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
cval – fill value for ‘constant’ padding mode. Default: 0
sw_device – device for the window data. By default the device (and accordingly the memory) of the inputs is used. Normally sw_device should be consistent with the device where predictor is defined.
device – device for the stitched output prediction. By default the device (and accordingly the memory) of the inputs is used. If for example set to device=torch.device(‘cpu’) the gpu memory consumption is less and independent of the inputs and roi_size. Output is on the device.
progress – whether to print a tqdm progress bar.
cache_roi_weight_map – whether to pre-compute the ROI weight map.
cpu_thresh – when provided, dynamically switch to stitching on cpu (to save gpu memory) when input image volume is larger than this threshold (in pixels/voxels). Otherwise use "device". Thus, the output may end-up on either cpu or gpu.
extra_input_padding – the amount of padding for the input image, which is a tuple of even number of pads. Refer to to the pad argument of torch.nn.functional.pad for more details.

Note

sw_batch_size denotes the max number of windows per network inference iteration, not the batch size of inputs.

class monai.apps.pathology.losses.hovernet_loss.HoVerNetLoss(lambda_hv_mse=2.0, lambda_hv_mse_grad=1.0, lambda_np_ce=1.0, lambda_np_dice=1.0, lambda_nc_ce=1.0, lambda_nc_dice=1.0)[source]#

Loss function for HoVerNet pipeline, which is combination of losses across the three branches. The NP (nucleus prediction) branch uses Dice + CrossEntropy. The HV (Horizontal and Vertical) distance from centroid branch uses MSE + MSE of the gradient. The NC (Nuclear Class prediction) branch uses Dice + CrossEntropy The result is a weighted sum of these losses.

Parameters:

lambda_hv_mse (float) – Weight factor to apply to the HV regression MSE part of the overall loss
lambda_hv_mse_grad (float) – Weight factor to apply to the MSE of the HV gradient part of the overall loss
lambda_np_ce (float) – Weight factor to apply to the nuclei prediction CrossEntropyLoss part of the overall loss
lambda_np_dice (float) – Weight factor to apply to the nuclei prediction DiceLoss part of overall loss
lambda_nc_ce (float) – Weight factor to apply to the nuclei class prediction CrossEntropyLoss part of the overall loss
lambda_nc_dice (float) – Weight factor to apply to the nuclei class prediction DiceLoss part of the overall loss

forward(prediction, target)[source]#

Parameters:

prediction (dict[str, Tensor]) – dictionary of predicted outputs for three branches, each of which should have the shape of BNHW.
target (dict[str, Tensor]) – dictionary of ground truths for three branches, each of which should have the shape of BNHW.

Return type:

Tensor

class monai.apps.pathology.metrics.LesionFROC(data, grow_distance=75, itc_diameter=200, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8), nms_sigma=0.0, nms_prob_threshold=0.5, nms_box_size=48, image_reader_name='cuCIM')[source]#

Evaluate with Free Response Operating Characteristic (FROC) score.

Parameters:

data (list[dict]) – either the list of dictionaries containing probability maps (inference result) and tumor mask (ground truth), as below, or the path to a json file containing such list. { “prob_map”: “path/to/prob_map_1.npy”, “tumor_mask”: “path/to/ground_truth_1.tiff”, “level”: 6, “pixel_spacing”: 0.243 }
grow_distance (int) – Euclidean distance (in micrometer) by which to grow the label the ground truth’s tumors. Defaults to 75, which is the equivalent size of 5 tumor cells.
itc_diameter (int) – the maximum diameter of a region (in micrometer) to be considered as an isolated tumor cell. Defaults to 200.
eval_thresholds (tuple) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.
nms_sigma (float) – the standard deviation for gaussian filter of non-maximal suppression. Defaults to 0.0.
nms_prob_threshold (float) – the probability threshold of non-maximal suppression. Defaults to 0.5.
nms_box_size (int) – the box size (in pixel) to be removed around the pixel for non-maximal suppression.
image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.

Note

For more info on nms_* parameters look at monai.utils.prob_nms.ProbNMS`.

compute_fp_tp()[source]#: Compute false positive and true positive probabilities for tumor detection, by comparing the model outputs with the prepared ground truths for all samples

evaluate()[source]#: Evaluate the detection performance of a model based on the model probability map output, the ground truth tumor mask, and their associated metadata (e.g., pixel_spacing, level)

prepare_ground_truth(sample)[source]#: Prepare the ground truth for evaluation based on the binary tumor mask

prepare_inference_result(sample)[source]#

Prepare the probability map for detection evaluation.

Return type:: tuple[ndarray, ndarray, ndarray]

monai.apps.pathology.utils.compute_multi_instance_mask(mask, threshold)[source]#

This method computes the segmentation mask according to the binary tumor mask.

Parameters:

mask (ndarray) – the binary mask array
threshold (float) – the threshold to fill holes

Return type:

Any

monai.apps.pathology.utils.compute_isolated_tumor_cells(tumor_mask, threshold)[source]#

This method computes identifies Isolated Tumor Cells (ITC) and return their labels.

Parameters:

tumor_mask (ndarray) – the tumor mask.
threshold (float) – the threshold (at the mask level) to define an isolated tumor cell (ITC). A region with the longest diameter less than this threshold is considered as an ITC.

Return type:

list[int]

class monai.apps.pathology.utils.PathologyProbNMS(spatial_dims=2, sigma=0.0, prob_threshold=0.5, box_size=48)[source]#: This class extends monai.utils.ProbNMS and add the resolution option for Pathology.

class monai.apps.pathology.transforms.stain.array.ExtractHEStains(tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308))[source]#

Class to extract a target stain from an image, using stain deconvolution (see Note).

Parameters:

tli – transmitted light intensity. Defaults to 240.
alpha – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta – absorbance threshold for transparent pixels. Defaults to 0.15
max_cref – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).

Note

For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:

MATLAB: mitkovetta/staining-normalization

Python: schaugf/HEnorm_python

class monai.apps.pathology.transforms.stain.array.NormalizeHEStains(tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308))[source]#

Class to normalize patches/images to a reference or target image stain (see Note).

Performs stain deconvolution of the source image using the ExtractHEStains class, to obtain the stain matrix and calculate the stain concentration matrix for the image. Then, performs the inverse Beer-Lambert transform to recreate the patch using the target H&E stain matrix provided. If no target stain provided, a default reference stain is used. Similarly, if no maximum stain concentrations are provided, a reference maximum stain concentrations matrix is used.

Parameters:

tli – transmitted light intensity. Defaults to 240.
alpha – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta – absorbance threshold for transparent pixels. Defaults to 0.15.
target_he – target stain matrix. Defaults to ((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)).
max_cref – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to [1.9705, 1.0308].

Note

For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:

MATLAB: mitkovetta/staining-normalization

Python: schaugf/HEnorm_python

A collection of dictionary-based wrappers around the pathology transforms defined in monai.apps.pathology.transforms.array.

Class names are ended with ‘d’ to denote dictionary-based transforms.

class monai.apps.pathology.transforms.stain.dictionary.ExtractHEStainsd(keys, tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.ExtractHEStains. Class to extract a target stain from an image, using stain deconvolution.

Parameters:

keys – keys of the corresponding items to be transformed. See also: monai.transforms.compose.MapTransform
tli – transmitted light intensity. Defaults to 240.
alpha – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta – absorbance threshold for transparent pixels. Defaults to 0.15
max_cref – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.stain.dictionary.NormalizeHEStainsd(keys, tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.NormalizeHEStains.

Class to normalize patches/images to a reference or target image stain.

Parameters:

keys – keys of the corresponding items to be transformed. See also: monai.transforms.compose.MapTransform
tli – transmitted light intensity. Defaults to 240.
alpha – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta – absorbance threshold for transparent pixels. Defaults to 0.15.
target_he – target stain matrix. Defaults to None.
max_cref – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to None.
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.post.array.GenerateSuccinctContour(height, width)[source]#

Converts SciPy-style contours (generated by skimage.measure.find_contours) to a more succinct version which only includes the pixels to which lines need to be drawn (i.e. not the intervening pixels along each line).

Parameters:

height (int) – height of bounding box, used to detect direction of line segment.
width (int) – width of bounding box, used to detect direction of line segment.

Returns:

the pixels that need to be joined by straight lines to describe the outmost pixels of the foreground similar to: OpenCV’s cv.CHAIN_APPROX_SIMPLE (counterclockwise)

class monai.apps.pathology.transforms.post.array.GenerateInstanceContour(min_num_points=3, contour_level=None)[source]#

Generate contour for each instance in a 2D array. Use GenerateSuccinctContour to only include the pixels to which lines need to be drawn

Parameters:

min_num_points – assumed that the created contour does not form a contour if it does not contain more points than the specified value. Defaults to 3.
contour_level – an optional value for skimage.measure.find_contours to find contours in the array. If not provided, the level is set to (max(image) + min(image)) / 2.

class monai.apps.pathology.transforms.post.array.GenerateInstanceCentroid(dtype=<class 'int'>)[source]#

Generate instance centroid using skimage.measure.centroid.

Parameters:: dtype – the data type of output centroid.

class monai.apps.pathology.transforms.post.array.GenerateInstanceType[source]#: Generate instance type and probability for each instance.

class monai.apps.pathology.transforms.post.array.Watershed(connectivity=1, dtype=<class 'numpy.int64'>)[source]#

Use skimage.segmentation.watershed to get instance segmentation results from images. See: https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.watershed.

Parameters:

connectivity – an array with the same number of dimensions as image whose non-zero elements indicate neighbors for connection. Following the scipy convention, default is a one-connected array of the dimension of the image.
dtype – target data content type to convert, default is np.int64.

class monai.apps.pathology.transforms.post.array.GenerateWatershedMask(activation='softmax', threshold=None, min_object_size=10, dtype=<class 'numpy.uint8'>)[source]#

generate mask used in watershed. Only points at which mask == True will be labeled.

Parameters:

activation – the activation layer to be applied on the input probability map. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
threshold – an optional float value to threshold to binarize probability map. If not provided, defaults to 0.5 when activation is not “softmax”, otherwise None.
min_object_size – objects smaller than this size (in pixel) are removed. Defaults to 10.
dtype – target data content type to convert, default is np.uint8.

class monai.apps.pathology.transforms.post.array.GenerateInstanceBorder(kernel_size=5, dtype=<class 'numpy.float32'>)[source]#

Generate instance border by hover map. The more parts of the image that cannot be identified as foreground areas, the larger the grey scale value. The grey value of the instance’s border will be larger.

Parameters:

kernel_size (int) – the size of the Sobel kernel. Defaults to 5.
dtype (Union[dtype, type, str, None]) – target data type to convert to. Defaults to np.float32.

Raises:

ValueError – when the mask shape is not [1, H, W].
ValueError – when the hover_map shape is not [2, H, W].

class monai.apps.pathology.transforms.post.array.GenerateDistanceMap(smooth_fn=None, dtype=<class 'numpy.float32'>)[source]#

Generate distance map. In general, the instance map is calculated from the distance to the background. Here, we use 1 - “instance border map” to generate the distance map. Nuclei values form mountains so invert them to get basins.

Parameters:

smooth_fn – smoothing function for distance map, which can be any callable object. If not provided monai.transforms.GaussianSmooth() is used.
dtype – target data type to convert to. Defaults to np.float32.

class monai.apps.pathology.transforms.post.array.GenerateWatershedMarkers(threshold=0.4, radius=2, min_object_size=10, postprocess_fn=None, dtype=<class 'numpy.int64'>)[source]#

Generate markers to be used in watershed. The watershed algorithm treats pixels values as a local topography (elevation). The algorithm floods basins from the markers until basins attributed to different markers meet on watershed lines. Generally, markers are chosen as local minima of the image, from which basins are flooded. Here is the implementation from HoVerNet paper. For more details refer to papers: https://arxiv.org/abs/1812.06499.

Parameters:

threshold – a float value to threshold to binarize instance border map. It turns uncertain area to 1 and other area to 0. Defaults to 0.4.
radius – the radius of the disk-shaped footprint used in opening. Defaults to 2.
min_object_size – objects smaller than this size (in pixel) are removed. Defaults to 10.
postprocess_fn – additional post-process function on the markers. If not provided, monai.transforms.post.FillHoles() will be used.
dtype – target data type to convert to. Defaults to np.int64.

class monai.apps.pathology.transforms.post.array.HoVerNetNuclearTypePostProcessing(activation='softmax', threshold=None, return_type_map=True, device=None)[source]#

The post-processing transform for HoVerNet model to generate nuclear type information. It updates the input instance info dictionary with information about types of the nuclei (value and probability). Also if requested (return_type_map=True), it generates a pixel-level type map.

Parameters:

activation – the activation layer to be applied on nuclear type branch. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
threshold – an optional float value to threshold to binarize probability map. If not provided, defaults to 0.5 when activation is not “softmax”, otherwise None.
return_type_map – whether to calculate and return pixel-level type map.
device – target device to put the output Tensor data.

class monai.apps.pathology.transforms.post.array.HoVerNetInstanceMapPostProcessing(activation='softmax', mask_threshold=None, min_object_size=10, sobel_kernel_size=5, distance_smooth_fn=None, marker_threshold=0.4, marker_radius=2, marker_postprocess_fn=None, watershed_connectivity=1, min_num_points=3, contour_level=None, device=None)[source]#

The post-processing transform for HoVerNet model to generate instance segmentation map. It generates an instance segmentation map as well as a dictionary containing centroids, bounding boxes, and contours for each instance.

Parameters:

activation – the activation layer to be applied on the input probability map. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
mask_threshold – a float value to threshold to binarize probability map to generate mask.
min_object_size – objects smaller than this size (in pixel) are removed. Defaults to 10.
sobel_kernel_size – the size of the Sobel kernel used in GenerateInstanceBorder. Defaults to 5.
distance_smooth_fn – smoothing function for distance map. If not provided, monai.transforms.intensity.GaussianSmooth() will be used.
marker_threshold – a float value to threshold to binarize instance border map for markers. It turns uncertain area to 1 and other area to 0. Defaults to 0.4.
marker_radius – the radius of the disk-shaped footprint used in opening of markers. Defaults to 2.
marker_postprocess_fn – post-process function for watershed markers. If not provided, monai.transforms.post.FillHoles() will be used.
watershed_connectivity – connectivity argument of skimage.segmentation.watershed.
min_num_points – minimum number of points to be considered as a contour. Defaults to 3.
contour_level – an optional value for skimage.measure.find_contours to find contours in the array. If not provided, the level is set to (max(image) + min(image)) / 2.
device – target device to put the output Tensor data.

class monai.apps.pathology.transforms.post.dictionary.GenerateSuccinctContourd(keys, height, width, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.post.array.GenerateSuccinctContour. Converts SciPy-style contours (generated by skimage.measure.find_contours) to a more succinct version which only includes the pixels to which lines need to be drawn (i.e. not the intervening pixels along each line).

Parameters:

keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed.
height (int) – height of bounding box, used to detect direction of line segment.
width (int) – width of bounding box, used to detect direction of line segment.
allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.post.dictionary.GenerateInstanceContourd(keys, contour_key_postfix='contour', offset_key=None, min_num_points=3, level=None, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.post.array.GenerateInstanceContour. Generate contour for each instance in a 2D array. Use GenerateSuccinctContour to only include the pixels to which lines need to be drawn

Parameters:

keys – keys of the corresponding items to be transformed.
contour_key_postfix – the output contour coordinates will be written to the value of {key}_{contour_key_postfix}.
offset_key – keys of offset used in GenerateInstanceContour.
min_num_points – assumed that the created contour does not form a contour if it does not contain more points than the specified value. Defaults to 3.
level – optional. Value along which to find contours in the array. By default, the level is set to (max(image) + min(image)) / 2.
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.post.dictionary.GenerateInstanceCentroidd(keys, centroid_key_postfix='centroid', offset_key=None, dtype=<class 'int'>, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.post.array.GenerateInstanceCentroid. Generate instance centroid using skimage.measure.centroid.

Parameters:

keys – keys of the corresponding items to be transformed.
centroid_key_postfix – the output centroid coordinates will be written to the value of {key}_{centroid_key_postfix}.
offset_key – keys of offset used in GenerateInstanceCentroid.
dtype – the data type of output centroid.
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.post.dictionary.GenerateInstanceTyped(keys, type_info_key='type_info', bbox_key='bbox', seg_pred_key='seg', instance_id_key='id', allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.post.array.GenerateInstanceType. Generate instance type and probability for each instance.

Parameters:

keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed.
type_info_key (str) – the output instance type and probability will be written to the value of {type_info_key}.
bbox_key (str) – keys of bounding box.
seg_pred_key (str) – keys of segmentation prediction map.
instance_id_key (str) – keys of instance id.
allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.post.dictionary.Watershedd(keys, mask_key='mask', markers_key=None, connectivity=1, dtype=<class 'numpy.uint8'>, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.array.Watershed. Use skimage.segmentation.watershed to get instance segmentation results from images. See: https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.watershed.

Parameters:

keys – keys of the corresponding items to be transformed. See also: monai.transforms.MapTransform
mask_key – keys of mask used in watershed. Only points at which mask == True will be labeled.
markers_key – keys of markers used in watershed. If None (no markers given), the local minima of the image are used as markers.
connectivity – An array with the same number of dimensions as image whose non-zero elements indicate neighbors for connection. Following the scipy convention, default is a one-connected array of the dimension of the image.
dtype – target data content type to convert. Defaults to np.uint8.
allow_missing_keys – don’t raise exception if key is missing.

Raises:

ValueError – when the image shape is not [1, H, W].
ValueError – when the mask shape is not [1, H, W].

class monai.apps.pathology.transforms.post.dictionary.GenerateWatershedMaskd(keys, mask_key='mask', activation='softmax', threshold=None, min_object_size=10, dtype=<class 'numpy.uint8'>, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.array.GenerateWatershedMask.

Parameters:

keys – keys of the corresponding items to be transformed.
mask_key – the mask will be written to the value of {mask_key}.
activation – the activation layer to be applied on nuclear type branch. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
threshold – if not None, threshold the float values to int number 0 or 1 with specified threshold.
min_object_size – objects smaller than this size are removed. Defaults to 10.
dtype – target data content type to convert, default is np.uint8.
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.post.dictionary.GenerateInstanceBorderd(mask_key='mask', hover_map_key='hover_map', border_key='border', kernel_size=21, dtype=<class 'numpy.float32'>)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.array.GenerateInstanceBorder.

Parameters:

mask_key (str) – the input key where the watershed mask is stored. Defaults to “mask”.
hover_map_key (str) – the input key where hover map is stored. Defaults to “hover_map”.
border_key (str) – the output key where instance border map is written. Defaults to “border”.
kernel_size (int) – the size of the Sobel kernel. Defaults to 21.
dtype (Union[dtype, type, str, None]) – target data content type to convert, default is np.float32.
allow_missing_keys – don’t raise exception if key is missing.

Raises:

ValueError – when the hover_map has only one value.
ValueError – when the sobel gradient map has only one value.

class monai.apps.pathology.transforms.post.dictionary.GenerateDistanceMapd(mask_key='mask', border_key='border', dist_map_key='dist_map', smooth_fn=None, dtype=<class 'numpy.float32'>)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.array.GenerateDistanceMap.

Parameters:

mask_key – the input key where the watershed mask is stored. Defaults to “mask”.
border_key – the input key where instance border map is stored. Defaults to “border”.
dist_map_key – the output key where distance map is written. Defaults to “dist_map”.
smooth_fn – smoothing function for distance map, which can be any callable object. If not provided monai.transforms.GaussianSmooth() is used.
dtype – target data content type to convert, default is np.float32.

class monai.apps.pathology.transforms.post.dictionary.GenerateWatershedMarkersd(mask_key='mask', border_key='border', markers_key='markers', threshold=0.4, radius=2, min_object_size=10, postprocess_fn=None, dtype=<class 'numpy.uint8'>)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.array.GenerateWatershedMarkers.

Parameters:

mask_key – the input key where the watershed mask is stored. Defaults to “mask”.
border_key – the input key where instance border map is stored. Defaults to “border”.
markers_key – the output key where markers is written. Defaults to “markers”.
threshold – threshold the float values of instance border map to int 0 or 1 with specified threshold. It turns uncertain area to 1 and other area to 0. Defaults to 0.4.
radius – the radius of the disk-shaped footprint used in opening. Defaults to 2.
min_object_size – objects smaller than this size are removed. Defaults to 10.
postprocess_fn – execute additional post transformation on marker. Defaults to None.
dtype – target data content type to convert, default is np.uint8.
allow_missing_keys – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.post.dictionary.HoVerNetInstanceMapPostProcessingd(nuclear_prediction_key='nucleus_prediction', hover_map_key='horizontal_vertical', instance_info_key='instance_info', instance_map_key='instance_map', activation='softmax', mask_threshold=None, min_object_size=10, sobel_kernel_size=5, distance_smooth_fn=None, marker_threshold=0.4, marker_radius=2, marker_postprocess_fn=None, watershed_connectivity=1, min_num_points=3, contour_level=None, device=None)[source]#

Dictionary-based wrapper for monai.apps.pathology.transforms.post.array.HoVerNetInstanceMapPostProcessing. The post-processing transform for HoVerNet model to generate instance segmentation map. It generates an instance segmentation map as well as a dictionary containing centroids, bounding boxes, and contours for each instance.

Parameters:

nuclear_prediction_key – the key for HoVerNet NP (nuclear prediction) branch. Defaults to HoVerNetBranch.NP.
hover_map_key – the key for HoVerNet NC (nuclear prediction) branch. Defaults to HoVerNetBranch.HV.
instance_info_key – the output key where instance information (contour, bounding boxes, and centroids) is written. Defaults to “instance_info”.
instance_map_key – the output key where instance map is written. Defaults to “instance_map”.
activation – the activation layer to be applied on the input probability map. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
mask_threshold – a float value to threshold to binarize probability map to generate mask.
min_object_size – objects smaller than this size are removed. Defaults to 10.
sobel_kernel_size – the size of the Sobel kernel used in GenerateInstanceBorder. Defaults to 5.
distance_smooth_fn – smoothing function for distance map. If not provided, monai.transforms.intensity.GaussianSmooth() will be used.
marker_threshold – a float value to threshold to binarize instance border map for markers. It turns uncertain area to 1 and other area to 0. Defaults to 0.4.
marker_radius – the radius of the disk-shaped footprint used in opening of markers. Defaults to 2.
marker_postprocess_fn – post-process function for watershed markers. If not provided, monai.transforms.post.FillHoles() will be used.
watershed_connectivity – connectivity argument of skimage.segmentation.watershed.
min_num_points – minimum number of points to be considered as a contour. Defaults to 3.
contour_level – an optional value for skimage.measure.find_contours to find contours in the array. If not provided, the level is set to (max(image) + min(image)) / 2.
device – target device to put the output Tensor data.

class monai.apps.pathology.transforms.post.dictionary.HoVerNetNuclearTypePostProcessingd(type_prediction_key='type_prediction', instance_info_key='instance_info', instance_map_key='instance_map', type_map_key='type_map', activation='softmax', threshold=None, return_type_map=True, device=None)[source]#

Dictionary-based wrapper for monai.apps.pathology.transforms.post.array.HoVerNetNuclearTypePostProcessing. It updates the input instance info dictionary with information about types of the nuclei (value and probability). Also if requested (return_type_map=True), it generates a pixel-level type map.

Parameters:

type_prediction_key – the key for HoVerNet NC (type prediction) branch. Defaults to HoVerNetBranch.NC.
instance_info_key – the key where instance information (contour, bounding boxes, and centroids) is stored. Defaults to “instance_info”.
instance_map_key – the key where instance map is stored. Defaults to “instance_map”.
type_map_key – the output key where type map is written. Defaults to “type_map”.
device – target device to put the output Tensor data.

Detection#

Hard Negative Sampler#

The functions in this script are adapted from nnDetection, MIC-DKFZ/nnDetection

class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#

HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it selects negative samples with high prediction scores.

The training workflow is described as the follows: 1) forward network and get prediction scores (classification prob/logits) for all the samples; 2) use hard negative sampler to choose negative samples with high prediction scores and some positive samples; 3) compute classification loss for the selected samples; 4) do back propagation.

Parameters:

batch_size_per_image (int) – number of training samples to be randomly selected per image
positive_fraction (float) – percentage of positive elements in the selected samples
min_neg (int) – minimum number of negative samples to select if possible.
pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

get_num_neg(negative, num_pos)[source]#

Sample enough negatives to fill up self.batch_size_per_image

Parameters:

negative (Tensor) – indices of positive samples
num_pos (int) – number of positive samples to draw

Return type:

int

Returns:

number of negative samples

get_num_pos(positive)[source]#

Number of positive samples to draw

Parameters:: positive (Tensor) – indices of positive samples
Return type:: int
Returns:: number of positive sample

select_positives(positive, num_pos, labels)[source]#

Select positive samples

Parameters:

positive (Tensor) – indices of positive samples, sized (P,), where P is the number of positive samples
num_pos (int) – number of positive samples to sample
labels (Tensor) – labels for all samples, sized (A,), where A is the number of samples.

Return type:

Tensor

Returns:

binary mask of positive samples to choose, sized (A,),: where A is the number of samples in one image

select_samples_img_list(target_labels, fg_probs)[source]#

Select positives and hard negatives from list samples per image. Hard negative sampler will be applied to each image independently.

Parameters:

target_labels (list[Tensor]) – list of labels per image. For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i. Positive samples have positive labels, negative samples have label 0.
fg_probs (list[Tensor]) – list of maximum foreground probability per images, For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i.

Return type:

tuple[list[Tensor], list[Tensor]]

Returns:

list of binary mask for positive samples
list binary mask for negative samples

Example

sampler = HardNegativeSampler(
    batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2
)
# two images with different number of samples
target_labels = [ torch.tensor([0,1]), torch.tensor([1,0,2,1])]
fg_probs = [ torch.rand(2), torch.rand(4)]
pos_idx_list, neg_idx_list = sampler.select_samples_img_list(target_labels, fg_probs)

select_samples_per_img(labels_per_img, fg_probs_per_img)[source]#

Select positives and hard negatives from samples.

Parameters:

labels_per_img (Tensor) – labels, sized (A,). Positive samples have positive labels, negative samples have label 0.
fg_probs_per_img (Tensor) – maximum foreground probability, sized (A,)

Return type:

tuple[Tensor, Tensor]

Returns:

binary mask for positive samples, sized (A,)
binary mask for negative samples, sized (A,)

Example

sampler = HardNegativeSampler(
    batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2
)
# two images with different number of samples
target_labels = torch.tensor([1,0,2,1])
fg_probs = torch.rand(4)
pos_idx, neg_idx = sampler.select_samples_per_img(target_labels, fg_probs)

class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSamplerBase(pool_size=10)[source]#

Base class of hard negative sampler.

Hard negative sampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.

Parameters:: pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

select_negatives(negative, num_neg, fg_probs)[source]#

Select hard negative samples.

Parameters:

negative (Tensor) – indices of all the negative samples, sized (P,), where P is the number of negative samples
num_neg (int) – number of negative samples to sample
fg_probs (Tensor) – maximum foreground prediction scores (probability) across all the classes for each sample, sized (A,), where A is the number of samples.

Return type:

Tensor

Returns:

binary mask of negative samples to choose, sized (A,),: where A is the number of samples in one image

RetinaNet Network#

Part of this script is adapted from pytorch/vision

class monai.apps.detection.networks.retinanet_network.RetinaNet(spatial_dims, num_classes, num_anchors, feature_extractor, size_divisible=1, use_list_output=False)[source]#

The network used in RetinaNet.

It takes an image tensor as inputs, and outputs either 1) a dictionary head_outputs. head_outputs[self.cls_key] is the predicted classification maps, a list of Tensor. head_outputs[self.box_reg_key] is the predicted box regression maps, a list of Tensor. or 2) a list of 2N tensors head_outputs, with first N tensors being the predicted classification maps and second N tensors being the predicted box regression maps.

Parameters:

spatial_dims – number of spatial dimensions of the images. We support both 2D and 3D images.
num_classes – number of output classes of the model (excluding the background).
num_anchors – number of anchors at each location.
feature_extractor – a network that outputs feature maps from the input images, each feature map corresponds to a different resolution. Its output can have a format of Tensor, Dict[Any, Tensor], or Sequence[Tensor]. It can be the output of resnet_fpn_feature_extractor(*args, **kwargs).
size_divisible – the spatial size of the network input should be divisible by size_divisible, decided by the feature_extractor.
use_list_output – default False. If False, the network outputs a dictionary head_outputs, head_outputs[self.cls_key] is the predicted classification maps, a list of Tensor. head_outputs[self.box_reg_key] is the predicted box regression maps, a list of Tensor. If True, the network outputs a list of 2N tensors head_outputs, with first N tensors being the predicted classification maps and second N tensors being the predicted box regression maps.

Example

from monai.networks.nets import resnet
spatial_dims = 3  # 3D network
conv1_t_stride = (2,2,1)  # stride of first convolutional layer in backbone
backbone = resnet.ResNet(
    spatial_dims = spatial_dims,
    block = resnet.ResNetBottleneck,
    layers = [3, 4, 6, 3],
    block_inplanes = resnet.get_inplanes(),
    n_input_channels= 1,
    conv1_t_stride = conv1_t_stride,
    conv1_t_size = (7,7,7),
)
# This feature_extractor outputs 4-level feature maps.
# number of output feature maps is len(returned_layers)+1
returned_layers = [1,2,3]  # returned layer from feature pyramid network
feature_extractor = resnet_fpn_feature_extractor(
    backbone = backbone,
    spatial_dims = spatial_dims,
    pretrained_backbone = False,
    trainable_backbone_layers = None,
    returned_layers = returned_layers,
)
# This feature_extractor requires input image spatial size
# to be divisible by (32, 32, 16).
size_divisible = tuple(2*s*2**max(returned_layers) for s in conv1_t_stride)
model = RetinaNet(
    spatial_dims = spatial_dims,
    num_classes = 5,
    num_anchors = 6,
    feature_extractor=feature_extractor,
    size_divisible = size_divisible,
).to(device)
result = model(torch.rand(2, 1, 128,128,128))
cls_logits_maps = result["classification"]  # a list of len(returned_layers)+1 Tensor
box_regression_maps = result["box_regression"]  # a list of len(returned_layers)+1 Tensor

forward(images)[source]#

It takes an image tensor as inputs, and outputs predicted classification maps and predicted box regression maps in head_outputs.

Parameters:: images (Tensor) – input images, sized (B, img_channels, H, W) or (B, img_channels, H, W, D).
Return type:: Any
Returns:: 1) If self.use_list_output is False, output a dictionary head_outputs with keys including self.cls_key and self.box_reg_key. head_outputs[self.cls_key] is the predicted classification maps, a list of Tensor. head_outputs[self.box_reg_key] is the predicted box regression maps, a list of Tensor. 2) if self.use_list_output is True, outputs a list of 2N tensors head_outputs, with first N tensors being the predicted classification maps and second N tensors being the predicted box regression maps.

class monai.apps.detection.networks.retinanet_network.RetinaNetClassificationHead(in_channels, num_anchors, num_classes, spatial_dims, prior_probability=0.01)[source]#

A classification head for use in RetinaNet.

This head takes a list of feature maps as inputs, and outputs a list of classification maps. Each output map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.

Parameters:

in_channels (int) – number of channels of the input feature
num_anchors (int) – number of anchors to be predicted
num_classes (int) – number of classes to be predicted
spatial_dims (int) – spatial dimension of the network, should be 2 or 3.
prior_probability (float) – prior probability to initialize classification convolutional layers.

forward(x)[source]#

It takes a list of feature maps as inputs, and outputs a list of classification maps. Each output classification map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.

Parameters:: x (list[Tensor]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.
Return type:: list[Tensor]
Returns:: cls_logits_maps, list of classification map. cls_logits_maps[i] is a (B, num_anchors * num_classes, H_i, W_i) or (B, num_anchors * num_classes, H_i, W_i, D_i) Tensor.

class monai.apps.detection.networks.retinanet_network.RetinaNetRegressionHead(in_channels, num_anchors, spatial_dims)[source]#

A regression head for use in RetinaNet.

This head takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.

Parameters:

in_channels (int) – number of channels of the input feature
num_anchors (int) – number of anchors to be predicted
spatial_dims (int) – spatial dimension of the network, should be 2 or 3.

forward(x)[source]#

It takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.

Parameters:: x (list[Tensor]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.
Return type:: list[Tensor]
Returns:: box_regression_maps, list of box regression map. cls_logits_maps[i] is a (B, num_anchors * 2 * spatial_dims, H_i, W_i) or (B, num_anchors * 2 * spatial_dims, H_i, W_i, D_i) Tensor.

monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(backbone, spatial_dims, pretrained_backbone=False, returned_layers=(1, 2, 3), trainable_backbone_layers=None)[source]#

Constructs a feature extractor network with a ResNet-FPN backbone, used as feature_extractor in RetinaNet.

Reference: “Focal Loss for Dense Object Detection”.

The returned feature_extractor network takes an image tensor as inputs, and outputs a dictionary that maps string to the extracted feature maps (Tensor).

The input to the returned feature_extractor is expected to be a list of tensors, each of shape [C, H, W] or [C, H, W, D], one for each image. Different images can have different sizes.

Parameters:

backbone – a ResNet model, used as backbone.
spatial_dims – number of spatial dimensions of the images. We support both 2D and 3D images.
pretrained_backbone – whether the backbone has been pre-trained.
returned_layers – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.
trainable_backbone_layers – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. When pretrained_backbone is False, this value is set to be 5. When pretrained_backbone is True, if None is passed (the default) this value is set to 3.

Example

from monai.networks.nets import resnet
spatial_dims = 3 # 3D network
backbone = resnet.ResNet(
    spatial_dims = spatial_dims,
    block = resnet.ResNetBottleneck,
    layers = [3, 4, 6, 3],
    block_inplanes = resnet.get_inplanes(),
    n_input_channels= 1,
    conv1_t_stride = (2,2,1),
    conv1_t_size = (7,7,7),
)
# This feature_extractor outputs 4-level feature maps.
# number of output feature maps is len(returned_layers)+1
feature_extractor = resnet_fpn_feature_extractor(
    backbone = backbone,
    spatial_dims = spatial_dims,
    pretrained_backbone = False,
    trainable_backbone_layers = None,
    returned_layers = [1,2,3],
)
model = RetinaNet(
    spatial_dims = spatial_dims,
    num_classes = 5,
    num_anchors = 6,
    feature_extractor=feature_extractor,
    size_divisible = 32,
).to(device)

RetinaNet Detector#

Part of this script is adapted from pytorch/vision

class monai.apps.detection.networks.retinanet_detector.RetinaNetDetector(network, anchor_generator, box_overlap_metric=<function box_iou>, spatial_dims=None, num_classes=None, size_divisible=1, cls_key='classification', box_reg_key='box_regression', debug=False)[source]#

Retinanet detector, expandable to other one stage anchor based box detectors in the future. An example of construction can found in the source code of retinanet_resnet50_fpn_detector() .

The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

boxes (FloatTensor[N, 4] or FloatTensor[N, 6]): the ground-truth boxes in StandardMode, i.e., [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax] format, with 0 <= xmin < xmax <= H, 0 <= ymin < ymax <= W, 0 <= zmin < zmax <= D.
labels: the class label for each ground-truth box

The model returns a Dict[str, Tensor] during training, containing the classification and regression losses. When saving the model, only self.network contains trainable parameters and needs to be saved.

During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:

boxes (FloatTensor[N, 4] or FloatTensor[N, 6]): the predicted boxes in StandardMode, i.e., [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax] format, with 0 <= xmin < xmax <= H, 0 <= ymin < ymax <= W, 0 <= zmin < zmax <= D.
labels (Int64Tensor[N]): the predicted labels for each image
labels_scores (Tensor[N]): the scores for each prediction

Parameters:

network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].
anchor_generator – anchor generator.
box_overlap_metric – func that compute overlap between two sets of boxes, default is Intersection over Union (IoU).
debug – whether to print out internal parameters, used for debugging and parameter tuning.

Notes

Input argument network can be a monai.apps.detection.networks.retinanet_network.RetinaNet(*) object, but any network that meets the following rules is a valid input network.

It should have attributes including spatial_dims, num_classes, cls_key, box_reg_key, num_anchors, size_divisible.
- spatial_dims (int) is the spatial dimension of the network, we support both 2D and 3D.
- num_classes (int) is the number of classes, excluding the background.
- size_divisible (int or Sequence[int]) is the expectation on the input image shape. The network needs the input spatial_size to be divisible by size_divisible, length should be 2 or 3.
- cls_key (str) is the key to represent classification in the output dict.
- box_reg_key (str) is the key to represent box regression in the output dict.
- num_anchors (int) is the number of anchor shapes at each location. it should equal to self.anchor_generator.num_anchors_per_location()[0].
If network does not have these attributes, user needs to provide them for the detector.
Its input should be an image Tensor sized (B, C, H, W) or (B, C, H, W, D).
About its output head_outputs, it should be either a list of tensors or a dictionary of str: List[Tensor]:
- If it is a dictionary, it needs to have at least two keys: network.cls_key and network.box_reg_key, representing predicted classification maps and box regression maps. head_outputs[network.cls_key] should be List[Tensor] or Tensor. Each Tensor represents classification logits map at one resolution level, sized (B, num_classes*num_anchors, H_i, W_i) or (B, num_classes*num_anchors, H_i, W_i, D_i). head_outputs[network.box_reg_key] should be List[Tensor] or Tensor. Each Tensor represents box regression map at one resolution level, sized (B, 2*spatial_dims*num_anchors, H_i, W_i)or (B, 2*spatial_dims*num_anchors, H_i, W_i, D_i). len(head_outputs[network.cls_key]) == len(head_outputs[network.box_reg_key]).
- If it is a list of 2N tensors, the first N tensors should be the predicted classification maps, and the second N tensors should be the predicted box regression maps.

Example

# define a naive network
import torch
class NaiveNet(torch.nn.Module):
    def __init__(self, spatial_dims: int, num_classes: int):
        super().__init__()
        self.spatial_dims = spatial_dims
        self.num_classes = num_classes
        self.size_divisible = 2
        self.cls_key = "cls"
        self.box_reg_key = "box_reg"
        self.num_anchors = 1
    def forward(self, images: torch.Tensor):
        spatial_size = images.shape[-self.spatial_dims:]
        out_spatial_size = tuple(s//self.size_divisible for s in spatial_size)  # half size of input
        out_cls_shape = (images.shape[0],self.num_classes*self.num_anchors) + out_spatial_size
        out_box_reg_shape = (images.shape[0],2*self.spatial_dims*self.num_anchors) + out_spatial_size
        return {self.cls_key: [torch.randn(out_cls_shape)], self.box_reg_key: [torch.randn(out_box_reg_shape)]}

# create a RetinaNetDetector detector
spatial_dims = 3
num_classes = 5
anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(
    feature_map_scales=(1, ), base_anchor_shapes=((8,) * spatial_dims)
)
net = NaiveNet(spatial_dims, num_classes)
detector = RetinaNetDetector(net, anchor_generator)

# only detector.network may contain trainable parameters.
optimizer = torch.optim.SGD(
    detector.network.parameters(),
    1e-3,
    momentum=0.9,
    weight_decay=3e-5,
    nesterov=True,
)
torch.save(detector.network.state_dict(), 'model.pt')  # save model
detector.network.load_state_dict(torch.load('model.pt'))  # load model

compute_anchor_matched_idxs(anchors, targets, num_anchor_locs_per_level)[source]#

Compute the matched indices between anchors and ground truth (gt) boxes in targets. output[k][i] represents the matched gt index for anchor[i] in image k. Suppose there are M gt boxes for image k. The range of it output[k][i] value is [-2, -1, 0, …, M-1]. [0, M - 1] indicates this anchor is matched with a gt box, while a negative value indicating that it is not matched.

Parameters:

anchors (list[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
targets (list[dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
num_anchor_locs_per_level (Sequence[int]) – each element represents HW or HWD at this level.

Return type:

list[Tensor]

Returns:

a list of matched index matched_idxs_per_image (Tensor[int64]), Tensor sized (sum(HWA),) or (sum(HWDA),). Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2

compute_box_loss(box_regression, targets, anchors, matched_idxs)[source]#

Compute box regression losses.

Parameters:

box_regression (Tensor) – box regression results, sized (B, sum(HWA), 2*self.spatial_dims)
targets (list[dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors (list[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
matched_idxs (list[Tensor]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)

Return type:

Tensor

Returns:

box regression losses.

compute_cls_loss(cls_logits, targets, matched_idxs)[source]#

Compute classification losses.

Parameters:

cls_logits (Tensor) – classification logits, sized (B, sum(HW(D)A), self.num_classes)
targets (list[dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
matched_idxs (list[Tensor]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)

Return type:

Tensor

Returns:

classification losses.

compute_loss(head_outputs_reshape, targets, anchors, num_anchor_locs_per_level)[source]#

Compute losses.

Parameters:

head_outputs_reshape (dict[str, Tensor]) – reshaped head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)
targets (list[dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors (list[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

Return type:

dict[str, Tensor]

Returns:

a dict of several kinds of losses.

forward(input_images, targets=None, use_inferer=False)[source]#

Returns a dict of losses during training, or a list predicted dict of boxes and labels during inference.

Parameters:

input_images – The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.
targets – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image (optional).
use_inferer – whether to use self.inferer, a sliding window inferer, to do the inference. If False, will simply forward the network. If True, will use self.inferer, and requires self.set_sliding_window_inferer(*args) to have been called before.

Returns:

If training mode, will return a dict with at least two keys, including self.cls_key and self.box_reg_key, representing classification loss and box regression loss.

If evaluation mode, will return a list of detection results. Each element corresponds to an images in input_images, is a dict with at least three keys, including self.target_box_key, self.target_label_key, self.pred_score_key, representing predicted boxes, classification labels, and classification scores.

generate_anchors(images, head_outputs)[source]#

Generate anchors and store it in self.anchors: List[Tensor]. We generate anchors only when there is no stored anchors, or the new coming images has different shape with self.previous_image_shape

Parameters:

images (Tensor) – input images, a (B, C, H, W) or (B, C, H, W, D) Tensor.
head_outputs (dict[str, list[Tensor]]) – head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)

Return type:

None

get_box_train_sample_per_image(box_regression_per_image, targets_per_image, anchors_per_image, matched_idxs_per_image)[source]#

Get samples from one image for box regression losses computation.

Parameters:

box_regression_per_image (Tensor) – box regression result for one image, (sum(HWA), 2*self.spatial_dims)
targets_per_image (dict[str, Tensor]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors_per_image (Tensor) – anchors of one image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
matched_idxs_per_image (Tensor) – matched index, sized (sum(HWA),) or (sum(HWDA),)

Return type:

tuple[Tensor, Tensor]

Returns:

paired predicted and GT samples from one image for box regression losses computation

get_cls_train_sample_per_image(cls_logits_per_image, targets_per_image, matched_idxs_per_image)[source]#

Get samples from one image for classification losses computation.

Parameters:

cls_logits_per_image (Tensor) – classification logits for one image, (sum(HWA), self.num_classes)
targets_per_image (dict[str, Tensor]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
matched_idxs_per_image (Tensor) – matched index, Tensor sized (sum(HWA),) or (sum(HWDA),) Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2

Return type:

tuple[Tensor, Tensor]

Returns:

paired predicted and GT samples from one image for classification losses computation

postprocess_detections(head_outputs_reshape, anchors, image_sizes, num_anchor_locs_per_level, need_sigmoid=True)[source]#

Postprocessing to generate detection result from classification logits and box regression. Use self.box_selector to select the final output boxes for each image.

Parameters:

head_outputs_reshape (dict[str, Tensor]) – reshaped head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)
targets – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors (list[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

Return type:

list[dict[str, Tensor]]

Returns:

a list of dict, each dict corresponds to detection result on image.

set_atss_matcher(num_candidates=4, center_in_gt=False)[source]#

Using for training. Set ATSS matcher that matches anchors with ground truth boxes

Parameters:

num_candidates (int) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.
center_in_gt (bool) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.

Return type:

None

set_balanced_sampler(batch_size_per_image, positive_fraction)[source]#

Using for training. Set torchvision balanced sampler that samples part of the anchors for training.

Parameters:

batch_size_per_image (int) – number of elements to be selected per image
positive_fraction (float) – percentage of positive elements per batch

Return type:

None

set_box_coder_weights(weights)[source]#

Set the weights for box coder.

Parameters:: weights (tuple[float]) – a list/tuple with length of 2*self.spatial_dims
Return type:: None

set_box_regression_loss(box_loss, encode_gt, decode_pred)[source]#

Using for training. Set loss for box regression.

Parameters:

box_loss (Module) – loss module for box regression
encode_gt (bool) – if True, will encode ground truth boxes to target box regression before computing the losses. Should be True for L1 loss and False for GIoU loss.
decode_pred (bool) – if True, will decode predicted box regression into predicted boxes before computing losses. Should be False for L1 loss and True for GIoU loss.

Example

detector.set_box_regression_loss(
    torch.nn.SmoothL1Loss(beta=1.0 / 9, reduction="mean"),
    encode_gt = True, decode_pred = False
)
detector.set_box_regression_loss(
    monai.losses.giou_loss.BoxGIoULoss(reduction="mean"),
    encode_gt = False, decode_pred = True
)

Return type:: None

set_box_selector_parameters(score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300, apply_sigmoid=True)[source]#

Using for inference. Set the parameters that are used for box selection during inference. The box selection is performed with the following steps:

For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.

Parameters:

score_thresh (float) – no box with scores less than score_thresh will be kept
topk_candidates_per_level (int) – max number of boxes to keep for each level
nms_thresh (float) – box overlapping threshold for NMS
detections_per_img (int) – max number of boxes to keep for each image

Return type:

None

set_cls_loss(cls_loss)[source]#

Using for training. Set loss for classification that takes logits as inputs, make sure sigmoid/softmax is built in.

Parameters:: cls_loss (Module) – loss module for classification

Example

detector.set_cls_loss(torch.nn.BCEWithLogitsLoss(reduction="mean"))
detector.set_cls_loss(FocalLoss(reduction="mean", gamma=2.0))

Return type:: None

set_hard_negative_sampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#

Using for training. Set hard negative sampler that samples part of the anchors for training.

HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.

Parameters:

batch_size_per_image (int) – number of elements to be selected per image
positive_fraction (float) – percentage of positive elements in the selected samples
min_neg (int) – minimum number of negative samples to select if possible.
pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

Return type:

None

set_regular_matcher(fg_iou_thresh, bg_iou_thresh, allow_low_quality_matches=True)[source]#

Using for training. Set torchvision matcher that matches anchors with ground truth boxes.

Parameters:

fg_iou_thresh (float) – foreground IoU threshold for Matcher, considered as matched if IoU > fg_iou_thresh
bg_iou_thresh (float) – background IoU threshold for Matcher, considered as not matched if IoU < bg_iou_thresh
allow_low_quality_matches (bool) – if True, produce additional matches for predictions that have only low-quality match candidates.

Return type:

None

set_sliding_window_inferer(roi_size, sw_batch_size=1, overlap=0.5, mode=constant, sigma_scale=0.125, padding_mode=constant, cval=0.0, sw_device=None, device=None, progress=False, cache_roi_weight_map=False)[source]#: Define sliding window inferer and store it to self.inferer.

set_target_keys(box_key, label_key)[source]#

Set keys for the training targets and inference outputs. During training, both box_key and label_key should be keys in the targets when performing self.forward(input_images, targets). During inference, they will be the keys in the output dict of self.forward(input_images)`.

Return type:: None

monai.apps.detection.networks.retinanet_detector.retinanet_resnet50_fpn_detector(num_classes, anchor_generator, returned_layers=(1, 2, 3), pretrained=False, progress=True, **kwargs)[source]#

Returns a RetinaNet detector using a ResNet-50 as backbone, which can be pretrained from Med3D: Transfer Learning for 3D Medical Image Analysis <https://arxiv.org/pdf/1904.00625.pdf> _.

Parameters:

num_classes (int) – number of output classes of the model (excluding the background).
anchor_generator (AnchorGenerator) – AnchorGenerator,
returned_layers (Sequence[int]) – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.
pretrained (bool) – If True, returns a backbone pre-trained on 23 medical datasets
progress (bool) – If True, displays a progress bar of the download to stderr

Return type:

RetinaNetDetector

Returns:

A RetinaNetDetector object with resnet50 as backbone

Example

# define a naive network
resnet_param = {
    "pretrained": False,
    "spatial_dims": 3,
    "n_input_channels": 2,
    "num_classes": 3,
    "conv1_t_size": 7,
    "conv1_t_stride": (2, 2, 2)
}
returned_layers = [1]
anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(
    feature_map_scales=(1, 2), base_anchor_shapes=((8,) * resnet_param["spatial_dims"])
)
detector = retinanet_resnet50_fpn_detector(
    **resnet_param, anchor_generator=anchor_generator, returned_layers=returned_layers
)

Transforms#

monai.apps.detection.transforms.box_ops.apply_affine_to_boxes(boxes, affine)[source]#

This function applies affine matrices to the boxes

Parameters:

boxes (~NdarrayTensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
affine (Union[ndarray, Tensor]) – affine matrix to be applied to the box coordinates, sized (spatial_dims+1,spatial_dims+1)

Return type:

~NdarrayTensor

Returns:

returned affine transformed boxes, with same data type as boxes, does not share memory with boxes

monai.apps.detection.transforms.box_ops.convert_box_to_mask(boxes, labels, spatial_size, bg_label=-1, ellipse_mask=False)[source]#

Convert box to int16 mask image, which has the same size with the input image.

Parameters:

boxes – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode.
labels – classification foreground(fg) labels corresponding to boxes, dtype should be int, sized (N,).
spatial_size – image spatial size.
bg_label – background labels for the output mask image, make sure it is smaller than any fg labels.
ellipse_mask –
bool.
- If True, it assumes the object shape is close to ellipse or ellipsoid.
- If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
- If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.

Returns:

int16 array, sized (num_box, H, W). Each channel represents a box.
The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.

monai.apps.detection.transforms.box_ops.convert_mask_to_box(boxes_mask, bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#

Convert int16 mask image to box, which has the same size with the input image

Parameters:

boxes_mask – int16 array, sized (num_box, H, W). Each channel represents a box. The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.
bg_label – background labels for the boxes_mask
box_dtype – output dtype for boxes
label_dtype – output dtype for labels

Returns:

bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode.
classification foreground(fg) labels, dtype should be int, sized (N,).

monai.apps.detection.transforms.box_ops.flip_boxes(boxes, spatial_size, flip_axes=None)[source]#

Flip boxes when the corresponding image is flipped

Parameters:

boxes – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
spatial_size – image spatial size.
flip_axes – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.

Returns:

flipped boxes, with same data type as boxes, does not share memory with boxes

monai.apps.detection.transforms.box_ops.resize_boxes(boxes, src_spatial_size, dst_spatial_size)[source]#

Resize boxes when the corresponding image is resized

Parameters:

boxes – source bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
src_spatial_size – source image spatial size.
dst_spatial_size – target image spatial size.

Returns:

resized boxes, with same data type as boxes, does not share memory with boxes

Example

boxes = torch.ones(1,4)
src_spatial_size = [100, 100]
dst_spatial_size = [128, 256]
resize_boxes(boxes, src_spatial_size, dst_spatial_size) #  will return tensor([[1.28, 2.56, 1.28, 2.56]])

monai.apps.detection.transforms.box_ops.rot90_boxes(boxes, spatial_size, k=1, axes=(0, 1))[source]#

Rotate boxes by 90 degrees in the plane specified by axes. Rotation direction is from the first towards the second axis.

Parameters:

boxes – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
spatial_size – image spatial size.
k – number of times the array is rotated by 90 degrees.
axes – (2,) array_like The array is rotated in the plane defined by the axes. Axes must be different.

Returns:

A rotated view of boxes.

Notes

rot90_boxes(boxes, spatial_size, k=1, axes=(1,0)) is the reverse of rot90_boxes(boxes, spatial_size, k=1, axes=(0,1)) rot90_boxes(boxes, spatial_size, k=1, axes=(1,0)) is equivalent to rot90_boxes(boxes, spatial_size, k=-1, axes=(0,1))

monai.apps.detection.transforms.box_ops.select_labels(labels, keep)[source]#

For element in labels, select indices keep from it.

Parameters:

labels – Sequence of array. Each element represents classification labels or scores corresponding to boxes, sized (N,).
keep – the indices to keep, same length with each element in labels.

Returns:

selected labels, does not share memory with original labels.

monai.apps.detection.transforms.box_ops.swapaxes_boxes(boxes, axis1, axis2)[source]#

Interchange two axes of boxes.

Parameters:

boxes (~NdarrayTensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
axis1 (int) – First axis.
axis2 (int) – Second axis.

Return type:

~NdarrayTensor

Returns:

boxes with two axes interchanged.

monai.apps.detection.transforms.box_ops.zoom_boxes(boxes, zoom)[source]#

Zoom boxes

Parameters:

boxes – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
zoom – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.

Returns:

zoomed boxes, with same data type as boxes, does not share memory with boxes

Example

boxes = torch.ones(1,4)
zoom_boxes(boxes, zoom=[0.5,2.2]) #  will return tensor([[0.5, 2.2, 0.5, 2.2]])

A collection of “vanilla” transforms for box operations Project-MONAI/MONAI

class monai.apps.detection.transforms.array.AffineBox[source]#: Applies affine matrix to the boxes

class monai.apps.detection.transforms.array.BoxToMask(bg_label=-1, ellipse_mask=False)[source]#

Convert box to int16 mask image, which has the same size with the input image.

Parameters:

bg_label (int) – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.
ellipse_mask (bool) –
bool.
- If True, it assumes the object shape is close to ellipse or ellipsoid.
- If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
- If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.

class monai.apps.detection.transforms.array.ClipBoxToImage(remove_empty=False)[source]#

Clip the bounding boxes and the associated labels/scores to make sure they are within the image. There might be multiple arrays of labels/scores associated with one array of boxes.

Parameters:: remove_empty (bool) – whether to remove the boxes and corresponding labels that are actually empty

class monai.apps.detection.transforms.array.ConvertBoxMode(src_mode=None, dst_mode=None)[source]#

This transform converts the boxes in src_mode to the dst_mode.

Parameters:

src_mode – source box mode. If it is not given, this func will assume it is StandardMode().
dst_mode – target box mode. If it is not given, this func will assume it is StandardMode().

Note

StandardMode = CornerCornerModeTypeA, also represented as “xyxy” for 2D and “xyzxyz” for 3D.

src_mode and dst_mode can be:

str: choose from BoxModeName, for example,
- “xyxy”: boxes has format [xmin, ymin, xmax, ymax]
- “xyzxyz”: boxes has format [xmin, ymin, zmin, xmax, ymax, zmax]
- “xxyy”: boxes has format [xmin, xmax, ymin, ymax]
- “xxyyzz”: boxes has format [xmin, xmax, ymin, ymax, zmin, zmax]
- “xyxyzz”: boxes has format [xmin, ymin, xmax, ymax, zmin, zmax]
- “xywh”: boxes has format [xmin, ymin, xsize, ysize]
- “xyzwhd”: boxes has format [xmin, ymin, zmin, xsize, ysize, zsize]
- “ccwh”: boxes has format [xcenter, ycenter, xsize, ysize]
- “cccwhd”: boxes has format [xcenter, ycenter, zcenter, xsize, ysize, zsize]
BoxMode class: choose from the subclasses of BoxMode, for example,
- CornerCornerModeTypeA: equivalent to “xyxy” or “xyzxyz”
- CornerCornerModeTypeB: equivalent to “xxyy” or “xxyyzz”
- CornerCornerModeTypeC: equivalent to “xyxy” or “xyxyzz”
- CornerSizeMode: equivalent to “xywh” or “xyzwhd”
- CenterSizeMode: equivalent to “ccwh” or “cccwhd”
BoxMode object: choose from the subclasses of BoxMode, for example,
- CornerCornerModeTypeA(): equivalent to “xyxy” or “xyzxyz”
- CornerCornerModeTypeB(): equivalent to “xxyy” or “xxyyzz”
- CornerCornerModeTypeC(): equivalent to “xyxy” or “xyxyzz”
- CornerSizeMode(): equivalent to “xywh” or “xyzwhd”
- CenterSizeMode(): equivalent to “ccwh” or “cccwhd”
None: will assume mode is StandardMode()

Example

boxes = torch.ones(10,4)
# convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize].
box_converter = ConvertBoxMode(src_mode="xyxy", dst_mode="ccwh")
box_converter(boxes)

class monai.apps.detection.transforms.array.ConvertBoxToStandardMode(mode=None)[source]#

Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].

Parameters:: mode – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .

Example

boxes = torch.ones(10,6)
# convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax]
box_converter = ConvertBoxToStandardMode(mode="xxyyzz")
box_converter(boxes)

class monai.apps.detection.transforms.array.FlipBox(spatial_axis=None)[source]#

Reverses the box coordinates along the given spatial axis. Preserves shape.

Parameters:: spatial_axis – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.

class monai.apps.detection.transforms.array.MaskToBox(bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#

Convert int16 mask image to box, which has the same size with the input image. Pairs with monai.apps.detection.transforms.array.BoxToMask. Please make sure the same min_fg_label is used when using the two transforms in pairs.

Parameters:

bg_label – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.
box_dtype – output dtype for boxes
label_dtype – output dtype for labels

class monai.apps.detection.transforms.array.ResizeBox(spatial_size, size_mode='all', **kwargs)[source]#

Resize the input boxes when the corresponding image is resized to given spatial size (with scaling, not cropping/padding).

Parameters:

spatial_size – expected shape of spatial dimensions after resize operation. if some components of the spatial_size are non-positive values, the transform will use the corresponding components of img size. For example, spatial_size=(32, -1) will be adapted to (32, 64) if the second spatial dimension size of img is 64.
size_mode – should be “all” or “longest”, if “all”, will use spatial_size for all the spatial dims, if “longest”, rescale the image so that only the longest side is equal to specified spatial_size, which must be an int number in this case, keeping the aspect ratio of the initial image, refer to: https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/ #albumentations.augmentations.geometric.resize.LongestMaxSize.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

class monai.apps.detection.transforms.array.RotateBox90(k=1, spatial_axes=(0, 1), lazy=False)[source]#

Rotate a boxes by 90 degrees in the plane specified by axes. See box_ops.rot90_boxes for additional details

Parameters:

k (int) – number of times to rotate by 90 degrees.
spatial_axes (tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions. If axis is negative it counts from the last to the first axis.

class monai.apps.detection.transforms.array.SpatialCropBox(roi_center=None, roi_size=None, roi_start=None, roi_end=None, roi_slices=None)[source]#

General purpose box cropper when the corresponding image is cropped by SpatialCrop(*) with the same ROI. The difference is that we do not support negative indexing for roi_slices.

If a dimension of the expected ROI size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected ROI, and the cropped results of several images may not have exactly the same shape. It can support to crop ND spatial boxes.

The cropped region can be parameterised in various ways:

a list of slices for each spatial dimension (do not allow for use of negative indexing)
a spatial center and size
the start and end coordinates of the ROI

Parameters:

roi_center – voxel coordinates for center of the crop ROI.
roi_size – size of the crop ROI, if a dimension of ROI size is bigger than image size, will not crop that dimension of the image.
roi_start – voxel coordinates for start of the crop ROI.
roi_end – voxel coordinates for end of the crop ROI, if a coordinate is out of image, use the end coordinate of image.
roi_slices – list of slices for each of the spatial dimensions.

class monai.apps.detection.transforms.array.StandardizeEmptyBox(spatial_dims)[source]#

When boxes are empty, this transform standardize it to shape of (0,4) or (0,6).

Parameters:: spatial_dims (int) – number of spatial dimensions of the bounding boxes.

class monai.apps.detection.transforms.array.ZoomBox(zoom, keep_size=False, **kwargs)[source]#

Zooms an ND Box with same padding or slicing setting with Zoom().

Parameters:

zoom – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.
keep_size – Should keep original size (padding/slicing if needed), default is True.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

A collection of dictionary-based wrappers around the “vanilla” transforms for box operations defined in monai.apps.detection.transforms.array.

Class names are ended with ‘d’ to denote dictionary-based transforms.

monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateD#: alias of AffineBoxToImageCoordinated

monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateDict#: alias of AffineBoxToImageCoordinated

class monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinated(box_keys, box_ref_image_keys, allow_missing_keys=False, image_meta_key=None, image_meta_key_postfix='meta_dict', affine_lps_to_ras=False)[source]#

Dictionary-based transform that converts box in world coordinate to image coordinate.

Parameters:

box_keys – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys – The single key that represents the reference image to which box_keys are attached.
remove_empty – whether to remove the boxes that are actually empty
allow_missing_keys – don’t raise exception if key is missing.
image_meta_key – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, affine, original_shape, etc. it is a string, map to the box_ref_image_key. if None, will try to construct meta_keys by box_ref_image_key_{meta_key_postfix}.
image_meta_key_postfix – if image_meta_keys=None, use box_ref_image_key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
affine_lps_to_ras – default False. Yet if 1) the image is read by ITKReader, and 2) the ITKReader has affine_lps_to_ras=True, and 3) the box is in world coordinate, then set affine_lps_to_ras=True.

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.BoxToMaskD#: alias of BoxToMaskd

monai.apps.detection.transforms.dictionary.BoxToMaskDict#: alias of BoxToMaskd

class monai.apps.detection.transforms.dictionary.BoxToMaskd(box_keys, box_mask_keys, label_keys, box_ref_image_keys, min_fg_label, ellipse_mask=False, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.BoxToMask. Pairs with monai.apps.detection.transforms.dictionary.MaskToBoxd . Please make sure the same min_fg_label is used when using the two transforms in pairs. The output d[box_mask_key] will have background intensity 0, since the following operations may pad 0 on the border.

This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.

use BoxToMaskd to covert boxes and labels to box_masks;

do transforms, e.g., rotation or cropping, on images and box_masks together;

use MaskToBoxd to convert box_masks back to boxes and labels.

Parameters:

box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_mask_keys (Union[Collection[Hashable], Hashable]) – Keys to store output box mask results for transformation. Same length with box_keys.
label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Same length with box_keys.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
min_fg_label (int) – min foreground box label.
ellipse_mask (bool) –
bool.
- If True, it assumes the object shape is close to ellipse or ellipsoid.
- If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
- If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.
allow_missing_keys (bool) – don’t raise exception if key is missing.

Example

# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and image together.
import numpy as np
from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd
transforms = Compose(
    [
        BoxToMaskd(
            box_keys="boxes", label_keys="labels",
            box_mask_keys="box_mask", box_ref_image_keys="image",
            min_fg_label=0, ellipse_mask=True
        ),
        RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"],
            prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6,
            keep_size=True,padding_mode="zeros"
        ),
        RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False),
        MaskToBoxd(
            box_mask_keys="box_mask", box_keys="boxes",
            label_keys="labels", min_fg_label=0
        )
        DeleteItemsd(keys=["box_mask"]),
    ]
)

monai.apps.detection.transforms.dictionary.ClipBoxToImageD#: alias of ClipBoxToImaged

monai.apps.detection.transforms.dictionary.ClipBoxToImageDict#: alias of ClipBoxToImaged

class monai.apps.detection.transforms.dictionary.ClipBoxToImaged(box_keys, label_keys, box_ref_image_keys, remove_empty=True, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ClipBoxToImage.

Clip the bounding boxes and the associated labels/scores to makes sure they are within the image. There might be multiple keys of labels/scores associated with one key of boxes.

Parameters:

box_keys (Union[Collection[Hashable], Hashable]) – The single key to pick box data for transformation. The box mode is assumed to be StandardMode.
label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Multiple keys are allowed.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – The single key that represents the reference image to which box_keys and label_keys are attached.
remove_empty (bool) – whether to remove the boxes that are actually empty
allow_missing_keys (bool) – don’t raise exception if key is missing.

Example

ClipBoxToImaged(
    box_keys="boxes", box_ref_image_keys="image", label_keys=["labels", "scores"], remove_empty=True
)

monai.apps.detection.transforms.dictionary.ConvertBoxModeD#: alias of ConvertBoxModed

monai.apps.detection.transforms.dictionary.ConvertBoxModeDict#: alias of ConvertBoxModed

class monai.apps.detection.transforms.dictionary.ConvertBoxModed(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ConvertBoxMode.

This transform converts the boxes in src_mode to the dst_mode.

Example

data = {"boxes": torch.ones(10,4)}
# convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize].
box_converter = ConvertBoxModed(box_keys=["boxes"], src_mode="xyxy", dst_mode="ccwh")
box_converter(data)

__init__(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#

Parameters:

box_keys – Keys to pick data for transformation.
src_mode – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .
dst_mode – target box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .
allow_missing_keys – don’t raise exception if key is missing.

See also monai.apps.detection,transforms.array.ConvertBoxMode

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeD#: alias of ConvertBoxToStandardModed

monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeDict#: alias of ConvertBoxToStandardModed

class monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModed(box_keys, mode=None, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ConvertBoxToStandardMode.

Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].

Example

data = {"boxes": torch.ones(10,6)}
# convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax]
box_converter = ConvertBoxToStandardModed(box_keys=["boxes"], mode="xxyyzz")
box_converter(data)

__init__(box_keys, mode=None, allow_missing_keys=False)[source]#

Parameters:

box_keys – Keys to pick data for transformation.
mode – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .
allow_missing_keys – don’t raise exception if key is missing.

See also monai.apps.detection,transforms.array.ConvertBoxToStandardMode

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.FlipBoxD#: alias of FlipBoxd

monai.apps.detection.transforms.dictionary.FlipBoxDict#: alias of FlipBoxd

class monai.apps.detection.transforms.dictionary.FlipBoxd(image_keys, box_keys, box_ref_image_keys, spatial_axis=None, allow_missing_keys=False)[source]#

Dictionary-based transform that flip boxes and images.

Parameters:

image_keys – Keys to pick image data for transformation.
box_keys – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys – Keys that represent the reference images to which box_keys are attached.
spatial_axis – Spatial axes along which to flip over. Default is None.
allow_missing_keys – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Tensor]

monai.apps.detection.transforms.dictionary.MaskToBoxD#: alias of MaskToBoxd

monai.apps.detection.transforms.dictionary.MaskToBoxDict#: alias of MaskToBoxd

class monai.apps.detection.transforms.dictionary.MaskToBoxd(box_keys, box_mask_keys, label_keys, min_fg_label, box_dtype=torch.float32, label_dtype=torch.int64, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.MaskToBox. Pairs with monai.apps.detection.transforms.dictionary.BoxToMaskd . Please make sure the same min_fg_label is used when using the two transforms in pairs.

This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.

use BoxToMaskd to covert boxes and labels to box_masks;

do transforms, e.g., rotation or cropping, on images and box_masks together;

use MaskToBoxd to convert box_masks back to boxes and labels.

Parameters:

box_keys – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_mask_keys – Keys to store output box mask results for transformation. Same length with box_keys.
label_keys – Keys that represent the labels corresponding to the box_keys. Same length with box_keys.
min_fg_label – min foreground box label.
box_dtype – output dtype for box_keys
label_dtype – output dtype for label_keys
allow_missing_keys – don’t raise exception if key is missing.

Example

# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and images together.
import numpy as np
from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd
transforms = Compose(
    [
        BoxToMaskd(
            box_keys="boxes", label_keys="labels",
            box_mask_keys="box_mask", box_ref_image_keys="image",
            min_fg_label=0, ellipse_mask=True
        ),
        RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"],
            prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6,
            keep_size=True,padding_mode="zeros"
        ),
        RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False),
        MaskToBoxd(
            box_mask_keys="box_mask", box_keys="boxes",
            label_keys="labels", min_fg_label=0
        )
        DeleteItemsd(keys=["box_mask"]),
    ]
)

monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelD#: alias of RandCropBoxByPosNegLabeld

monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelDict#: alias of RandCropBoxByPosNegLabeld

class monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabeld(image_keys, box_keys, label_keys, spatial_size, pos=1.0, neg=1.0, num_samples=1, whole_box=True, thresh_image_key=None, image_threshold=0.0, fg_indices_key=None, bg_indices_key=None, meta_keys=None, meta_key_postfix='meta_dict', allow_smaller=False, allow_missing_keys=False)[source]#

Crop random fixed sized regions that contains foreground boxes. Suppose all the expected fields specified by image_keys have same shape, and add patch_index to the corresponding meta data. And will return a list of dictionaries for all the cropped images. If a dimension of the expected spatial size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected size, and the cropped results of several images may not have exactly the same shape.

Parameters:

image_keys – Keys to pick image data for transformation. They need to have the same spatial size.
box_keys – The single key to pick box data for transformation. The box mode is assumed to be StandardMode.
label_keys – Keys that represent the labels corresponding to the box_keys. Multiple keys are allowed.
spatial_size – the spatial size of the crop region e.g. [224, 224, 128]. if a dimension of ROI size is bigger than image size, will not crop that dimension of the image. if its components have non-positive values, the corresponding size of data[label_key] will be used. for example: if the spatial size of input data is [40, 40, 40] and spatial_size=[32, 64, -1], the spatial size of output data will be [32, 40, 40].
pos – used with neg together to calculate the ratio pos / (pos + neg) for the probability to pick a foreground voxel as a center rather than a background voxel.
neg – used with pos together to calculate the ratio pos / (pos + neg) for the probability to pick a foreground voxel as a center rather than a background voxel.
num_samples – number of samples (crop regions) to take in each list.
whole_box – Bool, default True, whether we prefer to contain at least one whole box in the cropped foreground patch. Even if True, it is still possible to get partial box if there are multiple boxes in the image.
thresh_image_key – if thresh_image_key is not None, use label == 0 & thresh_image > image_threshold to select the negative sample(background) center. so the crop center will only exist on valid image area.
image_threshold – if enabled thresh_image_key, use thresh_image > image_threshold to determine the valid image content area.
fg_indices_key – if provided pre-computed foreground indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.
bg_indices_key – if provided pre-computed background indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. used to add patch_index to the meta dict. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. used to add patch_index to the meta dict.
allow_smaller – if False, an exception will be raised if the image is smaller than the requested ROI in any dimension. If True, any smaller dimensions will be set to match the cropped size (i.e., no cropping in that dimension).
allow_missing_keys – don’t raise exception if key is missing.

randomize(boxes, image_size, fg_indices=None, bg_indices=None, thresh_image=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises:: NotImplementedError – When the subclass does not override this method.

monai.apps.detection.transforms.dictionary.RandFlipBoxD#: alias of RandFlipBoxd

monai.apps.detection.transforms.dictionary.RandFlipBoxDict#: alias of RandFlipBoxd

class monai.apps.detection.transforms.dictionary.RandFlipBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, spatial_axis=None, allow_missing_keys=False)[source]#

Dictionary-based transform that randomly flip boxes and images with the given probabilities.

Parameters:

image_keys – Keys to pick image data for transformation.
box_keys – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys – Keys that represent the reference images to which box_keys are attached.
prob – Probability of flipping.
spatial_axis – Spatial axes along which to flip over. Default is None.
allow_missing_keys – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Tensor]

set_random_state(seed=None, state=None)[source]#

Set the random state locally, to control the randomness, the derived classes should use self.R instead of np.random to introduce random factors.

Parameters:

seed – set the random state with an integer seed.
state – set the random state with a np.random.RandomState object.

Raises:

TypeError – When state is not an Optional[np.random.RandomState].

Returns:

a Randomizable instance.

monai.apps.detection.transforms.dictionary.RandRotateBox90D#: alias of RandRotateBox90d

monai.apps.detection.transforms.dictionary.RandRotateBox90Dict#: alias of RandRotateBox90d

class monai.apps.detection.transforms.dictionary.RandRotateBox90d(image_keys, box_keys, box_ref_image_keys, prob=0.1, max_k=3, spatial_axes=(0, 1), allow_missing_keys=False)[source]#

With probability prob, input boxes and images are rotated by 90 degrees in the plane specified by spatial_axes.

Parameters:

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.
box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
prob (float) – probability of rotating. (Default 0.1, with 10% probability it returns a rotated array.)
max_k (int) – number of rotations will be sampled from np.random.randint(max_k) + 1. (Default 3)
spatial_axes (tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions.
allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Tensor]

randomize(data=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

monai.apps.detection.transforms.dictionary.RandZoomBoxD#: alias of RandZoomBoxd

monai.apps.detection.transforms.dictionary.RandZoomBoxDict#: alias of RandZoomBoxd

class monai.apps.detection.transforms.dictionary.RandZoomBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, min_zoom=0.9, max_zoom=1.1, mode=area, padding_mode=edge, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#

Dictionary-based transform that randomly zooms input boxes and images with given probability within given zoom range.

Parameters:

image_keys – Keys to pick image data for transformation.
box_keys – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys – Keys that represent the reference images to which box_keys are attached.
prob – Probability of zooming.
min_zoom – Min zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, min_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.
max_zoom – Max zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, max_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.
mode – {"nearest", "nearest-exact", "linear", "bilinear", "bicubic", "trilinear", "area"} The interpolation mode. Defaults to "area". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key in keys.
padding_mode – available modes for numpy array:{"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
align_corners – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key in keys.
keep_size – Should keep original size (pad if needed), default is True.
allow_missing_keys – don’t raise exception if key is missing.
kwargs – other args for np.pad API, note that np.pad treats channel dimension as the first dimension. more details: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Tensor]

set_random_state(seed=None, state=None)[source]#

Set the random state locally, to control the randomness, the derived classes should use self.R instead of np.random to introduce random factors.

Parameters:

seed – set the random state with an integer seed.
state – set the random state with a np.random.RandomState object.

Raises:

TypeError – When state is not an Optional[np.random.RandomState].

Returns:

a Randomizable instance.

monai.apps.detection.transforms.dictionary.RotateBox90D#: alias of RotateBox90d

monai.apps.detection.transforms.dictionary.RotateBox90Dict#: alias of RotateBox90d

class monai.apps.detection.transforms.dictionary.RotateBox90d(image_keys, box_keys, box_ref_image_keys, k=1, spatial_axes=(0, 1), allow_missing_keys=False)[source]#

Input boxes and images are rotated by 90 degrees in the plane specified by spatial_axes for k times

Parameters:

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.
box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
k (int) – number of times to rotate by 90 degrees.
spatial_axes (tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default (0, 1), this is the first two axis in spatial dimensions.
allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Tensor]

monai.apps.detection.transforms.dictionary.StandardizeEmptyBoxD#: alias of StandardizeEmptyBoxd

monai.apps.detection.transforms.dictionary.StandardizeEmptyBoxDict#: alias of StandardizeEmptyBoxd

class monai.apps.detection.transforms.dictionary.StandardizeEmptyBoxd(box_keys, box_ref_image_keys, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.StandardizeEmptyBox.

When boxes are empty, this transform standardize it to shape of (0,4) or (0,6).

Example

data = {"boxes": torch.ones(0,), "image": torch.ones(1, 128, 128, 128)}
box_converter = StandardizeEmptyBoxd(box_keys=["boxes"], box_ref_image_keys="image")
box_converter(data)

__init__(box_keys, box_ref_image_keys, allow_missing_keys=False)[source]#

Parameters:

box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick data for transformation.
box_ref_image_keys (str) – The single key that represents the reference image to which box_keys are attached.
allow_missing_keys (bool) – don’t raise exception if key is missing.

See also monai.apps.detection,transforms.array.ConvertBoxToStandardMode

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.ZoomBoxD#: alias of ZoomBoxd

monai.apps.detection.transforms.dictionary.ZoomBoxDict#: alias of ZoomBoxd

class monai.apps.detection.transforms.dictionary.ZoomBoxd(image_keys, box_keys, box_ref_image_keys, zoom, mode=area, padding_mode=edge, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#

Dictionary-based transform that zooms input boxes and images with the given zoom scale.

Parameters:

image_keys – Keys to pick image data for transformation.
box_keys – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys – Keys that represent the reference images to which box_keys are attached.
zoom – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.
mode – {"nearest", "nearest-exact", "linear", "bilinear", "bicubic", "trilinear", "area"} The interpolation mode. Defaults to "area". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key in keys.
padding_mode – available modes for numpy array:{"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
align_corners – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key in keys.
keep_size – Should keep original size (pad if needed), default is True.
allow_missing_keys – don’t raise exception if key is missing.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

inverse(data)[source]#

Inverse of __call__.

Raises:: NotImplementedError – When the subclass does not override this method.
Return type:: dict[Hashable, Tensor]

Anchor#

This script is adapted from pytorch/vision

class monai.apps.detection.utils.anchor_utils.AnchorGenerator(sizes=((20, 30, 40),), aspect_ratios=(((0.5, 1), (1, 0.5)),), indexing='ij')[source]#

This module is modified from torchvision to support both 2D and 3D images.

Module that generates anchors for a set of feature maps and image sizes.

The module support computing anchors at multiple sizes and aspect ratios per feature map.

sizes and aspect_ratios should have the same number of elements, and it should correspond to the number of feature maps.

sizes[i] and aspect_ratios[i] can have an arbitrary number of elements. For 2D images, anchor width and height w:h = 1:aspect_ratios[i,j] For 3D images, anchor width, height, and depth w:h:d = 1:aspect_ratios[i,j,0]:aspect_ratios[i,j,1]

AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors per spatial location for feature map i.

Parameters:

sizes (Sequence[Sequence[int]]) – base size of each anchor. len(sizes) is the number of feature maps, i.e., the number of output levels for the feature pyramid network (FPN). Each element of sizes is a Sequence which represents several anchor sizes for each feature map.
aspect_ratios (Sequence) – the aspect ratios of anchors. len(aspect_ratios) = len(sizes). For 2D images, each element of aspect_ratios[i] is a Sequence of float. For 3D images, each element of aspect_ratios[i] is a Sequence of 2 value Sequence.
indexing (str) –
choose from {'ij', 'xy'}, optional, Matrix ('ij', default and recommended) or Cartesian ('xy') indexing of output.
- Matrix ('ij', default and recommended) indexing keeps the original axis not changed.
- To use other monai detection components, please set indexing = 'ij'.
- Cartesian ('xy') indexing swaps axis 0 and 1.
- For 2D cases, monai AnchorGenerator(sizes, aspect_ratios, indexing='xy') and torchvision.models.detection.anchor_utils.AnchorGenerator(sizes, aspect_ratios) are equivalent.

Reference:.: pytorch/vision

Example

# 2D example inputs for a 2-level feature maps
sizes = ((10,12,14,16), (20,24,28,32))
base_aspect_ratios = (1., 0.5,  2.)
aspect_ratios = (base_aspect_ratios, base_aspect_ratios)
anchor_generator = AnchorGenerator(sizes, aspect_ratios)

# 3D example inputs for a 2-level feature maps
sizes = ((10,12,14,16), (20,24,28,32))
base_aspect_ratios = ((1., 1.), (1., 0.5), (0.5, 1.), (2., 2.))
aspect_ratios = (base_aspect_ratios, base_aspect_ratios)
anchor_generator = AnchorGenerator(sizes, aspect_ratios)

forward(images, feature_maps)[source]#

Generate anchor boxes for each image.

Parameters:

images (Tensor) – sized (B, C, W, H) or (B, C, W, H, D)
feature_maps (list[Tensor]) – for FPN level i, feature_maps[i] is sized (B, C_i, W_i, H_i) or (B, C_i, W_i, H_i, D_i). This input argument does not have to be the actual feature maps. Any list variable with the same (C_i, W_i, H_i) or (C_i, W_i, H_i, D_i) as feature maps works.

Return type:

list[Tensor]

Returns:

A list with length of B. Each element represents the anchors for this image. The B elements are identical.

Example

images = torch.zeros((3,1,128,128,128))
feature_maps = [torch.zeros((3,6,64,64,32)), torch.zeros((3,6,32,32,16))]
anchor_generator(images, feature_maps)

generate_anchors(scales, aspect_ratios, dtype=torch.float32, device=None)[source]#

Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.

Parameters:

scales – a sequence which represents several anchor sizes for the current feature map.
aspect_ratios – a sequence which represents several aspect_ratios for the current feature map. For 2D images, it is a Sequence of float aspect_ratios[j], anchor width and height w:h = 1:aspect_ratios[j]. For 3D images, it is a Sequence of 2 value Sequence aspect_ratios[j,0] and aspect_ratios[j,1], anchor width, height, and depth w:h:d = 1:aspect_ratios[j,0]:aspect_ratios[j,1]
dtype – target data type of the output Tensor.
device – target device to put the output Tensor data.
Returns – For each s in scales, returns [s, s*aspect_ratios[j]] for 2D images, and [s, s*aspect_ratios[j,0],s*aspect_ratios[j,1]] for 3D images.

grid_anchors(grid_sizes, strides)[source]#

Every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:spatial_dims) corresponds to a feature map. It outputs g[i] anchors that are s[i] distance apart in direction i, with the same dimensions as a.

Parameters:

grid_sizes (list[list[int]]) – spatial size of the feature maps
strides (list[list[Tensor]]) – strides of the feature maps regarding to the original image

Example

grid_sizes = [[100,100],[50,50]]
strides = [[torch.tensor(2),torch.tensor(2)], [torch.tensor(4),torch.tensor(4)]]

Return type:: list[Tensor]

num_anchors_per_location()[source]#: Return number of anchor shapes for each feature map.

set_cell_anchors(dtype, device)[source]#

Convert each element in self.cell_anchors to dtype and send to device.

Return type:: None

class monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(feature_map_scales=(1, 2, 4, 8), base_anchor_shapes=((32, 32, 32), (48, 20, 20), (20, 48, 20), (20, 20, 48)), indexing='ij')[source]#

Module that generates anchors for a set of feature maps and image sizes, inherited from AnchorGenerator

The module support computing anchors at multiple base anchor shapes per feature map.

feature_map_scales should have the same number of elements with the number of feature maps.

base_anchor_shapes can have an arbitrary number of elements. For 2D images, each element represents anchor width and height [w,h]. For 2D images, each element represents anchor width, height, and depth [w,h,d].

AnchorGenerator will output a set of len(base_anchor_shapes) anchors per spatial location for feature map i.

Parameters:

feature_map_scales – scale of anchors for each feature map, i.e., each output level of the feature pyramid network (FPN). len(feature_map_scales) is the number of feature maps. scale[i]*base_anchor_shapes represents the anchor shapes for feature map i.
base_anchor_shapes – a sequence which represents several anchor shapes for one feature map. For N-D images, it is a Sequence of N value Sequence.
indexing – choose from {‘xy’, ‘ij’}, optional Cartesian (‘xy’) or matrix (‘ij’, default) indexing of output. Cartesian (‘xy’) indexing swaps axis 0 and 1, which is the setting inside torchvision. matrix (‘ij’, default) indexing keeps the original axis not changed. See also indexing in https://pytorch.org/docs/stable/generated/torch.meshgrid.html

Example

# 2D example inputs for a 2-level feature maps
feature_map_scales = (1, 2)
base_anchor_shapes = ((10, 10), (6, 12), (12, 6))
anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes)

# 3D example inputs for a 2-level feature maps
feature_map_scales = (1, 2)
base_anchor_shapes = ((10, 10, 10), (12, 12, 8), (10, 10, 6), (16, 16, 10))
anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes)

static generate_anchors_using_shape(anchor_shapes, dtype=torch.float32, device=None)[source]#

Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.

Parameters:

anchor_shapes – [w, h] or [w, h, d], sized (N, spatial_dims), represents N anchor shapes for the current feature map.
dtype – target data type of the output Tensor.
device – target device to put the output Tensor data.

Returns:

For 2D images, returns [-w/2, -h/2, w/2, h/2]; For 3D images, returns [-w/2, -h/2, -d/2, w/2, h/2, d/2]

Matcher#

The functions in this script are adapted from nnDetection, MIC-DKFZ/nnDetection which is adapted from torchvision.

These are the changes compared with nndetection: 1) comments and docstrings; 2) reformat; 3) add a debug option to ATSSMatcher to help the users to tune parameters; 4) add a corner case return in ATSSMatcher.compute_matches; 5) add support for float16 cpu

class monai.apps.detection.utils.ATSS_matcher.ATSSMatcher(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#

__init__(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#

Compute matching based on ATSS https://arxiv.org/abs/1912.02424 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

Parameters:

num_candidates (int) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.
similarity_fn (Callable[Tensor, Tensor, Tensor]) – function for similarity computation between boxes and anchors
center_in_gt (bool) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.
debug (bool) – if True, will print the matcher threshold in order to tune num_candidates and center_in_gt.

compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#

Compute matches according to ATTS for a single image Adapted from (sfzhang15/ATSS)

Parameters:

boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
anchors (Tensor) – anchors to match Mx4 or Mx6, also assumed to be StandardMode.
num_anchors_per_level (Sequence[int]) – number of anchors per feature pyramid level
num_anchors_per_loc (int) – number of anchors per position

Return type:

tuple[Tensor, Tensor]

Returns:

matrix which contains the similarity from each boxes to each anchor [N, M]
vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]

Note

StandardMode = CornerCornerModeTypeA, also represented as “xyxy” ([xmin, ymin, xmax, ymax]) for 2D and “xyzxyz” ([xmin, ymin, zmin, xmax, ymax, zmax]) for 3D.

class monai.apps.detection.utils.ATSS_matcher.Matcher(similarity_fn=<function box_iou>)[source]#

Base class of Matcher, which matches boxes and anchors to each other

Parameters:: similarity_fn (Callable[Tensor, Tensor, Tensor]) – function for similarity computation between boxes and anchors

abstract compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#

Compute matches

Parameters:

boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
anchors (Tensor) – anchors to match Mx4 or Mx6, also assumed to be StandardMode.
num_anchors_per_level (Sequence[int]) – number of anchors per feature pyramid level
num_anchors_per_loc (int) – number of anchors per position

Return type:

tuple[Tensor, Tensor]

Returns:

matrix which contains the similarity from each boxes to each anchor [N, M]
vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]

Box coder#

This script is modified from torchvision to support N-D images,

pytorch/vision

class monai.apps.detection.utils.box_coder.BoxCoder(weights, boxes_xform_clip=None)[source]#

This class encodes and decodes a set of bounding boxes into the representation used for training the regressors.

Parameters:

weights – 4-element tuple or 6-element tuple
boxes_xform_clip – high threshold to prevent sending too large values into torch.exp()

Example

box_coder = BoxCoder(weights=[1., 1., 1., 1., 1., 1.])
gt_boxes = torch.tensor([[1,2,1,4,5,6],[1,3,2,7,8,9]])
proposals = gt_boxes + torch.rand(gt_boxes.shape)
rel_gt_boxes = box_coder.encode_single(gt_boxes, proposals)
gt_back = box_coder.decode_single(rel_gt_boxes, proposals)
# We expect gt_back to be equal to gt_boxes

decode(rel_codes, reference_boxes)[source]#

From a set of original reference_boxes and encoded relative box offsets,

Parameters:

rel_codes (Tensor) – encoded boxes, Nx4 or Nx6 torch tensor.
reference_boxes (Sequence[Tensor]) – a list of reference boxes, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to be StandardMode

Return type:

Tensor

Returns:

decoded boxes, Nx1x4 or Nx1x6 torch tensor. The box mode will be StandardMode

decode_single(rel_codes, reference_boxes)[source]#

From a set of original boxes and encoded relative box offsets,

Parameters:

rel_codes (Tensor) – encoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor.
reference_boxes (Tensor) – reference boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

Return type:

Tensor

Returns:

decoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor. The box mode will to be StandardMode

encode(gt_boxes, proposals)[source]#

Encode a set of proposals with respect to some ground truth (gt) boxes.

Parameters:

gt_boxes (Sequence[Tensor]) – list of gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode
proposals (Sequence[Tensor]) – list of boxes to be encoded, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to be StandardMode

Return type:

tuple[Tensor]

Returns:

A tuple of encoded gt, target of box regression that is used to: convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

encode_single(gt_boxes, proposals)[source]#

Encode proposals with respect to ground truth (gt) boxes.

Parameters:

gt_boxes (Tensor) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode
proposals (Tensor) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

Return type:

Tensor

Returns:

encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

monai.apps.detection.utils.box_coder.encode_boxes(gt_boxes, proposals, weights)[source]#

Encode a set of proposals with respect to some reference ground truth (gt) boxes.

Parameters:

gt_boxes (Tensor) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode
proposals (Tensor) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode
weights (Tensor) – the weights for (cx, cy, w, h) or (cx,cy,cz, w,h,d)

Return type:

Tensor

Returns:

encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

Detection Utilities#

monai.apps.detection.utils.detector_utils.check_input_images(input_images, spatial_dims)[source]#

Validate the input dimensionality (raise a ValueError if invalid).

Parameters:

input_images – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims – number of spatial dimensions of the images, 2 or 3.

monai.apps.detection.utils.detector_utils.check_training_targets(input_images, targets, spatial_dims, target_label_key, target_box_key)[source]#

Validate the input images/targets during training (raise a ValueError if invalid).

Parameters:

input_images – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
targets – a list of dict. Each dict with two keys: target_box_key and target_label_key, ground-truth boxes present in the image.
spatial_dims – number of spatial dimensions of the images, 2 or 3.
target_label_key – the expected key of target labels.
target_box_key – the expected key of target boxes.

monai.apps.detection.utils.detector_utils.pad_images(input_images, spatial_dims, size_divisible, mode=constant, **kwargs)[source]#

Pad the input images, so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0

Parameters:

input_images – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims – number of spatial dimensions of the images, 2D or 3D.
size_divisible – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.
mode – available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
kwargs – other arguments for torch.pad function.

Returns:

images, a (B, C, H, W) or (B, C, H, W, D) Tensor
image_sizes, the original spatial size of each image

monai.apps.detection.utils.detector_utils.preprocess_images(input_images, spatial_dims, size_divisible, mode=constant, **kwargs)[source]#

Preprocess the input images, including

validate of the inputs
pad the inputs so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0

Parameters:

input_images – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims – number of spatial dimensions of the images, 2 or 3.
size_divisible – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.
mode – available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
kwargs – other arguments for torch.pad function.

Returns:

images, a (B, C, H, W) or (B, C, H, W, D) Tensor
image_sizes, the original spatial size of each image

monai.apps.detection.utils.predict_utils.check_dict_values_same_length(head_outputs, keys=None)[source]#

We expect the values in head_outputs: Dict[str, List[Tensor]] to have the same length. Will raise ValueError if not.

Parameters:

head_outputs – a Dict[str, List[Tensor]] or Dict[str, Tensor]
keys – the keys in head_output that need to have values (List) with same length. If not provided, will use head_outputs.keys().

monai.apps.detection.utils.predict_utils.ensure_dict_value_to_list_(head_outputs, keys=None)[source]#

An in-place function. We expect head_outputs to be Dict[str, List[Tensor]]. Yet if it is Dict[str, Tensor], this func converts it to Dict[str, List[Tensor]]. It will be modified in-place.

Parameters:

head_outputs – a Dict[str, List[Tensor]] or Dict[str, Tensor], will be modifier in-place
keys – the keys in head_output that need to have value type List[Tensor]. If not provided, will use head_outputs.keys().

monai.apps.detection.utils.predict_utils.predict_with_inferer(images, network, keys, inferer=None)[source]#

Predict network dict output with an inferer. Compared with directly output network(images), it enables a sliding window inferer that can be used to handle large inputs.

Parameters:

images – input of the network, Tensor sized (B, C, H, W) or (B, C, H, W, D)
network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].
keys – the keys in the output dict, should be network output keys or a subset of them.
inferer – a SlidingWindowInferer to handle large inputs.

Returns:

The predicted head_output from network, a Dict[str, List[Tensor]]

Example

# define a naive network
import torch
import monai
class NaiveNet(torch.nn.Module):
    def __init__(self, ):
        super().__init__()

    def forward(self, images: torch.Tensor):
        return {"cls": torch.randn(images.shape), "box_reg": [torch.randn(images.shape)]}

# create a predictor
network = NaiveNet()
inferer = monai.inferers.SlidingWindowInferer(
    roi_size = (128, 128, 128),
    overlap = 0.25,
    cache_roi_weight_map = True,
)
network_output_keys=["cls", "box_reg"]
images = torch.randn((2, 3, 512, 512, 512))  # a large input
head_outputs = predict_with_inferer(images, network, network_output_keys, inferer)

Inference box selector#

Part of this script is adapted from pytorch/vision

class monai.apps.detection.utils.box_selector.BoxSelector(box_overlap_metric=<function box_iou>, apply_sigmoid=True, score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300)[source]#

Box selector which selects the predicted boxes. The box selection is performed with the following steps:

For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.

Parameters:

apply_sigmoid (bool) – whether to apply sigmoid to get scores from classification logits
score_thresh (float) – no box with scores less than score_thresh will be kept
topk_candidates_per_level (int) – max number of boxes to keep for each level
nms_thresh (float) – box overlapping threshold for NMS
detections_per_img (int) – max number of boxes to keep for each image

Example

input_param = {
    "apply_sigmoid": True,
    "score_thresh": 0.1,
    "topk_candidates_per_level": 2,
    "nms_thresh": 0.1,
    "detections_per_img": 5,
}
box_selector = BoxSelector(**input_param)
boxes = [torch.randn([3,6]), torch.randn([7,6])]
logits = [torch.randn([3,3]), torch.randn([7,3])]
spatial_size = (8,8,8)
selected_boxes, selected_scores, selected_labels = box_selector.select_boxes_per_image(
    boxes, logits, spatial_size
)

select_boxes_per_image(boxes_list, logits_list, spatial_size)[source]#

Postprocessing to generate detection result from classification logits and boxes.

The box selection is performed with the following steps:

For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.

Parameters:

boxes_list – list of predicted boxes from a single image, each element i is a Tensor sized (N_i, 2*spatial_dims)
logits_list – list of predicted classification logits from a single image, each element i is a Tensor sized (N_i, num_classes)
spatial_size – spatial size of the image

Returns:

selected boxes, Tensor sized (P, 2*spatial_dims)
selected_scores, Tensor sized (P, )
selected_labels, Tensor sized (P, )

select_top_score_idx_per_level(logits)[source]#

Select indices with highest scores.

The indices selection is performed with the following steps:

If self.apply_sigmoid, get scores by applying sigmoid to logits. Otherwise, use logits as scores.
Discard indices with scores less than self.score_thresh
Keep indices with top self.topk_candidates_per_level scores

Parameters:

logits (Tensor) – predicted classification logits, Tensor sized (N, num_classes)

Returns:

selected M indices, Tensor sized (M, ) - selected_scores: selected M scores, Tensor sized (M, ) - selected_labels: selected M labels, Tensor sized (M, )

Return type:

topk_idxs

Detection metrics#

This script is almost same with MIC-DKFZ/nnDetection The changes include 1) code reformatting, 2) docstrings.

This script is almost same with MIC-DKFZ/nnDetection The changes include 1) code reformatting, 2) docstrings, 3) allow input args gt_ignore to be optional. (If so, no GT boxes will be ignored.)

monai.apps.detection.metrics.matching.matching_batch(iou_fn, iou_thresholds, pred_boxes, pred_classes, pred_scores, gt_boxes, gt_classes, gt_ignore=None, max_detections=100)[source]#

Match boxes of a batch to corresponding ground truth for each category independently.

Parameters:

iou_fn – compute overlap for each pair
iou_thresholds – defined which IoU thresholds should be evaluated
pred_boxes – predicted boxes from single batch; List[[D, dim * 2]], D number of predictions
pred_classes – predicted classes from a single batch; List[[D]], D number of predictions
pred_scores – predicted score for each bounding box; List[[D]], D number of predictions
gt_boxes – ground truth boxes; List[[G, dim * 2]], G number of ground truth
gt_classes – ground truth classes; List[[G]], G number of ground truth
gt_ignore – specified if which ground truth boxes are not counted as true positives. If not given, when use all the gt_boxes. (detections which match theses boxes are not counted as false positives either); List[[G]], G number of ground truth
max_detections – maximum number of detections which should be evaluated

Returns:

List[Dict[int, Dict[str, np.ndarray]]], each Dict[str, np.ndarray] corresponds to an image. Dict has the following keys.

dtMatches: matched detections [T, D], where T = number of thresholds, D = number of detections
gtMatches: matched ground truth boxes [T, G], where T = number of thresholds, G = number of ground truth
dtScores: prediction scores [D] detection scores
gtIgnore: ground truth boxes which should be ignored [G] indicate whether ground truth should be ignored
dtIgnore: detections which should be ignored [T, D], indicate which detections should be ignored

Example

from monai.data.box_utils import box_iou
from monai.apps.detection.metrics.coco import COCOMetric
from monai.apps.detection.metrics.matching import matching_batch
# 3D example outputs of one image from detector
val_outputs_all = [
        {"boxes": torch.tensor([[1,1,1,3,4,5]],dtype=torch.float16),
        "labels": torch.randint(3,(1,)),
        "scores": torch.randn((1,)).absolute()},
]
val_targets_all = [
        {"boxes": torch.tensor([[1,1,1,2,6,4]],dtype=torch.float16),
        "labels": torch.randint(3,(1,))},
]

coco_metric = COCOMetric(
    classes=['c0','c1','c2'], iou_list=[0.1], max_detection=[10]
)
results_metric = matching_batch(
    iou_fn=box_iou,
    iou_thresholds=coco_metric.iou_thresholds,
    pred_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_outputs_all],
    pred_classes=[val_data_i["labels"].numpy() for val_data_i in val_outputs_all],
    pred_scores=[val_data_i["scores"].numpy() for val_data_i in val_outputs_all],
    gt_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_targets_all],
    gt_classes=[val_data_i["labels"].numpy() for val_data_i in val_targets_all],
)
val_metric_dict = coco_metric(results_metric)
print(val_metric_dict)

Reconstruction#

FastMRIReader#

class monai.apps.reconstruction.fastmri_reader.FastMRIReader(*args, **kwargs)[source]#

Load fastMRI files with ‘.h5’ suffix. fastMRI files, when loaded with “h5py”, are HDF5 dictionary-like datasets. The keys are:

kspace: contains the fully-sampled kspace
reconstruction_rss: contains the root sum of squares of ifft of kspace. This
is the ground-truth image.

It also has several attributes with the following keys:

acquisition (str): acquisition mode of the data (e.g., AXT2 denotes T2 brain MRI scans)
max (float): dynamic range of the data
norm (float): norm of the kspace
patient_id (str): the patient’s id whose measurements were recorded

get_data(dat)[source]#

Extract data array and metadata from the loaded data and return them. This function returns two objects, first is numpy array of image data, second is dict of metadata.

Parameters:: dat (dict) – a dictionary loaded from an h5 file
Return type:: tuple[ndarray, dict]

read(data)[source]#

Read data from specified h5 file. Note that the returned object is a dictionary.

Parameters:: data – file name to read.

verify_suffix(filename)[source]#

Verify whether the specified file format is supported by h5py reader.

Parameters:: filename – file name

ConvertToTensorComplex#

monai.apps.reconstruction.complex_utils.convert_to_tensor_complex(data, dtype=None, device=None, wrap_sequence=True, track_meta=False)[source]#

Convert complex-valued data to a 2-channel PyTorch tensor. The real and imaginary parts are stacked along the last dimension. This function relies on ‘monai.utils.type_conversion.convert_to_tensor’

Parameters:

data – input data can be PyTorch Tensor, numpy array, list, int, and float. will convert Tensor, Numpy array, float, int, bool to Tensor, strings and objects keep the original. for list, convert every item to a Tensor if applicable.
dtype – target data type to when converting to Tensor.
device – target device to put the converted Tensor data.
wrap_sequence – if False, then lists will recursively call this function. E.g., [1, 2] -> [tensor(1), tensor(2)]. If True, then [1, 2] -> tensor([1, 2]).
track_meta – whether to track the meta information, if True, will convert to MetaTensor. default to False.

Returns:

PyTorch version of the data

Example

import numpy as np
data = np.array([ [1+1j, 1-1j], [2+2j, 2-2j] ])
# the following line prints (2,2)
print(data.shape)
# the following line prints torch.Size([2, 2, 2])
print(convert_to_tensor_complex(data).shape)

ComplexAbs#

monai.apps.reconstruction.complex_utils.complex_abs(x)[source]#

Compute the absolute value of a complex array.

Parameters:: x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.
Return type:: Union[ndarray, Tensor]
Returns:: Absolute value along the last dimension

Example

import numpy as np
x = np.array([3,4])[np.newaxis]
# the following line prints 5
print(complex_abs(x))

RootSumOfSquares#

monai.apps.reconstruction.mri_utils.root_sum_of_squares(x, spatial_dim)[source]#

Compute the root sum of squares (rss) of the data (typically done for multi-coil MRI samples)

Parameters:

x (Union[ndarray, Tensor]) – Input array/tensor
spatial_dim (int) – dimension along which rss is applied

Return type:

Union[ndarray, Tensor]

Returns:

rss of x along spatial_dim

Example

import numpy as np
x = np.ones([2,3])
# the following line prints array([1.41421356, 1.41421356, 1.41421356])
print(rss(x,spatial_dim=0))

ComplexMul#

monai.apps.reconstruction.complex_utils.complex_mul(x, y)[source]#

Compute complex-valued multiplication. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)

Parameters:

x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.
y (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.

Return type:

Union[ndarray, Tensor]

Returns:

Complex multiplication of x and y

Example

import numpy as np
x = np.array([[1,2],[3,4]])
y = np.array([[1,1],[1,1]])
# the following line prints array([[-1,  3], [-1,  7]])
print(complex_mul(x,y))

ComplexConj#

monai.apps.reconstruction.complex_utils.complex_conj(x)[source]#

Compute complex conjugate of an/a array/tensor. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)

Parameters:: x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.
Return type:: Union[ndarray, Tensor]
Returns:: Complex conjugate of x

Example

import numpy as np
x = np.array([[1,2],[3,4]])
# the following line prints array([[ 1, -2], [ 3, -4]])
print(complex_conj(x))

Auto3DSeg#

class monai.apps.auto3dseg.AlgoEnsemble[source]#

The base class of Ensemble methods

__call__(pred_param=None)[source]#

Use the ensembled model to predict result.

Parameters:

pred_param –

prediction parameter dictionary. The key has two groups: the first one will be consumed in this function, and the second group will be passed to the InferClass to override the parameters of the class functions. The first group contains:

"infer_files": file paths to the images to read in a list.

"files_slices": a value type of slice. The files_slices will slice the "infer_files" and only make prediction on the infer_files[file_slices].

"mode": ensemble mode. Currently “mean” and “vote” (majority voting) schemes are supported.

"image_save_func": a dictionary used to instantiate the SaveImage transform. When specified, the ensemble prediction will save the prediction files, instead of keeping the files in the memory. Example: {“_target_”: “SaveImage”, “output_dir”: “./”}

"sigmoid": use the sigmoid function (e.g. x > 0.5) to convert the prediction probability map to the label class prediction, otherwise argmax(x) is used.

"algo_spec_params": a dictionary to add pred_params that are specific to a model. The dict has a format of {“<name of algo>”: “<pred_params for that algo>”}.

The parameters in the second group is defined in the config of each Algo templates. Please check: Project-MONAI/research-contributions

Returns:

A list of tensors or file paths, depending on whether "image_save_func" is set.

ensemble_pred(preds, sigmoid=False)[source]#

ensemble the results using either “mean” or “vote” method

Parameters:

preds – a list of probability prediction in Tensor-Like format.
sigmoid – use the sigmoid function to threshold probability one-hot map, otherwise argmax is used. Defaults to False

Returns:

a tensor which is the ensembled prediction.

get_algo(identifier)[source]#

Get a model by identifier.

Parameters:: identifier – the name of the bundleAlgo

get_algo_ensemble()[source]#

Get the algo ensemble after ranking or a empty list if ranking was not started.

Returns:: A list of Algo

set_algos(infer_algos)[source]#: Register model in the ensemble

set_infer_files(dataroot, data_list_or_path, data_key='testing')[source]#

Set the files to perform model inference.

Parameters:

dataroot – the path of the files
data_list_or_path – the data source file path

class monai.apps.auto3dseg.AlgoEnsembleBestByFold(n_fold=5)[source]#

Ensemble method that select the best models that are the tops in each fold.

Parameters:: n_fold (int) – number of cross-validation folds used in training

collect_algos()[source]#

Rank the algos by finding the best model in each cross-validation fold

Return type:: None

class monai.apps.auto3dseg.AlgoEnsembleBestN(n_best=5)[source]#

Ensemble method that select N model out of all using the models’ best_metric scores

Parameters:: n_best (int) – number of models to pick for ensemble (N).

collect_algos(n_best=-1)[source]#

Rank the algos by finding the top N (n_best) validation scores.

Return type:: None

sort_score()[source]#: Sort the best_metrics

class monai.apps.auto3dseg.AlgoEnsembleBuilder(history, data_src_cfg_name=None)[source]#

Build ensemble workflow from configs and arguments.

Parameters:

history – a collection of trained bundleAlgo algorithms.
data_src_cfg_name – filename of the data source.

Examples

builder = AlgoEnsembleBuilder(history, data_src_cfg)
builder.set_ensemble_method(BundleAlgoEnsembleBestN(3))
ensemble = builder.get_ensemble()

add_inferer(identifier, gen_algo, best_metric=None)[source]#

Add model inferer to the builder.

Parameters:

identifier – name of the bundleAlgo.
gen_algo – a trained BundleAlgo model object.
best_metric – the best metric in validation of the trained model.

get_ensemble()[source]#: Get the ensemble

set_ensemble_method(ensemble, *args, **kwargs)[source]#

Set the ensemble method.

Parameters:: ensemble (AlgoEnsemble) – the AlgoEnsemble to build.
Return type:: None

class monai.apps.auto3dseg.AutoRunner(work_dir='./work_dir', input=None, algos=None, analyze=None, algo_gen=None, train=None, hpo=False, hpo_backend='nni', ensemble=True, not_use_cache=False, templates_path_or_url=None, allow_skip=True, mlflow_tracking_uri=None, mlflow_experiment_name=None, **kwargs)[source]#

An interface for handling Auto3Dseg with minimal inputs and understanding of the internal states in Auto3Dseg. The users can run the Auto3Dseg with default settings in one line of code. They can also customize the advanced features Auto3Dseg in a few additional lines. Examples of customization include

change cross-validation folds

change training/prediction parameters

change ensemble methods

automatic hyperparameter optimization.

The output of the interface is a directory that contains

data statistics analysis report

algorithm definition files (scripts, configs, pickle objects) and training results (checkpoints, accuracies)

the predictions on the testing datasets from the final algorithm ensemble

a copy of the input arguments in form of YAML

cached intermediate results

Parameters:

work_dir – working directory to save the intermediate and final results.
input – the configuration dictionary or the file path to the configuration in form of YAML. The configuration should contain datalist, dataroot, modality, multigpu, and class_names info.
algos – optionally specify algorithms to use. If a dictionary, must be in the form {“algname”: dict(_target_=”algname.scripts.algo.AlgnameAlgo”, template_path=”algname”), …} If a list or a string, defines a subset of names of the algorithms to use, e.g. ‘segresnet’ or [‘segresnet’, ‘dints’] out of the full set of algorithm templates provided by templates_path_or_url. Defaults to None, to use all available algorithms.
analyze – on/off switch to run DataAnalyzer and generate a datastats report. Defaults to None, to automatically decide based on cache, and run data analysis only if we have not completed this step yet.
algo_gen – on/off switch to run AlgoGen and generate templated BundleAlgos. Defaults to None, to automatically decide based on cache, and run algorithm folders generation only if we have not completed this step yet.
train – on/off switch to run training and generate algorithm checkpoints. Defaults to None, to automatically decide based on cache, and run training only if we have not completed this step yet.
hpo – use hyperparameter optimization (HPO) in the training phase. Users can provide a list of hyper-parameter and a search will be performed to investigate the algorithm performances.
hpo_backend – a string that indicates the backend of the HPO. Currently, only NNI Grid-search mode is supported
ensemble – on/off switch to run model ensemble and use the ensemble to predict outputs in testing datasets.
not_use_cache – if the value is True, it will ignore all cached results in data analysis, algorithm generation, or training, and start the pipeline from scratch.
templates_path_or_url – the folder with the algorithm templates or a url. If None provided, the default template zip url will be downloaded and extracted into the work_dir.
allow_skip – a switch passed to BundleGen process which determines if some Algo in the default templates can be skipped based on the analysis on the dataset from Auto3DSeg DataAnalyzer.
mlflow_tracking_uri – a tracking URI for MLflow server which could be local directory or address of the remote tracking Server; MLflow runs will be recorded locally in algorithms’ model folder if the value is None.
mlflow_experiment_name – the name of the experiment in MLflow server.
kwargs – image writing parameters for the ensemble inference. The kwargs format follows the SaveImage transform. For more information, check https://docs.monai.io/en/stable/transforms.html#saveimage.

Examples

User can use the one-liner to start the Auto3Dseg workflow

python -m monai.apps.auto3dseg AutoRunner run --input             '{"modality": "ct", "datalist": "dl.json", "dataroot": "/dr", "multigpu": true, "class_names": ["A", "B"]}'

User can also save the input dictionary as a input YAML file and use the following one-liner

python -m monai.apps.auto3dseg AutoRunner run --input=./input.yaml

User can specify work_dir and data source config input and run AutoRunner:

work_dir = "./work_dir"
input = "path/to/input_yaml"
runner = AutoRunner(work_dir=work_dir, input=input)
runner.run()

User can specify a subset of algorithms to use and run AutoRunner:

work_dir = "./work_dir"
input = "path/to/input_yaml"
algos = ["segresnet", "dints"]
runner = AutoRunner(work_dir=work_dir, input=input, algos=algos)
runner.run()

User can specify a local folder with algorithms templates and run AutoRunner:

work_dir = "./work_dir"
input = "path/to/input_yaml"
algos = "segresnet"
templates_path_or_url = "./local_path_to/algorithm_templates"
runner = AutoRunner(work_dir=work_dir, input=input, algos=algos, templates_path_or_url=templates_path_or_url)
runner.run()

User can specify training parameters by:

input = "path/to/input_yaml"
runner = AutoRunner(input=input)
train_param = {
    "num_epochs_per_validation": 1,
    "num_images_per_batch": 2,
    "num_epochs": 2,
}
runner.set_training_params(params=train_param)  # 2 epochs
runner.run()

User can specify the fold number of cross validation

input = "path/to/input_yaml"
runner = AutoRunner(input=input)
runner.set_num_fold(n_fold = 2)
runner.run()

User can specify the prediction parameters during algo ensemble inference:

input = "path/to/input_yaml"
pred_params = {
    'files_slices': slice(0,2),
    'mode': "vote",
    'sigmoid': True,
}
runner = AutoRunner(input=input)
runner.set_prediction_params(params=pred_params)
runner.run()

User can define a grid search space and use the HPO during training.

input = "path/to/input_yaml"
runner = AutoRunner(input=input, hpo=True)
runner.set_nni_search_space({"learning_rate": {"_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1]}})
runner.run()

Notes

Expected results in the work_dir as below:

work_dir/
├── algorithm_templates # bundle algo templates (scripts/configs)
├── cache.yaml          # Autorunner will automatically cache results to save time
├── datastats.yaml      # datastats of the dataset
├── dints_0             # network scripts/configs/checkpoints and pickle object of the algo
├── ensemble_output     # the prediction of testing datasets from the ensemble of the algos
├── input.yaml          # copy of the input data source configs
├── segresnet_0         # network scripts/configs/checkpoints and pickle object of the algo
├── segresnet2d_0       # network scripts/configs/checkpoints and pickle object of the algo
└── swinunetr_0         # network scripts/configs/checkpoints and pickle object of the algo

export_cache(**kwargs)[source]#: Save the cache state as cache.yaml in the working directory

inspect_datalist_folds(datalist_filename)[source]#

Returns number of folds in the datalist file, and assigns fold numbers if not provided.

Parameters:: datalist_filename (str) – path to the datalist file.

Notes

If the fold key is not provided, it auto generates 5 folds assignments in the training key list. If validation key list is available, then it assumes a single fold validation.

Return type:: int

read_cache()[source]#

Check if the intermediate result is cached after each step in the current working directory

Returns:: a dict of cache results. If not_use_cache is set to True, or there is no cache file in the working directory, the result will be empty_cache in which all has_cache keys are set to False.

run()[source]#: Run the AutoRunner pipeline

set_analyze_params(params=None)[source]#

Set the data analysis extra params.

Parameters:: params – a dict that defines the overriding key-value pairs during training. The overriding method is defined by the algo class.

set_device_info(cuda_visible_devices=None, num_nodes=None, mn_start_method=None, cmd_prefix=None)[source]#

Set the device related info

Parameters:

cuda_visible_devices – define GPU ids for data analyzer, training, and ensembling. List of GPU ids [0,1,2,3] or a string “0,1,2,3”. Default using env “CUDA_VISIBLE_DEVICES” or all devices available.
num_nodes – number of nodes for training and ensembling. Default using env “NUM_NODES” or 1 if “NUM_NODES” is unset.
mn_start_method – multi-node start method. Autorunner will use the method to start multi-node processes. Default using env “MN_START_METHOD” or ‘bcprun’ if “MN_START_METHOD” is unset.
cmd_prefix –
command line prefix for subprocess running in BundleAlgo and EnsembleRunner. Default using env “CMD_PREFIX” or None, examples are:
- single GPU/CPU or multinode bcprun: “python “ or “/opt/conda/bin/python3.8 “,
- single node multi-GPU running “torchrun –nnodes=1 –nproc_per_node=2 “
If user define this prefix, please make sure –nproc_per_node matches cuda_visible_device or os.env[‘CUDA_VISIBLE_DEVICES’]. Also always set –nnodes=1. Set num_nodes for multi-node.

set_ensemble_method(ensemble_method_name='AlgoEnsembleBestByFold', **kwargs)[source]#

Set the bundle ensemble method name and parameters for save image transform parameters.

Parameters:

ensemble_method_name (str) – the name of the ensemble method. Only two methods are supported “AlgoEnsembleBestN” and “AlgoEnsembleBestByFold”.
kwargs (Any) – the keyword arguments used to define the ensemble method. Currently only n_best for AlgoEnsembleBestN is supported.

Return type:

AutoRunner

set_gpu_customization(gpu_customization=False, gpu_customization_specs=None)[source]#

Set options for GPU-based parameter customization/optimization.

Parameters:

gpu_customization – the switch to determine automatically customize/optimize bundle script/config parameters for each bundleAlgo based on gpus. Custom parameters are obtained through dummy training to simulate the actual model training process and hyperparameter optimization (HPO) experiments.

gpu_customization_specs (optional) –

the dictionary to enable users overwrite the HPO settings. user can overwrite part of variables as follows or all of them. The structure is as follows.

gpu_customization_specs = {
    'ALGO': {
        'num_trials': 6,
        'range_num_images_per_batch': [1, 20],
        'range_num_sw_batch_size': [1, 20]
    }
}

ALGO –
the name of algorithm. It could be one of algorithm names (e.g., ‘dints’) or ‘universal’ which would apply changes to all algorithms. Possible options are
- {"universal", "dints", "segresnet", "segresnet2d", "swinunetr"}.
num_trials – the number of HPO trials/experiments to run.
range_num_images_per_batch – the range of number of images per mini-batch.
range_num_sw_batch_size – the range of batch size in sliding-window inferer.

set_hpo_params(params=None)[source]#

Set parameters for the HPO module and the algos before the training. It will attempt to (1) override bundle templates with the key-value pairs in params (2) change the config of the HPO module (e.g. NNI) if the key is found to be one of:

“trialCodeDirectory”

“trialGpuNumber”

“trialConcurrency”

“maxTrialNumber”

“maxExperimentDuration”

“tuner”

“trainingService”

and (3) enable the dry-run mode if the user would generate the NNI configs without starting the NNI service.

Parameters:: params – a dict that defines the overriding key-value pairs during instantiation of the algo. For BundleAlgo, it will override the template config filling.

Notes

Users can set nni_dry_run to True in the params to enable the dry-run mode for the NNI backend.

set_image_save_transform(**kwargs)[source]#

Set the ensemble output transform.

Parameters:: kwargs (Any) – image writing parameters for the ensemble inference. The kwargs format follows SaveImage transform. For more information, check https://docs.monai.io/en/stable/transforms.html#saveimage.
Return type:: AutoRunner

set_nni_search_space(search_space)[source]#

Set the search space for NNI parameter search.

Parameters:: search_space (dict[str, Any]) – hyper parameter search space in the form of dict. For more information, please check NNI documentation: https://nni.readthedocs.io/en/v2.2/Tutorial/SearchSpaceSpec.html .
Return type:: AutoRunner

set_num_fold(num_fold=5)[source]#

Set the number of cross validation folds for all algos.

Parameters:: num_fold (int) – a positive integer to define the number of folds.
Return type:: AutoRunner

set_prediction_params(params=None)[source]#

Set the prediction params for all algos.

Parameters:: params – a dict that defines the overriding key-value pairs during prediction. The overriding method is defined by the algo class.

Examples

For BundleAlgo objects, this set of param will specify the algo ensemble to only inference the first: two files in the testing datalist {“file_slices”: slice(0, 2)}

set_training_params(params=None)[source]#

Set the training params for all algos.

Parameters:: params – a dict that defines the overriding key-value pairs during training. The overriding method is defined by the algo class.

Examples

For BundleAlgo objects, the training parameter to shorten the training time to a few epochs can be: {“num_epochs”: 2, “num_epochs_per_validation”: 1}

class monai.apps.auto3dseg.BundleAlgo(template_path)[source]#

An algorithm represented by a set of bundle configurations and scripts.

BundleAlgo.cfg is a monai.bundle.ConfigParser instance.

from monai.apps.auto3dseg import BundleAlgo

data_stats_yaml = "../datastats.yaml"
algo = BundleAlgo(template_path="../algorithm_templates")
algo.set_data_stats(data_stats_yaml)
# algo.set_data_src("../data_src.json")
algo.export_to_disk(".", algo_name="segresnet2d_1")

This class creates MONAI bundles from a directory of ‘bundle template’. Different from the regular MONAI bundle format, the bundle template may contain placeholders that must be filled using fill_template_config during export_to_disk. Then created bundle keeps the same file structure as the template.

__init__(template_path)[source]#

Create an Algo instance based on the predefined Algo template.

Parameters:: template_path (Union[str, PathLike]) – path to a folder that contains the algorithm templates. Please check Project-MONAI/research-contributions

export_to_disk(output_path, algo_name, **kwargs)[source]#

Fill the configuration templates, write the bundle (configs + scripts) to folder output_path/algo_name.

Parameters:

output_path (str) – Path to export the ‘scripts’ and ‘configs’ directories.
algo_name (str) – the identifier of the algorithm (usually contains the name and extra info like fold ID).
kwargs (Any) – other parameters, including: “copy_dirs=True/False” means whether to copy the template as output instead of inplace operation, “fill_template=True/False” means whether to fill the placeholders in the template. other parameters are for fill_template_config function.

Return type:

None

fill_template_config(data_stats_filename, algo_path, **kwargs)[source]#

The configuration files defined when constructing this Algo instance might not have a complete training and validation pipelines. Some configuration components and hyperparameters of the pipelines depend on the training data and other factors. This API is provided to allow the creation of fully functioning config files. Return the records of filling template config: {“<config name>”: {“<placeholder key>”: value, …}, …}.

Parameters:: data_stats_filename (str) – filename of the data stats report (generated by DataAnalyzer)

Notes

Template filling is optional. The user can construct a set of pre-filled configs without replacing values by using the data analysis results. It is also intended to be re-implemented in subclasses of BundleAlgo if the user wants their own way of auto-configured template filling.

Return type:: dict

get_inferer(*args, **kwargs)[source]#

Load the InferClass from the infer.py. The InferClass should be defined in the template under the path of “scripts/infer.py”. It is required to define the “InferClass” (name is fixed) with two functions at least (__init__ and infer). The init class has an override kwargs that can be used to override parameters in the run-time optionally.

Examples:

class InferClass
    def __init__(self, config_file: Optional[Union[str, Sequence[str]]] = None, **override):
        # read configs from config_file (sequence)
        # set up transforms
        # set up model
        # set up other hyper parameters
        return

    @torch.no_grad()
    def infer(self, image_file):
        # infer the model and save the results to output
        return output

get_output_path()[source]#: Returns the algo output paths to find the algo scripts and configs.

get_score(*args, **kwargs)[source]#: Returns validation scores of the model trained by the current Algo.

pre_check_skip_algo(skip_bundlegen=False, skip_info='')[source]#

Analyse the data analysis report and check if the algorithm needs to be skipped. This function is overriden within algo. :type skip_bundlegen: bool :param skip_bundlegen: skip generating bundles for this algo if true. :type skip_info: str :param skip_info: info to print when skipped.

Return type:: tuple[bool, str]

predict(predict_files, predict_params=None)[source]#

Use the trained model to predict the outputs with a given input image.

Parameters:

predict_files – a list of paths to files to run inference on [“path_to_image_1”, “path_to_image_2”]
predict_params – a dict to override the parameters in the bundle config (including the files to predict).

set_data_source(data_src_cfg)[source]#

Set the data source configuration file

Parameters:: data_src_cfg (str) – path to a configuration file (yaml) that contains datalist, dataroot, and other params. The config will be in a form of {“modality”: “ct”, “datalist”: “path_to_json_datalist”, “dataroot”: “path_dir_data”}
Return type:: None

set_data_stats(data_stats_files)[source]#

Set the data analysis report (generated by DataAnalyzer).

Parameters:: data_stats_files (str) – path to the datastats yaml file
Return type:: None

set_mlflow_experiment_name(mlflow_experiment_name)[source]#

Set the experiment name for MLflow server

Parameters:: mlflow_experiment_name – a string to specify the experiment name for MLflow server.

set_mlflow_tracking_uri(mlflow_tracking_uri)[source]#

Set the tracking URI for MLflow server

Parameters:: mlflow_tracking_uri – a tracking URI for MLflow server which could be local directory or address of the remote tracking Server; MLflow runs will be recorded locally in algorithms’ model folder if the value is None.

train(train_params=None, device_setting=None)[source]#

Load the run function in the training script of each model. Training parameter is predefined by the algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.

Parameters:

train_params – training parameters
device_setting – device related settings, should follow the device_setting in auto_runner.set_device_info. ‘CUDA_VISIBLE_DEVICES’ should be a string e.g. ‘0,1,2,3’

class monai.apps.auto3dseg.BundleGen(algo_path='.', algos=None, templates_path_or_url=None, data_stats_filename=None, data_src_cfg_name=None, mlflow_tracking_uri=None, mlflow_experiment_name=None)[source]#

This class generates a set of bundles according to the cross-validation folds, each of them can run independently.

Parameters:

algo_path – the directory path to save the algorithm templates. Default is the current working dir.
algos – If dictionary, it outlines the algorithm to use. If a list or a string, defines a subset of names of the algorithms to use, e.g. (‘segresnet’, ‘dints’) out of the full set of algorithm templates provided by templates_path_or_url. Defaults to None - to use all available algorithms.
templates_path_or_url – the folder with the algorithm templates or a url. If None provided, the default template zip url will be downloaded and extracted into the algo_path. The current default options are released at: Project-MONAI/research-contributions.
data_stats_filename – the path to the data stats file (generated by DataAnalyzer).
data_src_cfg_name – the path to the data source config YAML file. The config will be in a form of {“modality”: “ct”, “datalist”: “path_to_json_datalist”, “dataroot”: “path_dir_data”}.
mlflow_tracking_uri – a tracking URI for MLflow server which could be local directory or address of the remote tracking Server; MLflow runs will be recorded locally in algorithms’ model folder if the value is None.
mlfow_experiment_name – a string to specify the experiment name for MLflow server.

python -m monai.apps.auto3dseg BundleGen generate --data_stats_filename="../algorithms/datastats.yaml"

generate(output_folder='.', num_fold=5, gpu_customization=False, gpu_customization_specs=None, allow_skip=True)[source]#

Generate the bundle scripts/configs for each bundleAlgo

Parameters:

output_folder – the output folder to save each algorithm.
num_fold – the number of cross validation fold.
gpu_customization – the switch to determine automatically customize/optimize bundle script/config parameters for each bundleAlgo based on gpus. Custom parameters are obtained through dummy training to simulate the actual model training process and hyperparameter optimization (HPO) experiments.
gpu_customization_specs – the dictionary to enable users overwrite the HPO settings. user can overwrite part of variables as follows or all of them. The structure is as follows.

allow_skip –

a switch to determine if some Algo in the default templates can be skipped based on the analysis on the dataset from Auto3DSeg DataAnalyzer.

gpu_customization_specs = {
    'ALGO': {
        'num_trials': 6,
        'range_num_images_per_batch': [1, 20],
        'range_num_sw_batch_size': [1, 20]
    }
}

ALGO –
the name of algorithm. It could be one of algorithm names (e.g., ‘dints’) or ‘universal’ which would apply changes to all algorithms. Possible options are
- {"universal", "dints", "segresnet", "segresnet2d", "swinunetr"}.
num_trials – the number of HPO trials/experiments to run.
range_num_images_per_batch – the range of number of images per mini-batch.
range_num_sw_batch_size – the range of batch size in sliding-window inferer.

get_data_src()[source]#: Get the data source filename

get_data_stats()[source]#: Get the filename of the data stats

get_history()[source]#

Get the history of the bundleAlgo object with their names/identifiers

Return type:: list

get_mlflow_experiment_name()[source]#: Get the experiment name for MLflow server

get_mlflow_tracking_uri()[source]#: Get the tracking URI for MLflow server

set_data_src(data_src_cfg_name)[source]#

Set the data source filename

Parameters:: data_src_cfg_name – filename of data_source file

set_data_stats(data_stats_filename)[source]#

Set the data stats filename

Parameters:: data_stats_filename (str) – filename of datastats
Return type:: None

set_mlflow_experiment_name(mlflow_experiment_name)[source]#

Set the experiment name for MLflow server

Parameters:: mlflow_experiment_name – a string to specify the experiment name for MLflow server.

set_mlflow_tracking_uri(mlflow_tracking_uri)[source]#

Set the tracking URI for MLflow server

Parameters:: mlflow_tracking_uri – a tracking URI for MLflow server which could be local directory or address of the remote tracking Server; MLflow runs will be recorded locally in algorithms’ model folder if the value is None.

class monai.apps.auto3dseg.DataAnalyzer(datalist, dataroot='', output_path='./datastats.yaml', average=True, do_ccp=False, device='cuda', worker=4, image_key='image', label_key='label', hist_bins=0, hist_range=None, fmt='yaml', histogram_only=False, **extra_params)[source]#

The DataAnalyzer automatically analyzes given medical image dataset and reports the statistics. The module expects file paths to the image data and utilizes the LoadImaged transform to read the files, which supports nii, nii.gz, png, jpg, bmp, npz, npy, and dcm formats. Currently, only segmentation task is supported, so the user needs to provide paths to the image and label files (if have). Also, label data format is preferred to be (1,H,W,D), with the label index in the first dimension. If it is in onehot format, it will be converted to the preferred format.

Parameters:

datalist – a Python dictionary storing group, fold, and other information of the medical image dataset, or a string to the JSON file storing the dictionary.
dataroot – user’s local directory containing the datasets.
output_path – path to save the analysis result.
average – whether to average the statistical value across different image modalities.
do_ccp – apply the connected component algorithm to process the labels/images
device – a string specifying hardware (CUDA/CPU) utilized for the operations.
worker – number of workers to use for loading datasets in each GPU/CPU sub-process.
image_key – a string that user specify for the image. The DataAnalyzer will look it up in the datalist to locate the image files of the dataset.
label_key – a string that user specify for the label. The DataAnalyzer will look it up in the datalist to locate the label files of the dataset. If label_key is NoneType or “None”, the DataAnalyzer will skip looking for labels and all label-related operations.
hist_bins – bins to compute histogram for each image channel.
hist_range – ranges to compute histogram for each image channel.
fmt – format used to save the analysis results. Currently support "json" and "yaml", defaults to “yaml”.
histogram_only – whether to only compute histograms. Defaults to False.
extra_params – other optional arguments. Currently supported arguments are : ‘allowed_shape_difference’ (default 5) can be used to change the default tolerance of the allowed shape differences between the image and label items. In case of shape mismatch below the tolerance, the label image will be resized to match the image using nearest interpolation.

Examples

from monai.apps.auto3dseg.data_analyzer import DataAnalyzer

datalist = {
    "testing": [{"image": "image_003.nii.gz"}],
    "training": [
        {"fold": 0, "image": "image_001.nii.gz", "label": "label_001.nii.gz"},
        {"fold": 0, "image": "image_002.nii.gz", "label": "label_002.nii.gz"},
        {"fold": 1, "image": "image_001.nii.gz", "label": "label_001.nii.gz"},
        {"fold": 1, "image": "image_004.nii.gz", "label": "label_004.nii.gz"},
    ],
}

dataroot = '/datasets' # the directory where you have the image files (nii.gz)
DataAnalyzer(datalist, dataroot)

Notes

The module can also be called from the command line interface (CLI).

For example:

python -m monai.apps.auto3dseg \
    DataAnalyzer \
    get_all_case_stats \
    --datalist="my_datalist.json" \
    --dataroot="my_dataroot_dir"

get_all_case_stats(key='training', transform_list=None)[source]#

Get all case stats. Caller of the DataAnalyser class. The function initiates multiple GPU or CPU processes of the internal _get_all_case_stats functions, which iterates datalist and call SegSummarizer to generate stats for each case. After all case stats are generated, SegSummarizer is called to combine results.

Parameters:

key – dataset key
transform_list – option list of transforms before SegSummarizer

Returns:

A data statistics dictionary containing: ”stats_summary” (summary statistics of the entire datasets). Within stats_summary there are “image_stats” (summarizing info of shape, channel, spacing, and etc using operations_summary), “image_foreground_stats” (info of the intensity for the non-zero labeled voxels), and “label_stats” (info of the labels, pixel percentage, image_intensity, and each individual label in a list) “stats_by_cases” (List type value. Each element of the list is statistics of an image-label info. Within each element, there are: “image” (value is the path to an image), “label” (value is the path to the corresponding label), “image_stats” (summarizing info of shape, channel, spacing, and etc using operations), “image_foreground_stats” (similar to the previous one but one foreground image), and “label_stats” (stats of the individual labels )

Notes

Since the backend of the statistics computation are torch/numpy, nan/inf value may be generated and carried over in the computation. In such cases, the output dictionary will include .nan/.inf in the statistics.

class monai.apps.auto3dseg.EnsembleRunner(data_src_cfg_name, work_dir='./work_dir', num_fold=5, ensemble_method_name='AlgoEnsembleBestByFold', mgpu=True, **kwargs)[source]#

The Runner for ensembler. It ensembles predictions and saves them to the disk with a support of using multi-GPU.

Parameters:

data_src_cfg_name (str) – filename of the data source.
work_dir (str) – working directory to save the intermediate and final results. Default is ./work_dir.
num_fold (int) – number of fold. Default is 5.
ensemble_method_name (str) – method to ensemble predictions from different model. Default is AlgoEnsembleBestByFold. Supported methods: [“AlgoEnsembleBestN”, “AlgoEnsembleBestByFold”].
mgpu (bool) – if using multi-gpu. Default is True.
kwargs (Any) – additional image writing, ensembling parameters and prediction parameters for the ensemble inference. - for image saving, please check the supported parameters in SaveImage transform. - for prediction parameters, please check the supported parameters in the AlgoEnsemble callables. - for ensemble parameters, please check the documentation of the selected AlgoEnsemble callable.

Example

ensemble_runner = EnsembleRunner(data_src_cfg_name,
                                 work_dir,
                                 ensemble_method_name,
                                 mgpu=device_setting['n_devices']>1,
                                 **kwargs,
                                 **pred_params)
ensemble_runner.run(device_setting)

run(device_setting=None)[source]#

Parameters:: device_setting – device related settings, should follow the device_setting in auto_runner.set_device_info. ‘CUDA_VISIBLE_DEVICES’ should be a string e.g. ‘0,1,2,3’

set_ensemble_method(ensemble_method_name='AlgoEnsembleBestByFold', **kwargs)[source]#

Set the bundle ensemble method

Parameters:

ensemble_method_name (str) – the name of the ensemble method. Only two methods are supported “AlgoEnsembleBestN” and “AlgoEnsembleBestByFold”.
kwargs (Any) – the keyword arguments used to define the ensemble method. Currently only n_best for AlgoEnsembleBestN is supported.

Return type:

None

set_image_save_transform(**kwargs)[source]#

Set the ensemble output transform.

Parameters:: kwargs (Any) – image writing parameters for the ensemble inference. The kwargs format follows SaveImage transform. For more information, check https://docs.monai.io/en/stable/transforms.html#saveimage .
Return type:: None

set_num_fold(num_fold=5)[source]#

Set the number of cross validation folds for all algos.

Parameters:: num_fold (int) – a positive integer to define the number of folds.
Return type:: None

class monai.apps.auto3dseg.NNIGen(algo=None, params=None)[source]#

Generate algorithms for the NNI to automate hyperparameter tuning. The module has two major interfaces: __init__ which prints out how to set up the NNI, and a trialCommand function run_algo for the NNI library to start the trial of the algo. More about trialCommand function can be found in trail code section in NNI webpage https://nni.readthedocs.io/en/latest/tutorials/hpo_quickstart_pytorch/main.html .

Parameters:

algo – an Algo object (e.g. BundleAlgo) with defined methods: get_output_path and train and supports saving to and loading from pickle files via algo_from_pickle and algo_to_pickle.
params – a set of parameter to override the algo if override is supported by Algo subclass.

Examples:

The experiment will keep generating new folders to save the model checkpoints, scripts, and configs if available.
├── algorithm_templates
│   └── unet
├── unet_0
│   ├── algo_object.pkl
│   ├── configs
│   └── scripts
├── unet_0_learning_rate_0.01
│   ├── algo_object.pkl
│   ├── configs
│   ├── model_fold0
│   └── scripts
└── unet_0_learning_rate_0.1
    ├── algo_object.pkl
    ├── configs
    ├── model_fold0
    └── scripts

.. code-block:: python
    # Bundle Algorithms are already generated by BundleGen in work_dir
    import_bundle_algo_history(work_dir, only_trained=False)
    algo_dict = self.history[0]  # pick the first algorithm
    algo_name = algo_dict[AlgoKeys.ID]
    onealgo = algo_dict[AlgoKeys.ALGO]
    nni_gen = NNIGen(algo=onealgo)
    nni_gen.print_bundle_algo_instruction()

Notes

The NNIGen will prepare the algorithms in a folder and suggest a command to replace trialCommand in the experiment config. However, NNIGen will not trigger NNI. User needs to write their NNI experiment configs, and then run the NNI command manually.

generate(output_folder='.')[source]#

Generate the record for each Algo. If it is a BundleAlgo, it will generate the config files.

Parameters:: output_folder (str) – the directory nni will save the results to.
Return type:: None

get_hyperparameters()[source]#: Get parameter for next round of training from NNI server.

get_obj_filename()[source]#: Return the filename of the dumped pickle algo object.

get_task_id()[source]#: Get the identifier of the current experiment. In the format of listing the searching parameter name and values connected by underscore in the file name.

print_bundle_algo_instruction()[source]#: Print how to write the trial commands for Bundle Algo.

run_algo(obj_filename, output_folder='.', template_path=None)[source]#

The python interface for NNI to run.

Parameters:

obj_filename – the pickle-exported Algo object.
output_folder – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path: {algorithm_templates_dir}/{network}/scripts/algo.py

set_score(acc)[source]#: Report the acc to NNI server.

update_params(params)[source]#

Translate the parameter from monai bundle to meet NNI requirements.

Parameters:: params (dict) – a dict of parameters.
Return type:: None

class monai.apps.auto3dseg.OptunaGen(algo=None, params=None)[source]#

Generate algorithms for the Optuna to automate hyperparameter tuning. Please refer to NNI and Optuna (https://optuna.readthedocs.io/en/stable/) for more information. Optuna has different running scheme compared to NNI. The hyperparameter samples come from a trial object (trial.suggest…) created by Optuna, so OptunaGen needs to accept this trial object as input. Meanwhile, Optuna calls OptunaGen, thus OptunaGen.__call__() should return the accuracy. Use functools.partial to wrap OptunaGen for addition input arguments.

Parameters:

algo – an Algo object (e.g. BundleAlgo). The object must at least define two methods: get_output_path and train and supports saving to and loading from pickle files via algo_from_pickle and algo_to_pickle.
params – a set of parameter to override the algo if override is supported by Algo subclass.

Examples:

The experiment will keep generating new folders to save the model checkpoints, scripts, and configs if available.
├── algorithm_templates
│   └── unet
├── unet_0
│   ├── algo_object.pkl
│   ├── configs
│   └── scripts
├── unet_0_learning_rate_0.01
│   ├── algo_object.pkl
│   ├── configs
│   ├── model_fold0
│   └── scripts
└── unet_0_learning_rate_0.1
    ├── algo_object.pkl
    ├── configs
    ├── model_fold0
    └── scripts

Notes

Different from NNI and NNIGen, OptunaGen and Optuna can be ran within the Python process.

__call__(trial, obj_filename, output_folder='.', template_path=None)[source]#

Callable that Optuna will use to optimize the hyper-parameters

Parameters:

obj_filename – the pickle-exported Algo object.
output_folder – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path: {algorithm_templates_dir}/{network}/scripts/algo.py

generate(output_folder='.')[source]#

Generate the record for each Algo. If it is a BundleAlgo, it will generate the config files.

Parameters:: output_folder (str) – the directory nni will save the results to.
Return type:: None

get_hyperparameters()[source]#: Get parameter for next round of training from optuna trial object. This function requires user rewrite during usage for different search space.

get_obj_filename()[source]#: Return the dumped pickle object of algo.

get_task_id()[source]#: Get the identifier of the current experiment. In the format of listing the searching parameter name and values connected by underscore in the file name.

run_algo(obj_filename, output_folder='.', template_path=None)[source]#

The python interface for NNI to run.

Parameters:

obj_filename – the pickle-exported Algo object.
output_folder – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path: {algorithm_templates_dir}/{network}/scripts/algo.py

set_score(acc)[source]#: Set the accuracy score

set_trial(trial)[source]#: Set the Optuna trial

update_params(params)[source]#

Translate the parameter from monai bundle.

Parameters:: params (dict) – a dict of parameters.
Return type:: None

monai.apps.auto3dseg.export_bundle_algo_history(history)[source]#

Save all the BundleAlgo in the history to algo_object.pkl in each individual folder

Parameters:: history (list[dict[str, BundleAlgo]]) – a List of Bundle. Typically, the history can be obtained from BundleGen get_history method
Return type:: None

monai.apps.auto3dseg.get_name_from_algo_id(id)[source]#

Get the name of Algo from the identifier of the Algo.

Parameters:: id (str) – identifier which follows a convention of “name_fold_other”.
Return type:: str
Returns:: name of the Algo.

monai.apps.auto3dseg.import_bundle_algo_history(output_folder='.', template_path=None, only_trained=True)[source]#

import the history of the bundleAlgo objects as a list of algo dicts. each algo_dict has keys name (folder name), algo (bundleAlgo), is_trained (bool),

Parameters:

output_folder – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path: {algorithm_templates_dir}/{network}/scripts/algo.py.
only_trained – only read the algo history if the algo is trained.

nnUNet#

class monai.apps.nnunet.nnUNetV2Runner(input_config, trainer_class_name='nnUNetTrainer', work_dir='work_dir', export_validation_probabilities=True)[source]#

nnUNetV2Runner provides an interface in MONAI to use nnU-Net V2 library to analyze, train, and evaluate neural networks for medical image segmentation tasks. A version of nnunetv2 higher than 2.2 is needed for this class.

nnUNetV2Runner can be used in two ways:

with one line of code to execute the complete pipeline.
with a series of commands to run each modules in the pipeline.

The output of the interface is a directory that contains:

converted dataset met the requirement of nnU-Net V2
data analysis results
checkpoints from the trained U-Net models
validation accuracy in each fold of cross-validation
the predictions on the testing datasets from the final algorithm ensemble and potential post-processing

Parameters:

input_config (Any) – the configuration dictionary or the file path to the configuration in the form of YAML. The keys required in the configuration are: - "datalist": File path to the datalist for the train/testing splits - "dataroot": File path to the dataset - "modality": Imaging modality, e.g. “CT”, [“T2”, “ADC”] Currently, the configuration supports these optional keys: - "nnunet_raw": File path that will be written to env variable for nnU-Net - "nnunet_preprocessed": File path that will be written to env variable for nnU-Net - "nnunet_results": File path that will be written to env variable for nnU-Net - "nnUNet_trained_models" - "dataset_name_or_id": Name or Integer ID of the dataset If an optional key is not specified, then the pipeline will use the default values.
trainer_class_name (str) – the trainer class names offered by nnUNetV2 exhibit variations in training duration. Default: “nnUNetTrainer”. Other options: “nnUNetTrainer_Xepoch”. X could be one of 1,5,10,20,50,100, 250,2000,4000,8000.
export_validation_probabilities (bool) – True to save softmax predictions from final validation as npz files (in addition to predicted segmentations). Needed for finding the best ensemble. Default: True.
work_dir (str) – working directory to save the intermediate and final results.

Examples

Use the one-liner to start the nnU-Net workflow

python -m monai.apps.nnunet nnUNetV2Runner run --input_config ./input.yaml

Use convert_dataset to prepare the data to meet nnU-Net requirements, generate dataset JSON file,
and copy the dataset to a location specified by nnunet_raw in the input config file

python -m monai.apps.nnunet nnUNetV2Runner convert_dataset --input_config="./input.yaml"

convert_msd_dataset is an alternative option to prepare the data if the dataset is MSD.

python -m monai.apps.nnunet nnUNetV2Runner convert_msd_dataset \
    --input_config "./input.yaml" --data_dir "/path/to/Task09_Spleen"

experiment planning and data pre-processing

python -m monai.apps.nnunet nnUNetV2Runner plan_and_process --input_config "./input.yaml"

training all 20 models using all GPUs available.
“CUDA_VISIBLE_DEVICES” environment variable is not supported.

python -m monai.apps.nnunet nnUNetV2Runner train --input_config "./input.yaml"

training a single model on a single GPU for 5 epochs. Here config is used to specify the configuration.

python -m monai.apps.nnunet nnUNetV2Runner train_single_model --input_config "./input.yaml" \
    --config "3d_fullres" \
    --fold 0 \
    --gpu_id 0 \
    --trainer_class_name "nnUNetTrainer_5epochs" \
    --export_validation_probabilities True

training for all 20 models (4 configurations by 5 folds) on 2 GPUs

python -m monai.apps.nnunet nnUNetV2Runner train --input_config "./input.yaml" --gpu_id_for_all "0,1"

5-fold training for a single model on 2 GPUs. Here configs is used to specify the configurations.

python -m monai.apps.nnunet nnUNetV2Runner train --input_config "./input.yaml" \
    --configs "3d_fullres" \
    --trainer_class_name "nnUNetTrainer_5epochs" \
    --export_validation_probabilities True \
    --gpu_id_for_all "0,1"

find the best configuration

python -m monai.apps.nnunet nnUNetV2Runner find_best_configuration --input_config "./input.yaml"

predict, ensemble, and post-process

python -m monai.apps.nnunet nnUNetV2Runner predict_ensemble_postprocessing --input_config "./input.yaml"

convert_dataset()[source]#: Convert and make a copy the dataset to meet the requirements of nnU-Net workflow.

convert_msd_dataset(data_dir, overwrite_id=None, n_proc=-1)[source]#

Convert and make a copy the MSD dataset to meet requirements of nnU-Net workflow.

Parameters:

data_dir – downloaded and extracted MSD dataset folder. CANNOT be nnUNetv1 dataset! Example: “/workspace/downloads/Task05_Prostate”.
overwrite_id – Overwrite the dataset id. If not set then use the id of the MSD task (inferred from the folder name). Only use this if you already have an equivalently numbered dataset!
n_proc – Number of processes used.

extract_fingerprints(fpe='DatasetFingerprintExtractor', npfp=-1, verify_dataset_integrity=False, clean=False, verbose=False)[source]#

Extracts the dataset fingerprint used for experiment planning.

Parameters:

fpe (str) – [OPTIONAL] Name of the Dataset Fingerprint Extractor class that should be used. Default is “DatasetFingerprintExtractor”.
npfp (int) – [OPTIONAL] Number of processes used for fingerprint extraction.
verify_dataset_integrity (bool) – [RECOMMENDED] set this flag to check the dataset integrity. This is useful and should be done once for each dataset!
clean (bool) – [OPTIONAL] Set this flag to overwrite existing fingerprints. If this flag is not set and a fingerprint already exists, the fingerprint extractor will not run.
verbose (bool) – set this to print a lot of stuff. Useful for debugging. Will disable progress bar! Recommended for cluster environments.

Return type:

None

find_best_configuration(plans='nnUNetPlans', configs=(2d, 3d_fullres, 3d_lowres, 3d_cascade_fullres), trainers=None, allow_ensembling=True, num_processes=-1, overwrite=True, folds=(0, 1, 2, 3, 4), strict=False)[source]#

Find the best model configurations.

Parameters:

plans – list of plan identifiers. Default: nnUNetPlans.
configs – list of configurations. Default: [“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”].
trainers – list of trainers. Default: nnUNetTrainer.
allow_ensembling – set this flag to enable ensembling.
num_processes – number of processes to use for ensembling, postprocessing, etc.
overwrite – if set we will overwrite already ensembled files etc. May speed up consecutive runs of this command (not recommended) at the risk of not updating outdated results.
folds – folds to use. Default: (0, 1, 2, 3, 4).
strict – a switch that triggers RunTimeError if the logging folder cannot be found. Default: False.

plan_and_process(fpe='DatasetFingerprintExtractor', npfp=8, verify_dataset_integrity=False, no_pp=False, clean=False, pl='ExperimentPlanner', gpu_memory_target=8, preprocessor_name='DefaultPreprocessor', overwrite_target_spacing=None, overwrite_plans_name='nnUNetPlans', c=(2d, 3d_fullres, 3d_lowres), n_proc=(8, 8, 8), verbose=False)[source]#

Performs experiment planning and preprocessing before the training.

Parameters:

fpe (str) – [OPTIONAL] Name of the Dataset Fingerprint Extractor class that should be used. Default is “DatasetFingerprintExtractor”.
npfp (int) – [OPTIONAL] Number of processes used for fingerprint extraction. Default: 8.
verify_dataset_integrity (bool) – [RECOMMENDED] set this flag to check the dataset integrity. This is useful and should be done once for each dataset!
no_pp (bool) – [OPTIONAL] Set this to only run fingerprint extraction and experiment planning (no preprocessing). Useful for debugging.
clean (bool) – [OPTIONAL] Set this flag to overwrite existing fingerprints. If this flag is not set and a fingerprint already exists, the fingerprint extractor will not run. REQUIRED IF YOU CHANGE THE DATASET FINGERPRINT EXTRACTOR OR MAKE CHANGES TO THE DATASET!
pl (str) – [OPTIONAL] Name of the Experiment Planner class that should be used. Default is “ExperimentPlanner”. Note: There is no longer a distinction between 2d and 3d planner. It’s an all-in-one solution now.
gpu_memory_target (int) – [OPTIONAL] DANGER ZONE! Sets a custom GPU memory target. Default: 8 [GB]. Changing this will affect patch and batch size and will definitely affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline).
preprocessor_name (str) – [OPTIONAL] DANGER ZONE! Sets a custom preprocessor class. This class must be located in nnunetv2.preprocessing. Default: “DefaultPreprocessor”. Changing this may affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline).
overwrite_target_spacing (Optional[Any]) – [OPTIONAL] DANGER ZONE! Sets a custom target spacing for the 3d_fullres and 3d_cascade_fullres configurations. Default: None [no changes]. Changing this will affect image size and potentially patch and batch size. This will definitely affect your models performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline). Changing the target spacing for the other configurations is currently not implemented. New target spacing must be a list of three numbers!
overwrite_plans_name (str) – [OPTIONAL] USE A CUSTOM PLANS IDENTIFIER. If you used -gpu_memory_target, -preprocessor_name or -overwrite_target_spacing it is best practice to use -overwrite_plans_name to generate a differently named plans file such that the nnunet default plans are not overwritten. You will then need to specify your custom plans file with -p whenever running other nnunet commands (training, inference, etc)
c (tuple) – [OPTIONAL] Configurations for which the preprocessing should be run. Default: 2d 3f_fullres 3d_lowres. 3d_cascade_fullres does not need to be specified because it uses the data from 3f_fullres. Configurations that do not exist for some datasets will be skipped.
n_proc (tuple) – [OPTIONAL] Use this to define how many processes are to be used. If this is just one number then this number of processes is used for all configurations specified with -c. If it’s a list of numbers this list must have as many elements as there are configurations. We then iterate over zip(configs, num_processes) to determine the number of processes used for each configuration. More processes are always faster (up to the number of threads your PC can support, so 8 for a 4-core CPU with hyperthreading. If you don’t know what that is then don’t touch it, or at least don’t increase it!). DANGER: More often than not the number of processes that can be used is limited by the amount of RAM available. Image resampling takes up a lot of RAM. MONITOR RAM USAGE AND DECREASE -n_proc IF YOUR RAM FILLS UP TOO MUCH! Default: 8 4 8 (=8 processes for 2d, 4 for 3d_fullres and 8 for 3d_lowres if -c is at its default).
verbose (bool) – Set this to print a lot of stuff. Useful for debugging. Will disable progress bar! (Recommended for cluster environments).

Return type:

None

plan_experiments(pl='ExperimentPlanner', gpu_memory_target=8, preprocessor_name='DefaultPreprocessor', overwrite_target_spacing=None, overwrite_plans_name='nnUNetPlans')[source]#

Generate a configuration file that specifies the details of the experiment.

Parameters:

pl (str) – [OPTIONAL] Name of the Experiment Planner class that should be used. Default is “ExperimentPlanner”. Note: There is no longer a distinction between 2d and 3d planner. It’s an all-in-one solution now.
gpu_memory_target (float) – [OPTIONAL] DANGER ZONE! Sets a custom GPU memory target. Default: 8 [GB]. Changing this will affect patch and batch size and will definitely affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline).
preprocessor_name (str) – [OPTIONAL] DANGER ZONE! Sets a custom preprocessor class. This class must be located in nnunetv2.preprocessing. Default: “DefaultPreprocessor”. Changing this may affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline).
overwrite_target_spacing (Optional[Any]) – [OPTIONAL] DANGER ZONE! Sets a custom target spacing for the 3d_fullres and 3d_cascade_fullres configurations. Default: None [no changes]. Changing this will affect image size and potentially patch and batch size. This will definitely affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline). Changing the target spacing for the other configurations is currently not implemented. New target spacing must be a list of three numbers!
overwrite_plans_name (str) – [OPTIONAL] DANGER ZONE! If you used -gpu_memory_target, -preprocessor_name or -overwrite_target_spacing it is best practice to use -overwrite_plans_name to generate a differently named plans file such that the nnunet default plans are not overwritten. You will then need to specify your custom plan.

Return type:

None

predict(list_of_lists_or_source_folder, output_folder, model_training_output_dir, use_folds=None, tile_step_size=0.5, use_gaussian=True, use_mirroring=True, perform_everything_on_gpu=True, verbose=True, save_probabilities=False, overwrite=True, checkpoint_name='checkpoint_final.pth', folder_with_segs_from_prev_stage=None, num_parts=1, part_id=0, num_processes_preprocessing=-1, num_processes_segmentation_export=-1, gpu_id=0)[source]#

Use this to run inference with nnU-Net. This function is used when you want to manually specify a folder containing: a trained nnU-Net model. This is useful when the nnunet environment variables (nnUNet_results) are not set.

Parameters:

list_of_lists_or_source_folder – input folder. Remember to use the correct channel numberings for your files (_0000 etc). File endings must be the same as the training dataset!
output_folder – Output folder. If it does not exist it will be created. Predicted segmentations will have the same name as their source images.
model_training_output_dir – folder in which the trained model is. Must have subfolders fold_X for the different folds you trained.
use_folds – specify the folds of the trained model that should be used for prediction Default: (0, 1, 2, 3, 4).
tile_step_size – step size for sliding window prediction. The larger it is the faster but less accurate the prediction. Default: 0.5. Cannot be larger than 1. We recommend the default.
use_gaussian – use Gaussian smoothing as test-time augmentation.
use_mirroring – use mirroring/flipping as test-time augmentation.
verbose – set this if you like being talked to. You will have to be a good listener/reader.
save_probabilities – set this to export predicted class “probabilities”. Required if you want to ensemble multiple configurations.
overwrite – overwrite an existing previous prediction (will not overwrite existing files)
checkpoint_name – name of the checkpoint you want to use. Default: checkpoint_final.pth.
folder_with_segs_from_prev_stage – folder containing the predictions of the previous stage. Required for cascaded models.
num_parts – number of separate nnUNetv2_predict call that you will be making. Default: 1 (= this one call predicts everything).
part_id – if multiple nnUNetv2_predict exist, which one is this? IDs start with 0 can end with num_parts - 1. So when you submit 5 nnUNetv2_predict calls you need to set -num_parts 5 and use -part_id 0, 1, 2, 3 and 4.
num_processes_preprocessing – out-of-RAM issues.
num_processes_segmentation_export – Number of processes used for segmentation export. More is not always better. Beware of out-of-RAM issues.
gpu_id – which GPU to use for prediction.

predict_ensemble_postprocessing(folds=(0, 1, 2, 3, 4), run_ensemble=True, run_predict=True, run_postprocessing=True, **kwargs)[source]#

Run prediction, ensemble, and/or postprocessing optionally.

Parameters:

folds (tuple) – which folds to use
run_ensemble (bool) – whether to run ensembling.
run_predict (bool) – whether to predict using trained checkpoints
run_postprocessing (bool) – whether to conduct post-processing
kwargs (Any) – this optional parameter allows you to specify additional arguments defined in the predict method.

Return type:

None

preprocess(c=(2d, 3d_fullres, 3d_lowres), n_proc=(8, 8, 8), overwrite_plans_name='nnUNetPlans', verbose=False)[source]#

Apply a set of preprocessing operations to the input data before the training.

Parameters:

overwrite_plans_name (str) – [OPTIONAL] You can use this to specify a custom plans file that you may have generated.
c (tuple) – [OPTIONAL] Configurations for which the preprocessing should be run. Default: 2d 3f_fullres 3d_lowres. 3d_cascade_fullres does not need to be specified because it uses the data from 3f_fullres. Configurations that do not exist for some datasets will be skipped).
n_proc (tuple) – [OPTIONAL] Use this to define how many processes are to be used. If this is just one number then this number of processes is used for all configurations specified with -c. If it’s a list of numbers this list must have as many elements as there are configurations. We then iterate over zip(configs, num_processes) to determine the number of processes used for each configuration. More processes are always faster (up to the number of threads your PC can support, so 8 for a 4-core CPU with hyperthreading. If you don’t know what that is then don’t touch it, or at least don’t increase it!). DANGER: More often than not the number of processes that can be used is limited by the amount of RAM available. Image resampling takes up a lot of RAM. MONITOR RAM USAGE AND DECREASE -n_proc IF YOUR RAM FILLS UP TOO MUCH! Default: 8 4 8 (=8 processes for 2d, 4 for 3d_fullres and 8 for 3d_lowres if -c is at its default).
verbose (bool) – Set this to print a lot of stuff. Useful for debugging. Will disable the progress bar! Recommended for cluster environments.

Return type:

None

run(run_convert_dataset=True, run_plan_and_process=True, run_train=True, run_find_best_configuration=True, run_predict_ensemble_postprocessing=True)[source]#

Run the nnU-Net pipeline.

Parameters:

run_convert_dataset (bool) – whether to convert datasets, defaults to True.
run_plan_and_process (bool) – whether to preprocess and analyze the dataset, defaults to True.
run_train (bool) – whether to train models, defaults to True.
run_find_best_configuration (bool) – whether to find the best model (ensemble) configurations, defaults to True.
run_predict_ensemble_postprocessing (bool) – whether to make predictions on test datasets, defaults to True.

Return type:

None

train(configs=(3d_fullres, 2d, 3d_lowres, 3d_cascade_fullres), gpu_id_for_all=None, **kwargs)[source]#

Run the training for all the models specified by the configurations. Note: to set the number of GPUs to use, use gpu_id_for_all instead of the CUDA_VISIBLE_DEVICES environment variable.

Parameters:

configs – configurations that should be trained. Default: (“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”).
gpu_id_for_all – a tuple/list/integer of GPU device ID(s) to use for the training. Default: None (all available GPUs).
kwargs – this optional parameter allows you to specify additional arguments defined in the train_single_model method.

train_parallel(configs=(3d_fullres, 2d, 3d_lowres, 3d_cascade_fullres), gpu_id_for_all=None, **kwargs)[source]#

Create the line command for subprocess call for parallel training. Note: to set the number of GPUs to use, use gpu_id_for_all instead of the CUDA_VISIBLE_DEVICES environment variable.

Parameters:

configs – configurations that should be trained. default: (“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”).
gpu_id_for_all – a tuple/list/integer of GPU device ID(s) to use for the training. Default: None (all available GPUs).
kwargs – this optional parameter allows you to specify additional arguments defined in the train_single_model method.

train_parallel_cmd(configs=(3d_fullres, 2d, 3d_lowres, 3d_cascade_fullres), gpu_id_for_all=None, **kwargs)[source]#

Create the line command for subprocess call for parallel training.

Parameters:

configs – configurations that should be trained. Default: (“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”).
gpu_id_for_all – a tuple/list/integer of GPU device ID(s) to use for the training. Default: None (all available GPUs).
kwargs – this optional parameter allows you to specify additional arguments defined in the train_single_model method.

train_single_model(config, fold, gpu_id=0, **kwargs)[source]#

Run the training on a single GPU with one specified configuration provided. Note: this will override the environment variable CUDA_VISIBLE_DEVICES.

Parameters:

config – configuration that should be trained. Examples: “2d”, “3d_fullres”, “3d_lowres”.
fold – fold of the 5-fold cross-validation. Should be an int between 0 and 4.
gpu_id – an integer to select the device to use, or a tuple/list of GPU device indices used for multi-GPU training (e.g., (0,1)). Default: 0.
kwargs –
this optional parameter allows you to specify additional arguments in nnunetv2.run.run_training.run_training_entry.

Currently supported args are:
- p: custom plans identifier. Default: “nnUNetPlans”.
- pretrained_weights: path to nnU-Net checkpoint file to be used as pretrained model. Will only be
  used when actually training. Beta. Use with caution. Default: False.
- use_compressed: True to use compressed data for training. Reading compressed data is much
  more CPU and (potentially) RAM intensive and should only be used if you know what you are doing. Default: False.
- c: continue training from latest checkpoint. Default: False.
- val: True to run the validation only. Requires training to have finished.
  Default: False.
- disable_checkpointing: True to disable checkpointing. Ideal for testing things out and you
  don’t want to flood your hard drive with checkpoints. Default: False.

validate(configs=(3d_fullres, 2d, 3d_lowres, 3d_cascade_fullres), **kwargs)[source]#

Perform validation in all models defined by the configurations over 5 folds.

Parameters:

configs (tuple) – configurations that should be trained. default: (“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”).
kwargs (Any) – this optional parameter allows you to specify additional arguments defined in the train_single_model method.

Return type:

None

validate_single_model(config, fold, **kwargs)[source]#

Perform validation on single model.

Parameters:

config (str) – configuration that should be trained.
fold (int) – fold of the 5-fold cross-validation. Should be an int between 0 and 4.
kwargs (Any) – this optional parameter allows you to specify additional arguments defined in the train_single_model method.

Return type:

None