Applications#

Datasets#

class monai.apps.MedNISTDataset(root_dir, section, transform=(), download=False, seed=0, val_frac=0.1, test_frac=0.1, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#

The Dataset to automatically download MedNIST data and generate items for training, validation or test. It’s based on CacheDataset to accelerate the training process.

Parameters

root_dir (Union[str, PathLike]) – target directory to download and load MedNIST dataset.
section (str) – expected data section, can be: training, validation or test.
transform (Union[Sequence[Callable], Callable]) – transforms to execute operations on input data.
download (bool) – whether to download and extract the MedNIST from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy MedNIST.tar.gz file or MedNIST folder to root directory.
seed (int) – random seed to randomly split training, validation and test datasets, default is 0.
val_frac (float) – percentage of validation fraction in the whole dataset, default is 0.1.
test_frac (float) – percentage of test fraction in the whole dataset, default is 0.1.
cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers (Optional[int]) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress (bool) – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.

Raises

ValueError – When root_dir is not a directory.
RuntimeError – When dataset_dir doesn’t exist and downloading is not selected (download=False).

get_num_classes()[source]#

Get number of classes.

Return type: int

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: None

class monai.apps.DecathlonDataset(root_dir, task, section, transform=(), download=False, seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#

The Dataset to automatically download the data of Medical Segmentation Decathlon challenge (http://medicaldecathlon.com/) and generate items for training, validation or test. It will also load these properties from the JSON config file of dataset. user can call get_properties() to get specified properties or all the properties loaded. It’s based on monai.data.CacheDataset to accelerate the training process.

Parameters

root_dir (Union[str, PathLike]) – user’s local directory for caching and loading the MSD datasets.
task (str) – which task to download and execute: one of list (“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”).
section (str) – expected data section, can be: training, validation or test.
transform (Union[Sequence[Callable], Callable]) – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D].
download (bool) – whether to download and extract the Decathlon from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.
val_frac (float) – percentage of validation fraction in the whole dataset, default is 0.2.
seed (int) – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.
cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers (int) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress (bool) – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.

Raises

ValueError – When root_dir is not a directory.
ValueError – When task is not one of [“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”].
RuntimeError – When dataset_dir doesn’t exist and downloading is not selected (download=False).

Example:

transform = Compose(
    [
        LoadImaged(keys=["image", "label"]),
        EnsureChannelFirstd(keys=["image", "label"]),
        ScaleIntensityd(keys="image"),
        ToTensord(keys=["image", "label"]),
    ]
)

val_data = DecathlonDataset(
    root_dir="./", task="Task09_Spleen", transform=transform, section="validation", seed=12345, download=True
)

print(val_data[0]["image"], val_data[0]["label"])

get_indices()[source]#

Get the indices of datalist used in this dataset.

Return type: ndarray

get_properties(keys=None)[source]#: Get the loaded properties of dataset with specified keys. If no keys specified, return all the loaded properties.

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: None

class monai.apps.TciaDataset(root_dir, collection, section, transform=(), download=False, download_len=-1, seg_type='SEG', modality_tag=(8, 96), ref_series_uid_tag=(32, 14), ref_sop_uid_tag=(8, 4437), specific_tags=((8, 4373), (8, 4416), (12294, 16), (32, 13), (16, 16), (16, 32), (32, 17), (32, 18)), seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=0.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#

The Dataset to automatically download the data from a public The Cancer Imaging Archive (TCIA) dataset and generate items for training, validation or test.

The Highdicom library is used to load dicom data with modality “SEG”, but only a part of collections are supported, such as: “C4KC-KiTS”, “NSCLC-Radiomics”, “NSCLC-Radiomics-Interobserver1”, ” QIN-PROSTATE-Repeatability” and “PROSTATEx”. Therefore, if “seg” is included in keys of the LoadImaged transform and loading some other collections, errors may be raised. For supported collections, the original “SEG” information may not always be consistent for each dicom file. Therefore, to avoid creating different format of labels, please use the label_dict argument of PydicomReader when calling the LoadImaged transform. The prepared label dicts of collections that are mentioned above is also saved in: monai.apps.tcia.TCIA_LABEL_DICT. You can also refer to the second example bellow.

This class is based on monai.data.CacheDataset to accelerate the training process.

Parameters

root_dir (Union[str, PathLike]) – user’s local directory for caching and loading the TCIA dataset.
collection (str) – name of a TCIA collection. a TCIA dataset is defined as a collection. Please check the following list to browse the collection list (only public collections can be downloaded): https://www.cancerimagingarchive.net/collections/
section (str) – expected data section, can be: training, validation or test.
transform (Union[Sequence[Callable], Callable]) – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D]. If not specified, LoadImaged(reader=”PydicomReader”, keys=[“image”]) will be used as the default transform. In addition, we suggest to set the argument labels for PydicomReader if segmentations are needed to be loaded. The original labels for each dicom series may be different, using this argument is able to unify the format of labels.
download (bool) – whether to download and extract the dataset, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.
download_len (int) – number of series that will be downloaded, the value should be larger than 0 or -1, where -1 means all series will be downloaded. Default is -1.
seg_type (str) – modality type of segmentation that is used to do the first step download. Default is “SEG”.
modality_tag (Tuple) – tag of modality. Default is (0x0008, 0x0060).
ref_series_uid_tag (Tuple) – tag of referenced Series Instance UID. Default is (0x0020, 0x000e).
ref_sop_uid_tag (Tuple) – tag of referenced SOP Instance UID. Default is (0x0008, 0x1155).
specific_tags (Tuple) – tags that will be loaded for “SEG” series. This argument will be used in monai.data.PydicomReader. Default is [(0x0008, 0x1115), (0x0008,0x1140), (0x3006, 0x0010), (0x0020,0x000D), (0x0010,0x0010), (0x0010,0x0020), (0x0020,0x0011), (0x0020,0x0012)].
val_frac (float) – percentage of validation fraction in the whole dataset, default is 0.2.
seed (int) – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.
cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate (float) – percentage of cached data in total, default is 0.0 (no cache). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers (int) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress (bool) – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.

Example:

# collection is "Pancreatic-CT-CBCT-SEG", seg_type is "RTSTRUCT"
data = TciaDataset(
    root_dir="./", collection="Pancreatic-CT-CBCT-SEG", seg_type="RTSTRUCT", download=True
)

# collection is "C4KC-KiTS", seg_type is "SEG", and load both images and segmentations
from monai.apps.tcia import TCIA_LABEL_DICT
transform = Compose(
    [
        LoadImaged(reader="PydicomReader", keys=["image", "seg"], label_dict=TCIA_LABEL_DICT["C4KC-KiTS"]),
        EnsureChannelFirstd(keys=["image", "seg"]),
        ResampleToMatchd(keys="image", key_dst="seg"),
    ]
)
data = TciaDataset(
    root_dir="./", collection="C4KC-KiTS", section="validation", seed=12345, download=True
)

print(data[0]["seg"].shape)

get_indices()[source]#

Get the indices of datalist used in this dataset.

Return type: ndarray

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: None

class monai.apps.CrossValidation(dataset_cls, nfolds=5, seed=0, **dataset_params)[source]#

Cross validation dataset based on the general dataset which must have _split_datalist API.

Parameters

dataset_cls – dataset class to be used to create the cross validation partitions. It must have _split_datalist API.
nfolds (int) – number of folds to split the data for cross validation.
seed (int) – random seed to randomly shuffle the datalist before splitting into N folds, default is 0.
dataset_params – other additional parameters for the dataset_cls base class.

Example of 5 folds cross validation training:

cvdataset = CrossValidation(
    dataset_cls=DecathlonDataset,
    nfolds=5,
    seed=12345,
    root_dir="./",
    task="Task09_Spleen",
    section="training",
    transform=train_transform,
    download=True,
)
dataset_fold0_train = cvdataset.get_dataset(folds=[1, 2, 3, 4])
dataset_fold0_val = cvdataset.get_dataset(folds=0, transform=val_transform, download=False)
# execute training for fold 0 ...

dataset_fold1_train = cvdataset.get_dataset(folds=[0, 2, 3, 4])
dataset_fold1_val = cvdataset.get_dataset(folds=1, transform=val_transform, download=False)
# execute training for fold 1 ...

...

dataset_fold4_train = ...
# execute training for fold 4 ...

get_dataset(folds, **dataset_params)[source]#

Generate dataset based on the specified fold indices in the cross validation group.

Parameters

folds (Union[Sequence[int], int]) – index of folds for training or validation, if a list of values, concatenate the data.
dataset_params – other additional parameters for the dataset_cls base class, will override the same parameters in self.dataset_params.

Clara MMARs#

monai.apps.download_mmar(item, mmar_dir=None, progress=True, api=True, version=-1)[source]#

Download and extract Medical Model Archive (MMAR) from Nvidia Clara Train.

Utilities#

monai.apps.check_hash(filepath, val=None, hash_type='md5')[source]#

Verify hash signature of specified file.

Parameters

filepath (Union[str, PathLike]) – path of source file to verify hash value.
val (Optional[str]) – expected hash value of the file.
hash_type (str) – type of hash algorithm to use, default is “md5”. The supported hash types are “md5”, “sha1”, “sha256”, “sha512”. See also: monai.apps.utils.SUPPORTED_HASH_TYPES.

Return type

bool

monai.apps.download_url(url, filepath='', hash_val=None, hash_type='md5', progress=True, **gdown_kwargs)[source]#

Download file from specified URL link, support process bar and hash check.

Parameters

url (str) – source URL link to download file.
filepath (Union[str, PathLike]) – target filepath to save the downloaded file (including the filename). If undefined, os.path.basename(url) will be used.
hash_val (Optional[str]) – expected hash value to validate the downloaded file. if None, skip hash validation.
hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.
progress (bool) – whether to display a progress bar.
gdown_kwargs – other args for gdown except for the url, output and quiet. these args will only be used if download from google drive. details of the args of it: https://github.com/wkentaro/gdown/blob/main/gdown/download.py

Raises

RuntimeError – When the hash validation of the filepath existing file fails.
RuntimeError – When a network issue or denied permission prevents the file download from url to filepath.
URLError – See urllib.request.urlretrieve.
HTTPError – See urllib.request.urlretrieve.
ContentTooShortError – See urllib.request.urlretrieve.
IOError – See urllib.request.urlretrieve.
RuntimeError – When the hash validation of the url downloaded file fails.

Return type

None

monai.apps.extractall(filepath, output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True)[source]#

Extract file to the output directory. Expected file types are: zip, tar.gz and tar.

Parameters

filepath (Union[str, PathLike]) – the file path of compressed file.
output_dir (Union[str, PathLike]) – target directory to save extracted files.
hash_val (Optional[str]) – expected hash value to validate the compressed file. if None, skip hash validation.
hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.
file_type (str) – string of file type for decompressing. Leave it empty to infer the type from the filepath basename.
has_base (bool) – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.

Raises

RuntimeError – When the hash validation of the filepath compressed file fails.
NotImplementedError – When the filepath file extension is not one of [zip”, “tar.gz”, “tar”].

Return type

None

monai.apps.download_and_extract(url, filepath='', output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True, progress=True)[source]#

Download file from URL and extract it to the output directory.

Parameters

url (str) – source URL link to download file.
filepath (Union[str, PathLike]) – the file path of the downloaded compressed file. use this option to keep the directly downloaded compressed file, to avoid further repeated downloads.
output_dir (Union[str, PathLike]) – target directory to save extracted files. default is the current directory.
hash_val (Optional[str]) – expected hash value to validate the downloaded file. if None, skip hash validation.
hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.
file_type (str) – string of file type for decompressing. Leave it empty to infer the type from url’s base file name.
has_base (bool) – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.
progress (bool) – whether to display progress bar.

Return type

None

Deepgrow#

monai.apps.deepgrow.dataset.create_dataset(datalist, output_dir, dimension, pixdim, image_key='image', label_key='label', base_dir=None, limit=0, relative_path=False, transforms=None)[source]#

Utility to pre-process and create dataset list for Deepgrow training over on existing one. The input data list is normally a list of images and labels (3D volume) that needs pre-processing for Deepgrow training pipeline.

Parameters

datalist –
A list of data dictionary. Each entry should at least contain ‘image_key’: <image filename>. For example, typical input data can be a list of dictionaries:
```
[{'image': <image filename>, 'label': <label filename>}]
```
output_dir (str) – target directory to store the training data for Deepgrow Training
pixdim – output voxel spacing.
dimension (int) – dimension for Deepgrow training. It can be 2 or 3.
image_key (str) – image key in input datalist. Defaults to ‘image’.
label_key (str) – label key in input datalist. Defaults to ‘label’.
base_dir – base directory in case related path is used for the keys in datalist. Defaults to None.
limit (int) – limit number of inputs for pre-processing. Defaults to 0 (no limit).
relative_path (bool) – output keys values should be based on relative path. Defaults to False.
transforms – explicit transforms to execute operations on input data.

Raises

ValueError – When dimension is not one of [2, 3]
ValueError – When datalist is Empty

Return type

List[Dict]

Returns

A new datalist that contains path to the images/labels after pre-processing.

Example:

datalist = create_dataset(
    datalist=[{'image': 'img1.nii', 'label': 'label1.nii'}],
    base_dir=None,
    output_dir=output_2d,
    dimension=2,
    image_key='image',
    label_key='label',
    pixdim=(1.0, 1.0),
    limit=0,
    relative_path=True
)

print(datalist[0]["image"], datalist[0]["label"])

class monai.apps.deepgrow.interaction.Interaction(transforms, max_interactions, train, key_probability='probability')[source]#

Ignite process_function used to introduce interactions (simulation of clicks) for Deepgrow Training/Evaluation. For more details please refer to: https://pytorch.org/ignite/generated/ignite.engine.engine.Engine.html. This implementation is based on:

Sakinis et al., Interactive segmentation of medical images through fully convolutional neural networks. (2019) https://arxiv.org/abs/1903.08205

Parameters

transforms (Union[Sequence[Callable], Callable]) – execute additional transformation during every iteration (before train). Typically, several Tensor based transforms composed by Compose.
max_interactions (int) – maximum number of interactions per iteration
train (bool) – training or evaluation
key_probability (str) – field name to fill probability for every interaction

class monai.apps.deepgrow.transforms.AddInitialSeedPointd(label='label', guidance='guidance', sids='sids', sid='sid', connected_regions=5)[source]#

Add random guidance as initial seed point for a given label.

Note that the label is of size (C, D, H, W) or (C, H, W)

The guidance is of size (2, N, # of dims) where N is number of guidance added. # of dims = 4 when C, D, H, W; # of dims = 3 when (C, H, W)

Parameters

label (str) – label source.
guidance (str) – key to store guidance.
sids (str) – key that represents list of valid slice indices for the given label.
sid (str) – key that represents the slice to add initial seed point. If not present, random sid will be chosen.
connected_regions (int) – maximum connected regions to use for adding initial points.

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises: NotImplementedError – When the subclass does not override this method.

class monai.apps.deepgrow.transforms.AddGuidanceSignald(image='image', guidance='guidance', sigma=2, number_intensity_ch=1)[source]#

Add Guidance signal for input image.

Based on the “guidance” points, apply gaussian to them and add them as new channel for input image.

Parameters

image (str) – key to the image source.
guidance (str) – key to store guidance.
sigma (int) – standard deviation for Gaussian kernel.
number_intensity_ch (int) – channel index.

class monai.apps.deepgrow.transforms.AddRandomGuidanced(guidance='guidance', discrepancy='discrepancy', probability='probability')[source]#

Add random guidance based on discrepancies that were found between label and prediction. input shape is as below: Guidance is of shape (2, N, # of dim) Discrepancy is of shape (2, C, D, H, W) or (2, C, H, W) Probability is of shape (1)

Parameters

guidance (str) – key to guidance source.
discrepancy (str) – key that represents discrepancies found between label and prediction.
probability (str) – key that represents click/interaction probability.

randomize(data=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises: NotImplementedError – When the subclass does not override this method.

class monai.apps.deepgrow.transforms.AddGuidanceFromPointsd(ref_image, guidance='guidance', foreground='foreground', background='background', axis=0, depth_first=True, spatial_dims=2, slice_key='slice', meta_keys=None, meta_key_postfix='meta_dict', dimensions=None)[source]#

Add guidance based on user clicks.

We assume the input is loaded by LoadImaged and has the shape of (H, W, D) originally. Clicks always specify the coordinates in (H, W, D)

If depth_first is True:

Input is now of shape (D, H, W), will return guidance that specifies the coordinates in (D, H, W)

else:

Input is now of shape (H, W, D), will return guidance that specifies the coordinates in (H, W, D)

Parameters

ref_image – key to reference image to fetch current and original image details.
guidance (str) – output key to store guidance.
foreground (str) – key that represents user foreground (+ve) clicks.
background (str) – key that represents user background (-ve) clicks.
axis (int) – axis that represents slices in 3D volume. (axis to Depth)
depth_first (bool) – if depth (slices) is positioned at first dimension.
spatial_dims (int) – dimensions based on model used for deepgrow (2D vs 3D).
slice_key (str) – key that represents applicable slice to add guidance.
meta_keys (Optional[str]) – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.
meta_key_postfix (str) – if meta_key is None, use {ref_image}_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

Deprecated since version 0.6.0: dimensions is deprecated, use spatial_dims instead.

class monai.apps.deepgrow.transforms.SpatialCropForegroundd(keys, source_key, spatial_size, select_fn=<function is_positive>, channel_indices=None, margin=0, allow_smaller=True, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Crop only the foreground object of the expected images.

Difference VS monai.transforms.CropForegroundd:

If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.

This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]

The typical usage is to help training and evaluation if the valid part is small in the whole medical image. The valid part can be determined by any field in the data with source_key, for example:

Select values > 0 in image field as the foreground and crop on all fields specified by keys.
Select label = 3 in label field as the foreground to crop on all fields specified by keys.
Select label > 0 in the third channel of a One-Hot label field as the foreground to crop all keys fields.

Users can define arbitrary function to select expected foreground from the whole source image or specified channels. And it can also add margin to every dim of the bounding box of foreground object.

Parameters

keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed. See also: monai.transforms.MapTransform
source_key (str) – data source to generate the bounding box of foreground, can be image or label, etc.
spatial_size (Union[Sequence[int], ndarray]) – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.
select_fn (Callable) – function to select expected foreground, default is to select values > 0.
channel_indices (Union[Iterable[int], int, None]) – if defined, select foreground only on the specified channels of image. if None, select foreground on the whole image.
margin (int) – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
allow_smaller (bool) – when computing box size with margin, whether allow the image size to be smaller than box size, default to True. if the margined size is bigger than image size, will pad with specified mode.
meta_keys (Union[Collection[Hashable], Hashable, None]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_keys is None, use {key}_{meta_key_postfix} to fetch/store the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key (str) – key to record the start coordinate of spatial bounding box for foreground.
end_coord_key (str) – key to record the end coordinate of spatial bounding box for foreground.
original_shape_key (str) – key to record original shape for foreground.
cropped_shape_key (str) – key to record cropped shape for foreground.
allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.SpatialCropGuidanced(keys, guidance, spatial_size, margin=20, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Crop image based on guidance with minimal spatial size.

If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.
This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]

Input data is of shape (C, spatial_1, [spatial_2, …])

Parameters

keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed.
guidance (str) – key to the guidance. It is used to generate the bounding box of foreground
spatial_size – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.
margin – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
meta_keys (Union[Collection[Hashable], Hashable, None]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key (str) – key to record the start coordinate of spatial bounding box for foreground.
end_coord_key (str) – key to record the end coordinate of spatial bounding box for foreground.
original_shape_key (str) – key to record original shape for foreground.
cropped_shape_key (str) – key to record cropped shape for foreground.
allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.RestoreLabeld(keys, ref_image, slice_only=False, mode=InterpolateMode.NEAREST, align_corners=None, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Restores label based on the ref image.

The ref_image is assumed that it went through the following transforms:

Fetch2DSliced (If 2D)

Spacingd

SpatialCropGuidanced

Resized

And its shape is assumed to be (C, D, H, W)

This transform tries to undo these operation so that the result label can be overlapped with original volume. It does the following operation:

Undo Resized

Undo SpatialCropGuidanced

Undo Spacingd

Undo Fetch2DSliced

The resulting label is of shape (D, H, W)

Parameters

keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed.
ref_image (str) – reference image to fetch current and original image details
slice_only (bool) – apply only to an applicable slice, in case of 2D model/prediction
mode (Union[Sequence[Union[InterpolateMode, str]], InterpolateMode, str]) – {"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} One of the listed string values or a user supplied function for padding. Defaults to "constant". See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html
align_corners (Union[Sequence[Optional[bool]], bool, None]) – Geometrically, we consider the pixels of the input as squares rather than points. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html It also can be a sequence of bool, each element corresponds to a key in keys.
meta_keys (Optional[str]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix (str) – if meta_key is None, use key_{meta_key_postfix} to fetch the metadata according to the key data, default is `meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key (str) – key that records the start coordinate of spatial bounding box for foreground.
end_coord_key (str) – key that records the end coordinate of spatial bounding box for foreground.
original_shape_key (str) – key that records original shape for foreground.
cropped_shape_key (str) – key that records cropped shape for foreground.
allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.ResizeGuidanced(guidance, ref_image, meta_keys=None, meta_key_postfix='meta_dict', cropped_shape_key='foreground_cropped_shape')[source]#

Resize the guidance based on cropped vs resized image.

This transform assumes that the images have been cropped and resized. And the shape after cropped is store inside the meta dict of ref image.

Parameters

guidance (str) – key to guidance
ref_image (str) – key to reference image to fetch current and original image details
meta_keys (Optional[str]) – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.
meta_key_postfix (str) – if meta_key is None, use {ref_image}_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
cropped_shape_key (str) – key that records cropped shape for foreground.

class monai.apps.deepgrow.transforms.FindDiscrepancyRegionsd(label='label', pred='pred', discrepancy='discrepancy')[source]#

Find discrepancy between prediction and actual during click interactions during training.

Parameters

label (str) – key to label source.
pred (str) – key to prediction source.
discrepancy (str) – key to store discrepancies found between label and prediction.

class monai.apps.deepgrow.transforms.FindAllValidSlicesd(label='label', sids='sids')[source]#

Find/List all valid slices in the label. Label is assumed to be a 4D Volume with shape CDHW, where C=1.

Parameters

label (str) – key to the label source.
sids (str) – key to store slices indices having valid label map.

class monai.apps.deepgrow.transforms.Fetch2DSliced(keys, guidance='guidance', axis=0, meta_keys=None, meta_key_postfix='meta_dict', allow_missing_keys=False)[source]#

Fetch one slice in case of a 3D volume.

The volume only contains spatial coordinates.

Parameters

keys – keys of the corresponding items to be transformed.
guidance – key that represents guidance.
axis (int) – axis that represents slice in 3D volume.
meta_keys (Union[Collection[Hashable], Hashable, None]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix (str) – use key_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
allow_missing_keys (bool) – don’t raise exception if key is missing.

Pathology#

class monai.apps.pathology.data.PatchWSIDataset(data, region_size, grid_shape, patch_size, transform=None, image_reader_name='cuCIM', **kwargs)[source]#

This dataset reads whole slide images, extracts regions, and creates patches. It also reads labels for each patch and provides each patch with its associated class labels.

Parameters

data (List) – the list of input samples including image, location, and label (see the note below for more details).
region_size (Union[int, Tuple[int, int]]) – the size of regions to be extracted from the whole slide image.
grid_shape (Union[int, Tuple[int, int]]) – the grid shape on which the patches should be extracted.
patch_size (Union[int, Tuple[int, int]]) – the size of patches extracted from the region on the grid.
transform (Optional[Callable]) – transforms to be executed on input data.
image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.
kwargs – additional parameters for WSIReader

Note

The input data has the following form as an example: [{“image”: “path/to/image1.tiff”, “location”: [200, 500], “label”: [0,0,0,1]}].

This means from “image1.tiff” extract a region centered at the given location location with the size of region_size, and then extract patches with the size of patch_size from a grid with the shape of grid_shape. Be aware the grid_shape should construct a grid with the same number of element as labels, so for this example the grid_shape should be (2, 2).

class monai.apps.pathology.data.SmartCachePatchWSIDataset(data, region_size, grid_shape, patch_size, transform, image_reader_name='cuCIM', replace_rate=0.5, cache_num=9223372036854775807, cache_rate=1.0, num_init_workers=1, num_replace_workers=1, progress=True, copy_cache=True, as_contiguous=True, **kwargs)[source]#

Add SmartCache functionality to PatchWSIDataset.

Parameters

data (List) – the list of input samples including image, location, and label (see PatchWSIDataset for more details)
region_size (Union[int, Tuple[int, int]]) – the region to be extracted from the whole slide image.
grid_shape (Union[int, Tuple[int, int]]) – the grid shape on which the patches should be extracted.
patch_size (Union[int, Tuple[int, int]]) – the size of patches extracted from the region on the grid.
image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.
transform (Union[Sequence[Callable], Callable]) – transforms to be executed on input data.
replace_rate (float) – percentage of the cached items to be replaced in every epoch.
cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_init_workers (Optional[int]) – the number of worker threads to initialize the cache for first epoch. If num_init_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
num_replace_workers (Optional[int]) – the number of worker threads to prepare the replacement cache for every epoch. If num_replace_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress (bool) – whether to display a progress bar when caching for the first epoch.
copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cache content or every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
kwargs – additional parameters for WSIReader

class monai.apps.pathology.data.MaskedInferenceWSIDataset(data, patch_size, transform=None, image_reader_name='cuCIM', **kwargs)[source]#

This dataset load the provided foreground masks at an arbitrary resolution level, and extract patches based on that mask from the associated whole slide image.

Parameters

data (List[Dict[str, str]]) – a list of sample including the path to the whole slide image and the path to the mask. Like this: [{“image”: “path/to/image1.tiff”, “mask”: “path/to/mask1.npy}, …]”.
patch_size (Union[int, Tuple[int, int]]) – the size of patches to be extracted from the whole slide image for inference.
transform (Optional[Callable]) – transforms to be executed on extracted patches.
image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.
kwargs – additional parameters for WSIReader

Note

The resulting output (probability maps) after performing inference using this dataset is: supposed to be the same size as the foreground mask and not the original wsi image size.

class monai.apps.pathology.handlers.ProbMapProducer(output_dir='./', output_postfix='', dtype=<class 'numpy.float64'>, name=None)[source]#

Event handler triggered on completing every iteration to save the probability map

__init__(output_dir='./', output_postfix='', dtype=<class 'numpy.float64'>, name=None)[source]#

Parameters

output_dir (str) – output directory to save probability maps.
output_postfix (str) – a string appended to all output file names.
dtype (Union[dtype, type, str, None]) – the data type in which the probability map is stored. Default np.float64.
name (Optional[str]) – identifier of logging.logger to use, defaulting to engine.logger.

attach(engine)[source]#

Parameters: engine (Engine) – Ignite Engine, it can be a trainer, validator or evaluator.
Return type: None

save_prob_map(name)[source]#

This method save the probability map for an image, when its inference is finished, and delete that probability map from memory.

Parameters: name (str) – the name of image to be saved.
Return type: None

class monai.apps.pathology.metrics.LesionFROC(data, grow_distance=75, itc_diameter=200, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8), nms_sigma=0.0, nms_prob_threshold=0.5, nms_box_size=48, image_reader_name='cuCIM')[source]#

Evaluate with Free Response Operating Characteristic (FROC) score.

Parameters

data (List[Dict]) – either the list of dictionaries containing probability maps (inference result) and tumor mask (ground truth), as below, or the path to a json file containing such list. { “prob_map”: “path/to/prob_map_1.npy”, “tumor_mask”: “path/to/ground_truth_1.tiff”, “level”: 6, “pixel_spacing”: 0.243 }
grow_distance (int) – Euclidean distance (in micrometer) by which to grow the label the ground truth’s tumors. Defaults to 75, which is the equivalent size of 5 tumor cells.
itc_diameter (int) – the maximum diameter of a region (in micrometer) to be considered as an isolated tumor cell. Defaults to 200.
eval_thresholds (Tuple) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.
nms_sigma (float) – the standard deviation for gaussian filter of non-maximal suppression. Defaults to 0.0.
nms_prob_threshold (float) – the probability threshold of non-maximal suppression. Defaults to 0.5.
nms_box_size (int) – the box size (in pixel) to be removed around the pixel for non-maximal suppression.
image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.

Note

For more info on nms_* parameters look at monai.utils.prob_nms.ProbNMS`.

compute_fp_tp()[source]#: Compute false positive and true positive probabilities for tumor detection, by comparing the model outputs with the prepared ground truths for all samples

evaluate()[source]#: Evaluate the detection performance of a model based on the model probability map output, the ground truth tumor mask, and their associated metadata (e.g., pixel_spacing, level)

prepare_ground_truth(sample)[source]#: Prepare the ground truth for evaluation based on the binary tumor mask

prepare_inference_result(sample)[source]#: Prepare the probability map for detection evaluation.

monai.apps.pathology.utils.compute_multi_instance_mask(mask, threshold)[source]#

This method computes the segmentation mask according to the binary tumor mask.

Parameters

mask (ndarray) – the binary mask array
threshold (float) – the threshold to fill holes

monai.apps.pathology.utils.compute_isolated_tumor_cells(tumor_mask, threshold)[source]#

This method computes identifies Isolated Tumor Cells (ITC) and return their labels.

Parameters

tumor_mask (ndarray) – the tumor mask.
threshold (float) – the threshold (at the mask level) to define an isolated tumor cell (ITC). A region with the longest diameter less than this threshold is considered as an ITC.

Return type

List[int]

class monai.apps.pathology.utils.PathologyProbNMS(spatial_dims=2, sigma=0.0, prob_threshold=0.5, box_size=48)[source]#: This class extends monai.utils.ProbNMS and add the resolution option for Pathology.

class monai.apps.pathology.transforms.stain.array.ExtractHEStains(tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308))[source]#

Class to extract a target stain from an image, using stain deconvolution (see Note).

Parameters

tli (float) – transmitted light intensity. Defaults to 240.
alpha (float) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta (float) – absorbance threshold for transparent pixels. Defaults to 0.15
max_cref (Union[tuple, ndarray]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).

Note

For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:

MATLAB: https://github.com/mitkovetta/staining-normalization

Python: https://github.com/schaugf/HEnorm_python

class monai.apps.pathology.transforms.stain.array.NormalizeHEStains(tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308))[source]#

Class to normalize patches/images to a reference or target image stain (see Note).

Performs stain deconvolution of the source image using the ExtractHEStains class, to obtain the stain matrix and calculate the stain concentration matrix for the image. Then, performs the inverse Beer-Lambert transform to recreate the patch using the target H&E stain matrix provided. If no target stain provided, a default reference stain is used. Similarly, if no maximum stain concentrations are provided, a reference maximum stain concentrations matrix is used.

Parameters

tli (float) – transmitted light intensity. Defaults to 240.
alpha (float) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta (float) – absorbance threshold for transparent pixels. Defaults to 0.15.
target_he (Union[tuple, ndarray]) – target stain matrix. Defaults to ((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)).
max_cref (Union[tuple, ndarray]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to [1.9705, 1.0308].

Note

For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:

MATLAB: https://github.com/mitkovetta/staining-normalization

Python: https://github.com/schaugf/HEnorm_python

A collection of dictionary-based wrappers around the pathology transforms defined in monai.apps.pathology.transforms.array.

Class names are ended with ‘d’ to denote dictionary-based transforms.

class monai.apps.pathology.transforms.stain.dictionary.ExtractHEStainsd(keys, tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.ExtractHEStains. Class to extract a target stain from an image, using stain deconvolution.

Parameters

keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed. See also: monai.transforms.compose.MapTransform
tli (float) – transmitted light intensity. Defaults to 240.
alpha (float) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta (float) – absorbance threshold for transparent pixels. Defaults to 0.15
max_cref (Union[tuple, ndarray]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).
allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.stain.dictionary.NormalizeHEStainsd(keys, tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.NormalizeHEStains.

Class to normalize patches/images to a reference or target image stain.

Parameters

keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed. See also: monai.transforms.compose.MapTransform
tli (float) – transmitted light intensity. Defaults to 240.
alpha (float) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta (float) – absorbance threshold for transparent pixels. Defaults to 0.15.
target_he (Union[tuple, ndarray]) – target stain matrix. Defaults to None.
max_cref (Union[tuple, ndarray]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to None.
allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.spatial.array.SplitOnGrid(grid_size=(2, 2), patch_size=None)[source]#

Split the image into patches based on the provided grid shape. This transform works only with torch.Tensor inputs.

Parameters

grid_size (Union[int, Tuple[int, int]]) – a tuple or an integer define the shape of the grid upon which to extract patches. If it’s an integer, the value will be repeated for each dimension. Default is 2x2
patch_size (Union[int, Tuple[int, int], None]) – a tuple or an integer that defines the output patch sizes. If it’s an integer, the value will be repeated for each dimension. The default is (0, 0), where the patch size will be inferred from the grid shape.

Note: the shape of the input image is inferred based on the first image used.

class monai.apps.pathology.transforms.spatial.array.TileOnGrid(tile_count=None, tile_size=256, step=None, random_offset=False, pad_full=False, background_val=255, filter_mode='min')[source]#

Tile the 2D image into patches on a grid and maintain a subset of it. This transform works only with np.ndarray inputs for 2D images.

Parameters

tile_count (Optional[int]) – number of tiles to extract, if None extracts all non-background tiles Defaults to None.
tile_size (int) – size of the square tile Defaults to 256.
step (Optional[int]) – step size Defaults to None (same as tile_size)
random_offset (bool) – Randomize position of the grid, instead of starting from the top-left corner Defaults to False.
pad_full (bool) – pad image to the size evenly divisible by tile_size Defaults to False.
background_val (int) – the background constant (e.g. 255 for white background) Defaults to 255.
filter_mode (str) – mode must be in [“min”, “max”, “random”]. If total number of tiles is more than tile_size, then sort by intensity sum, and take the smallest (for min), largest (for max) or random (for random) subset Defaults to min (which assumes background is high value)

randomize(img_size)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: None

class monai.apps.pathology.transforms.spatial.dictionary.SplitOnGridd(keys, grid_size=(2, 2), patch_size=None, allow_missing_keys=False)[source]#

Split the image into patches based on the provided grid shape. This transform works only with torch.Tensor inputs.

Parameters

grid_size (Union[int, Tuple[int, int]]) – a tuple or an integer define the shape of the grid upon which to extract patches. If it’s an integer, the value will be repeated for each dimension. Default is 2x2
patch_size (Union[int, Tuple[int, int], None]) – a tuple or an integer that defines the output patch sizes. If it’s an integer, the value will be repeated for each dimension. The default is (0, 0), where the patch size will be inferred from the grid shape.

Note: the shape of the input image is inferred based on the first image used.

class monai.apps.pathology.transforms.spatial.dictionary.TileOnGridd(keys, tile_count=None, tile_size=256, step=None, random_offset=False, pad_full=False, background_val=255, filter_mode='min', allow_missing_keys=False, return_list_of_dicts=False)[source]#

Tile the 2D image into patches on a grid and maintain a subset of it. This transform works only with np.ndarray inputs for 2D images.

Parameters

tile_count (Optional[int]) – number of tiles to extract, if None extracts all non-background tiles Defaults to None.
tile_size (int) – size of the square tile Defaults to 256.
step (Optional[int]) – step size Defaults to None (same as tile_size)
random_offset (bool) – Randomize position of the grid, instead of starting from the top-left corner Defaults to False.
pad_full (bool) – pad image to the size evenly divisible by tile_size Defaults to False.
background_val (int) – the background constant (e.g. 255 for white background) Defaults to 255.
filter_mode (str) – mode must be in [“min”, “max”, “random”]. If total number of tiles is more than tile_size, then sort by intensity sum, and take the smallest (for min), largest (for max) or random (for random) subset Defaults to min (which assumes background is high value)

randomize(data=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: None

Detection#

Hard Negative Sampler#

The functions in this script are adapted from nnDetection, https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/core/boxes/sampler.py

class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#

HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it selects negative samples with high prediction scores.

The training workflow is described as the follows: 1) forward network and get prediction scores (classification prob/logits) for all the samples; 2) use hard negative sampler to choose negative samples with high prediction scores and some positive samples; 3) compute classification loss for the selected samples; 4) do back propagation.

Parameters

batch_size_per_image (int) – number of training samples to be randomly selected per image
positive_fraction (float) – percentage of positive elements in the selected samples
min_neg (int) – minimum number of negative samples to select if possible.
pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

get_num_neg(negative, num_pos)[source]#

Sample enough negatives to fill up self.batch_size_per_image

Parameters

negative (Tensor) – indices of positive samples
num_pos (int) – number of positive samples to draw

Return type

int

Returns

number of negative samples

get_num_pos(positive)[source]#

Number of positive samples to draw

Parameters: positive (Tensor) – indices of positive samples
Return type: int
Returns: number of positive sample

select_positives(positive, num_pos, labels)[source]#

Select positive samples

Parameters

positive (Tensor) – indices of positive samples, sized (P,), where P is the number of positive samples
num_pos (int) – number of positive samples to sample
labels (Tensor) – labels for all samples, sized (A,), where A is the number of samples.

Return type

Tensor

Returns

binary mask of positive samples to choose, sized (A,),: where A is the number of samples in one image

select_samples_img_list(target_labels, fg_probs)[source]#

Select positives and hard negatives from list samples per image. Hard negative sampler will be applied to each image independently.

Parameters

target_labels (List[Tensor]) – list of labels per image. For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i. Positive samples have positive labels, negative samples have label 0.
fg_probs (List[Tensor]) – list of maximum foreground probability per images, For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i.

Return type

Tuple[List[Tensor], List[Tensor]]

Returns

list of binary mask for positive samples
list binary mask for negative samples

Example

sampler = HardNegativeSampler(
    batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2
)
# two images with different number of samples
target_labels = [ torch.tensor([0,1]), torch.tensor([1,0,2,1])]
fg_probs = [ torch.rand(2), torch.rand(4)]
pos_idx_list, neg_idx_list = sampler.select_samples_img_list(target_labels, fg_probs)

select_samples_per_img(labels_per_img, fg_probs_per_img)[source]#

Select positives and hard negatives from samples.

Parameters

labels_per_img (Tensor) – labels, sized (A,). Positive samples have positive labels, negative samples have label 0.
fg_probs_per_img (Tensor) – maximum foreground probability, sized (A,)

Return type

Tuple[Tensor, Tensor]

Returns

binary mask for positive samples, sized (A,)
binary mask for negative samples, sized (A,)

Example

sampler = HardNegativeSampler(
    batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2
)
# two images with different number of samples
target_labels = torch.tensor([1,0,2,1])
fg_probs = torch.rand(4)
pos_idx, neg_idx = sampler.select_samples_per_img(target_labels, fg_probs)

class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSamplerBase(pool_size=10)[source]#

Base class of hard negative sampler.

Hard negative sampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.

Parameters: pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

select_negatives(negative, num_neg, fg_probs)[source]#

Select hard negative samples.

Parameters

negative (Tensor) – indices of all the negative samples, sized (P,), where P is the number of negative samples
num_neg (int) – number of negative samples to sample
fg_probs (Tensor) – maximum foreground prediction scores (probability) across all the classes for each sample, sized (A,), where A is the number of samples.

Return type

Tensor

Returns

binary mask of negative samples to choose, sized (A,),: where A is the number of samples in one image

RetinaNet Network#

Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py

class monai.apps.detection.networks.retinanet_network.RetinaNet(spatial_dims, num_classes, num_anchors, feature_extractor, size_divisible=1)[source]#

The network used in RetinaNet.

It takes an image tensor as inputs, and outputs a dictionary head_outputs. head_outputs[self.cls_key] is the predicted classification maps, a list of Tensor. head_outputs[self.box_reg_key] is the predicted box regression maps, a list of Tensor.

Parameters

spatial_dims (int) – number of spatial dimensions of the images. We support both 2D and 3D images.
num_classes (int) – number of output classes of the model (excluding the background).
num_anchors (int) – number of anchors at each location.
feature_extractor – a network that outputs feature maps from the input images, each feature map corresponds to a different resolution. Its output can have a format of Tensor, Dict[Any, Tensor], or Sequence[Tensor]. It can be the output of resnet_fpn_feature_extractor(*args, **kwargs).
size_divisible (Union[Sequence[int], int]) – the spatial size of the network input should be divisible by size_divisible, decided by the feature_extractor.

Example

from monai.networks.nets import resnet
spatial_dims = 3  # 3D network
conv1_t_stride = (2,2,1)  # stride of first convolutional layer in backbone
backbone = resnet.ResNet(
    spatial_dims = spatial_dims,
    block = resnet.ResNetBottleneck,
    layers = [3, 4, 6, 3],
    block_inplanes = resnet.get_inplanes(),
    n_input_channels= 1,
    conv1_t_stride = conv1_t_stride,
    conv1_t_size = (7,7,7),
)
# This feature_extractor outputs 4-level feature maps.
# number of output feature maps is len(returned_layers)+1
returned_layers = [1,2,3]  # returned layer from feature pyramid network
feature_extractor = resnet_fpn_feature_extractor(
    backbone = backbone,
    spatial_dims = spatial_dims,
    pretrained_backbone = False,
    trainable_backbone_layers = None,
    returned_layers = returned_layers,
)
# This feature_extractor requires input image spatial size
# to be divisible by (32, 32, 16).
size_divisible = tuple(2*s*2**max(returned_layers) for s in conv1_t_stride)
model = RetinaNet(
    spatial_dims = spatial_dims,
    num_classes = 5,
    num_anchors = 6,
    feature_extractor=feature_extractor,
    size_divisible = size_divisible,
).to(device)
result = model(torch.rand(2, 1, 128,128,128))
cls_logits_maps = result["cls_logits"]  # a list of len(returned_layers)+1 Tensor
box_regression_maps = result["box_regression"]  # a list of len(returned_layers)+1 Tensor

forward(images)[source]#

Parameters: images (Tensor) – input images, sized (B, img_channels, H, W) or (B, img_channels, H, W, D).
Return type: Dict[str, List[Tensor]]
Returns: a dictionary head_outputs with keys including self.cls_key and self.box_reg_key. head_outputs[self.cls_key] is the predicted classification maps, a list of Tensor. head_outputs[self.box_reg_key] is the predicted box regression maps, a list of Tensor.

class monai.apps.detection.networks.retinanet_network.RetinaNetClassificationHead(in_channels, num_anchors, num_classes, spatial_dims, prior_probability=0.01)[source]#

A classification head for use in RetinaNet.

This head takes a list of feature maps as inputs, and outputs a list of classification maps. Each output map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.

Parameters

in_channels (int) – number of channels of the input feature
num_anchors (int) – number of anchors to be predicted
num_classes (int) – number of classes to be predicted
spatial_dims (int) – spatial dimension of the network, should be 2 or 3.
prior_probability (float) – prior probability to initialize classification convolutional layers.

forward(x)[source]#

It takes a list of feature maps as inputs, and outputs a list of classification maps. Each output classification map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.

Parameters: x (List[Tensor]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.
Return type: List[Tensor]
Returns: cls_logits_maps, list of classification map. cls_logits_maps[i] is a (B, num_anchors * num_classes, H_i, W_i) or (B, num_anchors * num_classes, H_i, W_i, D_i) Tensor.

class monai.apps.detection.networks.retinanet_network.RetinaNetRegressionHead(in_channels, num_anchors, spatial_dims)[source]#

A regression head for use in RetinaNet.

This head takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.

Parameters

in_channels (int) – number of channels of the input feature
num_anchors (int) – number of anchors to be predicted
spatial_dims (int) – spatial dimension of the network, should be 2 or 3.

forward(x)[source]#

It takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.

Parameters: x (List[Tensor]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.
Return type: List[Tensor]
Returns: box_regression_maps, list of box regression map. cls_logits_maps[i] is a (B, num_anchors * 2 * spatial_dims, H_i, W_i) or (B, num_anchors * 2 * spatial_dims, H_i, W_i, D_i) Tensor.

monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(backbone, spatial_dims, pretrained_backbone=False, returned_layers=(1, 2, 3), trainable_backbone_layers=None)[source]#

Constructs a feature extractor network with a ResNet-FPN backbone, used as feature_extractor in RetinaNet.

Reference: “Focal Loss for Dense Object Detection”.

The returned feature_extractor network takes an image tensor as inputs, and outputs a dictionary that maps string to the extracted feature maps (Tensor).

The input to the returned feature_extractor is expected to be a list of tensors, each of shape [C, H, W] or [C, H, W, D], one for each image. Different images can have different sizes.

Parameters

backbone (ResNet) – a ResNet model, used as backbone.
spatial_dims (int) – number of spatial dimensions of the images. We support both 2D and 3D images.
pretrained_backbone (bool) – whether the backbone has been pre-trained.
returned_layers (Sequence[int]) – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.
trainable_backbone_layers (Optional[int]) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. When pretrained_backbone is False, this value is set to be 5. When pretrained_backbone is True, if None is passed (the default) this value is set to 3.

Example

from monai.networks.nets import resnet
spatial_dims = 3 # 3D network
backbone = resnet.ResNet(
    spatial_dims = spatial_dims,
    block = resnet.ResNetBottleneck,
    layers = [3, 4, 6, 3],
    block_inplanes = resnet.get_inplanes(),
    n_input_channels= 1,
    conv1_t_stride = (2,2,1),
    conv1_t_size = (7,7,7),
)
# This feature_extractor outputs 4-level feature maps.
# number of output feature maps is len(returned_layers)+1
feature_extractor = resnet_fpn_feature_extractor(
    backbone = backbone,
    spatial_dims = spatial_dims,
    pretrained_backbone = False,
    trainable_backbone_layers = None,
    returned_layers = [1,2,3],
)
model = RetinaNet(
    spatial_dims = spatial_dims,
    num_classes = 5,
    num_anchors = 6,
    feature_extractor=feature_extractor,
    size_divisible = 32,
).to(device)

RetinaNet Detector#

Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py

class monai.apps.detection.networks.retinanet_detector.RetinaNetDetector(network, anchor_generator, box_overlap_metric=<function box_iou>, debug=False)[source]#

Retinanet detector, expandable to other one stage anchor based box detectors in the future. An example of construction can found in the source code of retinanet_resnet50_fpn_detector() .

The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

boxes (FloatTensor[N, 4] or FloatTensor[N, 6]): the ground-truth boxes in StandardMode, i.e., [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax] format, with 0 <= xmin < xmax <= H, 0 <= ymin < ymax <= W, 0 <= zmin < zmax <= D.
labels: the class label for each ground-truth box

The model returns a Dict[str, Tensor] during training, containing the classification and regression losses. When saving the model, only self.network contains trainable parameters and needs to be saved.

During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:

boxes (FloatTensor[N, 4] or FloatTensor[N, 6]): the predicted boxes in StandardMode, i.e., [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax] format, with 0 <= xmin < xmax <= H, 0 <= ymin < ymax <= W, 0 <= zmin < zmax <= D.
labels (Int64Tensor[N]): the predicted labels for each image
labels_scores (Tensor[N]): the scores for each prediction

Parameters

network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].
anchor_generator (AnchorGenerator) – anchor generator.
box_overlap_metric (Callable) – func that compute overlap between two sets of boxes, default is Intersection over Union (IoU).
debug (bool) – whether to print out internal parameters, used for debugging and parameter tuning.

Notes

Input argument network can be a monai.apps.detection.networks.retinanet_network.RetinaNet(*) object, but any network that meets the following rules is a valid input network.

It should have attributes including spatial_dims, num_classes, cls_key, box_reg_key, num_anchors, size_divisible.
- spatial_dims (int) is the spatial dimension of the network, we support both 2D and 3D.
- num_classes (int) is the number of classes, excluding the background.
- size_divisible (int or Sequence[int]) is the expectation on the input image shape. The network needs the input spatial_size to be divisible by size_divisible, length should be 2 or 3.
- cls_key (str) is the key to represent classification in the output dict.
- box_reg_key (str) is the key to represent box regression in the output dict.
- num_anchors (int) is the number of anchor shapes at each location. it should equal to self.anchor_generator.num_anchors_per_location()[0].
Its input should be an image Tensor sized (B, C, H, W) or (B, C, H, W, D).
About its output head_outputs:
- It should be a dictionary with at least two keys: network.cls_key and network.box_reg_key.
- head_outputs[network.cls_key] should be List[Tensor] or Tensor. Each Tensor represents classification logits map at one resolution level, sized (B, num_classes*num_anchors, H_i, W_i) or (B, num_classes*num_anchors, H_i, W_i, D_i).
- head_outputs[network.box_reg_key] should be List[Tensor] or Tensor. Each Tensor represents box regression map at one resolution level, sized (B, 2*spatial_dims*num_anchors, H_i, W_i)or (B, 2*spatial_dims*num_anchors, H_i, W_i, D_i).
- len(head_outputs[network.cls_key]) == len(head_outputs[network.box_reg_key]).

Example

# define a naive network
import torch
class NaiveNet(torch.nn.Module):
    def __init__(self, spatial_dims: int, num_classes: int):
        super().__init__()
        self.spatial_dims = spatial_dims
        self.num_classes = num_classes
        self.size_divisible = 2
        self.cls_key = "cls"
        self.box_reg_key = "box_reg"
        self.num_anchors = 1
    def forward(self, images: torch.Tensor):
        spatial_size = images.shape[-self.spatial_dims:]
        out_spatial_size = tuple(s//self.size_divisible for s in spatial_size)  # half size of input
        out_cls_shape = (images.shape[0],self.num_classes*self.num_anchors) + out_spatial_size
        out_box_reg_shape = (images.shape[0],2*self.spatial_dims*self.num_anchors) + out_spatial_size
        return {self.cls_key: [torch.randn(out_cls_shape)], self.box_reg_key: [torch.randn(out_box_reg_shape)]}

# create a RetinaNetDetector detector
spatial_dims = 3
num_classes = 5
anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(
    feature_map_scales=(1, ), base_anchor_shapes=((8,) * spatial_dims)
)
net = NaiveNet(spatial_dims, num_classes)
detector = RetinaNetDetector(net, anchor_generator)

# only detector.network may contain trainable parameters.
optimizer = torch.optim.SGD(
    detector.network.parameters(),
    1e-3,
    momentum=0.9,
    weight_decay=3e-5,
    nesterov=True,
)
torch.save(detector.network.state_dict(), 'model.pt')  # save model
detector.network.load_state_dict(torch.load('model.pt'))  # load model

compute_anchor_matched_idxs(anchors, targets, num_anchor_locs_per_level)[source]#

Compute the matched indices between anchors and ground truth (gt) boxes in targets. output[k][i] represents the matched gt index for anchor[i] in image k. Suppose there are M gt boxes for image k. The range of it output[k][i] value is [-2, -1, 0, …, M-1]. [0, M - 1] indicates this anchor is matched with a gt box, while a negative value indicating that it is not matched.

Parameters

anchors (List[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
targets (List[Dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
num_anchor_locs_per_level (Sequence[int]) – each element represents HW or HWD at this level.

Return type

List[Tensor]

Returns

a list of matched index matched_idxs_per_image (Tensor[int64]), Tensor sized (sum(HWA),) or (sum(HWDA),). Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2

compute_box_loss(box_regression, targets, anchors, matched_idxs)[source]#

Compute box regression losses.

Parameters

box_regression (Tensor) – box regression results, sized (B, sum(HWA), 2*self.spatial_dims)
targets (List[Dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors (List[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
matched_idxs (List[Tensor]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)

Return type

Tensor

Returns

box regression losses.

compute_cls_loss(cls_logits, targets, matched_idxs)[source]#

Compute classification losses.

Parameters

cls_logits (Tensor) – classification logits, sized (B, sum(HW(D)A), self.num_classes)
targets (List[Dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
matched_idxs (List[Tensor]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)

Return type

Tensor

Returns

classification losses.

compute_loss(head_outputs_reshape, targets, anchors, num_anchor_locs_per_level)[source]#

Compute losses.

Parameters

head_outputs_reshape (Dict[str, Tensor]) – reshaped head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)
targets (List[Dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors (List[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

Return type

Dict[str, Tensor]

Returns

a dict of several kinds of losses.

forward(input_images, targets=None, use_inferer=False)[source]#

Returns a dict of losses during training, or a list predicted dict of boxes and labels during inference.

Parameters

input_images (Union[List[Tensor], Tensor]) – The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.
targets (Optional[List[Dict[str, Tensor]]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image (optional).
use_inferer (bool) – whether to use self.inferer, a sliding window inferer, to do the inference. If False, will simply forward the network. If True, will use self.inferer, and requires self.set_sliding_window_inferer(*args) to have been called before.

Return type

Union[Dict[str, Tensor], List[Dict[str, Tensor]]]

Returns

If training mode, will return a dict with at least two keys, including self.cls_key and self.box_reg_key, representing classification loss and box regression loss.

If evaluation mode, will return a list of detection results. Each element corresponds to an images in input_images, is a dict with at least three keys, including self.target_box_key, self.target_label_key, self.pred_score_key, representing predicted boxes, classification labels, and classification scores.

generate_anchors(images, head_outputs)[source]#

Generate anchors and store it in self.anchors: List[Tensor]. We generate anchors only when there is no stored anchors, or the new coming images has different shape with self.previous_image_shape

Parameters

images (Tensor) – input images, a (B, C, H, W) or (B, C, H, W, D) Tensor.
head_outputs (Dict[str, List[Tensor]]) – head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)

get_box_train_sample_per_image(box_regression_per_image, targets_per_image, anchors_per_image, matched_idxs_per_image)[source]#

Get samples from one image for box regression losses computation.

Parameters

box_regression_per_image (Tensor) – box regression result for one image, (sum(HWA), 2*self.spatial_dims)
targets_per_image (Dict[str, Tensor]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors_per_image (Tensor) – anchors of one image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
matched_idxs_per_image (Tensor) – matched index, sized (sum(HWA),) or (sum(HWDA),)

Return type

Tuple[Tensor, Tensor]

Returns

paired predicted and GT samples from one image for box regression losses computation

get_cls_train_sample_per_image(cls_logits_per_image, targets_per_image, matched_idxs_per_image)[source]#

Get samples from one image for classification losses computation.

Parameters

cls_logits_per_image (Tensor) – classification logits for one image, (sum(HWA), self.num_classes)
targets_per_image (Dict[str, Tensor]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
matched_idxs_per_image (Tensor) – matched index, Tensor sized (sum(HWA),) or (sum(HWDA),) Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2

Return type

Tuple[Tensor, Tensor]

Returns

paired predicted and GT samples from one image for classification losses computation

postprocess_detections(head_outputs_reshape, anchors, image_sizes, num_anchor_locs_per_level, need_sigmoid=True)[source]#

Postprocessing to generate detection result from classification logits and box regression. Use self.box_selector to select the final outut boxes for each image.

Parameters

head_outputs_reshape (Dict[str, Tensor]) – reshaped head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)
targets – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors (List[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

Return type

List[Dict[str, Tensor]]

Returns

a list of dict, each dict corresponds to detection result on image.

set_atss_matcher(num_candidates=4, center_in_gt=False)[source]#

Using for training. Set ATSS matcher that matches anchors with ground truth boxes

Parameters

num_candidates (int) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.
center_in_gt (bool) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.

Return type

None

set_balanced_sampler(batch_size_per_image, positive_fraction)[source]#

Using for training. Set torchvision balanced sampler that samples part of the anchors for training.

Parameters

batch_size_per_image (int) – number of elements to be selected per image
positive_fraction (float) – percentage of positive elements per batch

set_box_coder_weights(weights)[source]#

Set the weights for box coder.

Parameters: weights (Tuple[float]) – a list/tuple with length of 2*self.spatial_dims

set_box_regression_loss(box_loss, encode_gt, decode_pred)[source]#

Using for training. Set loss for box regression.

Parameters

box_loss (Module) – loss module for box regression
encode_gt (bool) – if True, will encode ground truth boxes to target box regression before computing the losses. Should be True for L1 loss and False for GIoU loss.
decode_pred (bool) – if True, will decode predicted box regression into predicted boxes before computing losses. Should be False for L1 loss and True for GIoU loss.

Example

detector.set_box_regression_loss(
    torch.nn.SmoothL1Loss(beta=1.0 / 9, reduction="mean"),
    encode_gt = True, decode_pred = False
)
detector.set_box_regression_loss(
    monai.losses.giou_loss.BoxGIoULoss(reduction="mean"),
    encode_gt = False, decode_pred = True
)

Return type: None

set_box_selector_parameters(score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300, apply_sigmoid=True)[source]#

Using for inference. Set the parameters that are used for box selection during inference. The box selection is performed with the following steps:

For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.

Parameters

score_thresh (float) – no box with scores less than score_thresh will be kept
topk_candidates_per_level (int) – max number of boxes to keep for each level
nms_thresh (float) – box overlapping threshold for NMS
detections_per_img (int) – max number of boxes to keep for each image

set_cls_loss(cls_loss)[source]#

Using for training. Set loss for classification that takes logits as inputs, make sure sigmoid/softmax is built in.

Parameters: cls_loss (Module) – loss module for classification

Example

detector.set_cls_loss(torch.nn.BCEWithLogitsLoss(reduction="mean"))
detector.set_cls_loss(FocalLoss(reduction="mean", gamma=2.0))

Return type: None

set_hard_negative_sampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#

Using for training. Set hard negative sampler that samples part of the anchors for training.

HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.

Parameters

batch_size_per_image (int) – number of elements to be selected per image
positive_fraction (float) – percentage of positive elements in the selected samples
min_neg (int) – minimum number of negative samples to select if possible.
pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

set_regular_matcher(fg_iou_thresh, bg_iou_thresh, allow_low_quality_matches=True)[source]#

Using for training. Set torchvision matcher that matches anchors with ground truth boxes.

Parameters

fg_iou_thresh (float) – foreground IoU threshold for Matcher, considered as matched if IoU > fg_iou_thresh
bg_iou_thresh (float) – background IoU threshold for Matcher, considered as not matched if IoU < bg_iou_thresh

Return type

None

set_sliding_window_inferer(roi_size, sw_batch_size=1, overlap=0.5, mode=BlendMode.CONSTANT, sigma_scale=0.125, padding_mode=PytorchPadMode.CONSTANT, cval=0.0, sw_device=None, device=None, progress=False, cache_roi_weight_map=False)[source]#: Define sliding window inferer and store it to self.inferer.

set_target_keys(box_key, label_key)[source]#: Set keys for the training targets and inference outputs. During training, both box_key and label_key should be keys in the targets when performing self.forward(input_images, targets). During inference, they will be the keys in the output dict of self.forward(input_images)`.

monai.apps.detection.networks.retinanet_detector.retinanet_resnet50_fpn_detector(num_classes, anchor_generator, returned_layers=(1, 2, 3), pretrained=False, progress=True, **kwargs)[source]#

Returns a RetinaNet detector using a ResNet-50 as backbone, which can be pretrained from Med3D: Transfer Learning for 3D Medical Image Analysis <https://arxiv.org/pdf/1904.00625.pdf> _.

Parameters

num_classes (int) – number of output classes of the model (excluding the background).
anchor_generator (AnchorGenerator) – AnchorGenerator,
returned_layers (Sequence[int]) – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.
pretrained (bool) – If True, returns a backbone pre-trained on 23 medical datasets
progress (bool) – If True, displays a progress bar of the download to stderr

Return type

RetinaNetDetector

Returns

A RetinaNetDetector object with resnet50 as backbone

Example

# define a naive network
resnet_param = {
    "pretrained": False,
    "spatial_dims": 3,
    "n_input_channels": 2,
    "num_classes": 3,
    "conv1_t_size": 7,
    "conv1_t_stride": (2, 2, 2)
}
returned_layers = [1]
anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(
    feature_map_scales=(1, 2), base_anchor_shapes=((8,) * resnet_param["spatial_dims"])
)
detector = retinanet_resnet50_fpn_detector(
    **resnet_param, anchor_generator=anchor_generator, returned_layers=returned_layers
)

Transforms#

monai.apps.detection.transforms.box_ops.apply_affine_to_boxes(boxes, affine)[source]#

This function applies affine matrices to the boxes

Parameters

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
affine (Union[ndarray, Tensor]) – affine matrix to be applied to the box coordinates, sized (spatial_dims+1,spatial_dims+1)

Return type

Union[ndarray, Tensor]

Returns

returned affine transformed boxes, with same data type as boxes, does not share memory with boxes

monai.apps.detection.transforms.box_ops.convert_box_to_mask(boxes, labels, spatial_size, bg_label=-1, ellipse_mask=False)[source]#

Convert box to int16 mask image, which has the same size with the input image.

Parameters

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode.
labels (Union[ndarray, Tensor]) – classification foreground(fg) labels corresponding to boxes, dtype should be int, sized (N,).
spatial_size (Union[Sequence[int], int]) – image spatial size.
bg_label (int) – background labels for the output mask image, make sure it is smaller than any fg labels.
ellipse_mask (bool) –
bool.
- If True, it assumes the object shape is close to ellipse or ellipsoid.
- If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
- If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.

Return type

Union[ndarray, Tensor]

Returns

int16 array, sized (num_box, H, W). Each channel represents a box.
The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.

monai.apps.detection.transforms.box_ops.convert_mask_to_box(boxes_mask, bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#

Convert int16 mask image to box, which has the same size with the input image

Parameters

boxes_mask (Union[ndarray, Tensor]) – int16 array, sized (num_box, H, W). Each channel represents a box. The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.
bg_label (int) – background labels for the boxes_mask
box_dtype – output dtype for boxes
label_dtype – output dtype for labels

Return type

Tuple[Union[ndarray, Tensor], Union[ndarray, Tensor]]

Returns

bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode.
classification foreground(fg) labels, dtype should be int, sized (N,).

monai.apps.detection.transforms.box_ops.flip_boxes(boxes, spatial_size, flip_axes=None)[source]#

Flip boxes when the corresponding image is flipped

Parameters

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
spatial_size (Union[Sequence[int], int]) – image spatial size.
flip_axes (Union[Sequence[int], int, None]) – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.

Returns

flipped boxes, with same data type as boxes, does not share memory with boxes

monai.apps.detection.transforms.box_ops.resize_boxes(boxes, src_spatial_size, dst_spatial_size)[source]#

Resize boxes when the corresponding image is resized

Parameters

boxes (Union[ndarray, Tensor]) – source bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
src_spatial_size (Union[Sequence[int], int]) – source image spatial size.
dst_spatial_size (Union[Sequence[int], int]) – target image spatial size.

Returns

resized boxes, with same data type as boxes, does not share memory with boxes

Example

boxes = torch.ones(1,4)
src_spatial_size = [100, 100]
dst_spatial_size = [128, 256]
resize_boxes(boxes, src_spatial_size, dst_spatial_size) #  will return tensor([[1.28, 2.56, 1.28, 2.56]])

monai.apps.detection.transforms.box_ops.rot90_boxes(boxes, spatial_size, k=1, axes=(0, 1))[source]#

Rotate boxes by 90 degrees in the plane specified by axes. Rotation direction is from the first towards the second axis.

Parameters

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
spatial_size (Union[Sequence[int], int]) – image spatial size.
k (int) – number of times the array is rotated by 90 degrees.
axes (Tuple[int, int]) – (2,) array_like The array is rotated in the plane defined by the axes. Axes must be different.

Returns

A rotated view of boxes.

Notes

rot90_boxes(boxes, spatial_size, k=1, axes=(1,0)) is the reverse of rot90_boxes(boxes, spatial_size, k=1, axes=(0,1)) rot90_boxes(boxes, spatial_size, k=1, axes=(1,0)) is equivalent to rot90_boxes(boxes, spatial_size, k=-1, axes=(0,1))

monai.apps.detection.transforms.box_ops.select_labels(labels, keep)[source]#

For element in labels, select indices keep from it.

Parameters

labels (Union[Sequence[Union[ndarray, Tensor]], ndarray, Tensor]) – Sequence of array. Each element represents classification labels or scores corresponding to boxes, sized (N,).
keep (Union[ndarray, Tensor]) – the indices to keep, same length with each element in labels.

Return type

Union[Tuple, ndarray, Tensor]

Returns

selected labels, does not share memory with original labels.

monai.apps.detection.transforms.box_ops.swapaxes_boxes(boxes, axis1, axis2)[source]#

Interchange two axes of boxes.

Parameters

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
axis1 (int) – First axis.
axis2 (int) – Second axis.

Returns

boxes with two axes interchanged.

monai.apps.detection.transforms.box_ops.zoom_boxes(boxes, zoom)[source]#

Zoom boxes

Parameters

boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
zoom (Union[Sequence[float], float]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.

Returns

zoomed boxes, with same data type as boxes, does not share memory with boxes

Example

boxes = torch.ones(1,4)
zoom_boxes(boxes, zoom=[0.5,2.2]) #  will return tensor([[0.5, 2.2, 0.5, 2.2]])

A collection of “vanilla” transforms for box operations https://github.com/Project-MONAI/MONAI/wiki/MONAI_Design

class monai.apps.detection.transforms.array.AffineBox[source]#: Applies affine matrix to the boxes

class monai.apps.detection.transforms.array.BoxToMask(bg_label=-1, ellipse_mask=False)[source]#

Convert box to int16 mask image, which has the same size with the input image.

Parameters

bg_label (int) – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.
ellipse_mask (bool) –
bool.
- If True, it assumes the object shape is close to ellipse or ellipsoid.
- If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
- If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.

class monai.apps.detection.transforms.array.ClipBoxToImage(remove_empty=False)[source]#

Clip the bounding boxes and the associated labels/scores to make sure they are within the image. There might be multiple arrays of labels/scores associated with one array of boxes.

Parameters: remove_empty (bool) – whether to remove the boxes and corresponding labels that are actually empty

class monai.apps.detection.transforms.array.ConvertBoxMode(src_mode=None, dst_mode=None)[source]#

This transform converts the boxes in src_mode to the dst_mode.

Parameters

src_mode (Union[str, BoxMode, Type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode().
dst_mode (Union[str, BoxMode, Type[BoxMode], None]) – target box mode. If it is not given, this func will assume it is StandardMode().

Note

StandardMode = CornerCornerModeTypeA, also represented as “xyxy” for 2D and “xyzxyz” for 3D.

src_mode and dst_mode can be:

str: choose from BoxModeName, for example,
- “xyxy”: boxes has format [xmin, ymin, xmax, ymax]
- “xyzxyz”: boxes has format [xmin, ymin, zmin, xmax, ymax, zmax]
- “xxyy”: boxes has format [xmin, xmax, ymin, ymax]
- “xxyyzz”: boxes has format [xmin, xmax, ymin, ymax, zmin, zmax]
- “xyxyzz”: boxes has format [xmin, ymin, xmax, ymax, zmin, zmax]
- “xywh”: boxes has format [xmin, ymin, xsize, ysize]
- “xyzwhd”: boxes has format [xmin, ymin, zmin, xsize, ysize, zsize]
- “ccwh”: boxes has format [xcenter, ycenter, xsize, ysize]
- “cccwhd”: boxes has format [xcenter, ycenter, zcenter, xsize, ysize, zsize]
BoxMode class: choose from the subclasses of BoxMode, for example,
- CornerCornerModeTypeA: equivalent to “xyxy” or “xyzxyz”
- CornerCornerModeTypeB: equivalent to “xxyy” or “xxyyzz”
- CornerCornerModeTypeC: equivalent to “xyxy” or “xyxyzz”
- CornerSizeMode: equivalent to “xywh” or “xyzwhd”
- CenterSizeMode: equivalent to “ccwh” or “cccwhd”
BoxMode object: choose from the subclasses of BoxMode, for example,
- CornerCornerModeTypeA(): equivalent to “xyxy” or “xyzxyz”
- CornerCornerModeTypeB(): equivalent to “xxyy” or “xxyyzz”
- CornerCornerModeTypeC(): equivalent to “xyxy” or “xyxyzz”
- CornerSizeMode(): equivalent to “xywh” or “xyzwhd”
- CenterSizeMode(): equivalent to “ccwh” or “cccwhd”
None: will assume mode is StandardMode()

Example

boxes = torch.ones(10,4)
# convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize].
box_converter = ConvertBoxMode(src_mode="xyxy", dst_mode="ccwh")
box_converter(boxes)

class monai.apps.detection.transforms.array.ConvertBoxToStandardMode(mode=None)[source]#

Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].

Parameters: mode (Union[str, BoxMode, Type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .

Example

boxes = torch.ones(10,6)
# convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax]
box_converter = ConvertBoxToStandardMode(mode="xxyyzz")
box_converter(boxes)

class monai.apps.detection.transforms.array.FlipBox(spatial_axis=None)[source]#

Reverses the box coordinates along the given spatial axis. Preserves shape.

Parameters: spatial_axis (Union[Sequence[int], int, None]) – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.

class monai.apps.detection.transforms.array.MaskToBox(bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#

Convert int16 mask image to box, which has the same size with the input image. Pairs with monai.apps.detection.transforms.array.BoxToMask. Please make sure the same min_fg_label is used when using the two transforms in pairs.

Parameters

bg_label (int) – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.
box_dtype – output dtype for boxes
label_dtype – output dtype for labels

class monai.apps.detection.transforms.array.ResizeBox(spatial_size, size_mode='all', **kwargs)[source]#

Resize the input boxes when the corresponding image is resized to given spatial size (with scaling, not cropping/padding).

Parameters

spatial_size (Union[Sequence[int], int]) – expected shape of spatial dimensions after resize operation. if some components of the spatial_size are non-positive values, the transform will use the corresponding components of img size. For example, spatial_size=(32, -1) will be adapted to (32, 64) if the second spatial dimension size of img is 64.
size_mode (str) – should be “all” or “longest”, if “all”, will use spatial_size for all the spatial dims, if “longest”, rescale the image so that only the longest side is equal to specified spatial_size, which must be an int number in this case, keeping the aspect ratio of the initial image, refer to: https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/ #albumentations.augmentations.geometric.resize.LongestMaxSize.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

class monai.apps.detection.transforms.array.RotateBox90(k=1, spatial_axes=(0, 1))[source]#

Rotate a boxes by 90 degrees in the plane specified by axes. See box_ops.rot90_boxes for additional details

Parameters

k (int) – number of times to rotate by 90 degrees.
spatial_axes (Tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions. If axis is negative it counts from the last to the first axis.

class monai.apps.detection.transforms.array.SpatialCropBox(roi_center=None, roi_size=None, roi_start=None, roi_end=None, roi_slices=None)[source]#

General purpose box cropper when the corresponding image is cropped by SpatialCrop(*) with the same ROI. The difference is that we do not support negative indexing for roi_slices.

If a dimension of the expected ROI size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected ROI, and the cropped results of several images may not have exactly the same shape. It can support to crop ND spatial boxes.

The cropped region can be parameterised in various ways:

a list of slices for each spatial dimension (do not allow for use of negative indexing)
a spatial center and size
the start and end coordinates of the ROI

Parameters

roi_center (Union[Sequence[int], ndarray, Tensor, None]) – voxel coordinates for center of the crop ROI.
roi_size (Union[Sequence[int], ndarray, Tensor, None]) – size of the crop ROI, if a dimension of ROI size is bigger than image size, will not crop that dimension of the image.
roi_start (Union[Sequence[int], ndarray, Tensor, None]) – voxel coordinates for start of the crop ROI.
roi_end (Union[Sequence[int], ndarray, Tensor, None]) – voxel coordinates for end of the crop ROI, if a coordinate is out of image, use the end coordinate of image.
roi_slices (Optional[Sequence[slice]]) – list of slices for each of the spatial dimensions.

class monai.apps.detection.transforms.array.ZoomBox(zoom, keep_size=False, **kwargs)[source]#

Zooms an ND Box with same padding or slicing setting with Zoom().

Parameters

zoom (Union[Sequence[float], float]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.
keep_size (bool) – Should keep original size (padding/slicing if needed), default is True.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

A collection of dictionary-based wrappers around the “vanilla” transforms for box operations defined in monai.apps.detection.transforms.array.

Class names are ended with ‘d’ to denote dictionary-based transforms.

monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateD#: alias of AffineBoxToImageCoordinated

monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateDict#: alias of AffineBoxToImageCoordinated

class monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinated(box_keys, box_ref_image_keys, allow_missing_keys=False, image_meta_key=None, image_meta_key_postfix='meta_dict', affine_lps_to_ras=False)[source]#

Dictionary-based transform that converts box in world coordinate to image coordinate.

Parameters

box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (str) – The single key that represents the reference image to which box_keys are attached.
remove_empty – whether to remove the boxes that are actually empty
allow_missing_keys (bool) – don’t raise exception if key is missing.
image_meta_key (Optional[str]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, affine, original_shape, etc. it is a string, map to the box_ref_image_key. if None, will try to construct meta_keys by box_ref_image_key_{meta_key_postfix}.
image_meta_key_postfix (Optional[str]) – if image_meta_keys=None, use box_ref_image_key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
affine_lps_to_ras – default False. Yet if 1) the image is read by ITKReader, and 2) the ITKReader has affine_lps_to_ras=True, and 3) the box is in world coordinate, then set affine_lps_to_ras=True.

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.BoxToMaskD#: alias of BoxToMaskd

monai.apps.detection.transforms.dictionary.BoxToMaskDict#: alias of BoxToMaskd

class monai.apps.detection.transforms.dictionary.BoxToMaskd(box_keys, box_mask_keys, label_keys, box_ref_image_keys, min_fg_label, ellipse_mask=False, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.BoxToMask. Pairs with monai.apps.detection.transforms.dictionary.MaskToBoxd . Please make sure the same min_fg_label is used when using the two transforms in pairs. The output d[box_mask_key] will have background intensity 0, since the following operations may pad 0 on the border.

This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.

use BoxToMaskd to covert boxes and labels to box_masks;

do transforms, e.g., rotation or cropping, on images and box_masks together;

use MaskToBoxd to convert box_masks back to boxes and labels.

Parameters

box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_mask_keys (Union[Collection[Hashable], Hashable]) – Keys to store output box mask results for transformation. Same length with box_keys.
label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Same length with box_keys.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
min_fg_label (int) – min foreground box label.
ellipse_mask (bool) –
bool.
- If True, it assumes the object shape is close to ellipse or ellipsoid.
- If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
- If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.
allow_missing_keys (bool) – don’t raise exception if key is missing.

Example

# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and image together.
import numpy as np
from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd
transforms = Compose(
    [
        BoxToMaskd(
            box_keys="boxes", label_keys="labels",
            box_mask_keys="box_mask", box_ref_image_keys="image",
            min_fg_label=0, ellipse_mask=True
        ),
        RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"],
            prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6,
            keep_size=True,padding_mode="zeros"
        ),
        RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False),
        MaskToBoxd(
            box_mask_keys="box_mask", box_keys="boxes",
            label_keys="labels", min_fg_label=0
        )
        DeleteItemsd(keys=["box_mask"]),
    ]
)

monai.apps.detection.transforms.dictionary.ClipBoxToImageD#: alias of ClipBoxToImaged

monai.apps.detection.transforms.dictionary.ClipBoxToImageDict#: alias of ClipBoxToImaged

class monai.apps.detection.transforms.dictionary.ClipBoxToImaged(box_keys, label_keys, box_ref_image_keys, remove_empty=True, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ClipBoxToImage.

Clip the bounding boxes and the associated labels/scores to makes sure they are within the image. There might be multiple keys of labels/scores associated with one key of boxes.

Parameters

box_keys (Union[Collection[Hashable], Hashable]) – The single key to pick box data for transformation. The box mode is assumed to be StandardMode.
label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Multiple keys are allowed.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – The single key that represents the reference image to which box_keys and label_keys are attached.
remove_empty (bool) – whether to remove the boxes that are actually empty
allow_missing_keys (bool) – don’t raise exception if key is missing.

Example

ClipBoxToImaged(
    box_keys="boxes", box_ref_image_keys="image", label_keys=["labels", "scores"], remove_empty=True
)

monai.apps.detection.transforms.dictionary.ConvertBoxModeD#: alias of ConvertBoxModed

monai.apps.detection.transforms.dictionary.ConvertBoxModeDict#: alias of ConvertBoxModed

class monai.apps.detection.transforms.dictionary.ConvertBoxModed(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ConvertBoxMode.

This transform converts the boxes in src_mode to the dst_mode.

Example

data = {"boxes": torch.ones(10,4)}
# convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize].
box_converter = ConvertBoxModed(box_keys=["boxes"], src_mode="xyxy", dst_mode="ccwh")
box_converter(data)

__init__(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#

Parameters

box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick data for transformation.
src_mode (Union[str, BoxMode, Type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .
dst_mode (Union[str, BoxMode, Type[BoxMode], None]) – target box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .
allow_missing_keys (bool) – don’t raise exception if key is missing.

See also monai.apps.detection,transforms.array.ConvertBoxMode

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeD#: alias of ConvertBoxToStandardModed

monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeDict#: alias of ConvertBoxToStandardModed

class monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModed(box_keys, mode=None, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ConvertBoxToStandardMode.

Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].

Example

data = {"boxes": torch.ones(10,6)}
# convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax]
box_converter = ConvertBoxToStandardModed(box_keys=["boxes"], mode="xxyyzz")
box_converter(data)

__init__(box_keys, mode=None, allow_missing_keys=False)[source]#

Parameters

box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick data for transformation.
mode (Union[str, BoxMode, Type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .
allow_missing_keys (bool) – don’t raise exception if key is missing.

See also monai.apps.detection,transforms.array.ConvertBoxToStandardMode

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.FlipBoxD#: alias of FlipBoxd

monai.apps.detection.transforms.dictionary.FlipBoxDict#: alias of FlipBoxd

class monai.apps.detection.transforms.dictionary.FlipBoxd(image_keys, box_keys, box_ref_image_keys, spatial_axis=None, allow_missing_keys=False)[source]#

Dictionary-based transform that flip boxes and images.

Parameters

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.
box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
spatial_axis (Union[Sequence[int], int, None]) – Spatial axes along which to flip over. Default is None.
allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Tensor]

monai.apps.detection.transforms.dictionary.MaskToBoxD#: alias of MaskToBoxd

monai.apps.detection.transforms.dictionary.MaskToBoxDict#: alias of MaskToBoxd

class monai.apps.detection.transforms.dictionary.MaskToBoxd(box_keys, box_mask_keys, label_keys, min_fg_label, box_dtype=torch.float32, label_dtype=torch.int64, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.MaskToBox. Pairs with monai.apps.detection.transforms.dictionary.BoxToMaskd . Please make sure the same min_fg_label is used when using the two transforms in pairs.

This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.

use BoxToMaskd to covert boxes and labels to box_masks;

do transforms, e.g., rotation or cropping, on images and box_masks together;

use MaskToBoxd to convert box_masks back to boxes and labels.

Parameters

box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_mask_keys (Union[Collection[Hashable], Hashable]) – Keys to store output box mask results for transformation. Same length with box_keys.
label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Same length with box_keys.
min_fg_label (int) – min foreground box label.
box_dtype – output dtype for box_keys
label_dtype – output dtype for label_keys
allow_missing_keys (bool) – don’t raise exception if key is missing.

Example

# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and images together.
import numpy as np
from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd
transforms = Compose(
    [
        BoxToMaskd(
            box_keys="boxes", label_keys="labels",
            box_mask_keys="box_mask", box_ref_image_keys="image",
            min_fg_label=0, ellipse_mask=True
        ),
        RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"],
            prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6,
            keep_size=True,padding_mode="zeros"
        ),
        RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False),
        MaskToBoxd(
            box_mask_keys="box_mask", box_keys="boxes",
            label_keys="labels", min_fg_label=0
        )
        DeleteItemsd(keys=["box_mask"]),
    ]
)

monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelD#: alias of RandCropBoxByPosNegLabeld

monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelDict#: alias of RandCropBoxByPosNegLabeld

class monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabeld(image_keys, box_keys, label_keys, spatial_size, pos=1.0, neg=1.0, num_samples=1, whole_box=True, thresh_image_key=None, image_threshold=0.0, fg_indices_key=None, bg_indices_key=None, meta_keys=None, meta_key_postfix='meta_dict', allow_smaller=False, allow_missing_keys=False)[source]#

Crop random fixed sized regions that contains foreground boxes. Suppose all the expected fields specified by image_keys have same shape, and add patch_index to the corresponding meta data. And will return a list of dictionaries for all the cropped images. If a dimension of the expected spatial size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected size, and the cropped results of several images may not have exactly the same shape.

Parameters

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation. They need to have the same spatial size.
box_keys (str) – The single key to pick box data for transformation. The box mode is assumed to be StandardMode.
label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Multiple keys are allowed.
spatial_size (Union[Sequence[int], int]) – the spatial size of the crop region e.g. [224, 224, 128]. if a dimension of ROI size is bigger than image size, will not crop that dimension of the image. if its components have non-positive values, the corresponding size of data[label_key] will be used. for example: if the spatial size of input data is [40, 40, 40] and spatial_size=[32, 64, -1], the spatial size of output data will be [32, 40, 40].
pos (float) – used with neg together to calculate the ratio pos / (pos + neg) for the probability to pick a foreground voxel as a center rather than a background voxel.
neg (float) – used with pos together to calculate the ratio pos / (pos + neg) for the probability to pick a foreground voxel as a center rather than a background voxel.
num_samples (int) – number of samples (crop regions) to take in each list.
whole_box (bool) – Bool, default True, whether we prefer to contain at least one whole box in the cropped foreground patch. Even if True, it is still possible to get partial box if there are multiple boxes in the image.
thresh_image_key (Optional[str]) – if thresh_image_key is not None, use label == 0 & thresh_image > image_threshold to select the negative sample(background) center. so the crop center will only exist on valid image area.
image_threshold (float) – if enabled thresh_image_key, use thresh_image > image_threshold to determine the valid image content area.
fg_indices_key (Optional[str]) – if provided pre-computed foreground indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.
bg_indices_key (Optional[str]) – if provided pre-computed background indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.
meta_keys (Union[Collection[Hashable], Hashable, None]) – explicitly indicate the key of the corresponding metadata dictionary. used to add patch_index to the meta dict. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix (str) – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. used to add patch_index to the meta dict.
allow_smaller (bool) – if False, an exception will be raised if the image is smaller than the requested ROI in any dimension. If True, any smaller dimensions will be set to match the cropped size (i.e., no cropping in that dimension).
allow_missing_keys (bool) – don’t raise exception if key is missing.

randomize(boxes, image_size, fg_indices=None, bg_indices=None, thresh_image=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: None

monai.apps.detection.transforms.dictionary.RandFlipBoxD#: alias of RandFlipBoxd

monai.apps.detection.transforms.dictionary.RandFlipBoxDict#: alias of RandFlipBoxd

class monai.apps.detection.transforms.dictionary.RandFlipBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, spatial_axis=None, allow_missing_keys=False)[source]#

Dictionary-based transform that randomly flip boxes and images with the given probabilities.

Parameters

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.
box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
prob (float) – Probability of flipping.
spatial_axis (Union[Sequence[int], int, None]) – Spatial axes along which to flip over. Default is None.
allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Tensor]

set_random_state(seed=None, state=None)[source]#

Set the random state locally, to control the randomness, the derived classes should use self.R instead of np.random to introduce random factors.

Parameters

seed (Optional[int]) – set the random state with an integer seed.
state (Optional[RandomState]) – set the random state with a np.random.RandomState object.

Raises

TypeError – When state is not an Optional[np.random.RandomState].

Return type

RandFlipBoxd

Returns

a Randomizable instance.

monai.apps.detection.transforms.dictionary.RandRotateBox90D#: alias of RandRotateBox90d

monai.apps.detection.transforms.dictionary.RandRotateBox90Dict#: alias of RandRotateBox90d

class monai.apps.detection.transforms.dictionary.RandRotateBox90d(image_keys, box_keys, box_ref_image_keys, prob=0.1, max_k=3, spatial_axes=(0, 1), allow_missing_keys=False)[source]#

With probability prob, input boxes and images are rotated by 90 degrees in the plane specified by spatial_axes.

Parameters

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.
box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
prob (float) – probability of rotating. (Default 0.1, with 10% probability it returns a rotated array.)
max_k (int) – number of rotations will be sampled from np.random.randint(max_k) + 1. (Default 3)
spatial_axes (Tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions.
allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Tensor]

monai.apps.detection.transforms.dictionary.RandZoomBoxD#: alias of RandZoomBoxd

monai.apps.detection.transforms.dictionary.RandZoomBoxDict#: alias of RandZoomBoxd

class monai.apps.detection.transforms.dictionary.RandZoomBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, min_zoom=0.9, max_zoom=1.1, mode=InterpolateMode.AREA, padding_mode=NumpyPadMode.EDGE, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#

Dictionary-based transform that randomly zooms input boxes and images with given probability within given zoom range.

Parameters

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.
box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
prob (float) – Probability of zooming.
min_zoom (Union[Sequence[float], float]) – Min zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, min_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.
max_zoom (Union[Sequence[float], float]) – Max zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, max_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.
mode (Union[Sequence[str], str]) – {"nearest", "nearest-exact", "linear", "bilinear", "bicubic", "trilinear", "area"} The interpolation mode. Defaults to "area". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key in keys.
padding_mode (Union[Sequence[str], str]) – available modes for numpy array:{"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
align_corners (Union[Sequence[Optional[bool]], bool, None]) – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key in keys.
keep_size (bool) – Should keep original size (pad if needed), default is True.
allow_missing_keys (bool) – don’t raise exception if key is missing.
kwargs – other args for np.pad API, note that np.pad treats channel dimension as the first dimension. more details: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Tensor]

set_random_state(seed=None, state=None)[source]#

Set the random state locally, to control the randomness, the derived classes should use self.R instead of np.random to introduce random factors.

Parameters

seed (Optional[int]) – set the random state with an integer seed.
state (Optional[RandomState]) – set the random state with a np.random.RandomState object.

Raises

TypeError – When state is not an Optional[np.random.RandomState].

Return type

RandZoomBoxd

Returns

a Randomizable instance.

monai.apps.detection.transforms.dictionary.RotateBox90D#: alias of RotateBox90d

monai.apps.detection.transforms.dictionary.RotateBox90Dict#: alias of RotateBox90d

class monai.apps.detection.transforms.dictionary.RotateBox90d(image_keys, box_keys, box_ref_image_keys, k=1, spatial_axes=(0, 1), allow_missing_keys=False)[source]#

Input boxes and images are rotated by 90 degrees in the plane specified by spatial_axes for k times

Parameters

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.
box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
k (int) – number of times to rotate by 90 degrees.
spatial_axes (Tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default (0, 1), this is the first two axis in spatial dimensions.
allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Tensor]

monai.apps.detection.transforms.dictionary.ZoomBoxD#: alias of ZoomBoxd

monai.apps.detection.transforms.dictionary.ZoomBoxDict#: alias of ZoomBoxd

class monai.apps.detection.transforms.dictionary.ZoomBoxd(image_keys, box_keys, box_ref_image_keys, zoom, mode=InterpolateMode.AREA, padding_mode=NumpyPadMode.EDGE, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#

Dictionary-based transform that zooms input boxes and images with the given zoom scale.

Parameters

image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.
box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.
box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.
zoom (Union[Sequence[float], float]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.
mode (Union[Sequence[str], str]) – {"nearest", "nearest-exact", "linear", "bilinear", "bicubic", "trilinear", "area"} The interpolation mode. Defaults to "area". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key in keys.
padding_mode (Union[Sequence[str], str]) – available modes for numpy array:{"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
align_corners (Union[Sequence[Optional[bool]], bool, None]) – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key in keys.
keep_size (bool) – Should keep original size (pad if needed), default is True.
allow_missing_keys (bool) – don’t raise exception if key is missing.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

inverse(data)[source]#

Inverse of __call__.

Raises: NotImplementedError – When the subclass does not override this method.
Return type: Dict[Hashable, Tensor]

Anchor#

This script is adapted from https://github.com/pytorch/vision/blob/release/0.12/torchvision/models/detection/anchor_utils.py

class monai.apps.detection.utils.anchor_utils.AnchorGenerator(sizes=((20, 30, 40),), aspect_ratios=(((0.5, 1), (1, 0.5)),), indexing='ij')[source]#

This module is modified from torchvision to support both 2D and 3D images.

Module that generates anchors for a set of feature maps and image sizes.

The module support computing anchors at multiple sizes and aspect ratios per feature map.

sizes and aspect_ratios should have the same number of elements, and it should correspond to the number of feature maps.

sizes[i] and aspect_ratios[i] can have an arbitrary number of elements. For 2D images, anchor width and height w:h = 1:aspect_ratios[i,j] For 3D images, anchor width, height, and depth w:h:d = 1:aspect_ratios[i,j,0]:aspect_ratios[i,j,1]

AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors per spatial location for feature map i.

Parameters

sizes (Sequence[Sequence[int]]) – base size of each anchor. len(sizes) is the number of feature maps, i.e., the number of output levels for the feature pyramid network (FPN). Each element of sizes is a Sequence which represents several anchor sizes for each feature map.
aspect_ratios (Sequence) – the aspect ratios of anchors. len(aspect_ratios) = len(sizes). For 2D images, each element of aspect_ratios[i] is a Sequence of float. For 3D images, each element of aspect_ratios[i] is a Sequence of 2 value Sequence.
indexing (str) –
choose from {'ij', 'xy'}, optional, Matrix ('ij', default and recommended) or Cartesian ('xy') indexing of output.
- Matrix ('ij', default and recommended) indexing keeps the original axis not changed.
- To use other monai detection components, please set indexing = 'ij'.
- Cartesian ('xy') indexing swaps axis 0 and 1.
- For 2D cases, monai AnchorGenerator(sizes, aspect_ratios, indexing='xy') and torchvision.models.detection.anchor_utils.AnchorGenerator(sizes, aspect_ratios) are equivalent.

Reference:.: https://github.com/pytorch/vision/blob/release/0.12/torchvision/models/detection/anchor_utils.py

Example

# 2D example inputs for a 2-level feature maps
sizes = ((10,12,14,16), (20,24,28,32))
base_aspect_ratios = (1., 0.5,  2.)
aspect_ratios = (base_aspect_ratios, base_aspect_ratios)
anchor_generator = AnchorGenerator(sizes, aspect_ratios)

# 3D example inputs for a 2-level feature maps
sizes = ((10,12,14,16), (20,24,28,32))
base_aspect_ratios = ((1., 1.), (1., 0.5), (0.5, 1.), (2., 2.))
aspect_ratios = (base_aspect_ratios, base_aspect_ratios)
anchor_generator = AnchorGenerator(sizes, aspect_ratios)

forward(images, feature_maps)[source]#

Generate anchor boxes for each image.

Parameters

images (Tensor) – sized (B, C, W, H) or (B, C, W, H, D)
feature_maps (List[Tensor]) – for FPN level i, feature_maps[i] is sized (B, C_i, W_i, H_i) or (B, C_i, W_i, H_i, D_i). This input argument does not have to be the actual feature maps. Any list variable with the same (C_i, W_i, H_i) or (C_i, W_i, H_i, D_i) as feature maps works.

Return type

List[Tensor]

Returns

A list with length of B. Each element represents the anchors for this image. The B elements are identical.

Example

images = torch.zeros((3,1,128,128,128))
feature_maps = [torch.zeros((3,6,64,64,32)), torch.zeros((3,6,32,32,16))]
anchor_generator(images, feature_maps)

generate_anchors(scales, aspect_ratios, dtype=torch.float32, device=None)[source]#

Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.

Parameters

scales (Sequence) – a sequence which represents several anchor sizes for the current feature map.
aspect_ratios (Sequence) – a sequence which represents several aspect_ratios for the current feature map. For 2D images, it is a Sequence of float aspect_ratios[j], anchor width and height w:h = 1:aspect_ratios[j]. For 3D images, it is a Sequence of 2 value Sequence aspect_ratios[j,0] and aspect_ratios[j,1], anchor width, height, and depth w:h:d = 1:aspect_ratios[j,0]:aspect_ratios[j,1]
dtype (dtype) – target data type of the output Tensor.
device (Optional[device]) – target device to put the output Tensor data.
Returns – For each s in scales, returns [s, s*aspect_ratios[j]] for 2D images, and [s, s*aspect_ratios[j,0],s*aspect_ratios[j,1]] for 3D images.

Return type

Tensor

grid_anchors(grid_sizes, strides)[source]#

Every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:spatial_dims) corresponds to a feature map. It outputs g[i] anchors that are s[i] distance apart in direction i, with the same dimensions as a.

Parameters

grid_sizes (List[List[int]]) – spatial size of the feature maps
strides (List[List[Tensor]]) – strides of the feature maps regarding to the original image

Example

grid_sizes = [[100,100],[50,50]]
strides = [[torch.tensor(2),torch.tensor(2)], [torch.tensor(4),torch.tensor(4)]]

Return type: List[Tensor]

num_anchors_per_location()[source]#: Return number of anchor shapes for each feature map.

set_cell_anchors(dtype, device)[source]#: Convert each element in self.cell_anchors to dtype and send to device.

class monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(feature_map_scales=(1, 2, 4, 8), base_anchor_shapes=((32, 32, 32), (48, 20, 20), (20, 48, 20), (20, 20, 48)), indexing='ij')[source]#

Module that generates anchors for a set of feature maps and image sizes, inherited from AnchorGenerator

The module support computing anchors at multiple base anchor shapes per feature map.

feature_map_scales should have the same number of elements with the number of feature maps.

base_anchor_shapes can have an arbitrary number of elements. For 2D images, each element represents anchor width and height [w,h]. For 2D images, each element represents anchor width, height, and depth [w,h,d].

AnchorGenerator will output a set of len(base_anchor_shapes) anchors per spatial location for feature map i.

Parameters

feature_map_scales (Union[Sequence[int], Sequence[float]]) – scale of anchors for each feature map, i.e., each output level of the feature pyramid network (FPN). len(feature_map_scales) is the number of feature maps. scale[i]*base_anchor_shapes represents the anchor shapes for feature map i.
base_anchor_shapes (Union[Sequence[Sequence[int]], Sequence[Sequence[float]]]) – a sequence which represents several anchor shapes for one feature map. For N-D images, it is a Sequence of N value Sequence.
indexing (str) – choose from {‘xy’, ‘ij’}, optional Cartesian (‘xy’) or matrix (‘ij’, default) indexing of output. Cartesian (‘xy’) indexing swaps axis 0 and 1, which is the setting inside torchvision. matrix (‘ij’, default) indexing keeps the original axis not changed. See also indexing in https://pytorch.org/docs/stable/generated/torch.meshgrid.html

Example

# 2D example inputs for a 2-level feature maps
feature_map_scales = (1, 2)
base_anchor_shapes = ((10, 10), (6, 12), (12, 6))
anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes)

# 3D example inputs for a 2-level feature maps
feature_map_scales = (1, 2)
base_anchor_shapes = ((10, 10, 10), (12, 12, 8), (10, 10, 6), (16, 16, 10))
anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes)

static generate_anchors_using_shape(anchor_shapes, dtype=torch.float32, device=None)[source]#

Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.

Parameters

anchor_shapes (Tensor) – [w, h] or [w, h, d], sized (N, spatial_dims), represents N anchor shapes for the current feature map.
dtype (dtype) – target data type of the output Tensor.
device (Optional[device]) – target device to put the output Tensor data.

Return type

Tensor

Returns

For 2D images, returns [-w/2, -h/2, w/2, h/2]; For 3D images, returns [-w/2, -h/2, -d/2, w/2, h/2, d/2]

Matcher#

The functions in this script are adapted from nnDetection, https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/core/boxes/matcher.py which is adapted from torchvision.

These are the changes compared with nndetection: 1) comments and docstrings; 2) reformat; 3) add a debug option to ATSSMatcher to help the users to tune parameters; 4) add a corner case return in ATSSMatcher.compute_matches; 5) add support for float16 cpu

class monai.apps.detection.utils.ATSS_matcher.ATSSMatcher(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#

__init__(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#

Compute matching based on ATSS https://arxiv.org/abs/1912.02424 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

Parameters

num_candidates (int) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.
similarity_fn (Callable[[Tensor, Tensor], Tensor]) – function for similarity computation between boxes and anchors
center_in_gt (bool) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.
debug (bool) – if True, will print the matcher threshold in order to tune num_candidates and center_in_gt.

compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#

Compute matches according to ATTS for a single image Adapted from (https://github.com/sfzhang15/ATSS/blob/79dfb28bd1/atss_core/modeling/rpn/atss/loss.py#L180-L184)

Parameters

boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
anchors (Tensor) – anchors to match Mx4 or Mx6, also assumed to be StandardMode.
num_anchors_per_level (Sequence[int]) – number of anchors per feature pyramid level
num_anchors_per_loc (int) – number of anchors per position

Return type

Tuple[Tensor, Tensor]

Returns

matrix which contains the similarity from each boxes to each anchor [N, M]
vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]

Note

StandardMode = CornerCornerModeTypeA, also represented as “xyxy” ([xmin, ymin, xmax, ymax]) for 2D and “xyzxyz” ([xmin, ymin, zmin, xmax, ymax, zmax]) for 3D.

class monai.apps.detection.utils.ATSS_matcher.Matcher(similarity_fn=<function box_iou>)[source]#

Base class of Matcher, which matches boxes and anchors to each other

Parameters: similarity_fn (Callable[[Tensor, Tensor], Tensor]) – function for similarity computation between boxes and anchors

abstract compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#

Compute matches

Parameters

boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
anchors (Tensor) – anchors to match Mx4 or Mx6, also assumed to be StandardMode.
num_anchors_per_level (Sequence[int]) – number of anchors per feature pyramid level
num_anchors_per_loc (int) – number of anchors per position

Return type

Tuple[Tensor, Tensor]

Returns

matrix which contains the similarity from each boxes to each anchor [N, M]
vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]

Box coder#

This script is modified from torchvision to support N-D images,

https://github.com/pytorch/vision/blob/main/torchvision/models/detection/_utils.py

class monai.apps.detection.utils.box_coder.BoxCoder(weights, boxes_xform_clip=None)[source]#

This class encodes and decodes a set of bounding boxes into the representation used for training the regressors.

Parameters

weights (Tuple[float]) – 4-element tuple or 6-element tuple
boxes_xform_clip (Optional[float]) – high threshold to prevent sending too large values into torch.exp()

Example

box_coder = BoxCoder(weights=[1., 1., 1., 1., 1., 1.])
gt_boxes = torch.tensor([[1,2,1,4,5,6],[1,3,2,7,8,9]])
proposals = gt_boxes + torch.rand(gt_boxes.shape)
rel_gt_boxes = box_coder.encode_single(gt_boxes, proposals)
gt_back = box_coder.decode_single(rel_gt_boxes, proposals)
# We expect gt_back to be equal to gt_boxes

decode(rel_codes, reference_boxes)[source]#

From a set of original reference_boxes and encoded relative box offsets,

Parameters

rel_codes (Tensor) – encoded boxes, Nx4 or Nx6 torch tensor.
reference_boxes (Sequence[Tensor]) – a list of reference boxes, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to be StandardMode

Return type

Tensor

Returns

decoded boxes, Nx1x4 or Nx1x6 torch tensor. The box mode will be StandardMode

decode_single(rel_codes, reference_boxes)[source]#

From a set of original boxes and encoded relative box offsets,

Parameters

rel_codes (Tensor) – encoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor.
reference_boxes (Tensor) – reference boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

Return type

Tensor

Returns

decoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor. The box mode will to be StandardMode

encode(gt_boxes, proposals)[source]#

Encode a set of proposals with respect to some ground truth (gt) boxes.

Parameters

gt_boxes (Sequence[Tensor]) – list of gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode
proposals (Sequence[Tensor]) – list of boxes to be encoded, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to be StandardMode

Return type

Tuple[Tensor]

Returns

A tuple of encoded gt, target of box regression that is used to: convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

encode_single(gt_boxes, proposals)[source]#

Encode proposals with respect to ground truth (gt) boxes.

Parameters

gt_boxes (Tensor) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode
proposals (Tensor) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

Return type

Tensor

Returns

encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

monai.apps.detection.utils.box_coder.encode_boxes(gt_boxes, proposals, weights)[source]#

Encode a set of proposals with respect to some reference ground truth (gt) boxes.

Parameters

gt_boxes (Tensor) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode
proposals (Tensor) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode
weights (Tensor) – the weights for (cx, cy, w, h) or (cx,cy,cz, w,h,d)

Return type

Tensor

Returns

encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

Detection Utilities#

monai.apps.detection.utils.detector_utils.check_input_images(input_images, spatial_dims)[source]#

Validate the input dimensionality (raise a ValueError if invalid).

Parameters

input_images (Union[List[Tensor], Tensor]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims (int) – number of spatial dimensions of the images, 2 or 3.

Return type

None

monai.apps.detection.utils.detector_utils.check_training_targets(input_images, targets, spatial_dims, target_label_key, target_box_key)[source]#

Validate the input images/targets during training (raise a ValueError if invalid).

Parameters

input_images (Union[List[Tensor], Tensor]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
targets (Optional[List[Dict[str, Tensor]]]) – a list of dict. Each dict with two keys: target_box_key and target_label_key, ground-truth boxes present in the image.
spatial_dims (int) – number of spatial dimensions of the images, 2 or 3.
target_label_key (str) – the expected key of target labels.
target_box_key (str) – the expected key of target boxes.

Return type

None

monai.apps.detection.utils.detector_utils.pad_images(input_images, spatial_dims, size_divisible, mode=PytorchPadMode.CONSTANT, **kwargs)[source]#

Pad the input images, so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0

Parameters

input_images (Union[List[Tensor], Tensor]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims (int) – number of spatial dimensions of the images, 2D or 3D.
size_divisible (Union[int, Sequence[int]]) – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.
mode (Union[PytorchPadMode, str]) – available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
kwargs – other arguments for torch.pad function.

Return type

Tuple[Tensor, List[List[int]]]

Returns

images, a (B, C, H, W) or (B, C, H, W, D) Tensor
image_sizes, the original spatial size of each image

monai.apps.detection.utils.detector_utils.preprocess_images(input_images, spatial_dims, size_divisible, mode=PytorchPadMode.CONSTANT, **kwargs)[source]#

Preprocess the input images, including

validate of the inputs
pad the inputs so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0

Parameters

input_images (Union[List[Tensor], Tensor]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims (int) – number of spatial dimensions of the images, 2 or 3.
size_divisible (Union[int, Sequence[int]]) – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.
mode (Union[PytorchPadMode, str]) – available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html
kwargs – other arguments for torch.pad function.

Return type

Tuple[Tensor, List[List[int]]]

Returns

images, a (B, C, H, W) or (B, C, H, W, D) Tensor
image_sizes, the original spatial size of each image

monai.apps.detection.utils.predict_utils.check_dict_values_same_length(head_outputs, keys=None)[source]#

We expect the values in head_outputs: Dict[str, List[Tensor]] to have the same length. Will raise ValueError if not.

Parameters

head_outputs (Dict[str, List[Tensor]]) – a Dict[str, List[Tensor]] or Dict[str, Tensor]
keys (Optional[List[str]]) – the keys in head_output that need to have values (List) with same length. If not provided, will use head_outputs.keys().

Return type

None

monai.apps.detection.utils.predict_utils.ensure_dict_value_to_list_(head_outputs, keys=None)[source]#

An in-place function. We expect head_outputs to be Dict[str, List[Tensor]]. Yet if it is Dict[str, Tensor], this func converts it to Dict[str, List[Tensor]]. It will be modified in-place.

Parameters

head_outputs (Dict[str, List[Tensor]]) – a Dict[str, List[Tensor]] or Dict[str, Tensor], will be modifier in-place
keys (Optional[List[str]]) – the keys in head_output that need to have value type List[Tensor]. If not provided, will use head_outputs.keys().

Return type

None

monai.apps.detection.utils.predict_utils.predict_with_inferer(images, network, keys, inferer=None)[source]#

Predict network dict output with an inferer. Compared with directly output network(images), it enables a sliding window inferer that can be used to handle large inputs.

Parameters

images (Tensor) – input of the network, Tensor sized (B, C, H, W) or (B, C, H, W, D)
network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].
keys (List[str]) – the keys in the output dict, should be network output keys or a subset of them.
inferer (Optional[SlidingWindowInferer]) – a SlidingWindowInferer to handle large inputs.

Return type

Dict[str, List[Tensor]]

Returns

The predicted head_output from network, a Dict[str, List[Tensor]]

Example

# define a naive network
import torch
import monai
class NaiveNet(torch.nn.Module):
    def __init__(self, ):
        super().__init__()

    def forward(self, images: torch.Tensor):
        return {"cls": torch.randn(images.shape), "box_reg": [torch.randn(images.shape)]}

# create a predictor
network = NaiveNet()
inferer = monai.inferers.SlidingWindowInferer(
    roi_size = (128, 128, 128),
    overlap = 0.25,
    cache_roi_weight_map = True,
)
network_output_keys=["cls", "box_reg"]
images = torch.randn((2, 3, 512, 512, 512))  # a large input
head_outputs = predict_with_inferer(images, network, network_output_keys, inferer)

Inference box selector#

Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py

class monai.apps.detection.utils.box_selector.BoxSelector(box_overlap_metric=<function box_iou>, apply_sigmoid=True, score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300)[source]#

Box selector which selects the predicted boxes. The box selection is performed with the following steps:

For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.

Parameters

apply_sigmoid (bool) – whether to apply sigmoid to get scores from classification logits
score_thresh (float) – no box with scores less than score_thresh will be kept
topk_candidates_per_level (int) – max number of boxes to keep for each level
nms_thresh (float) – box overlapping threshold for NMS
detections_per_img (int) – max number of boxes to keep for each image

Example

input_param = {
    "apply_sigmoid": True,
    "score_thresh": 0.1,
    "topk_candidates_per_level": 2,
    "nms_thresh": 0.1,
    "detections_per_img": 5,
}
box_selector = BoxSelector(**input_param)
boxes = [torch.randn([3,6]), torch.randn([7,6])]
logits = [torch.randn([3,3]), torch.randn([7,3])]
spatial_size = (8,8,8)
selected_boxes, selected_scores, selected_labels = box_selector.select_boxes_per_image(
    boxes, logits, spatial_size
)

select_boxes_per_image(boxes_list, logits_list, spatial_size)[source]#

Postprocessing to generate detection result from classification logits and boxes.

The box selection is performed with the following steps:

For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.

Parameters

boxes_list (List[Tensor]) – list of predicted boxes from a single image, each element i is a Tensor sized (N_i, 2*spatial_dims)
logits_list (List[Tensor]) – list of predicted classification logits from a single image, each element i is a Tensor sized (N_i, num_classes)
spatial_size (Union[List[int], Tuple[int]]) – spatial size of the image

Return type

Tuple[Tensor, Tensor, Tensor]

Returns

selected boxes, Tensor sized (P, 2*spatial_dims)
selected_scores, Tensor sized (P, )
selected_labels, Tensor sized (P, )

select_top_score_idx_per_level(logits)[source]#

Select indices with highest scores.

The indices selection is performed with the following steps:

If self.apply_sigmoid, get scores by applying sigmoid to logits. Otherwise, use logits as scores.
Discard indices with scores less than self.score_thresh
Keep indices with top self.topk_candidates_per_level scores

Parameters

logits (Tensor) – predicted classification logits, Tensor sized (N, num_classes)

Returns

selected M indices, Tensor sized (M, ) - selected_scores: selected M scores, Tensor sized (M, ) - selected_labels: selected M labels, Tensor sized (M, )

Return type

topk_idxs

Detection metrics#

This script is almost same with https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/evaluator/detection/coco.py The changes include 1) code reformatting, 2) docstrings.

This script is almost same with https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/evaluator/detection/matching.py The changes include 1) code reformatting, 2) docstrings, 3) allow input args gt_ignore to be optional. (If so, no GT boxes will be ignored.)

monai.apps.detection.metrics.matching.matching_batch(iou_fn, iou_thresholds, pred_boxes, pred_classes, pred_scores, gt_boxes, gt_classes, gt_ignore=None, max_detections=100)[source]#

Match boxes of a batch to corresponding ground truth for each category independently.

Parameters

iou_fn (Callable[[ndarray, ndarray], ndarray]) – compute overlap for each pair
iou_thresholds (Sequence[float]) – defined which IoU thresholds should be evaluated
pred_boxes (Sequence[ndarray]) – predicted boxes from single batch; List[[D, dim * 2]], D number of predictions
pred_classes (Sequence[ndarray]) – predicted classes from a single batch; List[[D]], D number of predictions
pred_scores (Sequence[ndarray]) – predicted score for each bounding box; List[[D]], D number of predictions
gt_boxes (Sequence[ndarray]) – ground truth boxes; List[[G, dim * 2]], G number of ground truth
gt_classes (Sequence[ndarray]) – ground truth classes; List[[G]], G number of ground truth
gt_ignore (Union[Sequence[Sequence[bool]], Sequence[ndarray], None]) – specified if which ground truth boxes are not counted as true positives. If not given, when use all the gt_boxes. (detections which match theses boxes are not counted as false positives either); List[[G]], G number of ground truth
max_detections (int) – maximum number of detections which should be evaluated

Return type

List[Dict[int, Dict[str, ndarray]]]

Returns

List[Dict[int, Dict[str, np.ndarray]]], each Dict[str, np.ndarray] corresponds to an image. Dict has the following keys.

dtMatches: matched detections [T, D], where T = number of thresholds, D = number of detections
gtMatches: matched ground truth boxes [T, G], where T = number of thresholds, G = number of ground truth
dtScores: prediction scores [D] detection scores
gtIgnore: ground truth boxes which should be ignored [G] indicate whether ground truth should be ignored
dtIgnore: detections which should be ignored [T, D], indicate which detections should be ignored

Example

from monai.data.box_utils import box_iou
from monai.apps.detection.metrics.coco import COCOMetric
from monai.apps.detection.metrics.matching import matching_batch
# 3D example outputs of one image from detector
val_outputs_all = [
        {"boxes": torch.tensor([[1,1,1,3,4,5]],dtype=torch.float16),
        "labels": torch.randint(3,(1,)),
        "scores": torch.randn((1,)).absolute()},
]
val_targets_all = [
        {"boxes": torch.tensor([[1,1,1,2,6,4]],dtype=torch.float16),
        "labels": torch.randint(3,(1,))},
]

coco_metric = COCOMetric(
    classes=['c0','c1','c2'], iou_list=[0.1], max_detection=[10]
)
results_metric = matching_batch(
    iou_fn=box_iou,
    iou_thresholds=coco_metric.iou_thresholds,
    pred_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_outputs_all],
    pred_classes=[val_data_i["labels"].numpy() for val_data_i in val_outputs_all],
    pred_scores=[val_data_i["scores"].numpy() for val_data_i in val_outputs_all],
    gt_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_targets_all],
    gt_classes=[val_data_i["labels"].numpy() for val_data_i in val_targets_all],
)
val_metric_dict = coco_metric(results_metric)
print(val_metric_dict)

Reconstruction#

ConvertToTensorComplex#

monai.apps.reconstruction.complex_utils.convert_to_tensor_complex(data, dtype=None, device=None, wrap_sequence=True, track_meta=False)[source]#

Convert complex-valued data to a 2-channel PyTorch tensor. The real and imaginary parts are stacked along the last dimension. This function relies on ‘monai.utils.type_conversion.convert_to_tensor’

Parameters

data – input data can be PyTorch Tensor, numpy array, list, int, and float. will convert Tensor, Numpy array, float, int, bool to Tensor, strings and objects keep the original. for list, convert every item to a Tensor if applicable.
dtype (Optional[dtype]) – target data type to when converting to Tensor.
device (Optional[device]) – target device to put the converted Tensor data.
wrap_sequence (bool) – if False, then lists will recursively call this function. E.g., [1, 2] -> [tensor(1), tensor(2)]. If True, then [1, 2] -> tensor([1, 2]).
track_meta (bool) – whether to track the meta information, if True, will convert to MetaTensor. default to False.

Return type

Tensor

Returns

PyTorch version of the data

Example

import numpy as np
data = np.array([ [1+1j, 1-1j], [2+2j, 2-2j] ])
# the following line prints (2,2)
print(data.shape)
# the following line prints torch.Size([2, 2, 2])
print(convert_to_tensor_complex(data).shape)

ComplexAbs#

monai.apps.reconstruction.complex_utils.complex_abs(x)[source]#

Compute the absolute value of a complex array.

Parameters: x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.
Return type: Union[ndarray, Tensor]
Returns: Absolute value along the last dimension

Example

import numpy as np
x = np.array([3,4])[np.newaxis]
# the following line prints 5
print(complex_abs(x))

RootSumOfSquares#

monai.apps.reconstruction.mri_utils.root_sum_of_squares(x, spatial_dim)[source]#

Compute the root sum of squares (rss) of the data (typically done for multi-coil MRI samples)

Parameters

x (Union[ndarray, Tensor]) – Input array/tensor
spatial_dim (int) – dimension along which rss is applied

Return type

Union[ndarray, Tensor]

Returns

rss of x along spatial_dim

Example

import numpy as np
x = np.ones([2,3])
# the following line prints array([1.41421356, 1.41421356, 1.41421356])
print(rss(x,spatial_dim=0))

ComplexMul#

monai.apps.reconstruction.complex_utils.complex_mul(x, y)[source]#

Compute complex-valued multiplication. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)

Parameters

x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.
y (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.

Return type

Union[ndarray, Tensor]

Returns

Complex multiplication of x and y

Example

import numpy as np
x = np.array([[1,2],[3,4]])
y = np.array([[1,1],[1,1]])
# the following line prints array([[-1,  3], [-1,  7]])
print(complex_mul(x,y))

ComplexConj#

monai.apps.reconstruction.complex_utils.complex_conj(x)[source]#

Compute complex conjugate of an/a array/tensor. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)

Parameters: x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.
Return type: Union[ndarray, Tensor]
Returns: Complex conjugate of x

Example

import numpy as np
x = np.array([[1,2],[3,4]])
# the following line prints array([[ 1, -2], [ 3, -4]])
print(complex_conj(x))

auto3dseg#

class monai.apps.auto3dseg.AlgoEnsemble[source]#

The base class of Ensemble methods

ensemble_pred(preds, sigmoid=True)[source]#

ensemble the results using either “mean” or “vote” method

Parameters

preds – a list of probability prediction in Tensor-Like format.
sigmoid – use the sigmoid function to threshold probability one-hot map.

Returns

a tensor which is the ensembled prediction.

get_algo(identifier)[source]#

Get a model by identifier.

Parameters: identifier – the name of the bundleAlgo

get_algo_ensemble()[source]#

Get the algo ensemble after ranking or a empty list if ranking was not started.

Returns: A list of Algo

set_algos(infer_algos)[source]#: Register model in the ensemble

set_infer_files(dataroot, data_list_file_path, data_key='testing')[source]#

Set the files to perform model inference.

Parameters

dataroot (str) – the path of the files
data_src_cfg_file – the data source file path

class monai.apps.auto3dseg.AlgoEnsembleBestByFold(n_fold=5)[source]#

Ensemble method that select the best models that are the tops in each fold.

Parameters: n_fold (int) – number of cross-validation folds used in training

collect_algos()[source]#: Rank the algos by finding the best model in each cross-validation fold

class monai.apps.auto3dseg.AlgoEnsembleBestN(n_best=5)[source]#

Ensemble method that select N model out of all using the models’ best_metric scores

Parameters: n_best (int) – number of models to pick for ensemble (N).

collect_algos(n_best=-1)[source]#: Rank the algos by finding the top N (n_best) validation scores.

sort_score()[source]#: Sort the best_metrics

class monai.apps.auto3dseg.AlgoEnsembleBuilder(history, data_src_cfg_filename=None)[source]#

Build ensemble workflow from configs and arguments.

Parameters

history (Sequence[Dict]) – a collection of trained bundleAlgo algorithms.
data_src_cfg_filename (Optional[str]) – filename of the data source.

Examples

builder = AlgoEnsembleBuilder(history, data_src_cfg)
builder.set_ensemble_method(BundleAlgoEnsembleBestN(3))
ensemble = builder.get_ensemble()

result = ensemble.predict()

add_inferer(identifier, gen_algo, best_metric=None)[source]#

Add model inferer to the builder.

Parameters

identifier (str) – name of the bundleAlgo.
gen_algo (BundleAlgo) – a trained BundleAlgo model object.
best_metric (Optional[float]) – the best metric in validation of the trained model.

get_ensemble()[source]#: Get the ensemble

set_ensemble_method(ensemble, *args, **kwargs)[source]#

Set the ensemble method.

Parameters: ensemble (AlgoEnsemble) – the AlgoEnsemble to build.

class monai.apps.auto3dseg.AutoRunner(work_dir='./work_dir', input=None, analyze=True, algo_gen=True, train=True, hpo=False, hpo_backend='nni', ensemble=True, not_use_cache=False, **kwargs)[source]#

An interface for handling Auto3Dseg with minimal inputs and understanding of the internal states in Auto3Dseg. The users can run the Auto3Dseg with default settings in one line of code. They can also customize the advanced features Auto3Dseg in a few additional lines. Examples of customization include

change cross-validation folds

change training/prediction parameters

change ensemble methods

automatic hyperparameter optimization.

The output of the interface is a directory that contains

data statistics analysis report

algorithm definition files (scripts, configs, pickle objects) and training results (checkpoints, accuracies)

the predictions on the testing datasets from the final algorithm ensemble

a copy of the input arguments in form of YAML

cached intermediate results

Parameters

work_dir (str) – working directory to save the intermediate and final results.
input (Union[Dict[str, Any], str, None]) – the configuration dictionary or the file path to the configuration in form of YAML. The configuration should contain datalist, dataroot, modality, multigpu, and class_names info.
analyze (bool) – on/off switch to run DataAnalyzer and generate a datastats report. If it is set to False, The AutoRunner will attempt to skip datastats analysis and use cached results. If there is no such cache, AutoRunner will report an error and stop.
algo_gen (bool) – on/off switch to run AlgoGen and generate templated BundleAlgos. If it is set to False, The AutoRunner will attempt to skip the algorithm generation and stop if there is no cache to load.
train (bool) – on/off switch to run training and generate algorithm checkpoints. If it is set to False, The AutoRunner will attempt to skip the training for all algorithms. If there is zero trained algorithm but train is set to False, AutoRunner will stop.
hpo (bool) – use hyperparameter optimization (HPO) in the training phase. Users can provide a list of hyper-parameter and a search will be performed to investigate the algorithm performances.
hpo_backend (str) – a string that indicates the backend of the HPO. Currently, only NNI Grid-search mode is supported
ensemble (bool) – on/off switch to run model ensemble and use the ensemble to predict outputs in testing datasets.
not_use_cache (bool) – if the value is True, it will ignore all cached results in data analysis, algorithm generation, or training, and start the pipeline from scratch.
kwargs – image writing parameters for the ensemble inference. The kwargs format follows the SaveImage transform. For more information, check https://docs.monai.io/en/stable/transforms.html#saveimage.

Examples

User can use the one-liner to start the Auto3Dseg workflow

python -m monai.apps.auto3dseg AutoRunner run --input             '{"modality": "ct", "datalist": "dl.json", "dataroot": "/dr", "multigpu": true, "class_names": ["A", "B"]}'

User can also save the input dictionary as a input YAML file and use the following one-liner

python -m monai.apps.auto3dseg AutoRunner run --input=./input.yaml

User can specify work_dir and data source config input and run AutoRunner:

work_dir = "./work_dir"
input = "path_to_yaml_data_cfg"
runner = AutoRunner(work_dir=work_dir, input=input)
runner.run()

User can specify training parameters by:

input = "path_to_yaml_data_cfg"
runner = AutoRunner(input=input)
train_param = {
    "CUDA_VISIBLE_DEVICES": [0],
    "num_iterations": 8,
    "num_iterations_per_validation": 4,
    "num_images_per_batch": 2,
    "num_epochs": 2,
}
runner.set_training_params(params=train_param)  # 2 epochs
runner.run()

User can specify the fold number of cross validation

input = "path_to_yaml_data_cfg"
runner = AutoRunner(input=input)
runner.set_num_fold(n_fold = 2)
runner.run()

User can specify the prediction parameters during algo ensemble inference:

input = "path_to_yaml_data_cfg"
pred_params = {
    'files_slices': slice(0,2),
    'mode': "vote",
    'sigmoid': True,
}
runner = AutoRunner(input=input)
runner.set_prediction_params(params=pred_params)
runner.run()

User can define a grid search space and use the HPO during training.

input = "path_to_yaml_data_cfg"
pred_param = {
    "CUDA_VISIBLE_DEVICES": [0],
    "num_iterations": 8,
    "num_iterations_per_validation": 4,
    "num_images_per_batch": 2,
    "num_epochs": 2,
}
runner = AutoRunner(input=input, hpo=True)
runner.set_nni_search_space({"learning_rate": {"_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1]}})
runner.run()

Notes

Expected results in the work_dir as below:

work_dir/
├── algorithm_templates # bundle algo templates (scripts/configs)
├── cache.yaml          # Autorunner will automatically cache results to save time
├── datastats.yaml      # datastats of the dataset
├── dints_0             # network scripts/configs/checkpoints and pickle object of the algo
├── ensemble_output     # the prediction of testing datasets from the ensemble of the algos
├── input.yaml          # copy of the input data source configs
├── segresnet_0         # network scripts/configs/checkpoints and pickle object of the algo
├── segresnet2d_0       # network scripts/configs/checkpoints and pickle object of the algo
└── swinunetr_0         # network scripts/configs/checkpoints and pickle object of the algo

check_algo_gen(algo_gen)[source]#: Check if the AutoRunner can skip AlgoGen/BundleGen.

check_analyze(analyze)[source]#: Check if the AutoRunner can skip data analysis.

check_cache()[source]#

Check if the intermediate result is cached after each step in the current working directory

Returns: a dict of cache results. If not_use_cache is set to True, or there is no cache file in the working directory, the result will be empty_cache in which all has_cache keys are set to False.

check_train(train)[source]#: Check if the AutoRunner can skip training.

export_cache()[source]#: Save the cache state as cache.yaml in the working directory

run()[source]#: Run the AutoRunner pipeline

set_ensemble_method(ensemble_method_name='AlgoEnsembleBestN', **kwargs)[source]#

Set the bundle ensemble method

Parameters

ensemble_method_name (str) – the name of the ensemble method. Only two methods are supported “AlgoEnsembleBestN” and “AlgoEnsembleBestByFold”.
kwargs – the keyword arguments used to define the ensemble method. Currently only n_best for AlgoEnsembleBestN is supported.

set_hpo_params(params=None)[source]#

Set parameters for the HPO module and the algos before the training. It will attempt to (1) override bundle templates with the key-value pairs in params (2) change the config of the HPO module (e.g. NNI) if the key is found to be one of:

“trialCodeDirectory”

“trialGpuNumber”

“trialConcurrency”

“maxTrialNumber”

“maxExperimentDuration”

“tuner”

“trainingService”

Parameters: params (Optional[Dict[str, Any]]) – a dict that defines the overriding key-value pairs during instantiation of the algo. For BundleAlgo, it will override the template config filling.

set_image_save_transform(kwargs)[source]#

Set the ensemble output transform.

Parameters: kwargs – image writing parameters for the ensemble inference. The kwargs format follows SaveImage transform. For more information, check https://docs.monai.io/en/stable/transforms.html#saveimage .

set_nni_search_space(search_space)[source]#

Set the search space for NNI parameter search.

Parameters: search_space – hyper parameter search space in the form of dict. For more information, please check NNI documentation: https://nni.readthedocs.io/en/v2.2/Tutorial/SearchSpaceSpec.html .

set_num_fold(num_fold=5)[source]#

Set the number of cross validation folds for all algos.

Parameters: num_fold (int) – a positive integer to define the number of folds.

set_prediction_params(params=None)[source]#

Set the prediction params for all algos.

Parameters: params (Optional[Dict[str, Any]]) – a dict that defines the overriding key-value pairs during prediction. The overriding method is defined by the algo class.

Examples

For BundleAlgo objects, this set of param will specify the algo ensemble to only inference the first: two files in the testing datalist {“file_slices”: slice(0, 2)}

set_training_params(params=None)[source]#

Set the training params for all algos.

Parameters: params (Optional[Dict[str, Any]]) – a dict that defines the overriding key-value pairs during training. The overriding method is defined by the algo class.

Examples

For BundleAlgo objects, the training parameter to shorten the training time to a few epochs can be: {“num_iterations”: 8, “num_iterations_per_validation”: 4}

class monai.apps.auto3dseg.BundleAlgo(template_path)[source]#

An algorithm represented by a set of bundle configurations and scripts.

BundleAlgo.cfg is a monai.bundle.ConfigParser instance.

from monai.apps.auto3dseg import BundleAlgo

data_stats_yaml = "/workspace/data_stats.yaml"
algo = BundleAlgo(template_path=../algorithms/templates/segresnet2d/configs)
algo.set_data_stats(data_stats_yaml)
# algo.set_data_src("../data_src.json")
algo.export_to_disk(".", algo_name="segresnet2d_1")

This class creates MONAI bundles from a directory of ‘bundle template’. Different from the regular MONAI bundle format, the bundle template may contain placeholders that must be filled using fill_template_config during export_to_disk. Then created bundle keeps the same file structure as the template.

__init__(template_path)[source]#

Create an Algo instance based on the predefined Algo template.

Parameters: template_path (str) – path to the root of the algo template.

export_to_disk(output_path, algo_name, **kwargs)[source]#

Fill the configuration templates, write the bundle (configs + scripts) to folder output_path/algo_name.

Parameters

output_path (str) – Path to export the ‘scripts’ and ‘configs’ directories.
algo_name (str) – the identifier of the algorithm (usually contains the name and extra info like fold ID).
kwargs – other parameters, including: “copy_dirs=True/False” means whether to copy the template as output instead of inplace operation, “fill_template=True/False” means whether to fill the placeholders in the template. other parameters are for fill_template_config function.

fill_template_config(data_stats_filename, algo_path, **kwargs)[source]#

The configuration files defined when constructing this Algo instance might not have a complete training and validation pipelines. Some configuration components and hyperparameters of the pipelines depend on the training data and other factors. This API is provided to allow the creation of fully functioning config files. Return the records of filling template config: {“<config name>”: {“<placeholder key>”: value, …}, …}.

Parameters: data_stats_filename (str) – filename of the data stats report (generated by DataAnalyzer)

Notes

Template filling is optional. The user can construct a set of pre-filled configs without replacing values by using the data analysis results. It is also intended to be re-implemented in subclasses of BundleAlgo if the user wants their own way of auto-configured template filling.

Return type: dict

get_inferer(*args, **kwargs)[source]#

Load the InferClass from the infer.py. The InferClass should be defined in the template under the path of “scripts/infer.py”. It is required to define the “InferClass” (name is fixed) with two functions at least (__init__ and infer). The init class has an override kwargs that can be used to override parameters in the run-time optionally.

Examples:

class InferClass
    def __init__(self, config_file: Optional[Union[str, Sequence[str]]] = None, **override):
        # read configs from config_file (sequence)
        # set up transforms
        # set up model
        # set up other hyper parameters
        return

    @torch.no_grad()
    def infer(self, image_file):
        # infer the model and save the results to output
        return output

get_output_path()[source]#: Returns the algo output paths to find the algo scripts and configs.

get_score(*args, **kwargs)[source]#: Returns validation scores of the model trained by the current Algo.

predict(predict_params=None)[source]#

Use the trained model to predict the outputs with a given input image. Path to input image is in the params dict in a form of {“files”, [“path_to_image_1”, “path_to_image_2”]}. If it is not specified, then the prediction will use the test images predefined in the bundle config.

Parameters: predict_params – a dict to override the parameters in the bundle config (including the files to predict).

set_data_source(data_src_cfg)[source]#

Set the data source configuration file

Parameters: data_src_cfg (str) – path to a configuration file (yaml) that contains datalist, dataroot, and other params. The config will be in a form of {“modality”: “ct”, “datalist”: “path_to_json_datalist”, “dataroot”: “path_dir_data”}

set_data_stats(data_stats_files)[source]#

Set the data analysis report (generated by DataAnalyzer).

Parameters: data_stats_files (str) – path to the datastats yaml file

train(train_params=None)[source]#

Load the run function in the training script of each model. Training parameter is predefined by the algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.

Parameters: train_params – to specify the devices using a list of integers: {"CUDA_VISIBLE_DEVICES": [1,2,3]}.

class monai.apps.auto3dseg.BundleGen(algo_path='.', algos=None, data_stats_filename=None, data_src_cfg_name=None)[source]#

This class generates a set of bundles according to the cross-validation folds, each of them can run independently.

Parameters

algo_path (str) – the directory path to save the algorithm templates. Default is the current working dir.
algos – if dictionary, it outlines the algorithm to use. if None, automatically download the zip file from the default link. if string, it represents the download link. The current default options are released at: https://github.com/Project-MONAI/research-contributions/tree/main/auto3dseg
data_stats_filename – the path to the data stats file (generated by DataAnalyzer)
data_src_cfg_name – the path to the data source config YAML file. The config will be in a form of {“modality”: “ct”, “datalist”: “path_to_json_datalist”, “dataroot”: “path_dir_data”}

python -m monai.apps.auto3dseg BundleGen generate --data_stats_filename="../algorithms/data_stats.yaml"

generate(output_folder='.', num_fold=5)[source]#

Generate the bundle scripts/configs for each bundleAlgo

Parameters

output_folder – the output folder to save each algorithm.
num_fold (int) – the number of cross validation fold

get_data_src()[source]#: Get the data source filename

get_data_stats()[source]#: Get the filename of the data stats

get_history()[source]#

get the history of the bundleAlgo object with their names/identifiers

Return type: List

set_data_src(data_src_cfg_filename)[source]#

Set the data source filename

Parameters: data_src_cfg_filename – filename of data_source file

set_data_stats(data_stats_filename)[source]#

Set the data stats filename

Parameters: data_stats_filename (str) – filename of datastats

class monai.apps.auto3dseg.DataAnalyzer(datalist, dataroot='', output_path='./data_stats.yaml', average=True, do_ccp=True, device='cuda', worker=0, image_key='image', label_key='label')[source]#

The DataAnalyzer automatically analyzes given medical image dataset and reports the statistics. The module expects file paths to the image data and utilizes the LoadImaged transform to read the files, which supports nii, nii.gz, png, jpg, bmp, npz, npy, and dcm formats. Currently, only segmentation task is supported, so the user needs to provide paths to the image and label files (if have). Also, label data format is preferred to be (1,H,W,D), with the label index in the first dimension. If it is in onehot format, it will be converted to the preferred format.

Parameters

datalist (Union[str, Dict]) – a Python dictionary storing group, fold, and other information of the medical image dataset, or a string to the JSON file storing the dictionary.
dataroot (str) – user’s local directory containing the datasets.
output_path (str) – path to save the analysis result.
average (bool) – whether to average the statistical value across different image modalities.
do_ccp (bool) – apply the connected component algorithm to process the labels/images
device (Union[str, device]) – a string specifying hardware (CUDA/CPU) utilized for the operations.
worker (int) – number of workers to use for parallel processing. If device is cuda/GPU, worker has to be 0.
image_key (str) – a string that user specify for the image. The DataAnalyzer will look it up in the datalist to locate the image files of the dataset.
label_key (Optional[str]) – a string that user specify for the label. The DataAnalyzer will look it up in the datalist to locate the label files of the dataset. If label_key is None, the DataAnalyzer will skip looking for labels and all label-related operations.

Raises

ValueError if device is GPU and worker > 0. –

Examples

from monai.apps.auto3dseg.data_analyzer import DataAnalyzer

datalist = {
    "testing": [{"image": "image_003.nii.gz"}],
    "training": [
        {"fold": 0, "image": "image_001.nii.gz", "label": "label_001.nii.gz"},
        {"fold": 0, "image": "image_002.nii.gz", "label": "label_002.nii.gz"},
        {"fold": 1, "image": "image_001.nii.gz", "label": "label_001.nii.gz"},
        {"fold": 1, "image": "image_004.nii.gz", "label": "label_004.nii.gz"},
    ],
}

dataroot = '/datasets' # the directory where you have the image files (nii.gz)
DataAnalyzer(datalist, dataroot)

Notes

The module can also be called from the command line interface (CLI).

For example:

python -m monai.apps.auto3dseg \
    DataAnalyzer \
    get_all_case_stats \
    --datalist="my_datalist.json" \
    --dataroot="my_dataroot_dir"

get_all_case_stats()[source]#

Get all case stats. Caller of the DataAnalyser class. The function iterates datalist and call get_case_stats to generate stats. Then get_case_summary is called to combine results.

Returns

A data statistics dictionary containing: ”stats_summary” (summary statistics of the entire datasets). Within stats_summary there are “image_stats” (summarizing info of shape, channel, spacing, and etc using operations_summary), “image_foreground_stats” (info of the intensity for the non-zero labeled voxels), and “label_stats” (info of the labels, pixel percentage, image_intensity, and each individual label in a list) “stats_by_cases” (List type value. Each element of the list is statistics of an image-label info. Within each element, there are: “image” (value is the path to an image), “label” (value is the path to the corresponding label), “image_stats” (summarizing info of shape, channel, spacing, and etc using operations), “image_foreground_stats” (similar to the previous one but one foreground image), and “label_stats” (stats of the individual labels )

Notes

Since the backend of the statistics computation are torch/numpy, nan/inf value may be generated and carried over in the computation. In such cases, the output dictionary will include .nan/.inf in the statistics.

class monai.apps.auto3dseg.NNIGen(algo=None, params=None)[source]#

Generate algorithms for the NNI to automate hyperparameter tuning. The module has two major interfaces: __init__ which prints out how to set up the NNI, and a trialCommand function run_algo for the NNI library to start the trial of the algo. More about trialCommand function can be found in trail code section in NNI webpage https://nni.readthedocs.io/en/latest/tutorials/hpo_quickstart_pytorch/main.html .

Parameters

algo – an Algo object (e.g. BundleAlgo) with defined methods: get_output_path and train and supports saving to and loading from pickle files via algo_from_pickle and algo_to_pickle.
params – a set of parameter to override the algo if override is supported by Algo subclass.

Examples:

The experiment will keep generating new folders to save the model checkpoints, scripts, and configs if available.
├── algorithm_templates
│   └── unet
├── unet_0
│   ├── algo_object.pkl
│   ├── configs
│   └── scripts
├── unet_0_learning_rate_0.01
│   ├── algo_object.pkl
│   ├── configs
│   ├── model_fold0
│   └── scripts
└── unet_0_learning_rate_0.1
    ├── algo_object.pkl
    ├── configs
    ├── model_fold0
    └── scripts

Notes

The NNIGen will prepare the algorithms in a folder and suggest a command to replace trialCommand in the experiment config. However, NNIGen will not trigger NNI. User needs to write their NNI experiment configs, and then run the NNI command manually.

generate(output_folder='.')[source]#

Generate the record for each Algo. If it is a BundleAlgo, it will generate the config files.

Parameters: output_folder (str) – the directory nni will save the results to.
Return type: None

get_hyperparameters()[source]#: Get parameter for next round of training from NNI server.

get_obj_filename()[source]#: Return the filename of the dumped pickle algo object.

get_task_id()[source]#: Get the identifier of the current experiment. In the format of listing the searching parameter name and values connected by underscore in the file name.

print_bundle_algo_instruction()[source]#: Print how to write the trial commands for Bundle Algo.

run_algo(obj_filename, output_folder='.', template_path=None)[source]#

The python interface for NNI to run.

Parameters

obj_filename (str) – the pickle-exported Algo object.
output_folder (str) – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path: {algorithm_templates_dir}/{network}/scripts/algo.py

Return type

None

set_score(acc)[source]#: Report the acc to NNI server.

update_params(params)[source]#

Translate the parameter from monai bundle to meet NNI requirements.

Parameters: params (dict) – a dict of parameters.

class monai.apps.auto3dseg.OptunaGen(algo=None, params=None)[source]#

Generate algorithms for the Optuna to automate hyperparameter tuning. Please refer to NNI and Optuna (https://optuna.readthedocs.io/en/stable/) for more information. Optuna has different running scheme compared to NNI. The hyperparameter samples come from a trial object (trial.suggest…) created by Optuna, so OptunaGen needs to accept this trial object as input. Meanwhile, Optuna calls OptunaGen, thus OptunaGen.__call__() should return the accuracy. Use functools.partial to wrap OptunaGen for addition input arguments.

Parameters

algo – an Algo object (e.g. BundleAlgo). The object must at least define two methods: get_output_path and train and supports saving to and loading from pickle files via algo_from_pickle and algo_to_pickle.
params – a set of parameter to override the algo if override is supported by Algo subclass.

Examples:

The experiment will keep generating new folders to save the model checkpoints, scripts, and configs if available.
├── algorithm_templates
│   └── unet
├── unet_0
│   ├── algo_object.pkl
│   ├── configs
│   └── scripts
├── unet_0_learning_rate_0.01
│   ├── algo_object.pkl
│   ├── configs
│   ├── model_fold0
│   └── scripts
└── unet_0_learning_rate_0.1
    ├── algo_object.pkl
    ├── configs
    ├── model_fold0
    └── scripts

Notes

Different from NNI and NNIGen, OptunaGen and Optuna can be ran within the Python process.

generate(output_folder='.')[source]#

Generate the record for each Algo. If it is a BundleAlgo, it will generate the config files.

Parameters: output_folder (str) – the directory nni will save the results to.
Return type: None

get_hyperparameters()[source]#: Get parameter for next round of training from optuna trial object. This function requires user rewrite during usage for different search space.

get_obj_filename()[source]#: Return the dumped pickle object of algo.

get_task_id()[source]#: Get the identifier of the current experiment. In the format of listing the searching parameter name and values connected by underscore in the file name.

run_algo(obj_filename, output_folder='.', template_path=None)[source]#

The python interface for NNI to run.

Parameters

obj_filename (str) – the pickle-exported Algo object.
output_folder (str) – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path: {algorithm_templates_dir}/{network}/scripts/algo.py

Return type

None

set_score(acc)[source]#: Set the accuracy score

set_trial(trial)[source]#: Set the Optuna trial

update_params(params)[source]#

Translate the parameter from monai bundle.

Parameters: params (dict) – a dict of parameters.

monai.apps.auto3dseg.export_bundle_algo_history(history)[source]#

Save all the BundleAlgo in the history to algo_object.pkl in each individual folder

Parameters: history (List[Dict[str, BundleAlgo]]) – a List of Bundle. Typically, the history can be obtained from BundleGen get_history method

monai.apps.auto3dseg.import_bundle_algo_history(output_folder='.', template_path=None, only_trained=True)[source]#

import the history of the bundleAlgo object with their names/identifiers

Parameters

output_folder (str) – the root path of the algorithms templates.
template_path (Optional[str]) – the algorithm_template. It must contain algo.py in the follow path: {algorithm_templates_dir}/{network}/scripts/algo.py.
only_trained (bool) – only read the algo history if the algo is trained.

Return type

List