Applications#

Datasets#

class monai.apps.MedNISTDataset(root_dir, section, transform=(), download=False, seed=0, val_frac=0.1, test_frac=0.1, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#

The Dataset to automatically download MedNIST data and generate items for training, validation or test. It’s based on CacheDataset to accelerate the training process.

Parameters
  • root_dir (Union[str, PathLike]) – target directory to download and load MedNIST dataset.

  • section (str) – expected data section, can be: training, validation or test.

  • transform (Union[Sequence[Callable], Callable]) – transforms to execute operations on input data.

  • download (bool) – whether to download and extract the MedNIST from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy MedNIST.tar.gz file or MedNIST folder to root directory.

  • seed (int) – random seed to randomly split training, validation and test datasets, default is 0.

  • val_frac (float) – percentage of of validation fraction in the whole dataset, default is 0.1.

  • test_frac (float) – percentage of of test fraction in the whole dataset, default is 0.1.

  • cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • num_workers (Optional[int]) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is speficied, 1 will be used instead.

  • progress (bool) – whether to display a progress bar when downloading dataset and computing the transform cache content.

  • copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.

  • as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.

Raises
  • ValueError – When root_dir is not a directory.

  • RuntimeError – When dataset_dir doesn’t exist and downloading is not selected (download=False).

get_num_classes()[source]#

Get number of classes.

Return type

int

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

None

class monai.apps.DecathlonDataset(root_dir, task, section, transform=(), download=False, seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#

The Dataset to automatically download the data of Medical Segmentation Decathlon challenge (http://medicaldecathlon.com/) and generate items for training, validation or test. It will also load these properties from the JSON config file of dataset. user can call get_properties() to get specified properties or all the properties loaded. It’s based on monai.data.CacheDataset to accelerate the training process.

Parameters
  • root_dir (Union[str, PathLike]) – user’s local directory for caching and loading the MSD datasets.

  • task (str) – which task to download and execute: one of list (“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”).

  • section (str) – expected data section, can be: training, validation or test.

  • transform (Union[Sequence[Callable], Callable]) – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D].

  • download (bool) – whether to download and extract the Decathlon from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.

  • val_frac (float) – percentage of of validation fraction in the whole dataset, default is 0.2.

  • seed (int) – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.

  • cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • num_workers (int) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is speficied, 1 will be used instead.

  • progress (bool) – whether to display a progress bar when downloading dataset and computing the transform cache content.

  • copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.

  • as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.

Raises
  • ValueError – When root_dir is not a directory.

  • ValueError – When task is not one of [“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”].

  • RuntimeError – When dataset_dir doesn’t exist and downloading is not selected (download=False).

Example:

transform = Compose(
    [
        LoadImaged(keys=["image", "label"]),
        EnsureChannelFirstd(keys=["image", "label"]),
        ScaleIntensityd(keys="image"),
        ToTensord(keys=["image", "label"]),
    ]
)

val_data = DecathlonDataset(
    root_dir="./", task="Task09_Spleen", transform=transform, section="validation", seed=12345, download=True
)

print(val_data[0]["image"], val_data[0]["label"])
get_indices()[source]#

Get the indices of datalist used in this dataset.

Return type

ndarray

get_properties(keys=None)[source]#

Get the loaded properties of dataset with specified keys. If no keys specified, return all the loaded properties.

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

None

class monai.apps.TciaDataset(root_dir, collection, section, transform=(), download=False, download_len=-1, seg_type='SEG', modality_tag=(8, 96), ref_series_uid_tag=(32, 14), ref_sop_uid_tag=(8, 4437), specific_tags=((8, 4373), (8, 4416), (12294, 16), (32, 13), (16, 16), (16, 32), (32, 17), (32, 18)), seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=0.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#

The Dataset to automatically download the data from a public The Cancer Imaging Archive (TCIA) dataset and generate items for training, validation or test.

The Highdicom library is used to load dicom data with modality “SEG”, but only a part of collections are supoorted, such as: “C4KC-KiTS”, “NSCLC-Radiomics”, “NSCLC-Radiomics-Interobserver1”, ” QIN-PROSTATE-Repeatability” and “PROSTATEx”. Therefore, if “seg” is included in keys of the LoadImaged transform and loading some other collections, errors may be raised. For supported collections, the original “SEG” information may not always be consistent for each dicom file. Therefore, to avoid creating different format of labels, please use the label_dict argument of PydicomReader when calling the LoadImaged transform. The prepared label dicts of collections that are mentioned above is also saved in: monai.apps.tcia.TCIA_LABEL_DICT. You can also refer to the second example bellow.

This class is based on monai.data.CacheDataset to accelerate the training process.

Parameters
  • root_dir (Union[str, PathLike]) – user’s local directory for caching and loading the TCIA dataset.

  • collection (str) – name of a TCIA collection. a TCIA dataset is defined as a collection. Please check the following list to browse the collection list (only public collections can be downloaded): https://www.cancerimagingarchive.net/collections/

  • section (str) – expected data section, can be: training, validation or test.

  • transform (Union[Sequence[Callable], Callable]) – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D]. If not specified, LoadImaged(reader=”PydicomReader”, keys=[“image”]) will be used as the default transform. In addition, we suggest to set the argument labels for PydicomReader if segmentations are needed to be loaded. The original labels for each dicom series may be different, using this argument is able to unify the format of labels.

  • download (bool) – whether to download and extract the dataset, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.

  • download_len (int) – number of series that will be downloaded, the value should be larger than 0 or -1, where -1 means all series will be downloaded. Default is -1.

  • seg_type (str) – modality type of segmentation that is used to do the first step download. Default is “SEG”.

  • modality_tag (Tuple) – tag of modality. Default is (0x0008, 0x0060).

  • ref_series_uid_tag (Tuple) – tag of referenced Series Instance UID. Default is (0x0020, 0x000e).

  • ref_sop_uid_tag (Tuple) – tag of referenced SOP Instance UID. Default is (0x0008, 0x1155).

  • specific_tags (Tuple) – tags that will be loaded for “SEG” series. This argument will be used in monai.data.PydicomReader. Default is [(0x0008, 0x1115), (0x0008,0x1140), (0x3006, 0x0010), (0x0020,0x000D), (0x0010,0x0010), (0x0010,0x0020), (0x0020,0x0011), (0x0020,0x0012)].

  • val_frac (float) – percentage of of validation fraction in the whole dataset, default is 0.2.

  • seed (int) – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.

  • cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • cache_rate (float) – percentage of cached data in total, default is 0.0 (no cache). will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • num_workers (int) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is speficied, 1 will be used instead.

  • progress (bool) – whether to display a progress bar when downloading dataset and computing the transform cache content.

  • copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.

  • as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.

Example:

# collection is "Pancreatic-CT-CBCT-SEG", seg_type is "RTSTRUCT"
data = TciaDataset(
    root_dir="./", collection="Pancreatic-CT-CBCT-SEG", seg_type="RTSTRUCT", download=True
)

# collection is "C4KC-KiTS", seg_type is "SEG", and load both images and segmentations
from monai.apps.tcia import TCIA_LABEL_DICT
transform = Compose(
    [
        LoadImaged(reader="PydicomReader", keys=["image", "seg"], label_dict=TCIA_LABEL_DICT["C4KC-KiTS"]),
        EnsureChannelFirstd(keys=["image", "seg"]),
        ResampleToMatchd(keys="image", key_dst="seg"),
    ]
)
data = TciaDataset(
    root_dir="./", collection="C4KC-KiTS", section="validation", seed=12345, download=True
)

print(data[0]["seg"].shape)
get_indices()[source]#

Get the indices of datalist used in this dataset.

Return type

ndarray

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

None

class monai.apps.CrossValidation(dataset_cls, nfolds=5, seed=0, **dataset_params)[source]#

Cross validation dataset based on the general dataset which must have _split_datalist API.

Parameters
  • dataset_cls – dataset class to be used to create the cross validation partitions. It must have _split_datalist API.

  • nfolds (int) – number of folds to split the data for cross validation.

  • seed (int) – random seed to randomly shuffle the datalist before splitting into N folds, default is 0.

  • dataset_params – other additional parameters for the dataset_cls base class.

Example of 5 folds cross validation training:

cvdataset = CrossValidation(
    dataset_cls=DecathlonDataset,
    nfolds=5,
    seed=12345,
    root_dir="./",
    task="Task09_Spleen",
    section="training",
    transform=train_transform,
    download=True,
)
dataset_fold0_train = cvdataset.get_dataset(folds=[1, 2, 3, 4])
dataset_fold0_val = cvdataset.get_dataset(folds=0, transform=val_transform, download=False)
# execute training for fold 0 ...

dataset_fold1_train = cvdataset.get_dataset(folds=[0, 2, 3, 4])
dataset_fold1_val = cvdataset.get_dataset(folds=1, transform=val_transform, download=False)
# execute training for fold 1 ...

...

dataset_fold4_train = ...
# execute training for fold 4 ...
get_dataset(folds, **dataset_params)[source]#

Generate dataset based on the specified fold indice in the cross validation group.

Parameters
  • folds (Union[Sequence[int], int]) – index of folds for training or validation, if a list of values, concatenate the data.

  • dataset_params – other additional parameters for the dataset_cls base class, will override the same parameters in self.dataset_params.

Clara MMARs#

monai.apps.download_mmar(item, mmar_dir=None, progress=True, api=True, version=-1)[source]#

Download and extract Medical Model Archive (MMAR) from Nvidia Clara Train.

Parameters
  • item – the corresponding model item from MODEL_DESC. Or when api is True, the substring to query NGC’s model name field.

  • mmar_dir (Union[str, PathLike, None]) – target directory to store the MMAR, default is mmars subfolder under torch.hub get_dir().

  • progress (bool) – whether to display a progress bar.

  • api (bool) – whether to query NGC and download via api

  • version (int) – which version of MMAR to download. -1 means the latest from ngc.

Examples::
>>> from monai.apps import download_mmar
>>> download_mmar("clara_pt_prostate_mri_segmentation_1", mmar_dir=".")
>>> download_mmar("prostate_mri_segmentation", mmar_dir=".", api=True)
Returns

The local directory of the downloaded model. If api is True, a list of local directories of downloaded models.

monai.apps.load_from_mmar(item, mmar_dir=None, progress=True, version=-1, map_location=None, pretrained=True, weights_only=False, model_key='model', api=True, model_file=None)[source]#

Download and extract Medical Model Archive (MMAR) model weights from Nvidia Clara Train.

Parameters
  • item – the corresponding model item from MODEL_DESC.

  • mmar_dir (Union[str, PathLike, None]) – : target directory to store the MMAR, default is mmars subfolder under torch.hub get_dir().

  • progress (bool) – whether to display a progress bar when downloading the content.

  • version (int) – version number of the MMAR. Set it to -1 to use item[Keys.VERSION].

  • map_location – pytorch API parameter for torch.load or torch.jit.load.

  • pretrained – whether to load the pretrained weights after initializing a network module.

  • weights_only – whether to load only the weights instead of initializing the network module and assign weights.

  • model_key (str) – a key to search in the model file or config file for the model dictionary. Currently this function assumes that the model dictionary has {“[name|path]”: “test.module”, “args”: {‘kw’: ‘test’}}.

  • api (bool) – whether to query NGC API to get model infomation.

  • model_file – the relative path to the model file within an MMAR.

Examples::
>>> from monai.apps import load_from_mmar
>>> unet_model = load_from_mmar("clara_pt_prostate_mri_segmentation_1", mmar_dir=".", map_location="cpu")
>>> print(unet_model)
monai.apps.MODEL_DESC#

Built-in immutable sequence.

If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.

If the argument is a tuple, the return value is the same object.

Utilities#

monai.apps.check_hash(filepath, val=None, hash_type='md5')[source]#

Verify hash signature of specified file.

Parameters
  • filepath (Union[str, PathLike]) – path of source file to verify hash value.

  • val (Optional[str]) – expected hash value of the file.

  • hash_type (str) – type of hash algorithm to use, default is “md5”. The supported hash types are “md5”, “sha1”, “sha256”, “sha512”. See also: monai.apps.utils.SUPPORTED_HASH_TYPES.

Return type

bool

monai.apps.download_url(url, filepath='', hash_val=None, hash_type='md5', progress=True, **gdown_kwargs)[source]#

Download file from specified URL link, support process bar and hash check.

Parameters
  • url (str) – source URL link to download file.

  • filepath (Union[str, PathLike]) – target filepath to save the downloaded file (including the filename). If undefined, os.path.basename(url) will be used.

  • hash_val (Optional[str]) – expected hash value to validate the downloaded file. if None, skip hash validation.

  • hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.

  • progress (bool) – whether to display a progress bar.

  • gdown_kwargs – other args for gdown except for the url, output and quiet. these args will only be used if download from google drive. details of the args of it: https://github.com/wkentaro/gdown/blob/main/gdown/download.py

Raises
  • RuntimeError – When the hash validation of the filepath existing file fails.

  • RuntimeError – When a network issue or denied permission prevents the file download from url to filepath.

  • URLError – See urllib.request.urlretrieve.

  • HTTPError – See urllib.request.urlretrieve.

  • ContentTooShortError – See urllib.request.urlretrieve.

  • IOError – See urllib.request.urlretrieve.

  • RuntimeError – When the hash validation of the url downloaded file fails.

Return type

None

monai.apps.extractall(filepath, output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True)[source]#

Extract file to the output directory. Expected file types are: zip, tar.gz and tar.

Parameters
  • filepath (Union[str, PathLike]) – the file path of compressed file.

  • output_dir (Union[str, PathLike]) – target directory to save extracted files.

  • hash_val (Optional[str]) – expected hash value to validate the compressed file. if None, skip hash validation.

  • hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.

  • file_type (str) – string of file type for decompressing. Leave it empty to infer the type from the filepath basename.

  • has_base (bool) – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.

Raises
  • RuntimeError – When the hash validation of the filepath compressed file fails.

  • NotImplementedError – When the filepath file extension is not one of [zip”, “tar.gz”, “tar”].

Return type

None

monai.apps.download_and_extract(url, filepath='', output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True, progress=True)[source]#

Download file from URL and extract it to the output directory.

Parameters
  • url (str) – source URL link to download file.

  • filepath (Union[str, PathLike]) – the file path of the downloaded compressed file. use this option to keep the directly downloaded compressed file, to avoid further repeated downloads.

  • output_dir (Union[str, PathLike]) – target directory to save extracted files. default is the current directory.

  • hash_val (Optional[str]) – expected hash value to validate the downloaded file. if None, skip hash validation.

  • hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.

  • file_type (str) – string of file type for decompressing. Leave it empty to infer the type from url’s base file name.

  • has_base (bool) – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.

  • progress (bool) – whether to display progress bar.

Return type

None

Deepgrow#

monai.apps.deepgrow.dataset.create_dataset(datalist, output_dir, dimension, pixdim, image_key='image', label_key='label', base_dir=None, limit=0, relative_path=False, transforms=None)[source]#

Utility to pre-process and create dataset list for Deepgrow training over on existing one. The input data list is normally a list of images and labels (3D volume) that needs pre-processing for Deepgrow training pipeline.

Parameters
  • datalist

    A list of data dictionary. Each entry should at least contain ‘image_key’: <image filename>. For example, typical input data can be a list of dictionaries:

    [{'image': <image filename>, 'label': <label filename>}]
    

  • output_dir (str) – target directory to store the training data for Deepgrow Training

  • pixdim – output voxel spacing.

  • dimension (int) – dimension for Deepgrow training. It can be 2 or 3.

  • image_key (str) – image key in input datalist. Defaults to ‘image’.

  • label_key (str) – label key in input datalist. Defaults to ‘label’.

  • base_dir – base directory in case related path is used for the keys in datalist. Defaults to None.

  • limit (int) – limit number of inputs for pre-processing. Defaults to 0 (no limit).

  • relative_path (bool) – output keys values should be based on relative path. Defaults to False.

  • transforms – explicit transforms to execute operations on input data.

Raises
  • ValueError – When dimension is not one of [2, 3]

  • ValueError – When datalist is Empty

Return type

List[Dict]

Returns

A new datalist that contains path to the images/labels after pre-processing.

Example:

datalist = create_dataset(
    datalist=[{'image': 'img1.nii', 'label': 'label1.nii'}],
    base_dir=None,
    output_dir=output_2d,
    dimension=2,
    image_key='image',
    label_key='label',
    pixdim=(1.0, 1.0),
    limit=0,
    relative_path=True
)

print(datalist[0]["image"], datalist[0]["label"])
class monai.apps.deepgrow.interaction.Interaction(transforms, max_interactions, train, key_probability='probability')[source]#

Ignite process_function used to introduce interactions (simulation of clicks) for Deepgrow Training/Evaluation. For more details please refer to: https://pytorch.org/ignite/generated/ignite.engine.engine.Engine.html. This implementation is based on:

Sakinis et al., Interactive segmentation of medical images through fully convolutional neural networks. (2019) https://arxiv.org/abs/1903.08205

Parameters
  • transforms (Union[Sequence[Callable], Callable]) – execute additional transformation during every iteration (before train). Typically, several Tensor based transforms composed by Compose.

  • max_interactions (int) – maximum number of interactions per iteration

  • train (bool) – training or evaluation

  • key_probability (str) – field name to fill probability for every interaction

class monai.apps.deepgrow.transforms.AddInitialSeedPointd(label='label', guidance='guidance', sids='sids', sid='sid', connected_regions=5)[source]#

Add random guidance as initial seed point for a given label.

Note that the label is of size (C, D, H, W) or (C, H, W)

The guidance is of size (2, N, # of dims) where N is number of guidance added. # of dims = 4 when C, D, H, W; # of dims = 3 when (C, H, W)

Parameters
  • label (str) – label source.

  • guidance (str) – key to store guidance.

  • sids (str) – key that represents list of valid slice indices for the given label.

  • sid (str) – key that represents the slice to add initial seed point. If not present, random sid will be chosen.

  • connected_regions (int) – maximum connected regions to use for adding initial points.

randomize(data)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

class monai.apps.deepgrow.transforms.AddGuidanceSignald(image='image', guidance='guidance', sigma=2, number_intensity_ch=1)[source]#

Add Guidance signal for input image.

Based on the “guidance” points, apply gaussian to them and add them as new channel for input image.

Parameters
  • image (str) – key to the image source.

  • guidance (str) – key to store guidance.

  • sigma (int) – standard deviation for Gaussian kernel.

  • number_intensity_ch (int) – channel index.

class monai.apps.deepgrow.transforms.AddRandomGuidanced(guidance='guidance', discrepancy='discrepancy', probability='probability')[source]#

Add random guidance based on discrepancies that were found between label and prediction. input shape is as below: Guidance is of shape (2, N, # of dim) Discrepancy is of shape (2, C, D, H, W) or (2, C, H, W) Probability is of shape (1)

Parameters
  • guidance (str) – key to guidance source.

  • discrepancy (str) – key that represents discrepancies found between label and prediction.

  • probability (str) – key that represents click/interaction probability.

randomize(data=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

class monai.apps.deepgrow.transforms.AddGuidanceFromPointsd(ref_image, guidance='guidance', foreground='foreground', background='background', axis=0, depth_first=True, spatial_dims=2, slice_key='slice', meta_keys=None, meta_key_postfix='meta_dict', dimensions=None)[source]#

Add guidance based on user clicks.

We assume the input is loaded by LoadImaged and has the shape of (H, W, D) originally. Clicks always specify the coordinates in (H, W, D)

If depth_first is True:

Input is now of shape (D, H, W), will return guidance that specifies the coordinates in (D, H, W)

else:

Input is now of shape (H, W, D), will return guidance that specifies the coordinates in (H, W, D)

Parameters
  • ref_image – key to reference image to fetch current and original image details.

  • guidance (str) – output key to store guidance.

  • foreground (str) – key that represents user foreground (+ve) clicks.

  • background (str) – key that represents user background (-ve) clicks.

  • axis (int) – axis that represents slices in 3D volume. (axis to Depth)

  • depth_first (bool) – if depth (slices) is positioned at first dimension.

  • spatial_dims (int) – dimensions based on model used for deepgrow (2D vs 3D).

  • slice_key (str) – key that represents applicable slice to add guidance.

  • meta_keys (Optional[str]) – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.

  • meta_key_postfix (str) – if meta_key is None, use {ref_image}_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

Deprecated since version 0.6.0: dimensions is deprecated, use spatial_dims instead.

class monai.apps.deepgrow.transforms.SpatialCropForegroundd(keys, source_key, spatial_size, select_fn=<function is_positive>, channel_indices=None, margin=0, allow_smaller=True, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Crop only the foreground object of the expected images.

Difference VS monai.transforms.CropForegroundd:

  1. If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.

  2. This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]

The typical usage is to help training and evaluation if the valid part is small in the whole medical image. The valid part can be determined by any field in the data with source_key, for example:

  • Select values > 0 in image field as the foreground and crop on all fields specified by keys.

  • Select label = 3 in label field as the foreground to crop on all fields specified by keys.

  • Select label > 0 in the third channel of a One-Hot label field as the foreground to crop all keys fields.

Users can define arbitrary function to select expected foreground from the whole source image or specified channels. And it can also add margin to every dim of the bounding box of foreground object.

Parameters
  • keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed. See also: monai.transforms.MapTransform

  • source_key (str) – data source to generate the bounding box of foreground, can be image or label, etc.

  • spatial_size (Union[Sequence[int], ndarray]) – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.

  • select_fn (Callable) – function to select expected foreground, default is to select values > 0.

  • channel_indices (Union[Iterable[int], int, None]) – if defined, select foreground only on the specified channels of image. if None, select foreground on the whole image.

  • margin (int) – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.

  • allow_smaller (bool) – when computing box size with margin, whether allow the image size to be smaller than box size, default to True. if the margined size is bigger than image size, will pad with specified mode.

  • meta_keys (Union[Collection[Hashable], Hashable, None]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.

  • meta_key_postfix – if meta_keys is None, use {key}_{meta_key_postfix} to fetch/store the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

  • start_coord_key (str) – key to record the start coordinate of spatial bounding box for foreground.

  • end_coord_key (str) – key to record the end coordinate of spatial bounding box for foreground.

  • original_shape_key (str) – key to record original shape for foreground.

  • cropped_shape_key (str) – key to record cropped shape for foreground.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.SpatialCropGuidanced(keys, guidance, spatial_size, margin=20, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Crop image based on guidance with minimal spatial size.

  • If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.

  • This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]

Input data is of shape (C, spatial_1, [spatial_2, …])

Parameters
  • keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed.

  • guidance (str) – key to the guidance. It is used to generate the bounding box of foreground

  • spatial_size – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.

  • margin – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.

  • meta_keys (Union[Collection[Hashable], Hashable, None]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.

  • meta_key_postfix – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

  • start_coord_key (str) – key to record the start coordinate of spatial bounding box for foreground.

  • end_coord_key (str) – key to record the end coordinate of spatial bounding box for foreground.

  • original_shape_key (str) – key to record original shape for foreground.

  • cropped_shape_key (str) – key to record cropped shape for foreground.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.RestoreLabeld(keys, ref_image, slice_only=False, mode=InterpolateMode.NEAREST, align_corners=None, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#

Restores label based on the ref image.

The ref_image is assumed that it went through the following transforms:

  1. Fetch2DSliced (If 2D)

  2. Spacingd

  3. SpatialCropGuidanced

  4. Resized

And its shape is assumed to be (C, D, H, W)

This transform tries to undo these operation so that the result label can be overlapped with original volume. It does the following operation:

  1. Undo Resized

  2. Undo SpatialCropGuidanced

  3. Undo Spacingd

  4. Undo Fetch2DSliced

The resulting label is of shape (D, H, W)

Parameters
  • keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed.

  • ref_image (str) – reference image to fetch current and original image details

  • slice_only (bool) – apply only to an applicable slice, in case of 2D model/prediction

  • mode (Union[Sequence[Union[InterpolateMode, str]], InterpolateMode, str]) – {"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} One of the listed string values or a user supplied function for padding. Defaults to "constant". See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html

  • align_corners (Union[Sequence[Optional[bool]], bool, None]) – Geometrically, we consider the pixels of the input as squares rather than points. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html It also can be a sequence of bool, each element corresponds to a key in keys.

  • meta_keys (Optional[str]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.

  • meta_key_postfix (str) – if meta_key is None, use key_{meta_key_postfix} to fetch the metadata according to the key data, default is `meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

  • start_coord_key (str) – key that records the start coordinate of spatial bounding box for foreground.

  • end_coord_key (str) – key that records the end coordinate of spatial bounding box for foreground.

  • original_shape_key (str) – key that records original shape for foreground.

  • cropped_shape_key (str) – key that records cropped shape for foreground.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.deepgrow.transforms.ResizeGuidanced(guidance, ref_image, meta_keys=None, meta_key_postfix='meta_dict', cropped_shape_key='foreground_cropped_shape')[source]#

Resize the guidance based on cropped vs resized image.

This transform assumes that the images have been cropped and resized. And the shape after cropped is store inside the meta dict of ref image.

Parameters
  • guidance (str) – key to guidance

  • ref_image (str) – key to reference image to fetch current and original image details

  • meta_keys (Optional[str]) – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.

  • meta_key_postfix (str) – if meta_key is None, use {ref_image}_{meta_key_postfix} to to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

  • cropped_shape_key (str) – key that records cropped shape for foreground.

class monai.apps.deepgrow.transforms.FindDiscrepancyRegionsd(label='label', pred='pred', discrepancy='discrepancy')[source]#

Find discrepancy between prediction and actual during click interactions during training.

Parameters
  • label (str) – key to label source.

  • pred (str) – key to prediction source.

  • discrepancy (str) – key to store discrepancies found between label and prediction.

class monai.apps.deepgrow.transforms.FindAllValidSlicesd(label='label', sids='sids')[source]#

Find/List all valid slices in the label. Label is assumed to be a 4D Volume with shape CDHW, where C=1.

Parameters
  • label (str) – key to the label source.

  • sids (str) – key to store slices indices having valid label map.

class monai.apps.deepgrow.transforms.Fetch2DSliced(keys, guidance='guidance', axis=0, meta_keys=None, meta_key_postfix='meta_dict', allow_missing_keys=False)[source]#

Fetch one slice in case of a 3D volume.

The volume only contains spatial coordinates.

Parameters
  • keys – keys of the corresponding items to be transformed.

  • guidance – key that represents guidance.

  • axis (int) – axis that represents slice in 3D volume.

  • meta_keys (Union[Collection[Hashable], Hashable, None]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.

  • meta_key_postfix (str) – use key_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

Pathology#

class monai.apps.pathology.data.PatchWSIDataset(data, region_size, grid_shape, patch_size, transform=None, image_reader_name='cuCIM', **kwargs)[source]#

This dataset reads whole slide images, extracts regions, and creates patches. It also reads labels for each patch and provides each patch with its associated class labels.

Parameters
  • data (List) – the list of input samples including image, location, and label (see the note below for more details).

  • region_size (Union[int, Tuple[int, int]]) – the size of regions to be extracted from the whole slide image.

  • grid_shape (Union[int, Tuple[int, int]]) – the grid shape on which the patches should be extracted.

  • patch_size (Union[int, Tuple[int, int]]) – the size of patches extracted from the region on the grid.

  • transform (Optional[Callable]) – transforms to be executed on input data.

  • image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.

  • kwargs – additional parameters for WSIReader

Note

The input data has the following form as an example: [{“image”: “path/to/image1.tiff”, “location”: [200, 500], “label”: [0,0,0,1]}].

This means from “image1.tiff” extract a region centered at the given location location with the size of region_size, and then extract patches with the size of patch_size from a grid with the shape of grid_shape. Be aware the the grid_shape should construct a grid with the same number of element as labels, so for this example the grid_shape should be (2, 2).

class monai.apps.pathology.data.SmartCachePatchWSIDataset(data, region_size, grid_shape, patch_size, transform, image_reader_name='cuCIM', replace_rate=0.5, cache_num=9223372036854775807, cache_rate=1.0, num_init_workers=1, num_replace_workers=1, progress=True, copy_cache=True, as_contiguous=True, **kwargs)[source]#

Add SmartCache functionality to PatchWSIDataset.

Parameters
  • data (List) – the list of input samples including image, location, and label (see PatchWSIDataset for more details)

  • region_size (Union[int, Tuple[int, int]]) – the region to be extracted from the whole slide image.

  • grid_shape (Union[int, Tuple[int, int]]) – the grid shape on which the patches should be extracted.

  • patch_size (Union[int, Tuple[int, int]]) – the size of patches extracted from the region on the grid.

  • image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.

  • transform (Union[Sequence[Callable], Callable]) – transforms to be executed on input data.

  • replace_rate (float) – percentage of the cached items to be replaced in every epoch.

  • cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • num_init_workers (Optional[int]) – the number of worker threads to initialize the cache for first epoch. If num_init_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.

  • num_replace_workers (Optional[int]) – the number of worker threads to prepare the replacement cache for every epoch. If num_replace_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.

  • progress (bool) – whether to display a progress bar when caching for the first epoch.

  • copy_cache (bool) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cache content or every cache item is only used once in a multi-processing environment, may set copy=False for better performance.

  • as_contiguous (bool) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.

  • kwargs – additional parameters for WSIReader

class monai.apps.pathology.data.MaskedInferenceWSIDataset(data, patch_size, transform=None, image_reader_name='cuCIM', **kwargs)[source]#

This dataset load the provided foreground masks at an arbitrary resolution level, and extract patches based on that mask from the associated whole slide image.

Parameters
  • data (List[Dict[str, str]]) – a list of sample including the path to the whole slide image and the path to the mask. Like this: [{“image”: “path/to/image1.tiff”, “mask”: “path/to/mask1.npy}, …]”.

  • patch_size (Union[int, Tuple[int, int]]) – the size of patches to be extracted from the whole slide image for inference.

  • transform (Optional[Callable]) – transforms to be executed on extracted patches.

  • image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.

  • kwargs – additional parameters for WSIReader

Note

The resulting output (probability maps) after performing inference using this dataset is

supposed to be the same size as the foreground mask and not the original wsi image size.

class monai.apps.pathology.handlers.ProbMapProducer(output_dir='./', output_postfix='', dtype=<class 'numpy.float64'>, name=None)[source]#

Event handler triggered on completing every iteration to save the probability map

__init__(output_dir='./', output_postfix='', dtype=<class 'numpy.float64'>, name=None)[source]#
Parameters
  • output_dir (str) – output directory to save probability maps.

  • output_postfix (str) – a string appended to all output file names.

  • dtype (Union[dtype, type, str, None]) – the data type in which the probability map is stored. Default np.float64.

  • name (Optional[str]) – identifier of logging.logger to use, defaulting to engine.logger.

attach(engine)[source]#
Parameters

engine (Engine) – Ignite Engine, it can be a trainer, validator or evaluator.

Return type

None

save_prob_map(name)[source]#

This method save the probability map for an image, when its inference is finished, and delete that probability map from memory.

Parameters

name (str) – the name of image to be saved.

Return type

None

class monai.apps.pathology.metrics.LesionFROC(data, grow_distance=75, itc_diameter=200, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8), nms_sigma=0.0, nms_prob_threshold=0.5, nms_box_size=48, image_reader_name='cuCIM')[source]#

Evaluate with Free Response Operating Characteristic (FROC) score.

Parameters
  • data (List[Dict]) – either the list of dictionaries containing probability maps (inference result) and tumor mask (ground truth), as below, or the path to a json file containing such list. { “prob_map”: “path/to/prob_map_1.npy”, “tumor_mask”: “path/to/ground_truth_1.tiff”, “level”: 6, “pixel_spacing”: 0.243 }

  • grow_distance (int) – Euclidean distance (in micrometer) by which to grow the label the ground truth’s tumors. Defaults to 75, which is the equivalent size of 5 tumor cells.

  • itc_diameter (int) – the maximum diameter of a region (in micrometer) to be considered as an isolated tumor cell. Defaults to 200.

  • eval_thresholds (Tuple) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.

  • nms_sigma (float) – the standard deviation for gaussian filter of non-maximal suppression. Defaults to 0.0.

  • nms_prob_threshold (float) – the probability threshold of non-maximal suppression. Defaults to 0.5.

  • nms_box_size (int) – the box size (in pixel) to be removed around the the pixel for non-maximal suppression.

  • image_reader_name (str) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.

Note

For more info on nms_* parameters look at monai.utils.prob_nms.ProbNMS`.

compute_fp_tp()[source]#

Compute false positive and true positive probabilities for tumor detection, by comparing the model outputs with the prepared ground truths for all samples

evaluate()[source]#

Evaluate the detection performance of a model based on the model probability map output, the ground truth tumor mask, and their associated metadata (e.g., pixel_spacing, level)

prepare_ground_truth(sample)[source]#

Prepare the ground truth for evaluation based on the binary tumor mask

prepare_inference_result(sample)[source]#

Prepare the probability map for detection evaluation.

monai.apps.pathology.utils.compute_multi_instance_mask(mask, threshold)[source]#

This method computes the segmentation mask according to the binary tumor mask.

Parameters
  • mask (ndarray) – the binary mask array

  • threshold (float) – the threshold to fill holes

monai.apps.pathology.utils.compute_isolated_tumor_cells(tumor_mask, threshold)[source]#

This method computes identifies Isolated Tumor Cells (ITC) and return their labels.

Parameters
  • tumor_mask (ndarray) – the tumor mask.

  • threshold (float) – the threshold (at the mask level) to define an isolated tumor cell (ITC). A region with the longest diameter less than this threshold is considered as an ITC.

Return type

List[int]

class monai.apps.pathology.utils.PathologyProbNMS(spatial_dims=2, sigma=0.0, prob_threshold=0.5, box_size=48)[source]#

This class extends monai.utils.ProbNMS and add the resolution option for Pathology.

class monai.apps.pathology.transforms.stain.array.ExtractHEStains(tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308))[source]#

Class to extract a target stain from an image, using stain deconvolution (see Note).

Parameters
  • tli (float) – transmitted light intensity. Defaults to 240.

  • alpha (float) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.

  • beta (float) – absorbance threshold for transparent pixels. Defaults to 0.15

  • max_cref (Union[tuple, ndarray]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).

Note

For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:

class monai.apps.pathology.transforms.stain.array.NormalizeHEStains(tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308))[source]#

Class to normalize patches/images to a reference or target image stain (see Note).

Performs stain deconvolution of the source image using the ExtractHEStains class, to obtain the stain matrix and calculate the stain concentration matrix for the image. Then, performs the inverse Beer-Lambert transform to recreate the patch using the target H&E stain matrix provided. If no target stain provided, a default reference stain is used. Similarly, if no maximum stain concentrations are provided, a reference maximum stain concentrations matrix is used.

Parameters
  • tli (float) – transmitted light intensity. Defaults to 240.

  • alpha (float) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.

  • beta (float) – absorbance threshold for transparent pixels. Defaults to 0.15.

  • target_he (Union[tuple, ndarray]) – target stain matrix. Defaults to ((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)).

  • max_cref (Union[tuple, ndarray]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to [1.9705, 1.0308].

Note

For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:

A collection of dictionary-based wrappers around the pathology transforms defined in monai.apps.pathology.transforms.array.

Class names are ended with ‘d’ to denote dictionary-based transforms.

class monai.apps.pathology.transforms.stain.dictionary.ExtractHEStainsd(keys, tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.ExtractHEStains. Class to extract a target stain from an image, using stain deconvolution.

Parameters
  • keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed. See also: monai.transforms.compose.MapTransform

  • tli (float) – transmitted light intensity. Defaults to 240.

  • alpha (float) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.

  • beta (float) – absorbance threshold for transparent pixels. Defaults to 0.15

  • max_cref (Union[tuple, ndarray]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.stain.dictionary.NormalizeHEStainsd(keys, tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.pathology.transforms.NormalizeHEStains.

Class to normalize patches/images to a reference or target image stain.

Performs stain deconvolution of the source image using the ExtractHEStains class, to obtain the stain matrix and calculate the stain concentration matrix for the image. Then, performs the inverse Beer-Lambert transform to recreate the patch using the target H&E stain matrix provided. If no target stain provided, a default reference stain is used. Similarly, if no maximum stain concentrations are provided, a reference maximum stain concentrations matrix is used.

Parameters
  • keys (Union[Collection[Hashable], Hashable]) – keys of the corresponding items to be transformed. See also: monai.transforms.compose.MapTransform

  • tli (float) – transmitted light intensity. Defaults to 240.

  • alpha (float) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.

  • beta (float) – absorbance threshold for transparent pixels. Defaults to 0.15.

  • target_he (Union[tuple, ndarray]) – target stain matrix. Defaults to None.

  • max_cref (Union[tuple, ndarray]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to None.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

class monai.apps.pathology.transforms.spatial.array.SplitOnGrid(grid_size=(2, 2), patch_size=None)[source]#

Split the image into patches based on the provided grid shape. This transform works only with torch.Tensor inputs.

Parameters
  • grid_size (Union[int, Tuple[int, int]]) – a tuple or an integer define the shape of the grid upon which to extract patches. If it’s an integer, the value will be repeated for each dimension. Default is 2x2

  • patch_size (Union[int, Tuple[int, int], None]) – a tuple or an integer that defines the output patch sizes. If it’s an integer, the value will be repeated for each dimension. The default is (0, 0), where the patch size will be inferred from the grid shape.

Note: the shape of the input image is inferred based on the first image used.

class monai.apps.pathology.transforms.spatial.array.TileOnGrid(tile_count=None, tile_size=256, step=None, random_offset=False, pad_full=False, background_val=255, filter_mode='min')[source]#

Tile the 2D image into patches on a grid and maintain a subset of it. This transform works only with np.ndarray inputs for 2D images.

Parameters
  • tile_count (Optional[int]) – number of tiles to extract, if None extracts all non-background tiles Defaults to None.

  • tile_size (int) – size of the square tile Defaults to 256.

  • step (Optional[int]) – step size Defaults to None (same as tile_size)

  • random_offset (bool) – Randomize position of the grid, instead of starting from the top-left corner Defaults to False.

  • pad_full (bool) – pad image to the size evenly divisible by tile_size Defaults to False.

  • background_val (int) – the background constant (e.g. 255 for white background) Defaults to 255.

  • filter_mode (str) – mode must be in [“min”, “max”, “random”]. If total number of tiles is more than tile_size, then sort by intensity sum, and take the smallest (for min), largest (for max) or random (for random) subset Defaults to min (which assumes background is high value)

randomize(img_size)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

None

class monai.apps.pathology.transforms.spatial.dictionary.SplitOnGridd(keys, grid_size=(2, 2), patch_size=None, allow_missing_keys=False)[source]#

Split the image into patches based on the provided grid shape. This transform works only with torch.Tensor inputs.

Parameters
  • grid_size (Union[int, Tuple[int, int]]) – a tuple or an integer define the shape of the grid upon which to extract patches. If it’s an integer, the value will be repeated for each dimension. Default is 2x2

  • patch_size (Union[int, Tuple[int, int], None]) – a tuple or an integer that defines the output patch sizes. If it’s an integer, the value will be repeated for each dimension. The default is (0, 0), where the patch size will be inferred from the grid shape.

Note: the shape of the input image is inferred based on the first image used.

class monai.apps.pathology.transforms.spatial.dictionary.TileOnGridd(keys, tile_count=None, tile_size=256, step=None, random_offset=False, pad_full=False, background_val=255, filter_mode='min', allow_missing_keys=False, return_list_of_dicts=False)[source]#

Tile the 2D image into patches on a grid and maintain a subset of it. This transform works only with np.ndarray inputs for 2D images.

Parameters
  • tile_count (Optional[int]) – number of tiles to extract, if None extracts all non-background tiles Defaults to None.

  • tile_size (int) – size of the square tile Defaults to 256.

  • step (Optional[int]) – step size Defaults to None (same as tile_size)

  • random_offset (bool) – Randomize position of the grid, instead of starting from the top-left corner Defaults to False.

  • pad_full (bool) – pad image to the size evenly divisible by tile_size Defaults to False.

  • background_val (int) – the background constant (e.g. 255 for white background) Defaults to 255.

  • filter_mode (str) – mode must be in [“min”, “max”, “random”]. If total number of tiles is more than tile_size, then sort by intensity sum, and take the smallest (for min), largest (for max) or random (for random) subset Defaults to min (which assumes background is high value)

randomize(data=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

None

Detection#

Hard Negative Sampler#

The functions in this script are adapted from nnDetection, https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/core/boxes/sampler.py

class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#

HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it selects negative samples with high prediction scores.

The training workflow is described as the follows: 1) forward network and get prediction scores (classification prob/logits) for all the samples; 2) use hard negative sampler to choose negative samples with high prediction scores and some positive samples; 3) compute classification loss for the selected samples; 4) do back propagation.

Parameters
  • batch_size_per_image (int) – number of training samples to be randomly selected per image

  • positive_fraction (float) – percentage of positive elements in the selected samples

  • min_neg (int) – minimum number of negative samples to select if possible.

  • pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

get_num_neg(negative, num_pos)[source]#

Sample enough negatives to fill up self.batch_size_per_image

Parameters
  • negative (Tensor) – indices of positive samples

  • num_pos (int) – number of positive samples to draw

Return type

int

Returns

number of negative samples

get_num_pos(positive)[source]#

Number of positive samples to draw

Parameters

positive (Tensor) – indices of positive samples

Return type

int

Returns

number of positive sample

select_positives(positive, num_pos, labels)[source]#

Select positive samples

Parameters
  • positive (Tensor) – indices of positive samples, sized (P,), where P is the number of positive samples

  • num_pos (int) – number of positive samples to sample

  • labels (Tensor) – labels for all samples, sized (A,), where A is the number of samples.

Return type

Tensor

Returns

binary mask of positive samples to choose, sized (A,),

where A is the number of samples in one image

select_samples_img_list(target_labels, fg_probs)[source]#

Select positives and hard negatives from list samples per image. Hard negative sampler will be applied to each image independently.

Parameters
  • target_labels (List[Tensor]) – list of labels per image. For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i. Positive samples have positive labels, negative samples have label 0.

  • fg_probs (List[Tensor]) – list of maximum foreground probability per images, For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i.

Return type

Tuple[List[Tensor], List[Tensor]]

Returns

  • list of binary mask for positive samples

  • list binary mask for negative samples

Example

sampler = HardNegativeSampler(
    batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2
)
# two images with different number of samples
target_labels = [ torch.tensor([0,1]), torch.tensor([1,0,2,1])]
fg_probs = [ torch.rand(2), torch.rand(4)]
pos_idx_list, neg_idx_list = sampler.select_samples_img_list(target_labels, fg_probs)
select_samples_per_img(labels_per_img, fg_probs_per_img)[source]#

Select positives and hard negatives from samples.

Parameters
  • labels_per_img (Tensor) – labels, sized (A,). Positive samples have positive labels, negative samples have label 0.

  • fg_probs_per_img (Tensor) – maximum foreground probability, sized (A,)

Return type

Tuple[Tensor, Tensor]

Returns

  • binary mask for positive samples, sized (A,)

  • binary mask for negative samples, sized (A,)

Example

sampler = HardNegativeSampler(
    batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2
)
# two images with different number of samples
target_labels = torch.tensor([1,0,2,1])
fg_probs = torch.rand(4)
pos_idx, neg_idx = sampler.select_samples_per_img(target_labels, fg_probs)
class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSamplerBase(pool_size=10)[source]#

Base class of hard negative sampler.

Hard negative sampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.

The training workflow is described as the follows: 1) forward network and get prediction scores (classification prob/logits) for all the samples; 2) use hard negative sampler to choose negative samples with high prediction scores and some positive samples; 3) compute classification loss for the selected samples; 4) do back propagation.

Parameters

pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

select_negatives(negative, num_neg, fg_probs)[source]#

Select hard negative samples.

Parameters
  • negative (Tensor) – indices of all the negative samples, sized (P,), where P is the number of negative samples

  • num_neg (int) – number of negative samples to sample

  • fg_probs (Tensor) – maximum foreground prediction scores (probability) across all the classes for each sample, sized (A,), where A is the the number of samples.

Return type

Tensor

Returns

binary mask of negative samples to choose, sized (A,),

where A is the the number of samples in one image

RetinaNet Network#

Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py

class monai.apps.detection.networks.retinanet_network.RetinaNet(spatial_dims, num_classes, num_anchors, feature_extractor, size_divisible=1)[source]#

The network used in RetinaNet.

It takes an image tensor as inputs, and outputs a dictionary head_outputs. head_outputs[self.cls_key] is the predicted classification maps, a list of Tensor. head_outputs[self.box_reg_key] is the predicted box regression maps, a list of Tensor.

Parameters
  • spatial_dims (int) – number of spatial dimensions of the images. We support both 2D and 3D images.

  • num_classes (int) – number of output classes of the model (excluding the background).

  • num_anchors (int) – number of anchors at each location.

  • feature_extractor – a network that outputs feature maps from the input images, each feature map corresponds to a different resolution. Its output can have a format of Tensor, Dict[Any, Tensor], or Sequence[Tensor]. It can be the output of resnet_fpn_feature_extractor(*args, **kwargs).

  • size_divisible (Union[Sequence[int], int]) – the spatial size of the network input should be divisible by size_divisible, decided by the feature_extractor.

Example

from monai.networks.nets import resnet
spatial_dims = 3  # 3D network
conv1_t_stride = (2,2,1)  # stride of first convolutional layer in backbone
backbone = resnet.ResNet(
    spatial_dims = spatial_dims,
    block = resnet.ResNetBottleneck,
    layers = [3, 4, 6, 3],
    block_inplanes = resnet.get_inplanes(),
    n_input_channels= 1,
    conv1_t_stride = conv1_t_stride,
    conv1_t_size = (7,7,7),
)
# This feature_extractor outputs 4-level feature maps.
# number of output feature maps is len(returned_layers)+1
returned_layers = [1,2,3]  # returned layer from feature pyramid network
feature_extractor = resnet_fpn_feature_extractor(
    backbone = backbone,
    spatial_dims = spatial_dims,
    pretrained_backbone = False,
    trainable_backbone_layers = None,
    returned_layers = returned_layers,
)
# This feature_extractor requires input image spatial size
# to be divisible by (32, 32, 16).
size_divisible = tuple(2*s*2**max(returned_layers) for s in conv1_t_stride)
model = RetinaNet(
    spatial_dims = spatial_dims,
    num_classes = 5,
    num_anchors = 6,
    feature_extractor=feature_extractor,
    size_divisible = size_divisible,
).to(device)
result = model(torch.rand(2, 1, 128,128,128))
cls_logits_maps = result["cls_logits"]  # a list of len(returned_layers)+1 Tensor
box_regression_maps = result["box_regression"]  # a list of len(returned_layers)+1 Tensor
forward(images)[source]#

It takes an image tensor as inputs, and outputs a dictionary head_outputs. head_outputs[self.cls_key] is the predicted classification maps, a list of Tensor. head_outputs[self.box_reg_key] is the predicted box regression maps, a list of Tensor.

Parameters

images (Tensor) – input images, sized (B, img_channels, H, W) or (B, img_channels, H, W, D).

Return type

Dict[str, List[Tensor]]

Returns

a dictionary head_outputs with keys including self.cls_key and self.box_reg_key. head_outputs[self.cls_key] is the predicted classification maps, a list of Tensor. head_outputs[self.box_reg_key] is the predicted box regression maps, a list of Tensor.

class monai.apps.detection.networks.retinanet_network.RetinaNetClassificationHead(in_channels, num_anchors, num_classes, spatial_dims, prior_probability=0.01)[source]#

A classification head for use in RetinaNet.

This head takes a list of feature maps as inputs, and outputs a list of classification maps. Each output map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.

Parameters
  • in_channels (int) – number of channels of the input feature

  • num_anchors (int) – number of anchors to be predicted

  • num_classes (int) – number of classes to be predicted

  • spatial_dims (int) – spatial dimension of the network, should be 2 or 3.

  • prior_probability (float) – prior probability to initialize classification convolutional layers.

forward(x)[source]#

It takes a list of feature maps as inputs, and outputs a list of classification maps. Each output classification map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.

Parameters

x (List[Tensor]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.

Return type

List[Tensor]

Returns

cls_logits_maps, list of classification map. cls_logits_maps[i] is a (B, num_anchors * num_classes, H_i, W_i) or (B, num_anchors * num_classes, H_i, W_i, D_i) Tensor.

class monai.apps.detection.networks.retinanet_network.RetinaNetRegressionHead(in_channels, num_anchors, spatial_dims)[source]#

A regression head for use in RetinaNet.

This head takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.

Parameters
  • in_channels (int) – number of channels of the input feature

  • num_anchors (int) – number of anchors to be predicted

  • spatial_dims (int) – spatial dimension of the network, should be 2 or 3.

forward(x)[source]#

It takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.

Parameters

x (List[Tensor]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.

Return type

List[Tensor]

Returns

box_regression_maps, list of box regression map. cls_logits_maps[i] is a (B, num_anchors * 2 * spatial_dims, H_i, W_i) or (B, num_anchors * 2 * spatial_dims, H_i, W_i, D_i) Tensor.

monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(backbone, spatial_dims, pretrained_backbone=False, returned_layers=(1, 2, 3), trainable_backbone_layers=None)[source]#

Constructs a feature extractor network with a ResNet-FPN backbone, used as feature_extractor in RetinaNet.

Reference: “Focal Loss for Dense Object Detection”.

The returned feature_extractor network takes an image tensor as inputs, and outputs a dictionary that maps string to the extracted feature maps (Tensor).

The input to the returned feature_extractor is expected to be a list of tensors, each of shape [C, H, W] or [C, H, W, D], one for each image. Different images can have different sizes.

Parameters
  • backbone (ResNet) – a ResNet model, used as backbone.

  • spatial_dims (int) – number of spatial dimensions of the images. We support both 2D and 3D images.

  • pretrained_backbone (bool) – whether the backbone has been pre-trained.

  • returned_layers (Sequence[int]) – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.

  • trainable_backbone_layers (Optional[int]) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. When pretrained_backbone is False, this value is set to be 5. When pretrained_backbone is True, if None is passed (the default) this value is set to 3.

Example

from monai.networks.nets import resnet
spatial_dims = 3 # 3D network
backbone = resnet.ResNet(
    spatial_dims = spatial_dims,
    block = resnet.ResNetBottleneck,
    layers = [3, 4, 6, 3],
    block_inplanes = resnet.get_inplanes(),
    n_input_channels= 1,
    conv1_t_stride = (2,2,1),
    conv1_t_size = (7,7,7),
)
# This feature_extractor outputs 4-level feature maps.
# number of output feature maps is len(returned_layers)+1
feature_extractor = resnet_fpn_feature_extractor(
    backbone = backbone,
    spatial_dims = spatial_dims,
    pretrained_backbone = False,
    trainable_backbone_layers = None,
    returned_layers = [1,2,3],
)
model = RetinaNet(
    spatial_dims = spatial_dims,
    num_classes = 5,
    num_anchors = 6,
    feature_extractor=feature_extractor,
    size_divisible = 32,
).to(device)

RetinaNet Detector#

Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py

class monai.apps.detection.networks.retinanet_detector.RetinaNetDetector(network, anchor_generator, box_overlap_metric=<function box_iou>, debug=False)[source]#

Retinanet detector, expandable to other one stage anchor based box detectors in the future. An example of construction can found in the source code of retinanet_resnet50_fpn_detector() .

The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.

The behavior of the model changes depending if it is in training or evaluation mode.

During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:

  • boxes (FloatTensor[N, 4] or FloatTensor[N, 6]): the ground-truth boxes in StandardMode, i.e., [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax] format, with 0 <= xmin < xmax <= H, 0 <= ymin < ymax <= W, 0 <= zmin < zmax <= D.

  • labels: the class label for each ground-truth box

The model returns a Dict[str, Tensor] during training, containing the classification and regression losses. When saving the model, only self.network contains trainable parameters and needs to be saved.

During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:

  • boxes (FloatTensor[N, 4] or FloatTensor[N, 6]): the predicted boxes in StandardMode, i.e., [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax] format, with 0 <= xmin < xmax <= H, 0 <= ymin < ymax <= W, 0 <= zmin < zmax <= D.

  • labels (Int64Tensor[N]): the predicted labels for each image

  • labels_scores (Tensor[N]): the scores for each prediction

Parameters
  • network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].

  • anchor_generator (AnchorGenerator) – anchor generator.

  • box_overlap_metric (Callable) – func that compute overlap between two sets of boxes, default is Intersection over Union (IoU).

  • debug (bool) – whether to print out internal parameters, used for debugging and parameter tuning.

Notes

Input argument network can be a monai.apps.detection.networks.retinanet_network.RetinaNet(*) object, but any network that meets the following rules is a valid input network.

  1. It should have attributes including spatial_dims, num_classes, cls_key, box_reg_key, num_anchors, size_divisible.

    • spatial_dims (int) is the spatial dimension of the network, we support both 2D and 3D.

    • num_classes (int) is the number of classes, excluding the background.

    • size_divisible (int or Sequene[int]) is the expection on the input image shape. The network needs the input spatial_size to be divisible by size_divisible, length should be 2 or 3.

    • cls_key (str) is the key to represent classification in the output dict.

    • box_reg_key (str) is the key to represent box regression in the output dict.

    • num_anchors (int) is the number of anchor shapes at each location. it should equal to self.anchor_generator.num_anchors_per_location()[0].

  2. Its input should be an image Tensor sized (B, C, H, W) or (B, C, H, W, D).

  3. About its output head_outputs:

    • It should be a dictionary with at least two keys: network.cls_key and network.box_reg_key.

    • head_outputs[network.cls_key] should be List[Tensor] or Tensor. Each Tensor represents classification logits map at one resolution level, sized (B, num_classes*num_anchors, H_i, W_i) or (B, num_classes*num_anchors, H_i, W_i, D_i).

    • head_outputs[network.box_reg_key] should be List[Tensor] or Tensor. Each Tensor represents box regression map at one resolution level, sized (B, 2*spatial_dims*num_anchors, H_i, W_i)or (B, 2*spatial_dims*num_anchors, H_i, W_i, D_i).

    • len(head_outputs[network.cls_key]) == len(head_outputs[network.box_reg_key]).

Example

# define a naive network
import torch
class NaiveNet(torch.nn.Module):
    def __init__(self, spatial_dims: int, num_classes: int):
        super().__init__()
        self.spatial_dims = spatial_dims
        self.num_classes = num_classes
        self.size_divisible = 2
        self.cls_key = "cls"
        self.box_reg_key = "box_reg"
        self.num_anchors = 1
    def forward(self, images: torch.Tensor):
        spatial_size = images.shape[-self.spatial_dims:]
        out_spatial_size = tuple(s//self.size_divisible for s in spatial_size)  # half size of input
        out_cls_shape = (images.shape[0],self.num_classes*self.num_anchors) + out_spatial_size
        out_box_reg_shape = (images.shape[0],2*self.spatial_dims*self.num_anchors) + out_spatial_size
        return {self.cls_key: [torch.randn(out_cls_shape)], self.box_reg_key: [torch.randn(out_box_reg_shape)]}

# create a RetinaNetDetector detector
spatial_dims = 3
num_classes = 5
anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(
    feature_map_scales=(1, ), base_anchor_shapes=((8,) * spatial_dims)
)
net = NaiveNet(spatial_dims, num_classes)
detector = RetinaNetDetector(net, anchor_generator)

# only detector.network may contain trainable parameters.
optimizer = torch.optim.SGD(
    detector.network.parameters(),
    1e-3,
    momentum=0.9,
    weight_decay=3e-5,
    nesterov=True,
)
torch.save(detector.network.state_dict(), 'model.pt')  # save model
detector.network.load_state_dict(torch.load('model.pt'))  # load model
compute_anchor_matched_idxs(anchors, targets, num_anchor_locs_per_level)[source]#

Compute the matched indices between anchors and ground truth (gt) boxes in targets. output[k][i] represents the matched gt index for anchor[i] in image k. Suppose there are M gt boxes for image k. The range of it output[k][i] value is [-2, -1, 0, …, M-1]. [0, M - 1] indicates this anchor is matched with a gt box, while a negative value indicating that it is not matched.

Parameters
  • anchors (List[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

  • targets (List[Dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.

  • num_anchor_locs_per_level (Sequence[int]) – each element represents HW or HWD at this level.

Return type

List[Tensor]

Returns

a list of matched index matched_idxs_per_image (Tensor[int64]), Tensor sized (sum(HWA),) or (sum(HWDA),). Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2

compute_box_loss(box_regression, targets, anchors, matched_idxs)[source]#

Compute box regression losses.

Parameters
  • box_regression (Tensor) – box regression results, sized (B, sum(HWA), 2*self.spatial_dims)

  • targets (List[Dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.

  • anchors (List[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

  • matched_idxs (List[Tensor]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)

Return type

Tensor

Returns

box regression losses.

compute_cls_loss(cls_logits, targets, matched_idxs)[source]#

Compute classification losses.

Parameters
  • cls_logits (Tensor) – classification logits, sized (B, sum(HW(D)A), self.num_classes)

  • targets (List[Dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.

  • matched_idxs (List[Tensor]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)

Return type

Tensor

Returns

classification losses.

compute_loss(head_outputs_reshape, targets, anchors, num_anchor_locs_per_level)[source]#

Compute losses.

Parameters
  • head_outputs_reshape (Dict[str, Tensor]) – reshaped head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)

  • targets (List[Dict[str, Tensor]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.

  • anchors (List[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

Return type

Dict[str, Tensor]

Returns

a dict of several kinds of losses.

forward(input_images, targets=None, use_inferer=False)[source]#

Returns a dict of losses during training, or a list predicted dict of boxes and labels during inference.

Parameters
  • input_images (Union[List[Tensor], Tensor]) – The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.

  • targets (Optional[List[Dict[str, Tensor]]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image (optional).

  • use_inferer (bool) – whether to use self.inferer, a sliding window inferer, to do the inference. If False, will simply forward the network. If True, will use self.inferer, and requires self.set_sliding_window_inferer(*args) to have been called before.

Return type

Union[Dict[str, Tensor], List[Dict[str, Tensor]]]

Returns

If training mode, will return a dict with at least two keys, including self.cls_key and self.box_reg_key, representing classification loss and box regression loss.

If evaluation mode, will return a list of detection results. Each element corresponds to an images in input_images, is a dict with at least three keys, including self.target_box_key, self.target_label_key, self.pred_score_key, representing predicted boxes, classification labels, and classification scores.

generate_anchors(images, head_outputs)[source]#

Generate anchors and store it in self.anchors: List[Tensor]. We generate anchors only when there is no stored anchors, or the new coming images has different shape with self.previous_image_shape

Parameters
  • images (Tensor) – input images, a (B, C, H, W) or (B, C, H, W, D) Tensor.

  • head_outputs (Dict[str, List[Tensor]]) – head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)

get_box_train_sample_per_image(box_regression_per_image, targets_per_image, anchors_per_image, matched_idxs_per_image)[source]#

Get samples from one image for box regression losses computation.

Parameters
  • box_regression_per_image (Tensor) – box regression result for one image, (sum(HWA), 2*self.spatial_dims)

  • targets_per_image (Dict[str, Tensor]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.

  • anchors_per_image (Tensor) – anchors of one image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

  • matched_idxs_per_image (Tensor) – matched index, sized (sum(HWA),) or (sum(HWDA),)

Return type

Tuple[Tensor, Tensor]

Returns

paired predicted and GT samples from one image for box regression losses computation

get_cls_train_sample_per_image(cls_logits_per_image, targets_per_image, matched_idxs_per_image)[source]#

Get samples from one image for classification losses computation.

Parameters
  • cls_logits_per_image (Tensor) – classification logits for one image, (sum(HWA), self.num_classes)

  • targets_per_image (Dict[str, Tensor]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.

  • matched_idxs_per_image (Tensor) – matched index, Tensor sized (sum(HWA),) or (sum(HWDA),) Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2

Return type

Tuple[Tensor, Tensor]

Returns

paired predicted and GT samples from one image for classification losses computation

postprocess_detections(head_outputs_reshape, anchors, image_sizes, num_anchor_locs_per_level, need_sigmoid=True)[source]#

Postprocessing to generate detection result from classification logits and box regression. Use self.box_selector to select the final outut boxes for each image.

Parameters
  • head_outputs_reshape (Dict[str, Tensor]) – reshaped head_outputs. head_output_reshape[self.cls_key] is a Tensor sized (B, sum(HW(D)A), self.num_classes). head_output_reshape[self.box_reg_key] is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)

  • targets – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.

  • anchors (List[Tensor]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.

Return type

List[Dict[str, Tensor]]

Returns

a list of dict, each dict scorresponds to detection result on image.

set_atss_matcher(num_candidates=4, center_in_gt=False)[source]#

Using for training. Set ATSS matcher that matches anchors with ground truth boxes

Parameters
  • num_candidates (int) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.

  • center_in_gt (bool) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.

Return type

None

set_balanced_sampler(batch_size_per_image, positive_fraction)[source]#

Using for training. Set torchvision balanced sampler that samples part of the anchors for training.

Parameters
  • batch_size_per_image (int) – number of elements to be selected per image

  • positive_fraction (float) – percentage of positive elements per batch

set_box_coder_weights(weights)[source]#

Set the weights for box coder.

Parameters

weights (Tuple[float]) – a list/tuple with length of 2*self.spatial_dims

set_box_regression_loss(box_loss, encode_gt, decode_pred)[source]#

Using for training. Set loss for box regression.

Parameters
  • box_loss (Module) – loss module for box regression

  • encode_gt (bool) – if True, will encode ground truth boxes to target box regression before computing the losses. Should be True for L1 loss and False for GIoU loss.

  • decode_pred (bool) – if True, will decode predicted box regression into predicted boxes before computing losses. Should be False for L1 loss and True for GIoU loss.

Example

detector.set_box_regression_loss(
    torch.nn.SmoothL1Loss(beta=1.0 / 9, reduction="mean"),
    encode_gt = True, decode_pred = False
)
detector.set_box_regression_loss(
    monai.losses.giou_loss.BoxGIoULoss(reduction="mean"),
    encode_gt = False, decode_pred = True
)
Return type

None

set_box_selector_parameters(score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300, apply_sigmoid=True)[source]#

Using for inference. Set the parameters that are used for box selection during inference. The box selection is performed with the following steps:

  1. For each level, discard boxes with scores less than self.score_thresh.

  2. For each level, keep boxes with top self.topk_candidates_per_level scores.

  3. For the whole image, perform non-maximum suppression (NMS) on boxes, with overapping threshold nms_thresh.

  4. For the whole image, keep boxes with top self.detections_per_img scores.

Parameters
  • score_thresh (float) – no box with scores less than score_thresh will be kept

  • topk_candidates_per_level (int) – max number of boxes to keep for each level

  • nms_thresh (float) – box overlapping threshold for NMS

  • detections_per_img (int) – max number of boxes to keep for each image

set_cls_loss(cls_loss)[source]#

Using for training. Set loss for classification that takes logits as inputs, make sure sigmoid/softmax is built in.

Parameters

cls_loss (Module) – loss module for classification

Example

detector.set_cls_loss(torch.nn.BCEWithLogitsLoss(reduction="mean"))
detector.set_cls_loss(FocalLoss(reduction="mean", gamma=2.0))
Return type

None

set_hard_negative_sampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#

Using for training. Set hard negative sampler that samples part of the anchors for training.

HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.

Parameters
  • batch_size_per_image (int) – number of elements to be selected per image

  • positive_fraction (float) – percentage of positive elements in the selected samples

  • min_neg (int) – minimum number of negative samples to select if possible.

  • pool_size (float) – when we need num_neg hard negative samples, they will be randomly selected from num_neg * pool_size negative samples with the highest prediction scores. Larger pool_size gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.

set_regular_matcher(fg_iou_thresh, bg_iou_thresh, allow_low_quality_matches=True)[source]#

Using for training. Set torchvision matcher that matches anchors with ground truth boxes.

Parameters
  • fg_iou_thresh (float) – foreground IoU threshold for Matcher, considered as matched if IoU > fg_iou_thresh

  • bg_iou_thresh (float) – background IoU threshold for Matcher, considered as not matched if IoU < bg_iou_thresh

Return type

None

set_sliding_window_inferer(roi_size, sw_batch_size=1, overlap=0.5, mode=BlendMode.CONSTANT, sigma_scale=0.125, padding_mode=PytorchPadMode.CONSTANT, cval=0.0, sw_device=None, device=None, progress=False, cache_roi_weight_map=False)[source]#

Define sliding window inferer and store it to self.inferer.

set_target_keys(box_key, label_key)[source]#

Set keys for the training targets and inference outputs. During training, both box_key and label_key should be keys in the targets when performing self.forward(input_images, targets). During inference, they will be the keys in the output dict of self.forward(input_images)`.

monai.apps.detection.networks.retinanet_detector.retinanet_resnet50_fpn_detector(num_classes, anchor_generator, returned_layers=(1, 2, 3), pretrained=False, progress=True, **kwargs)[source]#

Returns a RetinaNet detector using a ResNet-50 as backbone, which can be pretrained from Med3D: Transfer Learning for 3D Medical Image Analysis <https://arxiv.org/pdf/1904.00625.pdf> _.

Parameters
  • num_classes (int) – number of output classes of the model (excluding the background).

  • anchor_generator (AnchorGenerator) – AnchorGenerator,

  • returned_layers (Sequence[int]) – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.

  • pretrained (bool) – If True, returns a backbone pre-trained on 23 medical datasets

  • progress (bool) – If True, displays a progress bar of the download to stderr

Return type

RetinaNetDetector

Returns

A RetinaNetDetector object with resnet50 as backbone

Example

# define a naive network
resnet_param = {
    "pretrained": False,
    "spatial_dims": 3,
    "n_input_channels": 2,
    "num_classes": 3,
    "conv1_t_size": 7,
    "conv1_t_stride": (2, 2, 2)
}
returned_layers = [1]
anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(
    feature_map_scales=(1, 2), base_anchor_shapes=((8,) * resnet_param["spatial_dims"])
)
detector = retinanet_resnet50_fpn_detector(
    **resnet_param, anchor_generator=anchor_generator, returned_layers=returned_layers
)

Transforms#

monai.apps.detection.transforms.box_ops.apply_affine_to_boxes(boxes, affine)[source]#

This function applies affine matrices to the boxes

Parameters
  • boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

  • affine (Union[ndarray, Tensor]) – affine matrix to be applied to the box coordinates, sized (spatial_dims+1,spatial_dims+1)

Return type

Union[ndarray, Tensor]

Returns

returned affine transformed boxes, with same data type as boxes, does not share memory with boxes

monai.apps.detection.transforms.box_ops.convert_box_to_mask(boxes, labels, spatial_size, bg_label=-1, ellipse_mask=False)[source]#

Convert box to int16 mask image, which has the same size with the input image.

Parameters
  • boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode.

  • labels (Union[ndarray, Tensor]) – classification foreground(fg) labels corresponding to boxes, dtype should be int, sized (N,).

  • spatial_size (Union[Sequence[int], int]) – image spatial size.

  • bg_label (int) – background labels for the output mask image, make sure it is smaller than any fg labels.

  • ellipse_mask (bool) –

    bool.

    • If True, it assumes the object shape is close to ellipse or ellipsoid.

    • If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.

    • If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.

Return type

Union[ndarray, Tensor]

Returns

  • int16 array, sized (num_box, H, W). Each channel represents a box.

    The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.

monai.apps.detection.transforms.box_ops.convert_mask_to_box(boxes_mask, bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#

Convert int16 mask image to box, which has the same size with the input image

Parameters
  • boxes_mask (Union[ndarray, Tensor]) – int16 array, sized (num_box, H, W). Each channel represents a box. The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.

  • bg_label (int) – background labels for the boxes_mask

  • box_dtype – output dtype for boxes

  • label_dtype – output dtype for labels

Return type

Tuple[Union[ndarray, Tensor], Union[ndarray, Tensor]]

Returns

  • bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode.

  • classification foreground(fg) labels, dtype should be int, sized (N,).

monai.apps.detection.transforms.box_ops.flip_boxes(boxes, spatial_size, flip_axes=None)[source]#

Flip boxes when the corresponding image is flipped

Parameters
  • boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

  • spatial_size (Union[Sequence[int], int]) – image spatial size.

  • flip_axes (Union[Sequence[int], int, None]) – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.

Returns

flipped boxes, with same data type as boxes, does not share memory with boxes

monai.apps.detection.transforms.box_ops.resize_boxes(boxes, src_spatial_size, dst_spatial_size)[source]#

Resize boxes when the corresponding image is resized

Parameters
  • boxes (Union[ndarray, Tensor]) – source bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

  • src_spatial_size (Union[Sequence[int], int]) – source image spatial size.

  • dst_spatial_size (Union[Sequence[int], int]) – target image spatial size.

Returns

resized boxes, with same data type as boxes, does not share memory with boxes

Example

boxes = torch.ones(1,4)
src_spatial_size = [100, 100]
dst_spatial_size = [128, 256]
resize_boxes(boxes, src_spatial_size, dst_spatial_size) #  will return tensor([[1.28, 2.56, 1.28, 2.56]])
monai.apps.detection.transforms.box_ops.rot90_boxes(boxes, spatial_size, k=1, axes=(0, 1))[source]#

Rotate boxes by 90 degrees in the plane specified by axes. Rotation direction is from the first towards the second axis.

Parameters
  • boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

  • spatial_size (Union[Sequence[int], int]) – image spatial size.

  • k (int) – number of times the array is rotated by 90 degrees.

  • axes (Tuple[int, int]) – (2,) array_like The array is rotated in the plane defined by the axes. Axes must be different.

Returns

A rotated view of boxes.

Notes

rot90_boxes(boxes, spatial_size, k=1, axes=(1,0)) is the reverse of rot90_boxes(boxes, spatial_size, k=1, axes=(0,1)) rot90_boxes(boxes, spatial_size, k=1, axes=(1,0)) is equivalent to rot90_boxes(boxes, spatial_size, k=-1, axes=(0,1))

monai.apps.detection.transforms.box_ops.select_labels(labels, keep)[source]#

For element in labels, select indice keep from it.

Parameters
  • labels (Union[Sequence[Union[ndarray, Tensor]], ndarray, Tensor]) – Sequence of array. Each element represents classification labels or scores corresponding to boxes, sized (N,).

  • keep (Union[ndarray, Tensor]) – the indices to keep, same length with each element in labels.

Return type

Union[Tuple, ndarray, Tensor]

Returns

selected labels, does not share memory with original labels.

monai.apps.detection.transforms.box_ops.swapaxes_boxes(boxes, axis1, axis2)[source]#

Interchange two axes of boxes.

Parameters
  • boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

  • axis1 (int) – First axis.

  • axis2 (int) – Second axis.

Returns

boxes with two axes interchanged.

monai.apps.detection.transforms.box_ops.zoom_boxes(boxes, zoom)[source]#

Zoom boxes

Parameters
  • boxes (Union[ndarray, Tensor]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

  • zoom (Union[Sequence[float], float]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.

Returns

zoomed boxes, with same data type as boxes, does not share memory with boxes

Example

boxes = torch.ones(1,4)
zoom_boxes(boxes, zoom=[0.5,2.2]) #  will return tensor([[0.5, 2.2, 0.5, 2.2]])

A collection of “vanilla” transforms for box operations https://github.com/Project-MONAI/MONAI/wiki/MONAI_Design

class monai.apps.detection.transforms.array.AffineBox[source]#

Applies affine matrix to the boxes

class monai.apps.detection.transforms.array.BoxToMask(bg_label=-1, ellipse_mask=False)[source]#

Convert box to int16 mask image, which has the same size with the input image.

Parameters
  • bg_label (int) – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.

  • ellipse_mask (bool) –

    bool.

    • If True, it assumes the object shape is close to ellipse or ellipsoid.

    • If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.

    • If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.

class monai.apps.detection.transforms.array.ClipBoxToImage(remove_empty=False)[source]#

Clip the bounding boxes and the associated labels/scores to make sure they are within the image. There might be multiple arrays of labels/scores associated with one array of boxes.

Parameters

remove_empty (bool) – whether to remove the boxes and corresponding labels that are actually empty

class monai.apps.detection.transforms.array.ConvertBoxMode(src_mode=None, dst_mode=None)[source]#

This transform converts the boxes in src_mode to the dst_mode.

Parameters
  • src_mode (Union[str, BoxMode, Type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode().

  • dst_mode (Union[str, BoxMode, Type[BoxMode], None]) – target box mode. If it is not given, this func will assume it is StandardMode().

Note

StandardMode = CornerCornerModeTypeA, also represented as “xyxy” for 2D and “xyzxyz” for 3D.

src_mode and dst_mode can be:
  1. str: choose from BoxModeName, for example,
    • “xyxy”: boxes has format [xmin, ymin, xmax, ymax]

    • “xyzxyz”: boxes has format [xmin, ymin, zmin, xmax, ymax, zmax]

    • “xxyy”: boxes has format [xmin, xmax, ymin, ymax]

    • “xxyyzz”: boxes has format [xmin, xmax, ymin, ymax, zmin, zmax]

    • “xyxyzz”: boxes has format [xmin, ymin, xmax, ymax, zmin, zmax]

    • “xywh”: boxes has format [xmin, ymin, xsize, ysize]

    • “xyzwhd”: boxes has format [xmin, ymin, zmin, xsize, ysize, zsize]

    • “ccwh”: boxes has format [xcenter, ycenter, xsize, ysize]

    • “cccwhd”: boxes has format [xcenter, ycenter, zcenter, xsize, ysize, zsize]

  2. BoxMode class: choose from the subclasses of BoxMode, for example,
    • CornerCornerModeTypeA: equivalent to “xyxy” or “xyzxyz”

    • CornerCornerModeTypeB: equivalent to “xxyy” or “xxyyzz”

    • CornerCornerModeTypeC: equivalent to “xyxy” or “xyxyzz”

    • CornerSizeMode: equivalent to “xywh” or “xyzwhd”

    • CenterSizeMode: equivalent to “ccwh” or “cccwhd”

  3. BoxMode object: choose from the subclasses of BoxMode, for example,
    • CornerCornerModeTypeA(): equivalent to “xyxy” or “xyzxyz”

    • CornerCornerModeTypeB(): equivalent to “xxyy” or “xxyyzz”

    • CornerCornerModeTypeC(): equivalent to “xyxy” or “xyxyzz”

    • CornerSizeMode(): equivalent to “xywh” or “xyzwhd”

    • CenterSizeMode(): equivalent to “ccwh” or “cccwhd”

  4. None: will assume mode is StandardMode()

Example

boxes = torch.ones(10,4)
# convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize].
box_converter = ConvertBoxMode(src_mode="xyxy", dst_mode="ccwh")
box_converter(boxes)
class monai.apps.detection.transforms.array.ConvertBoxToStandardMode(mode=None)[source]#

Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].

Parameters

mode (Union[str, BoxMode, Type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .

Example

boxes = torch.ones(10,6)
# convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax]
box_converter = ConvertBoxToStandardMode(mode="xxyyzz")
box_converter(boxes)
class monai.apps.detection.transforms.array.FlipBox(spatial_axis=None)[source]#

Reverses the box coordinates along the given spatial axis. Preserves shape.

Parameters

spatial_axis (Union[Sequence[int], int, None]) – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.

class monai.apps.detection.transforms.array.MaskToBox(bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#

Convert int16 mask image to box, which has the same size with the input image. Pairs with monai.apps.detection.transforms.array.BoxToMask. Please make sure the same min_fg_label is used when using the two transforms in pairs.

Parameters
  • bg_label (int) – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.

  • box_dtype – output dtype for boxes

  • label_dtype – output dtype for labels

class monai.apps.detection.transforms.array.ResizeBox(spatial_size, size_mode='all', **kwargs)[source]#

Resize the input boxes when the corresponding image is resized to given spatial size (with scaling, not cropping/padding).

Parameters
  • spatial_size (Union[Sequence[int], int]) – expected shape of spatial dimensions after resize operation. if some components of the spatial_size are non-positive values, the transform will use the corresponding components of img size. For example, spatial_size=(32, -1) will be adapted to (32, 64) if the second spatial dimension size of img is 64.

  • size_mode (str) – should be “all” or “longest”, if “all”, will use spatial_size for all the spatial dims, if “longest”, rescale the image so that only the longest side is equal to specified spatial_size, which must be an int number in this case, keeping the aspect ratio of the initial image, refer to: https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/ #albumentations.augmentations.geometric.resize.LongestMaxSize.

  • kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

class monai.apps.detection.transforms.array.RotateBox90(k=1, spatial_axes=(0, 1))[source]#

Rotate a boxes by 90 degrees in the plane specified by axes. See box_ops.rot90_boxes for additional details

Parameters
  • k (int) – number of times to rotate by 90 degrees.

  • spatial_axes (Tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions. If axis is negative it counts from the last to the first axis.

class monai.apps.detection.transforms.array.SpatialCropBox(roi_center=None, roi_size=None, roi_start=None, roi_end=None, roi_slices=None)[source]#

General purpose box cropper when the corresponding image is cropped by SpatialCrop(*) with the same ROI. The difference is that we do not support negative indexing for roi_slices.

If a dimension of the expected ROI size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected ROI, and the cropped results of several images may not have exactly the same shape. It can support to crop ND spatial boxes.

The cropped region can be parameterised in various ways:
  • a list of slices for each spatial dimension (do not allow for use of negative indexing)

  • a spatial center and size

  • the start and end coordinates of the ROI

Parameters
  • roi_center (Union[Sequence[int], ndarray, Tensor, None]) – voxel coordinates for center of the crop ROI.

  • roi_size (Union[Sequence[int], ndarray, Tensor, None]) – size of the crop ROI, if a dimension of ROI size is bigger than image size, will not crop that dimension of the image.

  • roi_start (Union[Sequence[int], ndarray, Tensor, None]) – voxel coordinates for start of the crop ROI.

  • roi_end (Union[Sequence[int], ndarray, Tensor, None]) – voxel coordinates for end of the crop ROI, if a coordinate is out of image, use the end coordinate of image.

  • roi_slices (Optional[Sequence[slice]]) – list of slices for each of the spatial dimensions.

class monai.apps.detection.transforms.array.ZoomBox(zoom, keep_size=False, **kwargs)[source]#

Zooms an ND Box with same padding or slicing setting with Zoom().

Parameters
  • zoom (Union[Sequence[float], float]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.

  • keep_size (bool) – Should keep original size (padding/slicing if needed), default is True.

  • kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

A collection of dictionary-based wrappers around the “vanilla” transforms for box operations defined in monai.apps.detection.transforms.array.

Class names are ended with ‘d’ to denote dictionary-based transforms.

monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateD#

alias of AffineBoxToImageCoordinated

monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateDict#

alias of AffineBoxToImageCoordinated

class monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinated(box_keys, box_ref_image_keys, allow_missing_keys=False, image_meta_key=None, image_meta_key_postfix='meta_dict', affine_lps_to_ras=False)[source]#

Dictionary-based transform that converts box in world coordinate to image coordinate.

Parameters
  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_ref_image_keys (str) – The single key that represents the reference image to which box_keys are attached.

  • remove_empty – whether to remove the boxes that are actually empty

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

  • image_meta_key (Optional[str]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, affine, original_shape, etc. it is a string, map to the box_ref_image_key. if None, will try to construct meta_keys by box_ref_image_key_{meta_key_postfix}.

  • image_meta_key_postfix (Optional[str]) – if image_meta_keys=None, use box_ref_image_key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.

  • affine_lps_to_ras – default False. Yet if 1) the image is read by ITKReader, and 2) the ITKReader has affine_lps_to_ras=True, and 3) the box is in world coordinate, then set affine_lps_to_ras=True.

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.BoxToMaskD#

alias of BoxToMaskd

monai.apps.detection.transforms.dictionary.BoxToMaskDict#

alias of BoxToMaskd

class monai.apps.detection.transforms.dictionary.BoxToMaskd(box_keys, box_mask_keys, label_keys, box_ref_image_keys, min_fg_label, ellipse_mask=False, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.BoxToMask. Pairs with monai.apps.detection.transforms.dictionary.MaskToBoxd . Please make sure the same min_fg_label is used when using the two transforms in pairs. The output d[box_mask_key] will have background intensity 0, since the following operations may pad 0 on the border.

This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.

  1. use BoxToMaskd to covert boxes and labels to box_masks;

  2. do transforms, e.g., rotation or cropping, on images and box_masks together;

  3. use MaskToBoxd to convert box_masks back to boxes and labels.

Parameters
  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_mask_keys (Union[Collection[Hashable], Hashable]) – Keys to store output box mask results for transformation. Same length with box_keys.

  • label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Same length with box_keys.

  • box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.

  • min_fg_label (int) – min foreground box label.

  • ellipse_mask (bool) –

    bool.

    • If True, it assumes the object shape is close to ellipse or ellipsoid.

    • If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.

    • If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

Example

# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and image together.
import numpy as np
from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd
transforms = Compose(
    [
        BoxToMaskd(
            box_keys="boxes", label_keys="labels",
            box_mask_keys="box_mask", box_ref_image_keys="image",
            min_fg_label=0, ellipse_mask=True
        ),
        RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"],
            prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6,
            keep_size=True,padding_mode="zeros"
        ),
        RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False),
        MaskToBoxd(
            box_mask_keys="box_mask", box_keys="boxes",
            label_keys="labels", min_fg_label=0
        )
        DeleteItemsd(keys=["box_mask"]),
    ]
)
monai.apps.detection.transforms.dictionary.ClipBoxToImageD#

alias of ClipBoxToImaged

monai.apps.detection.transforms.dictionary.ClipBoxToImageDict#

alias of ClipBoxToImaged

class monai.apps.detection.transforms.dictionary.ClipBoxToImaged(box_keys, label_keys, box_ref_image_keys, remove_empty=True, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ClipBoxToImage.

Clip the bounding boxes and the associated labels/scores to makes sure they are within the image. There might be multiple keys of labels/scores associated with one key of boxes.

Parameters
  • box_keys (Union[Collection[Hashable], Hashable]) – The single key to pick box data for transformation. The box mode is assumed to be StandardMode.

  • label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Multiple keys are allowed.

  • box_ref_image_keys (Union[Collection[Hashable], Hashable]) – The single key that represents the reference image to which box_keys and label_keys are attached.

  • remove_empty (bool) – whether to remove the boxes that are actually empty

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

Example

ClipBoxToImaged(
    box_keys="boxes", box_ref_image_keys="image", label_keys=["labels", "scores"], remove_empty=True
)
monai.apps.detection.transforms.dictionary.ConvertBoxModeD#

alias of ConvertBoxModed

monai.apps.detection.transforms.dictionary.ConvertBoxModeDict#

alias of ConvertBoxModed

class monai.apps.detection.transforms.dictionary.ConvertBoxModed(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ConvertBoxMode.

This transform converts the boxes in src_mode to the dst_mode.

Example

data = {"boxes": torch.ones(10,4)}
# convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize].
box_converter = ConvertBoxModed(box_keys=["boxes"], src_mode="xyxy", dst_mode="ccwh")
box_converter(data)
__init__(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#
Parameters
  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick data for transformation.

  • src_mode (Union[str, BoxMode, Type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .

  • dst_mode (Union[str, BoxMode, Type[BoxMode], None]) – target box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

See also monai.apps.detection,transforms.array.ConvertBoxMode

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeD#

alias of ConvertBoxToStandardModed

monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeDict#

alias of ConvertBoxToStandardModed

class monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModed(box_keys, mode=None, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.ConvertBoxToStandardMode.

Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].

Example

data = {"boxes": torch.ones(10,6)}
# convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax]
box_converter = ConvertBoxToStandardModed(box_keys=["boxes"], mode="xxyyzz")
box_converter(data)
__init__(box_keys, mode=None, allow_missing_keys=False)[source]#
Parameters
  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick data for transformation.

  • mode (Union[str, BoxMode, Type[BoxMode], None]) – source box mode. If it is not given, this func will assume it is StandardMode(). It follows the same format with src_mode in ConvertBoxMode .

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

See also monai.apps.detection,transforms.array.ConvertBoxToStandardMode

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Union[ndarray, Tensor]]

monai.apps.detection.transforms.dictionary.FlipBoxD#

alias of FlipBoxd

monai.apps.detection.transforms.dictionary.FlipBoxDict#

alias of FlipBoxd

class monai.apps.detection.transforms.dictionary.FlipBoxd(image_keys, box_keys, box_ref_image_keys, spatial_axis=None, allow_missing_keys=False)[source]#

Dictionary-based transform that flip boxes and images.

Parameters
  • image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.

  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.

  • spatial_axis (Union[Sequence[int], int, None]) – Spatial axes along which to flip over. Default is None.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Tensor]

monai.apps.detection.transforms.dictionary.MaskToBoxD#

alias of MaskToBoxd

monai.apps.detection.transforms.dictionary.MaskToBoxDict#

alias of MaskToBoxd

class monai.apps.detection.transforms.dictionary.MaskToBoxd(box_keys, box_mask_keys, label_keys, min_fg_label, box_dtype=torch.float32, label_dtype=torch.int64, allow_missing_keys=False)[source]#

Dictionary-based wrapper of monai.apps.detection.transforms.array.MaskToBox. Pairs with monai.apps.detection.transforms.dictionary.BoxToMaskd . Please make sure the same min_fg_label is used when using the two transforms in pairs.

This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.

  1. use BoxToMaskd to covert boxes and labels to box_masks;

  2. do transforms, e.g., rotation or cropping, on images and box_masks together;

  3. use MaskToBoxd to convert box_masks back to boxes and labels.

Parameters
  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_mask_keys (Union[Collection[Hashable], Hashable]) – Keys to store output box mask results for transformation. Same length with box_keys.

  • label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Same length with box_keys.

  • min_fg_label (int) – min foreground box label.

  • box_dtype – output dtype for box_keys

  • label_dtype – output dtype for label_keys

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

Example

# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and images together.
import numpy as np
from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd
transforms = Compose(
    [
        BoxToMaskd(
            box_keys="boxes", label_keys="labels",
            box_mask_keys="box_mask", box_ref_image_keys="image",
            min_fg_label=0, ellipse_mask=True
        ),
        RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"],
            prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6,
            keep_size=True,padding_mode="zeros"
        ),
        RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False),
        MaskToBoxd(
            box_mask_keys="box_mask", box_keys="boxes",
            label_keys="labels", min_fg_label=0
        )
        DeleteItemsd(keys=["box_mask"]),
    ]
)
monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelD#

alias of RandCropBoxByPosNegLabeld

monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelDict#

alias of RandCropBoxByPosNegLabeld

class monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabeld(image_keys, box_keys, label_keys, spatial_size, pos=1.0, neg=1.0, num_samples=1, whole_box=True, thresh_image_key=None, image_threshold=0.0, fg_indices_key=None, bg_indices_key=None, meta_keys=None, meta_key_postfix='meta_dict', allow_smaller=False, allow_missing_keys=False)[source]#

Crop random fixed sized regions that contains foreground boxes. Suppose all the expected fields specified by image_keys have same shape, and add patch_index to the corresponding meta data. And will return a list of dictionaries for all the cropped images. If a dimension of the expected spatial size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected size, and the cropped results of several images may not have exactly the same shape.

Parameters
  • image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation. They need to have the same spatial size.

  • box_keys (str) – The single key to pick box data for transformation. The box mode is assumed to be StandardMode.

  • label_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the labels corresponding to the box_keys. Multiple keys are allowed.

  • spatial_size (Union[Sequence[int], int]) – the spatial size of the crop region e.g. [224, 224, 128]. if a dimension of ROI size is bigger than image size, will not crop that dimension of the image. if its components have non-positive values, the corresponding size of data[label_key] will be used. for example: if the spatial size of input data is [40, 40, 40] and spatial_size=[32, 64, -1], the spatial size of output data will be [32, 40, 40].

  • pos (float) – used with neg together to calculate the ratio pos / (pos + neg) for the probability to pick a foreground voxel as a center rather than a background voxel.

  • neg (float) – used with pos together to calculate the ratio pos / (pos + neg) for the probability to pick a foreground voxel as a center rather than a background voxel.

  • num_samples (int) – number of samples (crop regions) to take in each list.

  • whole_box (bool) – Bool, default True, whether we prefer to contain at least one whole box in the cropped foreground patch. Even if True, it is still possible to get partial box if there are multiple boxes in the image.

  • thresh_image_key (Optional[str]) – if thresh_image_key is not None, use label == 0 & thresh_image > image_threshold to select the negative sample(background) center. so the crop center will only exist on valid image area.

  • image_threshold (float) – if enabled thresh_image_key, use thresh_image > image_threshold to determine the valid image content area.

  • fg_indices_key (Optional[str]) – if provided pre-computed foreground indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.

  • bg_indices_key (Optional[str]) – if provided pre-computed background indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.

  • meta_keys (Union[Collection[Hashable], Hashable, None]) – explicitly indicate the key of the corresponding metadata dictionary. used to add patch_index to the meta dict. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.

  • meta_key_postfix (str) – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. used to add patch_index to the meta dict.

  • allow_smaller (bool) – if False, an exception will be raised if the image is smaller than the requested ROI in any dimension. If True, any smaller dimensions will be set to match the cropped size (i.e., no cropping in that dimension).

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

randomize(boxes, image_size, fg_indices=None, bg_indices=None, thresh_image=None)[source]#

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

None

monai.apps.detection.transforms.dictionary.RandFlipBoxD#

alias of RandFlipBoxd

monai.apps.detection.transforms.dictionary.RandFlipBoxDict#

alias of RandFlipBoxd

class monai.apps.detection.transforms.dictionary.RandFlipBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, spatial_axis=None, allow_missing_keys=False)[source]#

Dictionary-based transform that randomly flip boxes and images with the given probabilities.

Parameters
  • image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.

  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.

  • prob (float) – Probability of flipping.

  • spatial_axis (Union[Sequence[int], int, None]) – Spatial axes along which to flip over. Default is None.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Tensor]

set_random_state(seed=None, state=None)[source]#

Set the random state locally, to control the randomness, the derived classes should use self.R instead of np.random to introduce random factors.

Parameters
  • seed (Optional[int]) – set the random state with an integer seed.

  • state (Optional[RandomState]) – set the random state with a np.random.RandomState object.

Raises

TypeError – When state is not an Optional[np.random.RandomState].

Return type

RandFlipBoxd

Returns

a Randomizable instance.

monai.apps.detection.transforms.dictionary.RandRotateBox90D#

alias of RandRotateBox90d

monai.apps.detection.transforms.dictionary.RandRotateBox90Dict#

alias of RandRotateBox90d

class monai.apps.detection.transforms.dictionary.RandRotateBox90d(image_keys, box_keys, box_ref_image_keys, prob=0.1, max_k=3, spatial_axes=(0, 1), allow_missing_keys=False)[source]#

With probability prob, input boxes and images are rotated by 90 degrees in the plane specified by spatial_axes.

Parameters
  • image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.

  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.

  • prob (float) – probability of rotating. (Default 0.1, with 10% probability it returns a rotated array.)

  • max_k (int) – number of rotations will be sampled from np.random.randint(max_k) + 1. (Default 3)

  • spatial_axes (Tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Tensor]

monai.apps.detection.transforms.dictionary.RandZoomBoxD#

alias of RandZoomBoxd

monai.apps.detection.transforms.dictionary.RandZoomBoxDict#

alias of RandZoomBoxd

class monai.apps.detection.transforms.dictionary.RandZoomBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, min_zoom=0.9, max_zoom=1.1, mode=InterpolateMode.AREA, padding_mode=NumpyPadMode.EDGE, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#

Dictionary-based transform that randomly zooms input boxes and images with given probability within given zoom range.

Parameters
  • image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.

  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.

  • prob (float) – Probability of zooming.

  • min_zoom (Union[Sequence[float], float]) – Min zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, min_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.

  • max_zoom (Union[Sequence[float], float]) – Max zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, max_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.

  • mode (Union[Sequence[str], str]) – {"nearest", "nearest-exact", "linear", "bilinear", "bicubic", "trilinear", "area"} The interpolation mode. Defaults to "area". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key in keys.

  • padding_mode (Union[Sequence[str], str]) – available modes for numpy array:{"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html

  • align_corners (Union[Sequence[Optional[bool]], bool, None]) – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key in keys.

  • keep_size (bool) – Should keep original size (pad if needed), default is True.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

  • kwargs – other args for np.pad API, note that np.pad treats channel dimension as the first dimension. more details: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Tensor]

set_random_state(seed=None, state=None)[source]#

Set the random state locally, to control the randomness, the derived classes should use self.R instead of np.random to introduce random factors.

Parameters
  • seed (Optional[int]) – set the random state with an integer seed.

  • state (Optional[RandomState]) – set the random state with a np.random.RandomState object.

Raises

TypeError – When state is not an Optional[np.random.RandomState].

Return type

RandZoomBoxd

Returns

a Randomizable instance.

monai.apps.detection.transforms.dictionary.RotateBox90D#

alias of RotateBox90d

monai.apps.detection.transforms.dictionary.RotateBox90Dict#

alias of RotateBox90d

class monai.apps.detection.transforms.dictionary.RotateBox90d(image_keys, box_keys, box_ref_image_keys, k=1, spatial_axes=(0, 1), allow_missing_keys=False)[source]#

Input boxes and images are rotated by 90 degrees in the plane specified by spatial_axes for k times

Parameters
  • image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.

  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.

  • k (int) – number of times to rotate by 90 degrees.

  • spatial_axes (Tuple[int, int]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default (0, 1), this is the first two axis in spatial dimensions.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Tensor]

monai.apps.detection.transforms.dictionary.ZoomBoxD#

alias of ZoomBoxd

monai.apps.detection.transforms.dictionary.ZoomBoxDict#

alias of ZoomBoxd

class monai.apps.detection.transforms.dictionary.ZoomBoxd(image_keys, box_keys, box_ref_image_keys, zoom, mode=InterpolateMode.AREA, padding_mode=NumpyPadMode.EDGE, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#

Dictionary-based transform that zooms input boxes and images with the given zoom scale.

Parameters
  • image_keys (Union[Collection[Hashable], Hashable]) – Keys to pick image data for transformation.

  • box_keys (Union[Collection[Hashable], Hashable]) – Keys to pick box data for transformation. The box mode is assumed to be StandardMode.

  • box_ref_image_keys (Union[Collection[Hashable], Hashable]) – Keys that represent the reference images to which box_keys are attached.

  • zoom (Union[Sequence[float], float]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.

  • mode (Union[Sequence[str], str]) – {"nearest", "nearest-exact", "linear", "bilinear", "bicubic", "trilinear", "area"} The interpolation mode. Defaults to "area". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key in keys.

  • padding_mode (Union[Sequence[str], str]) – available modes for numpy array:{"constant", "edge", "linear_ramp", "maximum", "mean", "median", "minimum", "reflect", "symmetric", "wrap", "empty"} available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html

  • align_corners (Union[Sequence[Optional[bool]], bool, None]) – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key in keys.

  • keep_size (bool) – Should keep original size (pad if needed), default is True.

  • allow_missing_keys (bool) – don’t raise exception if key is missing.

  • kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.

inverse(data)[source]#

Inverse of __call__.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

Dict[Hashable, Tensor]

Anchor#

This script is adapted from https://github.com/pytorch/vision/blob/release/0.12/torchvision/models/detection/anchor_utils.py

class monai.apps.detection.utils.anchor_utils.AnchorGenerator(sizes=((20, 30, 40),), aspect_ratios=(((0.5, 1), (1, 0.5)),), indexing='ij')[source]#

This module is modified from torchvision to support both 2D and 3D images.

Module that generates anchors for a set of feature maps and image sizes.

The module support computing anchors at multiple sizes and aspect ratios per feature map.

sizes and aspect_ratios should have the same number of elements, and it should correspond to the number of feature maps.

sizes[i] and aspect_ratios[i] can have an arbitrary number of elements. For 2D images, anchor width and height w:h = 1:aspect_ratios[i,j] For 3D images, anchor width, height, and depth w:h:d = 1:aspect_ratios[i,j,0]:aspect_ratios[i,j,1]

AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors per spatial location for feature map i.

Parameters
  • sizes (Sequence[Sequence[int]]) – base size of each anchor. len(sizes) is the number of feature maps, i.e., the number of output levels for the feature pyramid network (FPN). Each element of sizes is a Sequence which represents several anchor sizes for each feature map.

  • aspect_ratios (Sequence) – the aspect ratios of anchors. len(aspect_ratios) = len(sizes). For 2D images, each element of aspect_ratios[i] is a Sequence of float. For 3D images, each element of aspect_ratios[i] is a Sequence of 2 value Sequence.

  • indexing (str) –

    choose from {'ij', 'xy'}, optional, Matrix ('ij', default and recommended) or Cartesian ('xy') indexing of output.

    • Matrix ('ij', default and recommended) indexing keeps the original axis not changed.

    • To use other monai detection components, please set indexing = 'ij'.

    • Cartesian ('xy') indexing swaps axis 0 and 1.

    • For 2D cases, monai AnchorGenerator(sizes, aspect_ratios, indexing='xy') and torchvision.models.detection.anchor_utils.AnchorGenerator(sizes, aspect_ratios) are equivalent.

Reference:.

https://github.com/pytorch/vision/blob/release/0.12/torchvision/models/detection/anchor_utils.py

Example

# 2D example inputs for a 2-level feature maps
sizes = ((10,12,14,16), (20,24,28,32))
base_aspect_ratios = (1., 0.5,  2.)
aspect_ratios = (base_aspect_ratios, base_aspect_ratios)
anchor_generator = AnchorGenerator(sizes, aspect_ratios)

# 3D example inputs for a 2-level feature maps
sizes = ((10,12,14,16), (20,24,28,32))
base_aspect_ratios = ((1., 1.), (1., 0.5), (0.5, 1.), (2., 2.))
aspect_ratios = (base_aspect_ratios, base_aspect_ratios)
anchor_generator = AnchorGenerator(sizes, aspect_ratios)
forward(images, feature_maps)[source]#

Generate anchor boxes for each image.

Parameters
  • images (Tensor) – sized (B, C, W, H) or (B, C, W, H, D)

  • feature_maps (List[Tensor]) – for FPN level i, feature_maps[i] is sized (B, C_i, W_i, H_i) or (B, C_i, W_i, H_i, D_i). This input argument does not have to be the actual feature maps. Any list variable with the same (C_i, W_i, H_i) or (C_i, W_i, H_i, D_i) as feature maps works.

Return type

List[Tensor]

Returns

A list with length of B. Each element represents the anchors for this image. The B elements are identical.

Example

images = torch.zeros((3,1,128,128,128))
feature_maps = [torch.zeros((3,6,64,64,32)), torch.zeros((3,6,32,32,16))]
anchor_generator(images, feature_maps)
generate_anchors(scales, aspect_ratios, dtype=torch.float32, device=None)[source]#

Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.

Parameters
  • scales (Sequence) – a sequence which represents several anchor sizes for the current feature map.

  • aspect_ratios (Sequence) – a sequence which represents several aspect_ratios for the current feature map. For 2D images, it is a Sequence of float aspect_ratios[j], anchor width and height w:h = 1:aspect_ratios[j]. For 3D images, it is a Sequence of 2 value Sequence aspect_ratios[j,0] and aspect_ratios[j,1], anchor width, height, and depth w:h:d = 1:aspect_ratios[j,0]:aspect_ratios[j,1]

  • dtype (dtype) – target data type of the output Tensor.

  • device (Optional[device]) – target device to put the output Tensor data.

  • Returns – For each s in scales, returns [s, s*aspect_ratios[j]] for 2D images, and [s, s*aspect_ratios[j,0],s*aspect_ratios[j,1]] for 3D images.

Return type

Tensor

grid_anchors(grid_sizes, strides)[source]#

Every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:spatial_dims) corresponds to a feature map. It outputs g[i] anchors that are s[i] distance apart in direction i, with the same dimensions as a.

Parameters
  • grid_sizes (List[List[int]]) – spatial size of the feature maps

  • strides (List[List[Tensor]]) – strides of the feature maps regarding to the original image

Example

grid_sizes = [[100,100],[50,50]]
strides = [[torch.tensor(2),torch.tensor(2)], [torch.tensor(4),torch.tensor(4)]]
Return type

List[Tensor]

num_anchors_per_location()[source]#

Return number of anchor shapes for each feature map.

set_cell_anchors(dtype, device)[source]#

Convert each element in self.cell_anchors to dtype and send to device.

class monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(feature_map_scales=(1, 2, 4, 8), base_anchor_shapes=((32, 32, 32), (48, 20, 20), (20, 48, 20), (20, 20, 48)), indexing='ij')[source]#

Module that generates anchors for a set of feature maps and image sizes, inherited from AnchorGenerator

The module support computing anchors at multiple base anchor shapes per feature map.

feature_map_scales should have the same number of elements with the number of feature maps.

base_anchor_shapes can have an arbitrary number of elements. For 2D images, each element represents anchor width and height [w,h]. For 2D images, each element represents anchor width, height, and depth [w,h,d].

AnchorGenerator will output a set of len(base_anchor_shapes) anchors per spatial location for feature map i.

Parameters
  • feature_map_scales (Union[Sequence[int], Sequence[float]]) – scale of anchors for each feature map, i.e., each output level of the feature pyramid network (FPN). len(feature_map_scales) is the number of feature maps. scale[i]*base_anchor_shapes represents the anchor shapes for feature map i.

  • base_anchor_shapes (Union[Sequence[Sequence[int]], Sequence[Sequence[float]]]) – a sequence which represents several anchor shapes for one feature map. For N-D images, it is a Sequence of N value Sequence.

  • indexing (str) – choose from {‘xy’, ‘ij’}, optional Cartesian (‘xy’) or matrix (‘ij’, default) indexing of output. Cartesian (‘xy’) indexing swaps axis 0 and 1, which is the setting inside torchvision. matrix (‘ij’, default) indexing keeps the original axis not changed. See also indexing in https://pytorch.org/docs/stable/generated/torch.meshgrid.html

Example

# 2D example inputs for a 2-level feature maps
feature_map_scales = (1, 2)
base_anchor_shapes = ((10, 10), (6, 12), (12, 6))
anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes)

# 3D example inputs for a 2-level feature maps
feature_map_scales = (1, 2)
base_anchor_shapes = ((10, 10, 10), (12, 12, 8), (10, 10, 6), (16, 16, 10))
anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes)
static generate_anchors_using_shape(anchor_shapes, dtype=torch.float32, device=None)[source]#

Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.

Parameters
  • anchor_shapes (Tensor) – [w, h] or [w, h, d], sized (N, spatial_dims), represents N anchor shapes for the current feature map.

  • dtype (dtype) – target data type of the output Tensor.

  • device (Optional[device]) – target device to put the output Tensor data.

Return type

Tensor

Returns

For 2D images, returns [-w/2, -h/2, w/2, h/2]; For 3D images, returns [-w/2, -h/2, -d/2, w/2, h/2, d/2]

Matcher#

The functions in this script are adapted from nnDetection, https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/core/boxes/matcher.py which is adapted from torchvision.

These are the changes compared with nndetection: 1) comments and docstrings; 2) reformat; 3) add a debug option to ATSSMatcher to help the users to tune parameters; 4) add a corner case return in ATSSMatcher.compute_matches; 5) add support for float16 cpu

class monai.apps.detection.utils.ATSS_matcher.ATSSMatcher(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#
__init__(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#

Compute matching based on ATSS https://arxiv.org/abs/1912.02424 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection

Parameters
  • num_candidates (int) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.

  • similarity_fn (Callable[[Tensor, Tensor], Tensor]) – function for similarity computation between boxes and anchors

  • center_in_gt (bool) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.

  • debug (bool) – if True, will print the matcher threshold in order to tune num_candidates and center_in_gt.

compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#

Compute matches according to ATTS for a single image Adapted from (https://github.com/sfzhang15/ATSS/blob/79dfb28bd1/atss_core/modeling/rpn/atss/loss.py#L180-L184)

Parameters
  • boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

  • anchors (Tensor) – anchors to match Mx4 or Mx6, also assumed to be StandardMode.

  • num_anchors_per_level (Sequence[int]) – number of anchors per feature pyramid level

  • num_anchors_per_loc (int) – number of anchors per position

Return type

Tuple[Tensor, Tensor]

Returns

  • matrix which contains the similarity from each boxes to each anchor [N, M]

  • vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]

Note

StandardMode = CornerCornerModeTypeA, also represented as “xyxy” ([xmin, ymin, xmax, ymax]) for 2D and “xyzxyz” ([xmin, ymin, zmin, xmax, ymax, zmax]) for 3D.

class monai.apps.detection.utils.ATSS_matcher.Matcher(similarity_fn=<function box_iou>)[source]#

Base class of Matcher, which matches boxes and anchors to each other

Parameters

similarity_fn (Callable[[Tensor, Tensor], Tensor]) – function for similarity computation between boxes and anchors

compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#

Compute matches

Parameters
  • boxes (Tensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode

  • anchors (Tensor) – anchors to match Mx4 or Mx6, also assumed to be StandardMode.

  • num_anchors_per_level (Sequence[int]) – number of anchors per feature pyramid level

  • num_anchors_per_loc (int) – number of anchors per position

Return type

Tuple[Tensor, Tensor]

Returns

  • matrix which contains the similarity from each boxes to each anchor [N, M]

  • vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]

Box coder#

This script is modified from torchvision to support N-D images,

https://github.com/pytorch/vision/blob/main/torchvision/models/detection/_utils.py

class monai.apps.detection.utils.box_coder.BoxCoder(weights, boxes_xform_clip=None)[source]#

This class encodes and decodes a set of bounding boxes into the representation used for training the regressors.

Parameters
  • weights (Tuple[float]) – 4-element tuple or 6-element tuple

  • boxes_xform_clip (Optional[float]) – high threshold to prevent sending too large values into torch.exp()

Example

box_coder = BoxCoder(weights=[1., 1., 1., 1., 1., 1.])
gt_boxes = torch.tensor([[1,2,1,4,5,6],[1,3,2,7,8,9]])
proposals = gt_boxes + torch.rand(gt_boxes.shape)
rel_gt_boxes = box_coder.encode_single(gt_boxes, proposals)
gt_back = box_coder.decode_single(rel_gt_boxes, proposals)
# We expect gt_back to be equal to gt_boxes
decode(rel_codes, reference_boxes)[source]#

From a set of original reference_boxes and encoded relative box offsets,

Parameters
  • rel_codes (Tensor) – encoded boxes, Nx4 or Nx6 torch tensor.

  • boxes – a list of reference boxes, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to be StandardMode

Return type

Tensor

Returns

decoded boxes, Nx1x4 or Nx1x6 torch tensor. The box mode will be StandardMode

decode_single(rel_codes, reference_boxes)[source]#

From a set of original boxes and encoded relative box offsets,

Parameters
  • rel_codes (Tensor) – encoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor.

  • reference_boxes (Tensor) – reference boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

Return type

Tensor

Returns

decoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor. The box mode will to be StandardMode

encode(gt_boxes, proposals)[source]#

Encode a set of proposals with respect to some ground truth (gt) boxes.

Parameters
  • gt_boxes (Sequence[Tensor]) – list of gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

  • proposals (Sequence[Tensor]) – list of boxes to be encoded, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to be StandardMode

Return type

Tuple[Tensor]

Returns

A tuple of encoded gt, target of box regression that is used to

convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

encode_single(gt_boxes, proposals)[source]#

Encode proposals with respect to ground truth (gt) boxes.

Parameters
  • gt_boxes (Tensor) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

  • proposals (Tensor) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

Return type

Tensor

Returns

encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

monai.apps.detection.utils.box_coder.encode_boxes(gt_boxes, proposals, weights)[source]#

Encode a set of proposals with respect to some reference ground truth (gt) boxes.

Parameters
  • gt_boxes (Tensor) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

  • proposals (Tensor) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to be StandardMode

  • weights (Tensor) – the weights for (cx, cy, w, h) or (cx,cy,cz, w,h,d)

Return type

Tensor

Returns

encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.

Detection Utilities#

monai.apps.detection.utils.detector_utils.check_input_images(input_images, spatial_dims)[source]#

Validate the input dimensionality (raise a ValueError if invalid).

Parameters
  • input_images (Union[List[Tensor], Tensor]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).

  • spatial_dims (int) – number of spatial dimensions of the images, 2 or 3.

Return type

None

monai.apps.detection.utils.detector_utils.check_training_targets(input_images, targets, spatial_dims, target_label_key, target_box_key)[source]#

Validate the input images/targets during training (raise a ValueError if invalid).

Parameters
  • input_images (Union[List[Tensor], Tensor]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).

  • targets (Optional[List[Dict[str, Tensor]]]) – a list of dict. Each dict with two keys: target_box_key and target_label_key, ground-truth boxes present in the image.

  • spatial_dims (int) – number of spatial dimensions of the images, 2 or 3.

  • target_label_key (str) – the expected key of target labels.

  • target_box_key (str) – the expected key of target boxes.

Return type

None

monai.apps.detection.utils.detector_utils.pad_images(input_images, spatial_dims, size_divisible, mode=PytorchPadMode.CONSTANT, **kwargs)[source]#

Pad the input images, so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0

Parameters
  • input_images (Union[List[Tensor], Tensor]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).

  • spatial_dims (int) – number of spatial dimensions of the images, 2D or 3D.

  • size_divisible (Union[int, Sequence[int]]) – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.

  • mode (Union[PytorchPadMode, str]) – available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html

  • kwargs – other arguments for torch.pad function.

Return type

Tuple[Tensor, List[List[int]]]

Returns

  • images, a (B, C, H, W) or (B, C, H, W, D) Tensor

  • image_sizes, the original spatial size of each image

monai.apps.detection.utils.detector_utils.preprocess_images(input_images, spatial_dims, size_divisible, mode=PytorchPadMode.CONSTANT, **kwargs)[source]#

Preprocess the input images, including

  • validate of the inputs

  • pad the inputs so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0

Parameters
  • input_images (Union[List[Tensor], Tensor]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).

  • spatial_dims (int) – number of spatial dimensions of the images, 2 or 3.

  • size_divisible (Union[int, Sequence[int]]) – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.

  • mode (Union[PytorchPadMode, str]) – available modes for PyTorch Tensor: {"constant", "reflect", "replicate", "circular"}. One of the listed string values or a user supplied function. Defaults to "constant". See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html

  • kwargs – other arguments for torch.pad function.

Return type

Tuple[Tensor, List[List[int]]]

Returns

  • images, a (B, C, H, W) or (B, C, H, W, D) Tensor

  • image_sizes, the original spatial size of each image

monai.apps.detection.utils.predict_utils.check_dict_values_same_length(head_outputs, keys=None)[source]#

We expect the values in head_outputs: Dict[str, List[Tensor]] to have the same length. Will raise ValueError if not.

Parameters
  • head_outputs (Dict[str, List[Tensor]]) – a Dict[str, List[Tensor]] or Dict[str, Tensor]

  • keys (Optional[List[str]]) – the keys in head_output that need to have values (List) with same length. If not provided, will use head_outputs.keys().

Return type

None

monai.apps.detection.utils.predict_utils.ensure_dict_value_to_list_(head_outputs, keys=None)[source]#

An in-place function. We expect head_outputs to be Dict[str, List[Tensor]]. Yet if it is Dict[str, Tensor], this func converts it to Dict[str, List[Tensor]]. It will be modified in-place.

Parameters
  • head_outputs (Dict[str, List[Tensor]]) – a Dict[str, List[Tensor]] or Dict[str, Tensor], will be modifier in-place

  • keys (Optional[List[str]]) – the keys in head_output that need to have value type List[Tensor]. If not provided, will use head_outputs.keys().

Return type

None

monai.apps.detection.utils.predict_utils.predict_with_inferer(images, network, keys, inferer=None)[source]#

Predict network dict output with an inferer. Compared with directly output network(images), it enables a sliding window inferer that can be used to handle large inputs.

Parameters
  • images (Tensor) – input of the network, Tensor sized (B, C, H, W) or (B, C, H, W, D)

  • network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].

  • keys (List[str]) – the keys in the output dict, should be network output keys or a subset of them.

  • inferer (Optional[SlidingWindowInferer]) – a SlidingWindowInferer to handle large inputs.

Return type

Dict[str, List[Tensor]]

Returns

The predicted head_output from network, a Dict[str, List[Tensor]]

Example

# define a naive network
import torch
import monai
class NaiveNet(torch.nn.Module):
    def __init__(self, ):
        super().__init__()

    def forward(self, images: torch.Tensor):
        return {"cls": torch.randn(images.shape), "box_reg": [torch.randn(images.shape)]}

# create a predictor
network = NaiveNet()
inferer = monai.inferers.SlidingWindowInferer(
    roi_size = (128, 128, 128),
    overlap = 0.25,
    cache_roi_weight_map = True,
)
network_output_keys=["cls", "box_reg"]
images = torch.randn((2, 3, 512, 512, 512))  # a large input
head_outputs = predict_with_inferer(images, network, network_output_keys, inferer)

Inference box selector#

Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py

class monai.apps.detection.utils.box_selector.BoxSelector(box_overlap_metric=<function box_iou>, apply_sigmoid=True, score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300)[source]#

Box selector which selects the predicted boxes. The box selection is performed with the following steps:

  1. For each level, discard boxes with scores less than self.score_thresh.

  2. For each level, keep boxes with top self.topk_candidates_per_level scores.

  3. For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.

  4. For the whole image, keep boxes with top self.detections_per_img scores.

Parameters
  • apply_sigmoid (bool) – whether to apply sigmoid to get scores from classification logits

  • score_thresh (float) – no box with scores less than score_thresh will be kept

  • topk_candidates_per_level (int) – max number of boxes to keep for each level

  • nms_thresh (float) – box overlapping threshold for NMS

  • detections_per_img (int) – max number of boxes to keep for each image

Example

input_param = {
    "apply_sigmoid": True,
    "score_thresh": 0.1,
    "topk_candidates_per_level": 2,
    "nms_thresh": 0.1,
    "detections_per_img": 5,
}
box_selector = BoxSelector(**input_param)
boxes = [torch.randn([3,6]), torch.randn([7,6])]
logits = [torch.randn([3,3]), torch.randn([7,3])]
spatial_size = (8,8,8)
selected_boxes, selected_scores, selected_labels = box_selector.select_boxes_per_image(
    boxes, logits, spatial_size
)
select_boxes_per_image(boxes_list, logits_list, spatial_size)[source]#

Postprocessing to generate detection result from classification logits and boxes.

The box selection is performed with the following steps:

  1. For each level, discard boxes with scores less than self.score_thresh.

  2. For each level, keep boxes with top self.topk_candidates_per_level scores.

  3. For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.

  4. For the whole image, keep boxes with top self.detections_per_img scores.

Parameters
  • boxes_list (List[Tensor]) – list of predicted boxes from a single image, each element i is a Tensor sized (N_i, 2*spatial_dims)

  • logits_list (List[Tensor]) – list of predicted classification logits from a single image, each element i is a Tensor sized (N_i, num_classes)

  • spatial_size (Union[List[int], Tuple[int]]) – spatial size of the image

Return type

Tuple[Tensor, Tensor, Tensor]

Returns

  • selected boxes, Tensor sized (P, 2*spatial_dims)

  • selected_scores, Tensor sized (P, )

  • selected_labels, Tensor sized (P, )

select_top_score_idx_per_level(logits)[source]#

Select indices with highest scores.

The indice selection is performed with the following steps:

  1. If self.apply_sigmoid, get scores by applying sigmoid to logits. Otherwise, use logits as scores.

  2. Discard indices with scores less than self.score_thresh

  3. Keep indices with top self.topk_candidates_per_level scores

Parameters

logits (Tensor) – predicted classification logits, Tensor sized (N, num_classes)

Returns

selected M indices, Tensor sized (M, ) - selected_scores: selected M scores, Tensor sized (M, ) - selected_labels: selected M labels, Tensor sized (M, )

Return type

  • topk_idxs

Detection metrics#

This script is almost same with https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/evaluator/detection/coco.py The changes include 1) code reformatting, 2) docstrings.

This script is almost same with https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/evaluator/detection/matching.py The changes include 1) code reformatting, 2) docstrings, 3) allow input args gt_ignore to be optional. (If so, no GT boxes will be ignored.)

monai.apps.detection.metrics.matching.matching_batch(iou_fn, iou_thresholds, pred_boxes, pred_classes, pred_scores, gt_boxes, gt_classes, gt_ignore=None, max_detections=100)[source]#

Match boxes of a batch to corresponding ground truth for each category independently.

Parameters
  • iou_fn (Callable[[ndarray, ndarray], ndarray]) – compute overlap for each pair

  • iou_thresholds (Sequence[float]) – defined which IoU thresholds should be evaluated

  • pred_boxes (Sequence[ndarray]) – predicted boxes from single batch; List[[D, dim * 2]], D number of predictions

  • pred_classes (Sequence[ndarray]) – predicted classes from a single batch; List[[D]], D number of predictions

  • pred_scores (Sequence[ndarray]) – predicted score for each bounding box; List[[D]], D number of predictions

  • gt_boxes (Sequence[ndarray]) – ground truth boxes; List[[G, dim * 2]], G number of ground truth

  • gt_classes (Sequence[ndarray]) – ground truth classes; List[[G]], G number of ground truth

  • gt_ignore (Union[Sequence[Sequence[bool]], Sequence[ndarray], None]) – specified if which ground truth boxes are not counted as true positives. If not given, when use all the gt_boxes. (detections which match theses boxes are not counted as false positives either); List[[G]], G number of ground truth

  • max_detections (int) – maximum number of detections which should be evaluated

Return type

List[Dict[int, Dict[str, ndarray]]]

Returns

List[Dict[int, Dict[str, np.ndarray]]], each Dict[str, np.ndarray] corresponds to an image. Dict has the following keys.

  • dtMatches: matched detections [T, D], where T = number of thresholds, D = number of detections

  • gtMatches: matched ground truth boxes [T, G], where T = number of thresholds, G = number of ground truth

  • dtScores: prediction scores [D] detection scores

  • gtIgnore: ground truth boxes which should be ignored [G] indicate whether ground truth should be ignored

  • dtIgnore: detections which should be ignored [T, D], indicate which detections should be ignored

Example

from monai.data.box_utils import box_iou
from monai.apps.detection.metrics.coco import COCOMetric
from monai.apps.detection.metrics.matching import matching_batch
# 3D example outputs of one image from detector
val_outputs_all = [
        {"boxes": torch.tensor([[1,1,1,3,4,5]],dtype=torch.float16),
        "labels": torch.randint(3,(1,)),
        "scores": torch.randn((1,)).absolute()},
]
val_targets_all = [
        {"boxes": torch.tensor([[1,1,1,2,6,4]],dtype=torch.float16),
        "labels": torch.randint(3,(1,))},
]

coco_metric = COCOMetric(
    classes=['c0','c1','c2'], iou_list=[0.1], max_detection=[10]
)
results_metric = matching_batch(
    iou_fn=box_iou,
    iou_thresholds=coco_metric.iou_thresholds,
    pred_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_outputs_all],
    pred_classes=[val_data_i["labels"].numpy() for val_data_i in val_outputs_all],
    pred_scores=[val_data_i["scores"].numpy() for val_data_i in val_outputs_all],
    gt_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_targets_all],
    gt_classes=[val_data_i["labels"].numpy() for val_data_i in val_targets_all],
)
val_metric_dict = coco_metric(results_metric)
print(val_metric_dict)

Reconstruction#

ConvertToTensorComplex#

monai.apps.reconstruction.complex_utils.convert_to_tensor_complex(data, dtype=None, device=None, wrap_sequence=True, track_meta=False)[source]#

Convert complex-valued data to a 2-channel PyTorch tensor. The real and imaginary parts are stacked along the last dimension. This function relies on ‘monai.utils.type_conversion.convert_to_tensor’

Parameters
  • data – input data can be PyTorch Tensor, numpy array, list, int, and float. will convert Tensor, Numpy array, float, int, bool to Tensor, strings and objects keep the original. for list, convert every item to a Tensor if applicable.

  • dtype (Optional[dtype]) – target data type to when converting to Tensor.

  • device (Optional[device]) – target device to put the converted Tensor data.

  • wrap_sequence (bool) – if False, then lists will recursively call this function. E.g., [1, 2] -> [tensor(1), tensor(2)]. If True, then [1, 2] -> tensor([1, 2]).

  • track_meta (bool) – whether to track the meta information, if True, will convert to MetaTensor. default to False.

Return type

Tensor

Returns

PyTorch version of the data

Example

import numpy as np
data = np.array([ [1+1j, 1-1j], [2+2j, 2-2j] ])
# the following line prints (2,2)
print(data.shape)
# the following line prints torch.Size([2, 2, 2])
print(convert_to_tensor_complex(data).shape)

ComplexAbs#

monai.apps.reconstruction.complex_utils.complex_abs(x)[source]#

Compute the absolute value of a complex array.

Parameters

x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.

Return type

Union[ndarray, Tensor]

Returns

Absolute value along the last dimention

Example

import numpy as np
x = np.array([3,4])[np.newaxis]
# the following line prints 5
print(complex_abs(x))

RootSumOfSquares#

monai.apps.reconstruction.mri_utils.root_sum_of_squares(x, spatial_dim)[source]#

Compute the root sum of squares (rss) of the data (typically done for multi-coil MRI samples)

Parameters
  • x (Union[ndarray, Tensor]) – Input array/tensor

  • spatial_dim (int) – dimension along which rss is applied

Return type

Union[ndarray, Tensor]

Returns

rss of x along spatial_dim

Example

import numpy as np
x = np.ones([2,3])
# the following line prints array([1.41421356, 1.41421356, 1.41421356])
print(rss(x,spatial_dim=0))

ComplexMul#

monai.apps.reconstruction.complex_utils.complex_mul(x, y)[source]#

Compute complex-valued multiplication. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)

Parameters
  • x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.

  • y (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.

Return type

Union[ndarray, Tensor]

Returns

Complex multiplication of x and y

Example

import numpy as np
x = np.array([[1,2],[3,4]])
y = np.array([[1,1],[1,1]])
# the following line prints array([[-1,  3], [-1,  7]])
print(complex_mul(x,y))

ComplexConj#

monai.apps.reconstruction.complex_utils.complex_conj(x)[source]#

Compute complex conjugate of an/a array/tensor. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)

Parameters

x (Union[ndarray, Tensor]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.

Return type

Union[ndarray, Tensor]

Returns

Complex conjugate of x

Example

import numpy as np
x = np.array([[1,2],[3,4]])
# the following line prints array([[ 1, -2], [ 3, -4]])
print(complex_conj(x))