Applications#
Datasets#
- class monai.apps.MedNISTDataset(root_dir, section, transform=(), download=False, seed=0, val_frac=0.1, test_frac=0.1, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#
The Dataset to automatically download MedNIST data and generate items for training, validation or test. It’s based on CacheDataset to accelerate the training process.
- Parameters
root_dir (
Union
[str
,PathLike
]) – target directory to download and load MedNIST dataset.section (
str
) – expected data section, can be: training, validation or test.transform (
Union
[Sequence
[Callable
],Callable
]) – transforms to execute operations on input data.download (
bool
) – whether to download and extract the MedNIST from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy MedNIST.tar.gz file or MedNIST folder to root directory.seed (
int
) – random seed to randomly split training, validation and test datasets, default is 0.val_frac (
float
) – percentage of of validation fraction in the whole dataset, default is 0.1.test_frac (
float
) – percentage of of test fraction in the whole dataset, default is 0.1.cache_num (
int
) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).cache_rate (
float
) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).num_workers (
Optional
[int
]) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is speficied, 1 will be used instead.progress (
bool
) – whether to display a progress bar when downloading dataset and computing the transform cache content.copy_cache (
bool
) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.as_contiguous (
bool
) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
- Raises
ValueError – When
root_dir
is not a directory.RuntimeError – When
dataset_dir
doesn’t exist and downloading is not selected (download=False
).
- randomize(data)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
None
- class monai.apps.DecathlonDataset(root_dir, task, section, transform=(), download=False, seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#
The Dataset to automatically download the data of Medical Segmentation Decathlon challenge (http://medicaldecathlon.com/) and generate items for training, validation or test. It will also load these properties from the JSON config file of dataset. user can call get_properties() to get specified properties or all the properties loaded. It’s based on
monai.data.CacheDataset
to accelerate the training process.- Parameters
root_dir (
Union
[str
,PathLike
]) – user’s local directory for caching and loading the MSD datasets.task (
str
) – which task to download and execute: one of list (“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”).section (
str
) – expected data section, can be: training, validation or test.transform (
Union
[Sequence
[Callable
],Callable
]) – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D].download (
bool
) – whether to download and extract the Decathlon from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.val_frac (
float
) – percentage of of validation fraction in the whole dataset, default is 0.2.seed (
int
) – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.cache_num (
int
) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).cache_rate (
float
) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).num_workers (
int
) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is speficied, 1 will be used instead.progress (
bool
) – whether to display a progress bar when downloading dataset and computing the transform cache content.copy_cache (
bool
) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.as_contiguous (
bool
) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
- Raises
ValueError – When
root_dir
is not a directory.ValueError – When
task
is not one of [“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”].RuntimeError – When
dataset_dir
doesn’t exist and downloading is not selected (download=False
).
Example:
transform = Compose( [ LoadImaged(keys=["image", "label"]), EnsureChannelFirstd(keys=["image", "label"]), ScaleIntensityd(keys="image"), ToTensord(keys=["image", "label"]), ] ) val_data = DecathlonDataset( root_dir="./", task="Task09_Spleen", transform=transform, section="validation", seed=12345, download=True ) print(val_data[0]["image"], val_data[0]["label"])
- get_properties(keys=None)[source]#
Get the loaded properties of dataset with specified keys. If no keys specified, return all the loaded properties.
- randomize(data)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
None
- class monai.apps.TciaDataset(root_dir, collection, section, transform=(), download=False, download_len=-1, seg_type='SEG', modality_tag=(8, 96), ref_series_uid_tag=(32, 14), ref_sop_uid_tag=(8, 4437), specific_tags=((8, 4373), (8, 4416), (12294, 16), (32, 13), (16, 16), (16, 32), (32, 17), (32, 18)), seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=0.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True)[source]#
The Dataset to automatically download the data from a public The Cancer Imaging Archive (TCIA) dataset and generate items for training, validation or test.
The Highdicom library is used to load dicom data with modality “SEG”, but only a part of collections are supoorted, such as: “C4KC-KiTS”, “NSCLC-Radiomics”, “NSCLC-Radiomics-Interobserver1”, ” QIN-PROSTATE-Repeatability” and “PROSTATEx”. Therefore, if “seg” is included in keys of the LoadImaged transform and loading some other collections, errors may be raised. For supported collections, the original “SEG” information may not always be consistent for each dicom file. Therefore, to avoid creating different format of labels, please use the label_dict argument of PydicomReader when calling the LoadImaged transform. The prepared label dicts of collections that are mentioned above is also saved in: monai.apps.tcia.TCIA_LABEL_DICT. You can also refer to the second example bellow.
This class is based on
monai.data.CacheDataset
to accelerate the training process.- Parameters
root_dir (
Union
[str
,PathLike
]) – user’s local directory for caching and loading the TCIA dataset.collection (
str
) – name of a TCIA collection. a TCIA dataset is defined as a collection. Please check the following list to browse the collection list (only public collections can be downloaded): https://www.cancerimagingarchive.net/collections/section (
str
) – expected data section, can be: training, validation or test.transform (
Union
[Sequence
[Callable
],Callable
]) – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D]. If not specified, LoadImaged(reader=”PydicomReader”, keys=[“image”]) will be used as the default transform. In addition, we suggest to set the argument labels for PydicomReader if segmentations are needed to be loaded. The original labels for each dicom series may be different, using this argument is able to unify the format of labels.download (
bool
) – whether to download and extract the dataset, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.download_len (
int
) – number of series that will be downloaded, the value should be larger than 0 or -1, where -1 means all series will be downloaded. Default is -1.seg_type (
str
) – modality type of segmentation that is used to do the first step download. Default is “SEG”.modality_tag (
Tuple
) – tag of modality. Default is (0x0008, 0x0060).ref_series_uid_tag (
Tuple
) – tag of referenced Series Instance UID. Default is (0x0020, 0x000e).ref_sop_uid_tag (
Tuple
) – tag of referenced SOP Instance UID. Default is (0x0008, 0x1155).specific_tags (
Tuple
) – tags that will be loaded for “SEG” series. This argument will be used in monai.data.PydicomReader. Default is [(0x0008, 0x1115), (0x0008,0x1140), (0x3006, 0x0010), (0x0020,0x000D), (0x0010,0x0010), (0x0010,0x0020), (0x0020,0x0011), (0x0020,0x0012)].val_frac (
float
) – percentage of of validation fraction in the whole dataset, default is 0.2.seed (
int
) – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.cache_num (
int
) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).cache_rate (
float
) – percentage of cached data in total, default is 0.0 (no cache). will take the minimum of (cache_num, data_length x cache_rate, data_length).num_workers (
int
) – the number of worker threads to use. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is speficied, 1 will be used instead.progress (
bool
) – whether to display a progress bar when downloading dataset and computing the transform cache content.copy_cache (
bool
) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.as_contiguous (
bool
) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
Example:
# collection is "Pancreatic-CT-CBCT-SEG", seg_type is "RTSTRUCT" data = TciaDataset( root_dir="./", collection="Pancreatic-CT-CBCT-SEG", seg_type="RTSTRUCT", download=True ) # collection is "C4KC-KiTS", seg_type is "SEG", and load both images and segmentations from monai.apps.tcia import TCIA_LABEL_DICT transform = Compose( [ LoadImaged(reader="PydicomReader", keys=["image", "seg"], label_dict=TCIA_LABEL_DICT["C4KC-KiTS"]), EnsureChannelFirstd(keys=["image", "seg"]), ResampleToMatchd(keys="image", key_dst="seg"), ] ) data = TciaDataset( root_dir="./", collection="C4KC-KiTS", section="validation", seed=12345, download=True ) print(data[0]["seg"].shape)
- randomize(data)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
None
- class monai.apps.CrossValidation(dataset_cls, nfolds=5, seed=0, **dataset_params)[source]#
Cross validation dataset based on the general dataset which must have _split_datalist API.
- Parameters
dataset_cls – dataset class to be used to create the cross validation partitions. It must have _split_datalist API.
nfolds (
int
) – number of folds to split the data for cross validation.seed (
int
) – random seed to randomly shuffle the datalist before splitting into N folds, default is 0.dataset_params – other additional parameters for the dataset_cls base class.
Example of 5 folds cross validation training:
cvdataset = CrossValidation( dataset_cls=DecathlonDataset, nfolds=5, seed=12345, root_dir="./", task="Task09_Spleen", section="training", transform=train_transform, download=True, ) dataset_fold0_train = cvdataset.get_dataset(folds=[1, 2, 3, 4]) dataset_fold0_val = cvdataset.get_dataset(folds=0, transform=val_transform, download=False) # execute training for fold 0 ... dataset_fold1_train = cvdataset.get_dataset(folds=[0, 2, 3, 4]) dataset_fold1_val = cvdataset.get_dataset(folds=1, transform=val_transform, download=False) # execute training for fold 1 ... ... dataset_fold4_train = ... # execute training for fold 4 ...
- get_dataset(folds, **dataset_params)[source]#
Generate dataset based on the specified fold indice in the cross validation group.
- Parameters
folds (
Union
[Sequence
[int
],int
]) – index of folds for training or validation, if a list of values, concatenate the data.dataset_params – other additional parameters for the dataset_cls base class, will override the same parameters in self.dataset_params.
Clara MMARs#
- monai.apps.download_mmar(item, mmar_dir=None, progress=True, api=True, version=-1)[source]#
Download and extract Medical Model Archive (MMAR) from Nvidia Clara Train.
See also
Nvidia NGC Registry CLI
- Parameters
item – the corresponding model item from MODEL_DESC. Or when api is True, the substring to query NGC’s model name field.
mmar_dir (
Union
[str
,PathLike
,None
]) – target directory to store the MMAR, default is mmars subfolder under torch.hub get_dir().progress (
bool
) – whether to display a progress bar.api (
bool
) – whether to query NGC and download via apiversion (
int
) – which version of MMAR to download. -1 means the latest from ngc.
- Examples::
>>> from monai.apps import download_mmar >>> download_mmar("clara_pt_prostate_mri_segmentation_1", mmar_dir=".") >>> download_mmar("prostate_mri_segmentation", mmar_dir=".", api=True)
- Returns
The local directory of the downloaded model. If api is True, a list of local directories of downloaded models.
- monai.apps.load_from_mmar(item, mmar_dir=None, progress=True, version=-1, map_location=None, pretrained=True, weights_only=False, model_key='model', api=True, model_file=None)[source]#
Download and extract Medical Model Archive (MMAR) model weights from Nvidia Clara Train.
- Parameters
item – the corresponding model item from MODEL_DESC.
mmar_dir (
Union
[str
,PathLike
,None
]) – : target directory to store the MMAR, default is mmars subfolder under torch.hub get_dir().progress (
bool
) – whether to display a progress bar when downloading the content.version (
int
) – version number of the MMAR. Set it to -1 to use item[Keys.VERSION].map_location – pytorch API parameter for torch.load or torch.jit.load.
pretrained – whether to load the pretrained weights after initializing a network module.
weights_only – whether to load only the weights instead of initializing the network module and assign weights.
model_key (
str
) – a key to search in the model file or config file for the model dictionary. Currently this function assumes that the model dictionary has {“[name|path]”: “test.module”, “args”: {‘kw’: ‘test’}}.api (
bool
) – whether to query NGC API to get model infomation.model_file – the relative path to the model file within an MMAR.
- Examples::
>>> from monai.apps import load_from_mmar >>> unet_model = load_from_mmar("clara_pt_prostate_mri_segmentation_1", mmar_dir=".", map_location="cpu") >>> print(unet_model)
See also
- monai.apps.MODEL_DESC#
Built-in immutable sequence.
If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.
If the argument is a tuple, the return value is the same object.
Utilities#
- monai.apps.check_hash(filepath, val=None, hash_type='md5')[source]#
Verify hash signature of specified file.
- Parameters
filepath (
Union
[str
,PathLike
]) – path of source file to verify hash value.val (
Optional
[str
]) – expected hash value of the file.hash_type (
str
) – type of hash algorithm to use, default is “md5”. The supported hash types are “md5”, “sha1”, “sha256”, “sha512”. See also:monai.apps.utils.SUPPORTED_HASH_TYPES
.
- Return type
bool
- monai.apps.download_url(url, filepath='', hash_val=None, hash_type='md5', progress=True, **gdown_kwargs)[source]#
Download file from specified URL link, support process bar and hash check.
- Parameters
url (
str
) – source URL link to download file.filepath (
Union
[str
,PathLike
]) – target filepath to save the downloaded file (including the filename). If undefined, os.path.basename(url) will be used.hash_val (
Optional
[str
]) – expected hash value to validate the downloaded file. if None, skip hash validation.hash_type (
str
) – ‘md5’ or ‘sha1’, defaults to ‘md5’.progress (
bool
) – whether to display a progress bar.gdown_kwargs – other args for gdown except for the url, output and quiet. these args will only be used if download from google drive. details of the args of it: https://github.com/wkentaro/gdown/blob/main/gdown/download.py
- Raises
RuntimeError – When the hash validation of the
filepath
existing file fails.RuntimeError – When a network issue or denied permission prevents the file download from
url
tofilepath
.URLError – See urllib.request.urlretrieve.
HTTPError – See urllib.request.urlretrieve.
ContentTooShortError – See urllib.request.urlretrieve.
IOError – See urllib.request.urlretrieve.
RuntimeError – When the hash validation of the
url
downloaded file fails.
- Return type
None
- monai.apps.extractall(filepath, output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True)[source]#
Extract file to the output directory. Expected file types are: zip, tar.gz and tar.
- Parameters
filepath (
Union
[str
,PathLike
]) – the file path of compressed file.output_dir (
Union
[str
,PathLike
]) – target directory to save extracted files.hash_val (
Optional
[str
]) – expected hash value to validate the compressed file. if None, skip hash validation.hash_type (
str
) – ‘md5’ or ‘sha1’, defaults to ‘md5’.file_type (
str
) – string of file type for decompressing. Leave it empty to infer the type from the filepath basename.has_base (
bool
) – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.
- Raises
RuntimeError – When the hash validation of the
filepath
compressed file fails.NotImplementedError – When the
filepath
file extension is not one of [zip”, “tar.gz”, “tar”].
- Return type
None
- monai.apps.download_and_extract(url, filepath='', output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True, progress=True)[source]#
Download file from URL and extract it to the output directory.
- Parameters
url (
str
) – source URL link to download file.filepath (
Union
[str
,PathLike
]) – the file path of the downloaded compressed file. use this option to keep the directly downloaded compressed file, to avoid further repeated downloads.output_dir (
Union
[str
,PathLike
]) – target directory to save extracted files. default is the current directory.hash_val (
Optional
[str
]) – expected hash value to validate the downloaded file. if None, skip hash validation.hash_type (
str
) – ‘md5’ or ‘sha1’, defaults to ‘md5’.file_type (
str
) – string of file type for decompressing. Leave it empty to infer the type from url’s base file name.has_base (
bool
) – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.progress (
bool
) – whether to display progress bar.
- Return type
None
Deepgrow#
- monai.apps.deepgrow.dataset.create_dataset(datalist, output_dir, dimension, pixdim, image_key='image', label_key='label', base_dir=None, limit=0, relative_path=False, transforms=None)[source]#
Utility to pre-process and create dataset list for Deepgrow training over on existing one. The input data list is normally a list of images and labels (3D volume) that needs pre-processing for Deepgrow training pipeline.
- Parameters
datalist –
A list of data dictionary. Each entry should at least contain ‘image_key’: <image filename>. For example, typical input data can be a list of dictionaries:
[{'image': <image filename>, 'label': <label filename>}]
output_dir (
str
) – target directory to store the training data for Deepgrow Trainingpixdim – output voxel spacing.
dimension (
int
) – dimension for Deepgrow training. It can be 2 or 3.image_key (
str
) – image key in input datalist. Defaults to ‘image’.label_key (
str
) – label key in input datalist. Defaults to ‘label’.base_dir – base directory in case related path is used for the keys in datalist. Defaults to None.
limit (
int
) – limit number of inputs for pre-processing. Defaults to 0 (no limit).relative_path (
bool
) – output keys values should be based on relative path. Defaults to False.transforms – explicit transforms to execute operations on input data.
- Raises
ValueError – When
dimension
is not one of [2, 3]ValueError – When
datalist
is Empty
- Return type
List
[Dict
]- Returns
A new datalist that contains path to the images/labels after pre-processing.
Example:
datalist = create_dataset( datalist=[{'image': 'img1.nii', 'label': 'label1.nii'}], base_dir=None, output_dir=output_2d, dimension=2, image_key='image', label_key='label', pixdim=(1.0, 1.0), limit=0, relative_path=True ) print(datalist[0]["image"], datalist[0]["label"])
- class monai.apps.deepgrow.interaction.Interaction(transforms, max_interactions, train, key_probability='probability')[source]#
Ignite process_function used to introduce interactions (simulation of clicks) for Deepgrow Training/Evaluation. For more details please refer to: https://pytorch.org/ignite/generated/ignite.engine.engine.Engine.html. This implementation is based on:
Sakinis et al., Interactive segmentation of medical images through fully convolutional neural networks. (2019) https://arxiv.org/abs/1903.08205
- Parameters
transforms (
Union
[Sequence
[Callable
],Callable
]) – execute additional transformation during every iteration (before train). Typically, several Tensor based transforms composed by Compose.max_interactions (
int
) – maximum number of interactions per iterationtrain (
bool
) – training or evaluationkey_probability (
str
) – field name to fill probability for every interaction
- class monai.apps.deepgrow.transforms.AddInitialSeedPointd(label='label', guidance='guidance', sids='sids', sid='sid', connected_regions=5)[source]#
Add random guidance as initial seed point for a given label.
Note that the label is of size (C, D, H, W) or (C, H, W)
The guidance is of size (2, N, # of dims) where N is number of guidance added. # of dims = 4 when C, D, H, W; # of dims = 3 when (C, H, W)
- Parameters
label (
str
) – label source.guidance (
str
) – key to store guidance.sids (
str
) – key that represents list of valid slice indices for the given label.sid (
str
) – key that represents the slice to add initial seed point. If not present, random sid will be chosen.connected_regions (
int
) – maximum connected regions to use for adding initial points.
- randomize(data)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- class monai.apps.deepgrow.transforms.AddGuidanceSignald(image='image', guidance='guidance', sigma=2, number_intensity_ch=1)[source]#
Add Guidance signal for input image.
Based on the “guidance” points, apply gaussian to them and add them as new channel for input image.
- Parameters
image (
str
) – key to the image source.guidance (
str
) – key to store guidance.sigma (
int
) – standard deviation for Gaussian kernel.number_intensity_ch (
int
) – channel index.
- class monai.apps.deepgrow.transforms.AddRandomGuidanced(guidance='guidance', discrepancy='discrepancy', probability='probability')[source]#
Add random guidance based on discrepancies that were found between label and prediction. input shape is as below: Guidance is of shape (2, N, # of dim) Discrepancy is of shape (2, C, D, H, W) or (2, C, H, W) Probability is of shape (1)
- Parameters
guidance (
str
) – key to guidance source.discrepancy (
str
) – key that represents discrepancies found between label and prediction.probability (
str
) – key that represents click/interaction probability.
- randomize(data=None)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- class monai.apps.deepgrow.transforms.AddGuidanceFromPointsd(ref_image, guidance='guidance', foreground='foreground', background='background', axis=0, depth_first=True, spatial_dims=2, slice_key='slice', meta_keys=None, meta_key_postfix='meta_dict', dimensions=None)[source]#
Add guidance based on user clicks.
We assume the input is loaded by LoadImaged and has the shape of (H, W, D) originally. Clicks always specify the coordinates in (H, W, D)
If depth_first is True:
Input is now of shape (D, H, W), will return guidance that specifies the coordinates in (D, H, W)
else:
Input is now of shape (H, W, D), will return guidance that specifies the coordinates in (H, W, D)
- Parameters
ref_image – key to reference image to fetch current and original image details.
guidance (
str
) – output key to store guidance.foreground (
str
) – key that represents user foreground (+ve) clicks.background (
str
) – key that represents user background (-ve) clicks.axis (
int
) – axis that represents slices in 3D volume. (axis to Depth)depth_first (
bool
) – if depth (slices) is positioned at first dimension.spatial_dims (
int
) – dimensions based on model used for deepgrow (2D vs 3D).slice_key (
str
) – key that represents applicable slice to add guidance.meta_keys (
Optional
[str
]) – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.meta_key_postfix (
str
) – if meta_key is None, use {ref_image}_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
Deprecated since version 0.6.0:
dimensions
is deprecated, usespatial_dims
instead.
- class monai.apps.deepgrow.transforms.SpatialCropForegroundd(keys, source_key, spatial_size, select_fn=<function is_positive>, channel_indices=None, margin=0, allow_smaller=True, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#
Crop only the foreground object of the expected images.
Difference VS
monai.transforms.CropForegroundd
:If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.
This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]
The typical usage is to help training and evaluation if the valid part is small in the whole medical image. The valid part can be determined by any field in the data with source_key, for example:
Select values > 0 in image field as the foreground and crop on all fields specified by keys.
Select label = 3 in label field as the foreground to crop on all fields specified by keys.
Select label > 0 in the third channel of a One-Hot label field as the foreground to crop all keys fields.
Users can define arbitrary function to select expected foreground from the whole source image or specified channels. And it can also add margin to every dim of the bounding box of foreground object.
- Parameters
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed. See also:monai.transforms.MapTransform
source_key (
str
) – data source to generate the bounding box of foreground, can be image or label, etc.spatial_size (
Union
[Sequence
[int
],ndarray
]) – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.select_fn (
Callable
) – function to select expected foreground, default is to select values > 0.channel_indices (
Union
[Iterable
[int
],int
,None
]) – if defined, select foreground only on the specified channels of image. if None, select foreground on the whole image.margin (
int
) – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.allow_smaller (
bool
) – when computing box size with margin, whether allow the image size to be smaller than box size, default to True. if the margined size is bigger than image size, will pad with specified mode.meta_keys (
Union
[Collection
[Hashable
],Hashable
,None
]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.meta_key_postfix – if meta_keys is None, use {key}_{meta_key_postfix} to fetch/store the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key (
str
) – key to record the start coordinate of spatial bounding box for foreground.end_coord_key (
str
) – key to record the end coordinate of spatial bounding box for foreground.original_shape_key (
str
) – key to record original shape for foreground.cropped_shape_key (
str
) – key to record cropped shape for foreground.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.deepgrow.transforms.SpatialCropGuidanced(keys, guidance, spatial_size, margin=20, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#
Crop image based on guidance with minimal spatial size.
If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.
This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]
Input data is of shape (C, spatial_1, [spatial_2, …])
- Parameters
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed.guidance (
str
) – key to the guidance. It is used to generate the bounding box of foregroundspatial_size – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.
margin – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
meta_keys (
Union
[Collection
[Hashable
],Hashable
,None
]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.meta_key_postfix – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key (
str
) – key to record the start coordinate of spatial bounding box for foreground.end_coord_key (
str
) – key to record the end coordinate of spatial bounding box for foreground.original_shape_key (
str
) – key to record original shape for foreground.cropped_shape_key (
str
) – key to record cropped shape for foreground.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.deepgrow.transforms.RestoreLabeld(keys, ref_image, slice_only=False, mode=InterpolateMode.NEAREST, align_corners=None, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#
Restores label based on the ref image.
The ref_image is assumed that it went through the following transforms:
Fetch2DSliced (If 2D)
Spacingd
SpatialCropGuidanced
Resized
And its shape is assumed to be (C, D, H, W)
This transform tries to undo these operation so that the result label can be overlapped with original volume. It does the following operation:
Undo Resized
Undo SpatialCropGuidanced
Undo Spacingd
Undo Fetch2DSliced
The resulting label is of shape (D, H, W)
- Parameters
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed.ref_image (
str
) – reference image to fetch current and original image detailsslice_only (
bool
) – apply only to an applicable slice, in case of 2D model/predictionmode (
Union
[Sequence
[Union
[InterpolateMode
,str
]],InterpolateMode
,str
]) – {"constant"
,"edge"
,"linear_ramp"
,"maximum"
,"mean"
,"median"
,"minimum"
,"reflect"
,"symmetric"
,"wrap"
,"empty"
} One of the listed string values or a user supplied function for padding. Defaults to"constant"
. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.htmlalign_corners (
Union
[Sequence
[Optional
[bool
]],bool
,None
]) – Geometrically, we consider the pixels of the input as squares rather than points. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html It also can be a sequence of bool, each element corresponds to a key inkeys
.meta_keys (
Optional
[str
]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.meta_key_postfix (
str
) – if meta_key is None, use key_{meta_key_postfix} to fetch the metadata according to the key data, default is `meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.start_coord_key (
str
) – key that records the start coordinate of spatial bounding box for foreground.end_coord_key (
str
) – key that records the end coordinate of spatial bounding box for foreground.original_shape_key (
str
) – key that records original shape for foreground.cropped_shape_key (
str
) – key that records cropped shape for foreground.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.deepgrow.transforms.ResizeGuidanced(guidance, ref_image, meta_keys=None, meta_key_postfix='meta_dict', cropped_shape_key='foreground_cropped_shape')[source]#
Resize the guidance based on cropped vs resized image.
This transform assumes that the images have been cropped and resized. And the shape after cropped is store inside the meta dict of ref image.
- Parameters
guidance (
str
) – key to guidanceref_image (
str
) – key to reference image to fetch current and original image detailsmeta_keys (
Optional
[str
]) – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.meta_key_postfix (
str
) – if meta_key is None, use {ref_image}_{meta_key_postfix} to to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.cropped_shape_key (
str
) – key that records cropped shape for foreground.
- class monai.apps.deepgrow.transforms.FindDiscrepancyRegionsd(label='label', pred='pred', discrepancy='discrepancy')[source]#
Find discrepancy between prediction and actual during click interactions during training.
- Parameters
label (
str
) – key to label source.pred (
str
) – key to prediction source.discrepancy (
str
) – key to store discrepancies found between label and prediction.
- class monai.apps.deepgrow.transforms.FindAllValidSlicesd(label='label', sids='sids')[source]#
Find/List all valid slices in the label. Label is assumed to be a 4D Volume with shape CDHW, where C=1.
- Parameters
label (
str
) – key to the label source.sids (
str
) – key to store slices indices having valid label map.
- class monai.apps.deepgrow.transforms.Fetch2DSliced(keys, guidance='guidance', axis=0, meta_keys=None, meta_key_postfix='meta_dict', allow_missing_keys=False)[source]#
Fetch one slice in case of a 3D volume.
The volume only contains spatial coordinates.
- Parameters
keys – keys of the corresponding items to be transformed.
guidance – key that represents guidance.
axis (
int
) – axis that represents slice in 3D volume.meta_keys (
Union
[Collection
[Hashable
],Hashable
,None
]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.meta_key_postfix (
str
) – use key_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
Pathology#
- class monai.apps.pathology.data.PatchWSIDataset(data, region_size, grid_shape, patch_size, transform=None, image_reader_name='cuCIM', **kwargs)[source]#
This dataset reads whole slide images, extracts regions, and creates patches. It also reads labels for each patch and provides each patch with its associated class labels.
- Parameters
data (
List
) – the list of input samples including image, location, and label (see the note below for more details).region_size (
Union
[int
,Tuple
[int
,int
]]) – the size of regions to be extracted from the whole slide image.grid_shape (
Union
[int
,Tuple
[int
,int
]]) – the grid shape on which the patches should be extracted.patch_size (
Union
[int
,Tuple
[int
,int
]]) – the size of patches extracted from the region on the grid.transform (
Optional
[Callable
]) – transforms to be executed on input data.image_reader_name (
str
) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.kwargs – additional parameters for
WSIReader
Note
The input data has the following form as an example: [{“image”: “path/to/image1.tiff”, “location”: [200, 500], “label”: [0,0,0,1]}].
This means from “image1.tiff” extract a region centered at the given location location with the size of region_size, and then extract patches with the size of patch_size from a grid with the shape of grid_shape. Be aware the the grid_shape should construct a grid with the same number of element as labels, so for this example the grid_shape should be (2, 2).
- class monai.apps.pathology.data.SmartCachePatchWSIDataset(data, region_size, grid_shape, patch_size, transform, image_reader_name='cuCIM', replace_rate=0.5, cache_num=9223372036854775807, cache_rate=1.0, num_init_workers=1, num_replace_workers=1, progress=True, copy_cache=True, as_contiguous=True, **kwargs)[source]#
Add SmartCache functionality to PatchWSIDataset.
- Parameters
data (
List
) – the list of input samples including image, location, and label (see PatchWSIDataset for more details)region_size (
Union
[int
,Tuple
[int
,int
]]) – the region to be extracted from the whole slide image.grid_shape (
Union
[int
,Tuple
[int
,int
]]) – the grid shape on which the patches should be extracted.patch_size (
Union
[int
,Tuple
[int
,int
]]) – the size of patches extracted from the region on the grid.image_reader_name (
str
) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.transform (
Union
[Sequence
[Callable
],Callable
]) – transforms to be executed on input data.replace_rate (
float
) – percentage of the cached items to be replaced in every epoch.cache_num (
int
) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).cache_rate (
float
) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).num_init_workers (
Optional
[int
]) – the number of worker threads to initialize the cache for first epoch. If num_init_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.num_replace_workers (
Optional
[int
]) – the number of worker threads to prepare the replacement cache for every epoch. If num_replace_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.progress (
bool
) – whether to display a progress bar when caching for the first epoch.copy_cache (
bool
) – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cache content or every cache item is only used once in a multi-processing environment, may set copy=False for better performance.as_contiguous (
bool
) – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.kwargs – additional parameters for
WSIReader
- class monai.apps.pathology.data.MaskedInferenceWSIDataset(data, patch_size, transform=None, image_reader_name='cuCIM', **kwargs)[source]#
This dataset load the provided foreground masks at an arbitrary resolution level, and extract patches based on that mask from the associated whole slide image.
- Parameters
data (
List
[Dict
[str
,str
]]) – a list of sample including the path to the whole slide image and the path to the mask. Like this: [{“image”: “path/to/image1.tiff”, “mask”: “path/to/mask1.npy}, …]”.patch_size (
Union
[int
,Tuple
[int
,int
]]) – the size of patches to be extracted from the whole slide image for inference.transform (
Optional
[Callable
]) – transforms to be executed on extracted patches.image_reader_name (
str
) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.kwargs – additional parameters for
WSIReader
Note
- The resulting output (probability maps) after performing inference using this dataset is
supposed to be the same size as the foreground mask and not the original wsi image size.
- class monai.apps.pathology.handlers.ProbMapProducer(output_dir='./', output_postfix='', dtype=<class 'numpy.float64'>, name=None)[source]#
Event handler triggered on completing every iteration to save the probability map
- __init__(output_dir='./', output_postfix='', dtype=<class 'numpy.float64'>, name=None)[source]#
- Parameters
output_dir (
str
) – output directory to save probability maps.output_postfix (
str
) – a string appended to all output file names.dtype (
Union
[dtype
,type
,str
,None
]) – the data type in which the probability map is stored. Default np.float64.name (
Optional
[str
]) – identifier of logging.logger to use, defaulting to engine.logger.
- class monai.apps.pathology.metrics.LesionFROC(data, grow_distance=75, itc_diameter=200, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8), nms_sigma=0.0, nms_prob_threshold=0.5, nms_box_size=48, image_reader_name='cuCIM')[source]#
Evaluate with Free Response Operating Characteristic (FROC) score.
- Parameters
data (
List
[Dict
]) – either the list of dictionaries containing probability maps (inference result) and tumor mask (ground truth), as below, or the path to a json file containing such list. { “prob_map”: “path/to/prob_map_1.npy”, “tumor_mask”: “path/to/ground_truth_1.tiff”, “level”: 6, “pixel_spacing”: 0.243 }grow_distance (
int
) – Euclidean distance (in micrometer) by which to grow the label the ground truth’s tumors. Defaults to 75, which is the equivalent size of 5 tumor cells.itc_diameter (
int
) – the maximum diameter of a region (in micrometer) to be considered as an isolated tumor cell. Defaults to 200.eval_thresholds (
Tuple
) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.nms_sigma (
float
) – the standard deviation for gaussian filter of non-maximal suppression. Defaults to 0.0.nms_prob_threshold (
float
) – the probability threshold of non-maximal suppression. Defaults to 0.5.nms_box_size (
int
) – the box size (in pixel) to be removed around the the pixel for non-maximal suppression.image_reader_name (
str
) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.
Note
For more info on nms_* parameters look at monai.utils.prob_nms.ProbNMS`.
- compute_fp_tp()[source]#
Compute false positive and true positive probabilities for tumor detection, by comparing the model outputs with the prepared ground truths for all samples
- evaluate()[source]#
Evaluate the detection performance of a model based on the model probability map output, the ground truth tumor mask, and their associated metadata (e.g., pixel_spacing, level)
- monai.apps.pathology.utils.compute_multi_instance_mask(mask, threshold)[source]#
This method computes the segmentation mask according to the binary tumor mask.
- Parameters
mask (
ndarray
) – the binary mask arraythreshold (
float
) – the threshold to fill holes
- monai.apps.pathology.utils.compute_isolated_tumor_cells(tumor_mask, threshold)[source]#
This method computes identifies Isolated Tumor Cells (ITC) and return their labels.
- Parameters
tumor_mask (
ndarray
) – the tumor mask.threshold (
float
) – the threshold (at the mask level) to define an isolated tumor cell (ITC). A region with the longest diameter less than this threshold is considered as an ITC.
- Return type
List
[int
]
- class monai.apps.pathology.utils.PathologyProbNMS(spatial_dims=2, sigma=0.0, prob_threshold=0.5, box_size=48)[source]#
This class extends monai.utils.ProbNMS and add the resolution option for Pathology.
- class monai.apps.pathology.transforms.stain.array.ExtractHEStains(tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308))[source]#
Class to extract a target stain from an image, using stain deconvolution (see Note).
- Parameters
tli (
float
) – transmitted light intensity. Defaults to 240.alpha (
float
) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.beta (
float
) – absorbance threshold for transparent pixels. Defaults to 0.15max_cref (
Union
[tuple
,ndarray
]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).
Note
For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:
- class monai.apps.pathology.transforms.stain.array.NormalizeHEStains(tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308))[source]#
Class to normalize patches/images to a reference or target image stain (see Note).
Performs stain deconvolution of the source image using the ExtractHEStains class, to obtain the stain matrix and calculate the stain concentration matrix for the image. Then, performs the inverse Beer-Lambert transform to recreate the patch using the target H&E stain matrix provided. If no target stain provided, a default reference stain is used. Similarly, if no maximum stain concentrations are provided, a reference maximum stain concentrations matrix is used.
- Parameters
tli (
float
) – transmitted light intensity. Defaults to 240.alpha (
float
) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.beta (
float
) – absorbance threshold for transparent pixels. Defaults to 0.15.target_he (
Union
[tuple
,ndarray
]) – target stain matrix. Defaults to ((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)).max_cref (
Union
[tuple
,ndarray
]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to [1.9705, 1.0308].
Note
For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:
A collection of dictionary-based wrappers around the pathology transforms
defined in monai.apps.pathology.transforms.array
.
Class names are ended with ‘d’ to denote dictionary-based transforms.
- class monai.apps.pathology.transforms.stain.dictionary.ExtractHEStainsd(keys, tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.ExtractHEStains
. Class to extract a target stain from an image, using stain deconvolution.- Parameters
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed. See also:monai.transforms.compose.MapTransform
tli (
float
) – transmitted light intensity. Defaults to 240.alpha (
float
) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.beta (
float
) – absorbance threshold for transparent pixels. Defaults to 0.15max_cref (
Union
[tuple
,ndarray
]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.stain.dictionary.NormalizeHEStainsd(keys, tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.NormalizeHEStains
.Class to normalize patches/images to a reference or target image stain.
Performs stain deconvolution of the source image using the ExtractHEStains class, to obtain the stain matrix and calculate the stain concentration matrix for the image. Then, performs the inverse Beer-Lambert transform to recreate the patch using the target H&E stain matrix provided. If no target stain provided, a default reference stain is used. Similarly, if no maximum stain concentrations are provided, a reference maximum stain concentrations matrix is used.
- Parameters
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed. See also:monai.transforms.compose.MapTransform
tli (
float
) – transmitted light intensity. Defaults to 240.alpha (
float
) – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.beta (
float
) – absorbance threshold for transparent pixels. Defaults to 0.15.target_he (
Union
[tuple
,ndarray
]) – target stain matrix. Defaults to None.max_cref (
Union
[tuple
,ndarray
]) – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to None.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.spatial.array.SplitOnGrid(grid_size=(2, 2), patch_size=None)[source]#
Split the image into patches based on the provided grid shape. This transform works only with torch.Tensor inputs.
- Parameters
grid_size (
Union
[int
,Tuple
[int
,int
]]) – a tuple or an integer define the shape of the grid upon which to extract patches. If it’s an integer, the value will be repeated for each dimension. Default is 2x2patch_size (
Union
[int
,Tuple
[int
,int
],None
]) – a tuple or an integer that defines the output patch sizes. If it’s an integer, the value will be repeated for each dimension. The default is (0, 0), where the patch size will be inferred from the grid shape.
Note: the shape of the input image is inferred based on the first image used.
- class monai.apps.pathology.transforms.spatial.array.TileOnGrid(tile_count=None, tile_size=256, step=None, random_offset=False, pad_full=False, background_val=255, filter_mode='min')[source]#
Tile the 2D image into patches on a grid and maintain a subset of it. This transform works only with np.ndarray inputs for 2D images.
- Parameters
tile_count (
Optional
[int
]) – number of tiles to extract, if None extracts all non-background tiles Defaults toNone
.tile_size (
int
) – size of the square tile Defaults to256
.step (
Optional
[int
]) – step size Defaults toNone
(same as tile_size)random_offset (
bool
) – Randomize position of the grid, instead of starting from the top-left corner Defaults toFalse
.pad_full (
bool
) – pad image to the size evenly divisible by tile_size Defaults toFalse
.background_val (
int
) – the background constant (e.g. 255 for white background) Defaults to255
.filter_mode (
str
) – mode must be in [“min”, “max”, “random”]. If total number of tiles is more than tile_size, then sort by intensity sum, and take the smallest (for min), largest (for max) or random (for random) subset Defaults tomin
(which assumes background is high value)
- randomize(img_size)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
None
- class monai.apps.pathology.transforms.spatial.dictionary.SplitOnGridd(keys, grid_size=(2, 2), patch_size=None, allow_missing_keys=False)[source]#
Split the image into patches based on the provided grid shape. This transform works only with torch.Tensor inputs.
- Parameters
grid_size (
Union
[int
,Tuple
[int
,int
]]) – a tuple or an integer define the shape of the grid upon which to extract patches. If it’s an integer, the value will be repeated for each dimension. Default is 2x2patch_size (
Union
[int
,Tuple
[int
,int
],None
]) – a tuple or an integer that defines the output patch sizes. If it’s an integer, the value will be repeated for each dimension. The default is (0, 0), where the patch size will be inferred from the grid shape.
Note: the shape of the input image is inferred based on the first image used.
- class monai.apps.pathology.transforms.spatial.dictionary.TileOnGridd(keys, tile_count=None, tile_size=256, step=None, random_offset=False, pad_full=False, background_val=255, filter_mode='min', allow_missing_keys=False, return_list_of_dicts=False)[source]#
Tile the 2D image into patches on a grid and maintain a subset of it. This transform works only with np.ndarray inputs for 2D images.
- Parameters
tile_count (
Optional
[int
]) – number of tiles to extract, if None extracts all non-background tiles Defaults toNone
.tile_size (
int
) – size of the square tile Defaults to256
.step (
Optional
[int
]) – step size Defaults toNone
(same as tile_size)random_offset (
bool
) – Randomize position of the grid, instead of starting from the top-left corner Defaults toFalse
.pad_full (
bool
) – pad image to the size evenly divisible by tile_size Defaults toFalse
.background_val (
int
) – the background constant (e.g. 255 for white background) Defaults to255
.filter_mode (
str
) – mode must be in [“min”, “max”, “random”]. If total number of tiles is more than tile_size, then sort by intensity sum, and take the smallest (for min), largest (for max) or random (for random) subset Defaults tomin
(which assumes background is high value)
- randomize(data=None)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
None
Detection#
Hard Negative Sampler#
The functions in this script are adapted from nnDetection, https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/core/boxes/sampler.py
- class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#
HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it selects negative samples with high prediction scores.
The training workflow is described as the follows: 1) forward network and get prediction scores (classification prob/logits) for all the samples; 2) use hard negative sampler to choose negative samples with high prediction scores and some positive samples; 3) compute classification loss for the selected samples; 4) do back propagation.
- Parameters
batch_size_per_image (
int
) – number of training samples to be randomly selected per imagepositive_fraction (
float
) – percentage of positive elements in the selected samplesmin_neg (
int
) – minimum number of negative samples to select if possible.pool_size (
float
) – when we neednum_neg
hard negative samples, they will be randomly selected fromnum_neg * pool_size
negative samples with the highest prediction scores. Largerpool_size
gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.
- get_num_neg(negative, num_pos)[source]#
Sample enough negatives to fill up
self.batch_size_per_image
- Parameters
negative (
Tensor
) – indices of positive samplesnum_pos (
int
) – number of positive samples to draw
- Return type
int
- Returns
number of negative samples
- get_num_pos(positive)[source]#
Number of positive samples to draw
- Parameters
positive (
Tensor
) – indices of positive samples- Return type
int
- Returns
number of positive sample
- select_positives(positive, num_pos, labels)[source]#
Select positive samples
- Parameters
positive (
Tensor
) – indices of positive samples, sized (P,), where P is the number of positive samplesnum_pos (
int
) – number of positive samples to samplelabels (
Tensor
) – labels for all samples, sized (A,), where A is the number of samples.
- Return type
Tensor
- Returns
- binary mask of positive samples to choose, sized (A,),
where A is the number of samples in one image
- select_samples_img_list(target_labels, fg_probs)[source]#
Select positives and hard negatives from list samples per image. Hard negative sampler will be applied to each image independently.
- Parameters
target_labels (
List
[Tensor
]) – list of labels per image. For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i. Positive samples have positive labels, negative samples have label 0.fg_probs (
List
[Tensor
]) – list of maximum foreground probability per images, For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i.
- Return type
Tuple
[List
[Tensor
],List
[Tensor
]]- Returns
list of binary mask for positive samples
list binary mask for negative samples
Example
sampler = HardNegativeSampler( batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2 ) # two images with different number of samples target_labels = [ torch.tensor([0,1]), torch.tensor([1,0,2,1])] fg_probs = [ torch.rand(2), torch.rand(4)] pos_idx_list, neg_idx_list = sampler.select_samples_img_list(target_labels, fg_probs)
- select_samples_per_img(labels_per_img, fg_probs_per_img)[source]#
Select positives and hard negatives from samples.
- Parameters
labels_per_img (
Tensor
) – labels, sized (A,). Positive samples have positive labels, negative samples have label 0.fg_probs_per_img (
Tensor
) – maximum foreground probability, sized (A,)
- Return type
Tuple
[Tensor
,Tensor
]- Returns
binary mask for positive samples, sized (A,)
binary mask for negative samples, sized (A,)
Example
sampler = HardNegativeSampler( batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2 ) # two images with different number of samples target_labels = torch.tensor([1,0,2,1]) fg_probs = torch.rand(4) pos_idx, neg_idx = sampler.select_samples_per_img(target_labels, fg_probs)
- class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSamplerBase(pool_size=10)[source]#
Base class of hard negative sampler.
Hard negative sampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.
The training workflow is described as the follows: 1) forward network and get prediction scores (classification prob/logits) for all the samples; 2) use hard negative sampler to choose negative samples with high prediction scores and some positive samples; 3) compute classification loss for the selected samples; 4) do back propagation.
- Parameters
pool_size (
float
) – when we neednum_neg
hard negative samples, they will be randomly selected fromnum_neg * pool_size
negative samples with the highest prediction scores. Largerpool_size
gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.
- select_negatives(negative, num_neg, fg_probs)[source]#
Select hard negative samples.
- Parameters
negative (
Tensor
) – indices of all the negative samples, sized (P,), where P is the number of negative samplesnum_neg (
int
) – number of negative samples to samplefg_probs (
Tensor
) – maximum foreground prediction scores (probability) across all the classes for each sample, sized (A,), where A is the the number of samples.
- Return type
Tensor
- Returns
- binary mask of negative samples to choose, sized (A,),
where A is the the number of samples in one image
RetinaNet Network#
Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py
- class monai.apps.detection.networks.retinanet_network.RetinaNet(spatial_dims, num_classes, num_anchors, feature_extractor, size_divisible=1)[source]#
The network used in RetinaNet.
It takes an image tensor as inputs, and outputs a dictionary
head_outputs
.head_outputs[self.cls_key]
is the predicted classification maps, a list of Tensor.head_outputs[self.box_reg_key]
is the predicted box regression maps, a list of Tensor.- Parameters
spatial_dims (
int
) – number of spatial dimensions of the images. We support both 2D and 3D images.num_classes (
int
) – number of output classes of the model (excluding the background).num_anchors (
int
) – number of anchors at each location.feature_extractor – a network that outputs feature maps from the input images, each feature map corresponds to a different resolution. Its output can have a format of Tensor, Dict[Any, Tensor], or Sequence[Tensor]. It can be the output of
resnet_fpn_feature_extractor(*args, **kwargs)
.size_divisible (
Union
[Sequence
[int
],int
]) – the spatial size of the network input should be divisible by size_divisible, decided by the feature_extractor.
Example
from monai.networks.nets import resnet spatial_dims = 3 # 3D network conv1_t_stride = (2,2,1) # stride of first convolutional layer in backbone backbone = resnet.ResNet( spatial_dims = spatial_dims, block = resnet.ResNetBottleneck, layers = [3, 4, 6, 3], block_inplanes = resnet.get_inplanes(), n_input_channels= 1, conv1_t_stride = conv1_t_stride, conv1_t_size = (7,7,7), ) # This feature_extractor outputs 4-level feature maps. # number of output feature maps is len(returned_layers)+1 returned_layers = [1,2,3] # returned layer from feature pyramid network feature_extractor = resnet_fpn_feature_extractor( backbone = backbone, spatial_dims = spatial_dims, pretrained_backbone = False, trainable_backbone_layers = None, returned_layers = returned_layers, ) # This feature_extractor requires input image spatial size # to be divisible by (32, 32, 16). size_divisible = tuple(2*s*2**max(returned_layers) for s in conv1_t_stride) model = RetinaNet( spatial_dims = spatial_dims, num_classes = 5, num_anchors = 6, feature_extractor=feature_extractor, size_divisible = size_divisible, ).to(device) result = model(torch.rand(2, 1, 128,128,128)) cls_logits_maps = result["cls_logits"] # a list of len(returned_layers)+1 Tensor box_regression_maps = result["box_regression"] # a list of len(returned_layers)+1 Tensor
- forward(images)[source]#
It takes an image tensor as inputs, and outputs a dictionary
head_outputs
.head_outputs[self.cls_key]
is the predicted classification maps, a list of Tensor.head_outputs[self.box_reg_key]
is the predicted box regression maps, a list of Tensor.- Parameters
images (
Tensor
) – input images, sized (B, img_channels, H, W) or (B, img_channels, H, W, D).- Return type
Dict
[str
,List
[Tensor
]]- Returns
a dictionary
head_outputs
with keys including self.cls_key and self.box_reg_key.head_outputs[self.cls_key]
is the predicted classification maps, a list of Tensor.head_outputs[self.box_reg_key]
is the predicted box regression maps, a list of Tensor.
- class monai.apps.detection.networks.retinanet_network.RetinaNetClassificationHead(in_channels, num_anchors, num_classes, spatial_dims, prior_probability=0.01)[source]#
A classification head for use in RetinaNet.
This head takes a list of feature maps as inputs, and outputs a list of classification maps. Each output map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.
- Parameters
in_channels (
int
) – number of channels of the input featurenum_anchors (
int
) – number of anchors to be predictednum_classes (
int
) – number of classes to be predictedspatial_dims (
int
) – spatial dimension of the network, should be 2 or 3.prior_probability (
float
) – prior probability to initialize classification convolutional layers.
- forward(x)[source]#
It takes a list of feature maps as inputs, and outputs a list of classification maps. Each output classification map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.
- Parameters
x (
List
[Tensor
]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.- Return type
List
[Tensor
]- Returns
cls_logits_maps, list of classification map. cls_logits_maps[i] is a (B, num_anchors * num_classes, H_i, W_i) or (B, num_anchors * num_classes, H_i, W_i, D_i) Tensor.
- class monai.apps.detection.networks.retinanet_network.RetinaNetRegressionHead(in_channels, num_anchors, spatial_dims)[source]#
A regression head for use in RetinaNet.
This head takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.
- Parameters
in_channels (
int
) – number of channels of the input featurenum_anchors (
int
) – number of anchors to be predictedspatial_dims (
int
) – spatial dimension of the network, should be 2 or 3.
- forward(x)[source]#
It takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.
- Parameters
x (
List
[Tensor
]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.- Return type
List
[Tensor
]- Returns
box_regression_maps, list of box regression map. cls_logits_maps[i] is a (B, num_anchors * 2 * spatial_dims, H_i, W_i) or (B, num_anchors * 2 * spatial_dims, H_i, W_i, D_i) Tensor.
- monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(backbone, spatial_dims, pretrained_backbone=False, returned_layers=(1, 2, 3), trainable_backbone_layers=None)[source]#
Constructs a feature extractor network with a ResNet-FPN backbone, used as feature_extractor in RetinaNet.
Reference: “Focal Loss for Dense Object Detection”.
The returned feature_extractor network takes an image tensor as inputs, and outputs a dictionary that maps string to the extracted feature maps (Tensor).
The input to the returned feature_extractor is expected to be a list of tensors, each of shape
[C, H, W]
or[C, H, W, D]
, one for each image. Different images can have different sizes.- Parameters
backbone (
ResNet
) – a ResNet model, used as backbone.spatial_dims (
int
) – number of spatial dimensions of the images. We support both 2D and 3D images.pretrained_backbone (
bool
) – whether the backbone has been pre-trained.returned_layers (
Sequence
[int
]) – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.trainable_backbone_layers (
Optional
[int
]) – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. When pretrained_backbone is False, this value is set to be 5. When pretrained_backbone is True, ifNone
is passed (the default) this value is set to 3.
Example
from monai.networks.nets import resnet spatial_dims = 3 # 3D network backbone = resnet.ResNet( spatial_dims = spatial_dims, block = resnet.ResNetBottleneck, layers = [3, 4, 6, 3], block_inplanes = resnet.get_inplanes(), n_input_channels= 1, conv1_t_stride = (2,2,1), conv1_t_size = (7,7,7), ) # This feature_extractor outputs 4-level feature maps. # number of output feature maps is len(returned_layers)+1 feature_extractor = resnet_fpn_feature_extractor( backbone = backbone, spatial_dims = spatial_dims, pretrained_backbone = False, trainable_backbone_layers = None, returned_layers = [1,2,3], ) model = RetinaNet( spatial_dims = spatial_dims, num_classes = 5, num_anchors = 6, feature_extractor=feature_extractor, size_divisible = 32, ).to(device)
RetinaNet Detector#
Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py
- class monai.apps.detection.networks.retinanet_detector.RetinaNetDetector(network, anchor_generator, box_overlap_metric=<function box_iou>, debug=False)[source]#
Retinanet detector, expandable to other one stage anchor based box detectors in the future. An example of construction can found in the source code of
retinanet_resnet50_fpn_detector()
.The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:
boxes (
FloatTensor[N, 4]
orFloatTensor[N, 6]
): the ground-truth boxes inStandardMode
, i.e.,[xmin, ymin, xmax, ymax]
or[xmin, ymin, zmin, xmax, ymax, zmax]
format, with0 <= xmin < xmax <= H
,0 <= ymin < ymax <= W
,0 <= zmin < zmax <= D
.labels: the class label for each ground-truth box
The model returns a Dict[str, Tensor] during training, containing the classification and regression losses. When saving the model, only self.network contains trainable parameters and needs to be saved.
During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:
boxes (
FloatTensor[N, 4]
orFloatTensor[N, 6]
): the predicted boxes inStandardMode
, i.e.,[xmin, ymin, xmax, ymax]
or[xmin, ymin, zmin, xmax, ymax, zmax]
format, with0 <= xmin < xmax <= H
,0 <= ymin < ymax <= W
,0 <= zmin < zmax <= D
.labels (Int64Tensor[N]): the predicted labels for each image
labels_scores (Tensor[N]): the scores for each prediction
- Parameters
network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].
anchor_generator (
AnchorGenerator
) – anchor generator.box_overlap_metric (
Callable
) – func that compute overlap between two sets of boxes, default is Intersection over Union (IoU).debug (
bool
) – whether to print out internal parameters, used for debugging and parameter tuning.
Notes
Input argument
network
can be a monai.apps.detection.networks.retinanet_network.RetinaNet(*) object, but any network that meets the following rules is a valid inputnetwork
.It should have attributes including spatial_dims, num_classes, cls_key, box_reg_key, num_anchors, size_divisible.
spatial_dims (int) is the spatial dimension of the network, we support both 2D and 3D.
num_classes (int) is the number of classes, excluding the background.
size_divisible (int or Sequene[int]) is the expection on the input image shape. The network needs the input spatial_size to be divisible by size_divisible, length should be 2 or 3.
cls_key (str) is the key to represent classification in the output dict.
box_reg_key (str) is the key to represent box regression in the output dict.
num_anchors (int) is the number of anchor shapes at each location. it should equal to
self.anchor_generator.num_anchors_per_location()[0]
.
Its input should be an image Tensor sized (B, C, H, W) or (B, C, H, W, D).
About its output
head_outputs
:It should be a dictionary with at least two keys:
network.cls_key
andnetwork.box_reg_key
.head_outputs[network.cls_key]
should be List[Tensor] or Tensor. Each Tensor represents classification logits map at one resolution level, sized (B, num_classes*num_anchors, H_i, W_i) or (B, num_classes*num_anchors, H_i, W_i, D_i).head_outputs[network.box_reg_key]
should be List[Tensor] or Tensor. Each Tensor represents box regression map at one resolution level, sized (B, 2*spatial_dims*num_anchors, H_i, W_i)or (B, 2*spatial_dims*num_anchors, H_i, W_i, D_i).len(head_outputs[network.cls_key]) == len(head_outputs[network.box_reg_key])
.
Example
# define a naive network import torch class NaiveNet(torch.nn.Module): def __init__(self, spatial_dims: int, num_classes: int): super().__init__() self.spatial_dims = spatial_dims self.num_classes = num_classes self.size_divisible = 2 self.cls_key = "cls" self.box_reg_key = "box_reg" self.num_anchors = 1 def forward(self, images: torch.Tensor): spatial_size = images.shape[-self.spatial_dims:] out_spatial_size = tuple(s//self.size_divisible for s in spatial_size) # half size of input out_cls_shape = (images.shape[0],self.num_classes*self.num_anchors) + out_spatial_size out_box_reg_shape = (images.shape[0],2*self.spatial_dims*self.num_anchors) + out_spatial_size return {self.cls_key: [torch.randn(out_cls_shape)], self.box_reg_key: [torch.randn(out_box_reg_shape)]} # create a RetinaNetDetector detector spatial_dims = 3 num_classes = 5 anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape( feature_map_scales=(1, ), base_anchor_shapes=((8,) * spatial_dims) ) net = NaiveNet(spatial_dims, num_classes) detector = RetinaNetDetector(net, anchor_generator) # only detector.network may contain trainable parameters. optimizer = torch.optim.SGD( detector.network.parameters(), 1e-3, momentum=0.9, weight_decay=3e-5, nesterov=True, ) torch.save(detector.network.state_dict(), 'model.pt') # save model detector.network.load_state_dict(torch.load('model.pt')) # load model
- compute_anchor_matched_idxs(anchors, targets, num_anchor_locs_per_level)[source]#
Compute the matched indices between anchors and ground truth (gt) boxes in targets. output[k][i] represents the matched gt index for anchor[i] in image k. Suppose there are M gt boxes for image k. The range of it output[k][i] value is [-2, -1, 0, …, M-1]. [0, M - 1] indicates this anchor is matched with a gt box, while a negative value indicating that it is not matched.
- Parameters
anchors (
List
[Tensor
]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.targets (
List
[Dict
[str
,Tensor
]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.num_anchor_locs_per_level (
Sequence
[int
]) – each element represents HW or HWD at this level.
- Return type
List
[Tensor
]- Returns
a list of matched index matched_idxs_per_image (Tensor[int64]), Tensor sized (sum(HWA),) or (sum(HWDA),). Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2
- compute_box_loss(box_regression, targets, anchors, matched_idxs)[source]#
Compute box regression losses.
- Parameters
box_regression (
Tensor
) – box regression results, sized (B, sum(HWA), 2*self.spatial_dims)targets (
List
[Dict
[str
,Tensor
]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.anchors (
List
[Tensor
]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.matched_idxs (
List
[Tensor
]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)
- Return type
Tensor
- Returns
box regression losses.
- compute_cls_loss(cls_logits, targets, matched_idxs)[source]#
Compute classification losses.
- Parameters
cls_logits (
Tensor
) – classification logits, sized (B, sum(HW(D)A), self.num_classes)targets (
List
[Dict
[str
,Tensor
]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.matched_idxs (
List
[Tensor
]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)
- Return type
Tensor
- Returns
classification losses.
- compute_loss(head_outputs_reshape, targets, anchors, num_anchor_locs_per_level)[source]#
Compute losses.
- Parameters
head_outputs_reshape (
Dict
[str
,Tensor
]) – reshaped head_outputs.head_output_reshape[self.cls_key]
is a Tensor sized (B, sum(HW(D)A), self.num_classes).head_output_reshape[self.box_reg_key]
is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)targets (
List
[Dict
[str
,Tensor
]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.anchors (
List
[Tensor
]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
- Return type
Dict
[str
,Tensor
]- Returns
a dict of several kinds of losses.
- forward(input_images, targets=None, use_inferer=False)[source]#
Returns a dict of losses during training, or a list predicted dict of boxes and labels during inference.
- Parameters
input_images (
Union
[List
[Tensor
],Tensor
]) – The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.targets (
Optional
[List
[Dict
[str
,Tensor
]]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image (optional).use_inferer (
bool
) – whether to use self.inferer, a sliding window inferer, to do the inference. If False, will simply forward the network. If True, will use self.inferer, and requiresself.set_sliding_window_inferer(*args)
to have been called before.
- Return type
Union
[Dict
[str
,Tensor
],List
[Dict
[str
,Tensor
]]]- Returns
If training mode, will return a dict with at least two keys, including self.cls_key and self.box_reg_key, representing classification loss and box regression loss.
If evaluation mode, will return a list of detection results. Each element corresponds to an images in
input_images
, is a dict with at least three keys, including self.target_box_key, self.target_label_key, self.pred_score_key, representing predicted boxes, classification labels, and classification scores.
- generate_anchors(images, head_outputs)[source]#
Generate anchors and store it in self.anchors: List[Tensor]. We generate anchors only when there is no stored anchors, or the new coming images has different shape with self.previous_image_shape
- Parameters
images (
Tensor
) – input images, a (B, C, H, W) or (B, C, H, W, D) Tensor.head_outputs (
Dict
[str
,List
[Tensor
]]) – head_outputs.head_output_reshape[self.cls_key]
is a Tensor sized (B, sum(HW(D)A), self.num_classes).head_output_reshape[self.box_reg_key]
is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)
- get_box_train_sample_per_image(box_regression_per_image, targets_per_image, anchors_per_image, matched_idxs_per_image)[source]#
Get samples from one image for box regression losses computation.
- Parameters
box_regression_per_image (
Tensor
) – box regression result for one image, (sum(HWA), 2*self.spatial_dims)targets_per_image (
Dict
[str
,Tensor
]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.anchors_per_image (
Tensor
) – anchors of one image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.matched_idxs_per_image (
Tensor
) – matched index, sized (sum(HWA),) or (sum(HWDA),)
- Return type
Tuple
[Tensor
,Tensor
]- Returns
paired predicted and GT samples from one image for box regression losses computation
- get_cls_train_sample_per_image(cls_logits_per_image, targets_per_image, matched_idxs_per_image)[source]#
Get samples from one image for classification losses computation.
- Parameters
cls_logits_per_image (
Tensor
) – classification logits for one image, (sum(HWA), self.num_classes)targets_per_image (
Dict
[str
,Tensor
]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.matched_idxs_per_image (
Tensor
) – matched index, Tensor sized (sum(HWA),) or (sum(HWDA),) Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2
- Return type
Tuple
[Tensor
,Tensor
]- Returns
paired predicted and GT samples from one image for classification losses computation
- postprocess_detections(head_outputs_reshape, anchors, image_sizes, num_anchor_locs_per_level, need_sigmoid=True)[source]#
Postprocessing to generate detection result from classification logits and box regression. Use self.box_selector to select the final outut boxes for each image.
- Parameters
head_outputs_reshape (
Dict
[str
,Tensor
]) – reshaped head_outputs.head_output_reshape[self.cls_key]
is a Tensor sized (B, sum(HW(D)A), self.num_classes).head_output_reshape[self.box_reg_key]
is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)targets – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors (
List
[Tensor
]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
- Return type
List
[Dict
[str
,Tensor
]]- Returns
a list of dict, each dict scorresponds to detection result on image.
- set_atss_matcher(num_candidates=4, center_in_gt=False)[source]#
Using for training. Set ATSS matcher that matches anchors with ground truth boxes
- Parameters
num_candidates (
int
) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.center_in_gt (
bool
) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.
- Return type
None
- set_balanced_sampler(batch_size_per_image, positive_fraction)[source]#
Using for training. Set torchvision balanced sampler that samples part of the anchors for training.
- Parameters
batch_size_per_image (
int
) – number of elements to be selected per imagepositive_fraction (
float
) – percentage of positive elements per batch
- set_box_coder_weights(weights)[source]#
Set the weights for box coder.
- Parameters
weights (
Tuple
[float
]) – a list/tuple with length of 2*self.spatial_dims
- set_box_regression_loss(box_loss, encode_gt, decode_pred)[source]#
Using for training. Set loss for box regression.
- Parameters
box_loss (
Module
) – loss module for box regressionencode_gt (
bool
) – if True, will encode ground truth boxes to target box regression before computing the losses. Should be True for L1 loss and False for GIoU loss.decode_pred (
bool
) – if True, will decode predicted box regression into predicted boxes before computing losses. Should be False for L1 loss and True for GIoU loss.
Example
detector.set_box_regression_loss( torch.nn.SmoothL1Loss(beta=1.0 / 9, reduction="mean"), encode_gt = True, decode_pred = False ) detector.set_box_regression_loss( monai.losses.giou_loss.BoxGIoULoss(reduction="mean"), encode_gt = False, decode_pred = True )
- Return type
None
- set_box_selector_parameters(score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300, apply_sigmoid=True)[source]#
Using for inference. Set the parameters that are used for box selection during inference. The box selection is performed with the following steps:
For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.
- Parameters
score_thresh (
float
) – no box with scores less than score_thresh will be kepttopk_candidates_per_level (
int
) – max number of boxes to keep for each levelnms_thresh (
float
) – box overlapping threshold for NMSdetections_per_img (
int
) – max number of boxes to keep for each image
- set_cls_loss(cls_loss)[source]#
Using for training. Set loss for classification that takes logits as inputs, make sure sigmoid/softmax is built in.
- Parameters
cls_loss (
Module
) – loss module for classification
Example
detector.set_cls_loss(torch.nn.BCEWithLogitsLoss(reduction="mean")) detector.set_cls_loss(FocalLoss(reduction="mean", gamma=2.0))
- Return type
None
- set_hard_negative_sampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#
Using for training. Set hard negative sampler that samples part of the anchors for training.
HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.
- Parameters
batch_size_per_image (
int
) – number of elements to be selected per imagepositive_fraction (
float
) – percentage of positive elements in the selected samplesmin_neg (
int
) – minimum number of negative samples to select if possible.pool_size (
float
) – when we neednum_neg
hard negative samples, they will be randomly selected fromnum_neg * pool_size
negative samples with the highest prediction scores. Largerpool_size
gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.
- set_regular_matcher(fg_iou_thresh, bg_iou_thresh, allow_low_quality_matches=True)[source]#
Using for training. Set torchvision matcher that matches anchors with ground truth boxes.
- Parameters
fg_iou_thresh (
float
) – foreground IoU threshold for Matcher, considered as matched if IoU > fg_iou_threshbg_iou_thresh (
float
) – background IoU threshold for Matcher, considered as not matched if IoU < bg_iou_thresh
- Return type
None
- set_sliding_window_inferer(roi_size, sw_batch_size=1, overlap=0.5, mode=BlendMode.CONSTANT, sigma_scale=0.125, padding_mode=PytorchPadMode.CONSTANT, cval=0.0, sw_device=None, device=None, progress=False, cache_roi_weight_map=False)[source]#
Define sliding window inferer and store it to self.inferer.
- set_target_keys(box_key, label_key)[source]#
Set keys for the training targets and inference outputs. During training, both box_key and label_key should be keys in the targets when performing
self.forward(input_images, targets)
. During inference, they will be the keys in the output dict of self.forward(input_images)`.
- monai.apps.detection.networks.retinanet_detector.retinanet_resnet50_fpn_detector(num_classes, anchor_generator, returned_layers=(1, 2, 3), pretrained=False, progress=True, **kwargs)[source]#
Returns a RetinaNet detector using a ResNet-50 as backbone, which can be pretrained from Med3D: Transfer Learning for 3D Medical Image Analysis <https://arxiv.org/pdf/1904.00625.pdf> _.
- Parameters
num_classes (
int
) – number of output classes of the model (excluding the background).anchor_generator (
AnchorGenerator
) – AnchorGenerator,returned_layers (
Sequence
[int
]) – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.pretrained (
bool
) – If True, returns a backbone pre-trained on 23 medical datasetsprogress (
bool
) – If True, displays a progress bar of the download to stderr
- Return type
- Returns
A RetinaNetDetector object with resnet50 as backbone
Example
# define a naive network resnet_param = { "pretrained": False, "spatial_dims": 3, "n_input_channels": 2, "num_classes": 3, "conv1_t_size": 7, "conv1_t_stride": (2, 2, 2) } returned_layers = [1] anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape( feature_map_scales=(1, 2), base_anchor_shapes=((8,) * resnet_param["spatial_dims"]) ) detector = retinanet_resnet50_fpn_detector( **resnet_param, anchor_generator=anchor_generator, returned_layers=returned_layers )
Transforms#
- monai.apps.detection.transforms.box_ops.apply_affine_to_boxes(boxes, affine)[source]#
This function applies affine matrices to the boxes
- Parameters
boxes (
Union
[ndarray
,Tensor
]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardModeaffine (
Union
[ndarray
,Tensor
]) – affine matrix to be applied to the box coordinates, sized (spatial_dims+1,spatial_dims+1)
- Return type
Union
[ndarray
,Tensor
]- Returns
returned affine transformed boxes, with same data type as
boxes
, does not share memory withboxes
- monai.apps.detection.transforms.box_ops.convert_box_to_mask(boxes, labels, spatial_size, bg_label=-1, ellipse_mask=False)[source]#
Convert box to int16 mask image, which has the same size with the input image.
- Parameters
boxes (
Union
[ndarray
,Tensor
]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
.labels (
Union
[ndarray
,Tensor
]) – classification foreground(fg) labels corresponding to boxes, dtype should be int, sized (N,).spatial_size (
Union
[Sequence
[int
],int
]) – image spatial size.bg_label (
int
) – background labels for the output mask image, make sure it is smaller than any fg labels.ellipse_mask (
bool
) –bool.
If True, it assumes the object shape is close to ellipse or ellipsoid.
If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.
- Return type
Union
[ndarray
,Tensor
]- Returns
- int16 array, sized (num_box, H, W). Each channel represents a box.
The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.
- monai.apps.detection.transforms.box_ops.convert_mask_to_box(boxes_mask, bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#
Convert int16 mask image to box, which has the same size with the input image
- Parameters
boxes_mask (
Union
[ndarray
,Tensor
]) – int16 array, sized (num_box, H, W). Each channel represents a box. The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.bg_label (
int
) – background labels for the boxes_maskbox_dtype – output dtype for boxes
label_dtype – output dtype for labels
- Return type
Tuple
[Union
[ndarray
,Tensor
],Union
[ndarray
,Tensor
]]- Returns
bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be
StandardMode
.classification foreground(fg) labels, dtype should be int, sized (N,).
- monai.apps.detection.transforms.box_ops.flip_boxes(boxes, spatial_size, flip_axes=None)[source]#
Flip boxes when the corresponding image is flipped
- Parameters
boxes (
Union
[ndarray
,Tensor
]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
spatial_size (
Union
[Sequence
[int
],int
]) – image spatial size.flip_axes (
Union
[Sequence
[int
],int
,None
]) – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.
- Returns
flipped boxes, with same data type as
boxes
, does not share memory withboxes
- monai.apps.detection.transforms.box_ops.resize_boxes(boxes, src_spatial_size, dst_spatial_size)[source]#
Resize boxes when the corresponding image is resized
- Parameters
boxes (
Union
[ndarray
,Tensor
]) – source bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
src_spatial_size (
Union
[Sequence
[int
],int
]) – source image spatial size.dst_spatial_size (
Union
[Sequence
[int
],int
]) – target image spatial size.
- Returns
resized boxes, with same data type as
boxes
, does not share memory withboxes
Example
boxes = torch.ones(1,4) src_spatial_size = [100, 100] dst_spatial_size = [128, 256] resize_boxes(boxes, src_spatial_size, dst_spatial_size) # will return tensor([[1.28, 2.56, 1.28, 2.56]])
- monai.apps.detection.transforms.box_ops.rot90_boxes(boxes, spatial_size, k=1, axes=(0, 1))[source]#
Rotate boxes by 90 degrees in the plane specified by axes. Rotation direction is from the first towards the second axis.
- Parameters
boxes (
Union
[ndarray
,Tensor
]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
spatial_size (
Union
[Sequence
[int
],int
]) – image spatial size.k (
int
) – number of times the array is rotated by 90 degrees.axes (
Tuple
[int
,int
]) – (2,) array_like The array is rotated in the plane defined by the axes. Axes must be different.
- Returns
A rotated view of boxes.
Notes
rot90_boxes(boxes, spatial_size, k=1, axes=(1,0))
is the reverse ofrot90_boxes(boxes, spatial_size, k=1, axes=(0,1))
rot90_boxes(boxes, spatial_size, k=1, axes=(1,0))
is equivalent torot90_boxes(boxes, spatial_size, k=-1, axes=(0,1))
- monai.apps.detection.transforms.box_ops.select_labels(labels, keep)[source]#
For element in labels, select indice keep from it.
- Parameters
labels (
Union
[Sequence
[Union
[ndarray
,Tensor
]],ndarray
,Tensor
]) – Sequence of array. Each element represents classification labels or scores corresponding toboxes
, sized (N,).keep (
Union
[ndarray
,Tensor
]) – the indices to keep, same length with each element in labels.
- Return type
Union
[Tuple
,ndarray
,Tensor
]- Returns
selected labels, does not share memory with original labels.
- monai.apps.detection.transforms.box_ops.swapaxes_boxes(boxes, axis1, axis2)[source]#
Interchange two axes of boxes.
- Parameters
boxes (
Union
[ndarray
,Tensor
]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
axis1 (
int
) – First axis.axis2 (
int
) – Second axis.
- Returns
boxes with two axes interchanged.
- monai.apps.detection.transforms.box_ops.zoom_boxes(boxes, zoom)[source]#
Zoom boxes
- Parameters
boxes (
Union
[ndarray
,Tensor
]) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardModezoom (
Union
[Sequence
[float
],float
]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.
- Returns
zoomed boxes, with same data type as
boxes
, does not share memory withboxes
Example
boxes = torch.ones(1,4) zoom_boxes(boxes, zoom=[0.5,2.2]) # will return tensor([[0.5, 2.2, 0.5, 2.2]])
A collection of “vanilla” transforms for box operations https://github.com/Project-MONAI/MONAI/wiki/MONAI_Design
- class monai.apps.detection.transforms.array.BoxToMask(bg_label=-1, ellipse_mask=False)[source]#
Convert box to int16 mask image, which has the same size with the input image.
- Parameters
bg_label (
int
) – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.ellipse_mask (
bool
) –bool.
If True, it assumes the object shape is close to ellipse or ellipsoid.
If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.
- class monai.apps.detection.transforms.array.ClipBoxToImage(remove_empty=False)[source]#
Clip the bounding boxes and the associated labels/scores to make sure they are within the image. There might be multiple arrays of labels/scores associated with one array of boxes.
- Parameters
remove_empty (
bool
) – whether to remove the boxes and corresponding labels that are actually empty
- class monai.apps.detection.transforms.array.ConvertBoxMode(src_mode=None, dst_mode=None)[source]#
This transform converts the boxes in src_mode to the dst_mode.
- Parameters
Note
StandardMode
=CornerCornerModeTypeA
, also represented as “xyxy” for 2D and “xyzxyz” for 3D.- src_mode and dst_mode can be:
- str: choose from
BoxModeName
, for example, “xyxy”: boxes has format [xmin, ymin, xmax, ymax]
“xyzxyz”: boxes has format [xmin, ymin, zmin, xmax, ymax, zmax]
“xxyy”: boxes has format [xmin, xmax, ymin, ymax]
“xxyyzz”: boxes has format [xmin, xmax, ymin, ymax, zmin, zmax]
“xyxyzz”: boxes has format [xmin, ymin, xmax, ymax, zmin, zmax]
“xywh”: boxes has format [xmin, ymin, xsize, ysize]
“xyzwhd”: boxes has format [xmin, ymin, zmin, xsize, ysize, zsize]
“ccwh”: boxes has format [xcenter, ycenter, xsize, ysize]
“cccwhd”: boxes has format [xcenter, ycenter, zcenter, xsize, ysize, zsize]
- str: choose from
- BoxMode class: choose from the subclasses of
BoxMode
, for example, CornerCornerModeTypeA: equivalent to “xyxy” or “xyzxyz”
CornerCornerModeTypeB: equivalent to “xxyy” or “xxyyzz”
CornerCornerModeTypeC: equivalent to “xyxy” or “xyxyzz”
CornerSizeMode: equivalent to “xywh” or “xyzwhd”
CenterSizeMode: equivalent to “ccwh” or “cccwhd”
- BoxMode class: choose from the subclasses of
- BoxMode object: choose from the subclasses of
BoxMode
, for example, CornerCornerModeTypeA(): equivalent to “xyxy” or “xyzxyz”
CornerCornerModeTypeB(): equivalent to “xxyy” or “xxyyzz”
CornerCornerModeTypeC(): equivalent to “xyxy” or “xyxyzz”
CornerSizeMode(): equivalent to “xywh” or “xyzwhd”
CenterSizeMode(): equivalent to “ccwh” or “cccwhd”
- BoxMode object: choose from the subclasses of
None: will assume mode is
StandardMode()
Example
boxes = torch.ones(10,4) # convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize]. box_converter = ConvertBoxMode(src_mode="xyxy", dst_mode="ccwh") box_converter(boxes)
- class monai.apps.detection.transforms.array.ConvertBoxToStandardMode(mode=None)[source]#
Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].
- Parameters
mode (
Union
[str
,BoxMode
,Type
[BoxMode
],None
]) – source box mode. If it is not given, this func will assume it isStandardMode()
. It follows the same format withsrc_mode
inConvertBoxMode
.
Example
boxes = torch.ones(10,6) # convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax] box_converter = ConvertBoxToStandardMode(mode="xxyyzz") box_converter(boxes)
- class monai.apps.detection.transforms.array.FlipBox(spatial_axis=None)[source]#
Reverses the box coordinates along the given spatial axis. Preserves shape.
- Parameters
spatial_axis (
Union
[Sequence
[int
],int
,None
]) – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.
- class monai.apps.detection.transforms.array.MaskToBox(bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#
Convert int16 mask image to box, which has the same size with the input image. Pairs with
monai.apps.detection.transforms.array.BoxToMask
. Please make sure the samemin_fg_label
is used when using the two transforms in pairs.- Parameters
bg_label (
int
) – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.box_dtype – output dtype for boxes
label_dtype – output dtype for labels
- class monai.apps.detection.transforms.array.ResizeBox(spatial_size, size_mode='all', **kwargs)[source]#
Resize the input boxes when the corresponding image is resized to given spatial size (with scaling, not cropping/padding).
- Parameters
spatial_size (
Union
[Sequence
[int
],int
]) – expected shape of spatial dimensions after resize operation. if some components of the spatial_size are non-positive values, the transform will use the corresponding components of img size. For example, spatial_size=(32, -1) will be adapted to (32, 64) if the second spatial dimension size of img is 64.size_mode (
str
) – should be “all” or “longest”, if “all”, will use spatial_size for all the spatial dims, if “longest”, rescale the image so that only the longest side is equal to specified spatial_size, which must be an int number in this case, keeping the aspect ratio of the initial image, refer to: https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/ #albumentations.augmentations.geometric.resize.LongestMaxSize.kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.
- class monai.apps.detection.transforms.array.RotateBox90(k=1, spatial_axes=(0, 1))[source]#
Rotate a boxes by 90 degrees in the plane specified by axes. See box_ops.rot90_boxes for additional details
- Parameters
k (
int
) – number of times to rotate by 90 degrees.spatial_axes (
Tuple
[int
,int
]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions. If axis is negative it counts from the last to the first axis.
- class monai.apps.detection.transforms.array.SpatialCropBox(roi_center=None, roi_size=None, roi_start=None, roi_end=None, roi_slices=None)[source]#
General purpose box cropper when the corresponding image is cropped by SpatialCrop(*) with the same ROI. The difference is that we do not support negative indexing for roi_slices.
If a dimension of the expected ROI size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected ROI, and the cropped results of several images may not have exactly the same shape. It can support to crop ND spatial boxes.
- The cropped region can be parameterised in various ways:
a list of slices for each spatial dimension (do not allow for use of negative indexing)
a spatial center and size
the start and end coordinates of the ROI
- Parameters
roi_center (
Union
[Sequence
[int
],ndarray
,Tensor
,None
]) – voxel coordinates for center of the crop ROI.roi_size (
Union
[Sequence
[int
],ndarray
,Tensor
,None
]) – size of the crop ROI, if a dimension of ROI size is bigger than image size, will not crop that dimension of the image.roi_start (
Union
[Sequence
[int
],ndarray
,Tensor
,None
]) – voxel coordinates for start of the crop ROI.roi_end (
Union
[Sequence
[int
],ndarray
,Tensor
,None
]) – voxel coordinates for end of the crop ROI, if a coordinate is out of image, use the end coordinate of image.roi_slices (
Optional
[Sequence
[slice
]]) – list of slices for each of the spatial dimensions.
- class monai.apps.detection.transforms.array.ZoomBox(zoom, keep_size=False, **kwargs)[source]#
Zooms an ND Box with same padding or slicing setting with Zoom().
- Parameters
zoom (
Union
[Sequence
[float
],float
]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.keep_size (
bool
) – Should keep original size (padding/slicing if needed), default is True.kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.
A collection of dictionary-based wrappers around the “vanilla” transforms for box operations
defined in monai.apps.detection.transforms.array
.
Class names are ended with ‘d’ to denote dictionary-based transforms.
- monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateD#
alias of
AffineBoxToImageCoordinated
- monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateDict#
alias of
AffineBoxToImageCoordinated
- class monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinated(box_keys, box_ref_image_keys, allow_missing_keys=False, image_meta_key=None, image_meta_key_postfix='meta_dict', affine_lps_to_ras=False)[source]#
Dictionary-based transform that converts box in world coordinate to image coordinate.
- Parameters
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
str
) – The single key that represents the reference image to whichbox_keys
are attached.remove_empty – whether to remove the boxes that are actually empty
allow_missing_keys (
bool
) – don’t raise exception if key is missing.image_meta_key (
Optional
[str
]) – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, affine, original_shape, etc. it is a string, map to the box_ref_image_key. if None, will try to construct meta_keys by box_ref_image_key_{meta_key_postfix}.image_meta_key_postfix (
Optional
[str
]) – if image_meta_keys=None, use box_ref_image_key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.affine_lps_to_ras – default
False
. Yet if 1) the image is read by ITKReader, and 2) the ITKReader has affine_lps_to_ras=True, and 3) the box is in world coordinate, then setaffine_lps_to_ras=True
.
- monai.apps.detection.transforms.dictionary.BoxToMaskD#
alias of
BoxToMaskd
- monai.apps.detection.transforms.dictionary.BoxToMaskDict#
alias of
BoxToMaskd
- class monai.apps.detection.transforms.dictionary.BoxToMaskd(box_keys, box_mask_keys, label_keys, box_ref_image_keys, min_fg_label, ellipse_mask=False, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.BoxToMask
. Pairs withmonai.apps.detection.transforms.dictionary.MaskToBoxd
. Please make sure the samemin_fg_label
is used when using the two transforms in pairs. The outputd[box_mask_key]
will have background intensity 0, since the following operations may pad 0 on the border.This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.
use
BoxToMaskd
to covert boxes and labels to box_masks;do transforms, e.g., rotation or cropping, on images and box_masks together;
use
MaskToBoxd
to convert box_masks back to boxes and labels.
- Parameters
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_mask_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to store output box mask results for transformation. Same length withbox_keys
.label_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the labels corresponding to thebox_keys
. Same length withbox_keys
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.min_fg_label (
int
) – min foreground box label.ellipse_mask (
bool
) –bool.
If True, it assumes the object shape is close to ellipse or ellipsoid.
If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.
allow_missing_keys (
bool
) – don’t raise exception if key is missing.
Example
# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and image together. import numpy as np from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd transforms = Compose( [ BoxToMaskd( box_keys="boxes", label_keys="labels", box_mask_keys="box_mask", box_ref_image_keys="image", min_fg_label=0, ellipse_mask=True ), RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"], prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6, keep_size=True,padding_mode="zeros" ), RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False), MaskToBoxd( box_mask_keys="box_mask", box_keys="boxes", label_keys="labels", min_fg_label=0 ) DeleteItemsd(keys=["box_mask"]), ] )
- monai.apps.detection.transforms.dictionary.ClipBoxToImageD#
alias of
ClipBoxToImaged
- monai.apps.detection.transforms.dictionary.ClipBoxToImageDict#
alias of
ClipBoxToImaged
- class monai.apps.detection.transforms.dictionary.ClipBoxToImaged(box_keys, label_keys, box_ref_image_keys, remove_empty=True, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.ClipBoxToImage
.Clip the bounding boxes and the associated labels/scores to makes sure they are within the image. There might be multiple keys of labels/scores associated with one key of boxes.
- Parameters
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – The single key to pick box data for transformation. The box mode is assumed to beStandardMode
.label_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the labels corresponding to thebox_keys
. Multiple keys are allowed.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – The single key that represents the reference image to whichbox_keys
andlabel_keys
are attached.remove_empty (
bool
) – whether to remove the boxes that are actually emptyallow_missing_keys (
bool
) – don’t raise exception if key is missing.
Example
ClipBoxToImaged( box_keys="boxes", box_ref_image_keys="image", label_keys=["labels", "scores"], remove_empty=True )
- monai.apps.detection.transforms.dictionary.ConvertBoxModeD#
alias of
ConvertBoxModed
- monai.apps.detection.transforms.dictionary.ConvertBoxModeDict#
alias of
ConvertBoxModed
- class monai.apps.detection.transforms.dictionary.ConvertBoxModed(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.ConvertBoxMode
.This transform converts the boxes in src_mode to the dst_mode.
Example
data = {"boxes": torch.ones(10,4)} # convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize]. box_converter = ConvertBoxModed(box_keys=["boxes"], src_mode="xyxy", dst_mode="ccwh") box_converter(data)
- __init__(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#
- Parameters
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick data for transformation.src_mode (
Union
[str
,BoxMode
,Type
[BoxMode
],None
]) – source box mode. If it is not given, this func will assume it isStandardMode()
. It follows the same format withsrc_mode
inConvertBoxMode
.dst_mode (
Union
[str
,BoxMode
,Type
[BoxMode
],None
]) – target box mode. If it is not given, this func will assume it isStandardMode()
. It follows the same format withsrc_mode
inConvertBoxMode
.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
See also
monai.apps.detection,transforms.array.ConvertBoxMode
- monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeD#
alias of
ConvertBoxToStandardModed
- monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeDict#
alias of
ConvertBoxToStandardModed
- class monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModed(box_keys, mode=None, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.ConvertBoxToStandardMode
.Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].
Example
data = {"boxes": torch.ones(10,6)} # convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax] box_converter = ConvertBoxToStandardModed(box_keys=["boxes"], mode="xxyyzz") box_converter(data)
- __init__(box_keys, mode=None, allow_missing_keys=False)[source]#
- Parameters
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick data for transformation.mode (
Union
[str
,BoxMode
,Type
[BoxMode
],None
]) – source box mode. If it is not given, this func will assume it isStandardMode()
. It follows the same format withsrc_mode
inConvertBoxMode
.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
See also
monai.apps.detection,transforms.array.ConvertBoxToStandardMode
- class monai.apps.detection.transforms.dictionary.FlipBoxd(image_keys, box_keys, box_ref_image_keys, spatial_axis=None, allow_missing_keys=False)[source]#
Dictionary-based transform that flip boxes and images.
- Parameters
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation.box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.spatial_axis (
Union
[Sequence
[int
],int
,None
]) – Spatial axes along which to flip over. Default is None.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- monai.apps.detection.transforms.dictionary.MaskToBoxD#
alias of
MaskToBoxd
- monai.apps.detection.transforms.dictionary.MaskToBoxDict#
alias of
MaskToBoxd
- class monai.apps.detection.transforms.dictionary.MaskToBoxd(box_keys, box_mask_keys, label_keys, min_fg_label, box_dtype=torch.float32, label_dtype=torch.int64, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.MaskToBox
. Pairs withmonai.apps.detection.transforms.dictionary.BoxToMaskd
. Please make sure the samemin_fg_label
is used when using the two transforms in pairs.This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.
use
BoxToMaskd
to covert boxes and labels to box_masks;do transforms, e.g., rotation or cropping, on images and box_masks together;
use
MaskToBoxd
to convert box_masks back to boxes and labels.
- Parameters
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_mask_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to store output box mask results for transformation. Same length withbox_keys
.label_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the labels corresponding to thebox_keys
. Same length withbox_keys
.min_fg_label (
int
) – min foreground box label.box_dtype – output dtype for box_keys
label_dtype – output dtype for label_keys
allow_missing_keys (
bool
) – don’t raise exception if key is missing.
Example
# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and images together. import numpy as np from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd transforms = Compose( [ BoxToMaskd( box_keys="boxes", label_keys="labels", box_mask_keys="box_mask", box_ref_image_keys="image", min_fg_label=0, ellipse_mask=True ), RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"], prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6, keep_size=True,padding_mode="zeros" ), RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False), MaskToBoxd( box_mask_keys="box_mask", box_keys="boxes", label_keys="labels", min_fg_label=0 ) DeleteItemsd(keys=["box_mask"]), ] )
- monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelD#
alias of
RandCropBoxByPosNegLabeld
- monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelDict#
alias of
RandCropBoxByPosNegLabeld
- class monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabeld(image_keys, box_keys, label_keys, spatial_size, pos=1.0, neg=1.0, num_samples=1, whole_box=True, thresh_image_key=None, image_threshold=0.0, fg_indices_key=None, bg_indices_key=None, meta_keys=None, meta_key_postfix='meta_dict', allow_smaller=False, allow_missing_keys=False)[source]#
Crop random fixed sized regions that contains foreground boxes. Suppose all the expected fields specified by image_keys have same shape, and add patch_index to the corresponding meta data. And will return a list of dictionaries for all the cropped images. If a dimension of the expected spatial size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected size, and the cropped results of several images may not have exactly the same shape.
- Parameters
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation. They need to have the same spatial size.box_keys (
str
) – The single key to pick box data for transformation. The box mode is assumed to beStandardMode
.label_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the labels corresponding to thebox_keys
. Multiple keys are allowed.spatial_size (
Union
[Sequence
[int
],int
]) – the spatial size of the crop region e.g. [224, 224, 128]. if a dimension of ROI size is bigger than image size, will not crop that dimension of the image. if its components have non-positive values, the corresponding size of data[label_key] will be used. for example: if the spatial size of input data is [40, 40, 40] and spatial_size=[32, 64, -1], the spatial size of output data will be [32, 40, 40].pos (
float
) – used with neg together to calculate the ratiopos / (pos + neg)
for the probability to pick a foreground voxel as a center rather than a background voxel.neg (
float
) – used with pos together to calculate the ratiopos / (pos + neg)
for the probability to pick a foreground voxel as a center rather than a background voxel.num_samples (
int
) – number of samples (crop regions) to take in each list.whole_box (
bool
) – Bool, default True, whether we prefer to contain at least one whole box in the cropped foreground patch. Even if True, it is still possible to get partial box if there are multiple boxes in the image.thresh_image_key (
Optional
[str
]) – if thresh_image_key is not None, uselabel == 0 & thresh_image > image_threshold
to select the negative sample(background) center. so the crop center will only exist on valid image area.image_threshold (
float
) – if enabled thresh_image_key, usethresh_image > image_threshold
to determine the valid image content area.fg_indices_key (
Optional
[str
]) – if provided pre-computed foreground indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.bg_indices_key (
Optional
[str
]) – if provided pre-computed background indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.meta_keys (
Union
[Collection
[Hashable
],Hashable
,None
]) – explicitly indicate the key of the corresponding metadata dictionary. used to add patch_index to the meta dict. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.meta_key_postfix (
str
) – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. used to add patch_index to the meta dict.allow_smaller (
bool
) – if False, an exception will be raised if the image is smaller than the requested ROI in any dimension. If True, any smaller dimensions will be set to match the cropped size (i.e., no cropping in that dimension).allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- randomize(boxes, image_size, fg_indices=None, bg_indices=None, thresh_image=None)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
None
- monai.apps.detection.transforms.dictionary.RandFlipBoxD#
alias of
RandFlipBoxd
- monai.apps.detection.transforms.dictionary.RandFlipBoxDict#
alias of
RandFlipBoxd
- class monai.apps.detection.transforms.dictionary.RandFlipBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, spatial_axis=None, allow_missing_keys=False)[source]#
Dictionary-based transform that randomly flip boxes and images with the given probabilities.
- Parameters
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation.box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.prob (
float
) – Probability of flipping.spatial_axis (
Union
[Sequence
[int
],int
,None
]) – Spatial axes along which to flip over. Default is None.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- inverse(data)[source]#
Inverse of
__call__
.- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
Dict
[Hashable
,Tensor
]
- set_random_state(seed=None, state=None)[source]#
Set the random state locally, to control the randomness, the derived classes should use
self.R
instead of np.random to introduce random factors.- Parameters
seed (
Optional
[int
]) – set the random state with an integer seed.state (
Optional
[RandomState
]) – set the random state with a np.random.RandomState object.
- Raises
TypeError – When
state
is not anOptional[np.random.RandomState]
.- Return type
- Returns
a Randomizable instance.
- monai.apps.detection.transforms.dictionary.RandRotateBox90D#
alias of
RandRotateBox90d
- monai.apps.detection.transforms.dictionary.RandRotateBox90Dict#
alias of
RandRotateBox90d
- class monai.apps.detection.transforms.dictionary.RandRotateBox90d(image_keys, box_keys, box_ref_image_keys, prob=0.1, max_k=3, spatial_axes=(0, 1), allow_missing_keys=False)[source]#
With probability prob, input boxes and images are rotated by 90 degrees in the plane specified by spatial_axes.
- Parameters
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation.box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.prob (
float
) – probability of rotating. (Default 0.1, with 10% probability it returns a rotated array.)max_k (
int
) – number of rotations will be sampled from np.random.randint(max_k) + 1. (Default 3)spatial_axes (
Tuple
[int
,int
]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- monai.apps.detection.transforms.dictionary.RandZoomBoxD#
alias of
RandZoomBoxd
- monai.apps.detection.transforms.dictionary.RandZoomBoxDict#
alias of
RandZoomBoxd
- class monai.apps.detection.transforms.dictionary.RandZoomBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, min_zoom=0.9, max_zoom=1.1, mode=InterpolateMode.AREA, padding_mode=NumpyPadMode.EDGE, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#
Dictionary-based transform that randomly zooms input boxes and images with given probability within given zoom range.
- Parameters
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation.box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.prob (
float
) – Probability of zooming.min_zoom (
Union
[Sequence
[float
],float
]) – Min zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, min_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.max_zoom (
Union
[Sequence
[float
],float
]) – Max zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, max_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.mode (
Union
[Sequence
[str
],str
]) – {"nearest"
,"nearest-exact"
,"linear"
,"bilinear"
,"bicubic"
,"trilinear"
,"area"
} The interpolation mode. Defaults to"area"
. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key inkeys
.padding_mode (
Union
[Sequence
[str
],str
]) – available modes for numpy array:{"constant"
,"edge"
,"linear_ramp"
,"maximum"
,"mean"
,"median"
,"minimum"
,"reflect"
,"symmetric"
,"wrap"
,"empty"
} available modes for PyTorch Tensor: {"constant"
,"reflect"
,"replicate"
,"circular"
}. One of the listed string values or a user supplied function. Defaults to"constant"
. The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlalign_corners (
Union
[Sequence
[Optional
[bool
]],bool
,None
]) – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key inkeys
.keep_size (
bool
) – Should keep original size (pad if needed), default is True.allow_missing_keys (
bool
) – don’t raise exception if key is missing.kwargs – other args for np.pad API, note that np.pad treats channel dimension as the first dimension. more details: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html
- inverse(data)[source]#
Inverse of
__call__
.- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
Dict
[Hashable
,Tensor
]
- set_random_state(seed=None, state=None)[source]#
Set the random state locally, to control the randomness, the derived classes should use
self.R
instead of np.random to introduce random factors.- Parameters
seed (
Optional
[int
]) – set the random state with an integer seed.state (
Optional
[RandomState
]) – set the random state with a np.random.RandomState object.
- Raises
TypeError – When
state
is not anOptional[np.random.RandomState]
.- Return type
- Returns
a Randomizable instance.
- monai.apps.detection.transforms.dictionary.RotateBox90D#
alias of
RotateBox90d
- monai.apps.detection.transforms.dictionary.RotateBox90Dict#
alias of
RotateBox90d
- class monai.apps.detection.transforms.dictionary.RotateBox90d(image_keys, box_keys, box_ref_image_keys, k=1, spatial_axes=(0, 1), allow_missing_keys=False)[source]#
Input boxes and images are rotated by 90 degrees in the plane specified by
spatial_axes
fork
times- Parameters
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation.box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.k (
int
) – number of times to rotate by 90 degrees.spatial_axes (
Tuple
[int
,int
]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default (0, 1), this is the first two axis in spatial dimensions.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.detection.transforms.dictionary.ZoomBoxd(image_keys, box_keys, box_ref_image_keys, zoom, mode=InterpolateMode.AREA, padding_mode=NumpyPadMode.EDGE, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#
Dictionary-based transform that zooms input boxes and images with the given zoom scale.
- Parameters
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation.box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.zoom (
Union
[Sequence
[float
],float
]) – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.mode (
Union
[Sequence
[str
],str
]) – {"nearest"
,"nearest-exact"
,"linear"
,"bilinear"
,"bicubic"
,"trilinear"
,"area"
} The interpolation mode. Defaults to"area"
. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key inkeys
.padding_mode (
Union
[Sequence
[str
],str
]) – available modes for numpy array:{"constant"
,"edge"
,"linear_ramp"
,"maximum"
,"mean"
,"median"
,"minimum"
,"reflect"
,"symmetric"
,"wrap"
,"empty"
} available modes for PyTorch Tensor: {"constant"
,"reflect"
,"replicate"
,"circular"
}. One of the listed string values or a user supplied function. Defaults to"constant"
. The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlalign_corners (
Union
[Sequence
[Optional
[bool
]],bool
,None
]) – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key inkeys
.keep_size (
bool
) – Should keep original size (pad if needed), default is True.allow_missing_keys (
bool
) – don’t raise exception if key is missing.kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.
Anchor#
This script is adapted from https://github.com/pytorch/vision/blob/release/0.12/torchvision/models/detection/anchor_utils.py
- class monai.apps.detection.utils.anchor_utils.AnchorGenerator(sizes=((20, 30, 40),), aspect_ratios=(((0.5, 1), (1, 0.5)),), indexing='ij')[source]#
This module is modified from torchvision to support both 2D and 3D images.
Module that generates anchors for a set of feature maps and image sizes.
The module support computing anchors at multiple sizes and aspect ratios per feature map.
sizes and aspect_ratios should have the same number of elements, and it should correspond to the number of feature maps.
sizes[i] and aspect_ratios[i] can have an arbitrary number of elements. For 2D images, anchor width and height w:h = 1:aspect_ratios[i,j] For 3D images, anchor width, height, and depth w:h:d = 1:aspect_ratios[i,j,0]:aspect_ratios[i,j,1]
AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors per spatial location for feature map i.
- Parameters
sizes (
Sequence
[Sequence
[int
]]) – base size of each anchor. len(sizes) is the number of feature maps, i.e., the number of output levels for the feature pyramid network (FPN). Each element ofsizes
is a Sequence which represents several anchor sizes for each feature map.aspect_ratios (
Sequence
) – the aspect ratios of anchors.len(aspect_ratios) = len(sizes)
. For 2D images, each element ofaspect_ratios[i]
is a Sequence of float. For 3D images, each element ofaspect_ratios[i]
is a Sequence of 2 value Sequence.indexing (
str
) –choose from {
'ij'
,'xy'
}, optional, Matrix ('ij'
, default and recommended) or Cartesian ('xy'
) indexing of output.Matrix (
'ij'
, default and recommended) indexing keeps the original axis not changed.To use other monai detection components, please set
indexing = 'ij'
.Cartesian (
'xy'
) indexing swaps axis 0 and 1.For 2D cases, monai
AnchorGenerator(sizes, aspect_ratios, indexing='xy')
andtorchvision.models.detection.anchor_utils.AnchorGenerator(sizes, aspect_ratios)
are equivalent.
- Reference:.
https://github.com/pytorch/vision/blob/release/0.12/torchvision/models/detection/anchor_utils.py
Example
# 2D example inputs for a 2-level feature maps sizes = ((10,12,14,16), (20,24,28,32)) base_aspect_ratios = (1., 0.5, 2.) aspect_ratios = (base_aspect_ratios, base_aspect_ratios) anchor_generator = AnchorGenerator(sizes, aspect_ratios) # 3D example inputs for a 2-level feature maps sizes = ((10,12,14,16), (20,24,28,32)) base_aspect_ratios = ((1., 1.), (1., 0.5), (0.5, 1.), (2., 2.)) aspect_ratios = (base_aspect_ratios, base_aspect_ratios) anchor_generator = AnchorGenerator(sizes, aspect_ratios)
- forward(images, feature_maps)[source]#
Generate anchor boxes for each image.
- Parameters
images (
Tensor
) – sized (B, C, W, H) or (B, C, W, H, D)feature_maps (
List
[Tensor
]) – for FPN level i, feature_maps[i] is sized (B, C_i, W_i, H_i) or (B, C_i, W_i, H_i, D_i). This input argument does not have to be the actual feature maps. Any list variable with the same (C_i, W_i, H_i) or (C_i, W_i, H_i, D_i) as feature maps works.
- Return type
List
[Tensor
]- Returns
A list with length of B. Each element represents the anchors for this image. The B elements are identical.
Example
images = torch.zeros((3,1,128,128,128)) feature_maps = [torch.zeros((3,6,64,64,32)), torch.zeros((3,6,32,32,16))] anchor_generator(images, feature_maps)
- generate_anchors(scales, aspect_ratios, dtype=torch.float32, device=None)[source]#
Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.
- Parameters
scales (
Sequence
) – a sequence which represents several anchor sizes for the current feature map.aspect_ratios (
Sequence
) – a sequence which represents several aspect_ratios for the current feature map. For 2D images, it is a Sequence of float aspect_ratios[j], anchor width and height w:h = 1:aspect_ratios[j]. For 3D images, it is a Sequence of 2 value Sequence aspect_ratios[j,0] and aspect_ratios[j,1], anchor width, height, and depth w:h:d = 1:aspect_ratios[j,0]:aspect_ratios[j,1]dtype (
dtype
) – target data type of the output Tensor.device (
Optional
[device
]) – target device to put the output Tensor data.Returns – For each s in scales, returns [s, s*aspect_ratios[j]] for 2D images, and [s, s*aspect_ratios[j,0],s*aspect_ratios[j,1]] for 3D images.
- Return type
Tensor
- grid_anchors(grid_sizes, strides)[source]#
Every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:spatial_dims) corresponds to a feature map. It outputs g[i] anchors that are s[i] distance apart in direction i, with the same dimensions as a.
- Parameters
grid_sizes (
List
[List
[int
]]) – spatial size of the feature mapsstrides (
List
[List
[Tensor
]]) – strides of the feature maps regarding to the original image
Example
grid_sizes = [[100,100],[50,50]] strides = [[torch.tensor(2),torch.tensor(2)], [torch.tensor(4),torch.tensor(4)]]
- Return type
List
[Tensor
]
- class monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(feature_map_scales=(1, 2, 4, 8), base_anchor_shapes=((32, 32, 32), (48, 20, 20), (20, 48, 20), (20, 20, 48)), indexing='ij')[source]#
Module that generates anchors for a set of feature maps and image sizes, inherited from
AnchorGenerator
The module support computing anchors at multiple base anchor shapes per feature map.
feature_map_scales
should have the same number of elements with the number of feature maps.base_anchor_shapes can have an arbitrary number of elements. For 2D images, each element represents anchor width and height [w,h]. For 2D images, each element represents anchor width, height, and depth [w,h,d].
AnchorGenerator will output a set of
len(base_anchor_shapes)
anchors per spatial location for feature mapi
.- Parameters
feature_map_scales (
Union
[Sequence
[int
],Sequence
[float
]]) – scale of anchors for each feature map, i.e., each output level of the feature pyramid network (FPN).len(feature_map_scales)
is the number of feature maps.scale[i]*base_anchor_shapes
represents the anchor shapes for feature mapi
.base_anchor_shapes (
Union
[Sequence
[Sequence
[int
]],Sequence
[Sequence
[float
]]]) – a sequence which represents several anchor shapes for one feature map. For N-D images, it is a Sequence of N value Sequence.indexing (
str
) – choose from {‘xy’, ‘ij’}, optional Cartesian (‘xy’) or matrix (‘ij’, default) indexing of output. Cartesian (‘xy’) indexing swaps axis 0 and 1, which is the setting inside torchvision. matrix (‘ij’, default) indexing keeps the original axis not changed. See also indexing in https://pytorch.org/docs/stable/generated/torch.meshgrid.html
Example
# 2D example inputs for a 2-level feature maps feature_map_scales = (1, 2) base_anchor_shapes = ((10, 10), (6, 12), (12, 6)) anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes) # 3D example inputs for a 2-level feature maps feature_map_scales = (1, 2) base_anchor_shapes = ((10, 10, 10), (12, 12, 8), (10, 10, 6), (16, 16, 10)) anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes)
- static generate_anchors_using_shape(anchor_shapes, dtype=torch.float32, device=None)[source]#
Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.
- Parameters
anchor_shapes (
Tensor
) – [w, h] or [w, h, d], sized (N, spatial_dims), represents N anchor shapes for the current feature map.dtype (
dtype
) – target data type of the output Tensor.device (
Optional
[device
]) – target device to put the output Tensor data.
- Return type
Tensor
- Returns
For 2D images, returns [-w/2, -h/2, w/2, h/2]; For 3D images, returns [-w/2, -h/2, -d/2, w/2, h/2, d/2]
Matcher#
The functions in this script are adapted from nnDetection, https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/core/boxes/matcher.py which is adapted from torchvision.
These are the changes compared with nndetection: 1) comments and docstrings; 2) reformat; 3) add a debug option to ATSSMatcher to help the users to tune parameters; 4) add a corner case return in ATSSMatcher.compute_matches; 5) add support for float16 cpu
- class monai.apps.detection.utils.ATSS_matcher.ATSSMatcher(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#
- __init__(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#
Compute matching based on ATSS https://arxiv.org/abs/1912.02424 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
- Parameters
num_candidates (
int
) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.similarity_fn (
Callable
[[Tensor
,Tensor
],Tensor
]) – function for similarity computation between boxes and anchorscenter_in_gt (
bool
) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.debug (
bool
) – if True, will print the matcher threshold in order to tunenum_candidates
andcenter_in_gt
.
- compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#
Compute matches according to ATTS for a single image Adapted from (https://github.com/sfzhang15/ATSS/blob/79dfb28bd1/atss_core/modeling/rpn/atss/loss.py#L180-L184)
- Parameters
boxes (
Tensor
) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
anchors (
Tensor
) – anchors to match Mx4 or Mx6, also assumed to beStandardMode
.num_anchors_per_level (
Sequence
[int
]) – number of anchors per feature pyramid levelnum_anchors_per_loc (
int
) – number of anchors per position
- Return type
Tuple
[Tensor
,Tensor
]- Returns
matrix which contains the similarity from each boxes to each anchor [N, M]
vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]
Note
StandardMode
=CornerCornerModeTypeA
, also represented as “xyxy” ([xmin, ymin, xmax, ymax]) for 2D and “xyzxyz” ([xmin, ymin, zmin, xmax, ymax, zmax]) for 3D.
- class monai.apps.detection.utils.ATSS_matcher.Matcher(similarity_fn=<function box_iou>)[source]#
Base class of Matcher, which matches boxes and anchors to each other
- Parameters
similarity_fn (
Callable
[[Tensor
,Tensor
],Tensor
]) – function for similarity computation between boxes and anchors
- compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#
Compute matches
- Parameters
boxes (
Tensor
) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
anchors (
Tensor
) – anchors to match Mx4 or Mx6, also assumed to beStandardMode
.num_anchors_per_level (
Sequence
[int
]) – number of anchors per feature pyramid levelnum_anchors_per_loc (
int
) – number of anchors per position
- Return type
Tuple
[Tensor
,Tensor
]- Returns
matrix which contains the similarity from each boxes to each anchor [N, M]
vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]
Box coder#
This script is modified from torchvision to support N-D images,
https://github.com/pytorch/vision/blob/main/torchvision/models/detection/_utils.py
- class monai.apps.detection.utils.box_coder.BoxCoder(weights, boxes_xform_clip=None)[source]#
This class encodes and decodes a set of bounding boxes into the representation used for training the regressors.
- Parameters
weights (
Tuple
[float
]) – 4-element tuple or 6-element tupleboxes_xform_clip (
Optional
[float
]) – high threshold to prevent sending too large values into torch.exp()
Example
box_coder = BoxCoder(weights=[1., 1., 1., 1., 1., 1.]) gt_boxes = torch.tensor([[1,2,1,4,5,6],[1,3,2,7,8,9]]) proposals = gt_boxes + torch.rand(gt_boxes.shape) rel_gt_boxes = box_coder.encode_single(gt_boxes, proposals) gt_back = box_coder.decode_single(rel_gt_boxes, proposals) # We expect gt_back to be equal to gt_boxes
- decode(rel_codes, reference_boxes)[source]#
From a set of original reference_boxes and encoded relative box offsets,
- Parameters
rel_codes (
Tensor
) – encoded boxes, Nx4 or Nx6 torch tensor.boxes – a list of reference boxes, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to be
StandardMode
- Return type
Tensor
- Returns
decoded boxes, Nx1x4 or Nx1x6 torch tensor. The box mode will be
StandardMode
- decode_single(rel_codes, reference_boxes)[source]#
From a set of original boxes and encoded relative box offsets,
- Parameters
rel_codes (
Tensor
) – encoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor.reference_boxes (
Tensor
) – reference boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
- Return type
Tensor
- Returns
decoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor. The box mode will to be
StandardMode
- encode(gt_boxes, proposals)[source]#
Encode a set of proposals with respect to some ground truth (gt) boxes.
- Parameters
gt_boxes (
Sequence
[Tensor
]) – list of gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
proposals (
Sequence
[Tensor
]) – list of boxes to be encoded, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to beStandardMode
- Return type
Tuple
[Tensor
]- Returns
- A tuple of encoded gt, target of box regression that is used to
convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.
- encode_single(gt_boxes, proposals)[source]#
Encode proposals with respect to ground truth (gt) boxes.
- Parameters
gt_boxes (
Tensor
) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
proposals (
Tensor
) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
- Return type
Tensor
- Returns
encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.
- monai.apps.detection.utils.box_coder.encode_boxes(gt_boxes, proposals, weights)[source]#
Encode a set of proposals with respect to some reference ground truth (gt) boxes.
- Parameters
gt_boxes (
Tensor
) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
proposals (
Tensor
) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
weights (
Tensor
) – the weights for(cx, cy, w, h) or (cx,cy,cz, w,h,d)
- Return type
Tensor
- Returns
encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.
Detection Utilities#
- monai.apps.detection.utils.detector_utils.check_input_images(input_images, spatial_dims)[source]#
Validate the input dimensionality (raise a ValueError if invalid).
- Parameters
input_images (
Union
[List
[Tensor
],Tensor
]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).spatial_dims (
int
) – number of spatial dimensions of the images, 2 or 3.
- Return type
None
- monai.apps.detection.utils.detector_utils.check_training_targets(input_images, targets, spatial_dims, target_label_key, target_box_key)[source]#
Validate the input images/targets during training (raise a ValueError if invalid).
- Parameters
input_images (
Union
[List
[Tensor
],Tensor
]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).targets (
Optional
[List
[Dict
[str
,Tensor
]]]) – a list of dict. Each dict with two keys: target_box_key and target_label_key, ground-truth boxes present in the image.spatial_dims (
int
) – number of spatial dimensions of the images, 2 or 3.target_label_key (
str
) – the expected key of target labels.target_box_key (
str
) – the expected key of target boxes.
- Return type
None
- monai.apps.detection.utils.detector_utils.pad_images(input_images, spatial_dims, size_divisible, mode=PytorchPadMode.CONSTANT, **kwargs)[source]#
Pad the input images, so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0
- Parameters
input_images (
Union
[List
[Tensor
],Tensor
]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).spatial_dims (
int
) – number of spatial dimensions of the images, 2D or 3D.size_divisible (
Union
[int
,Sequence
[int
]]) – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.mode (
Union
[PytorchPadMode
,str
]) – available modes for PyTorch Tensor: {"constant"
,"reflect"
,"replicate"
,"circular"
}. One of the listed string values or a user supplied function. Defaults to"constant"
. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlkwargs – other arguments for torch.pad function.
- Return type
Tuple
[Tensor
,List
[List
[int
]]]- Returns
images, a (B, C, H, W) or (B, C, H, W, D) Tensor
image_sizes, the original spatial size of each image
- monai.apps.detection.utils.detector_utils.preprocess_images(input_images, spatial_dims, size_divisible, mode=PytorchPadMode.CONSTANT, **kwargs)[source]#
Preprocess the input images, including
validate of the inputs
pad the inputs so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0
- Parameters
input_images (
Union
[List
[Tensor
],Tensor
]) – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).spatial_dims (
int
) – number of spatial dimensions of the images, 2 or 3.size_divisible (
Union
[int
,Sequence
[int
]]) – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.mode (
Union
[PytorchPadMode
,str
]) – available modes for PyTorch Tensor: {"constant"
,"reflect"
,"replicate"
,"circular"
}. One of the listed string values or a user supplied function. Defaults to"constant"
. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlkwargs – other arguments for torch.pad function.
- Return type
Tuple
[Tensor
,List
[List
[int
]]]- Returns
images, a (B, C, H, W) or (B, C, H, W, D) Tensor
image_sizes, the original spatial size of each image
- monai.apps.detection.utils.predict_utils.check_dict_values_same_length(head_outputs, keys=None)[source]#
We expect the values in
head_outputs
: Dict[str, List[Tensor]] to have the same length. Will raise ValueError if not.- Parameters
head_outputs (
Dict
[str
,List
[Tensor
]]) – a Dict[str, List[Tensor]] or Dict[str, Tensor]keys (
Optional
[List
[str
]]) – the keys in head_output that need to have values (List) with same length. If not provided, will use head_outputs.keys().
- Return type
None
- monai.apps.detection.utils.predict_utils.ensure_dict_value_to_list_(head_outputs, keys=None)[source]#
An in-place function. We expect
head_outputs
to be Dict[str, List[Tensor]]. Yet if it is Dict[str, Tensor], this func converts it to Dict[str, List[Tensor]]. It will be modified in-place.- Parameters
head_outputs (
Dict
[str
,List
[Tensor
]]) – a Dict[str, List[Tensor]] or Dict[str, Tensor], will be modifier in-placekeys (
Optional
[List
[str
]]) – the keys in head_output that need to have value type List[Tensor]. If not provided, will use head_outputs.keys().
- Return type
None
- monai.apps.detection.utils.predict_utils.predict_with_inferer(images, network, keys, inferer=None)[source]#
Predict network dict output with an inferer. Compared with directly output network(images), it enables a sliding window inferer that can be used to handle large inputs.
- Parameters
images (
Tensor
) – input of the network, Tensor sized (B, C, H, W) or (B, C, H, W, D)network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].
keys (
List
[str
]) – the keys in the output dict, should be network output keys or a subset of them.inferer (
Optional
[SlidingWindowInferer
]) – a SlidingWindowInferer to handle large inputs.
- Return type
Dict
[str
,List
[Tensor
]]- Returns
The predicted head_output from network, a Dict[str, List[Tensor]]
Example
# define a naive network import torch import monai class NaiveNet(torch.nn.Module): def __init__(self, ): super().__init__() def forward(self, images: torch.Tensor): return {"cls": torch.randn(images.shape), "box_reg": [torch.randn(images.shape)]} # create a predictor network = NaiveNet() inferer = monai.inferers.SlidingWindowInferer( roi_size = (128, 128, 128), overlap = 0.25, cache_roi_weight_map = True, ) network_output_keys=["cls", "box_reg"] images = torch.randn((2, 3, 512, 512, 512)) # a large input head_outputs = predict_with_inferer(images, network, network_output_keys, inferer)
Inference box selector#
Part of this script is adapted from https://github.com/pytorch/vision/blob/main/torchvision/models/detection/retinanet.py
- class monai.apps.detection.utils.box_selector.BoxSelector(box_overlap_metric=<function box_iou>, apply_sigmoid=True, score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300)[source]#
Box selector which selects the predicted boxes. The box selection is performed with the following steps:
For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.
- Parameters
apply_sigmoid (
bool
) – whether to apply sigmoid to get scores from classification logitsscore_thresh (
float
) – no box with scores less than score_thresh will be kepttopk_candidates_per_level (
int
) – max number of boxes to keep for each levelnms_thresh (
float
) – box overlapping threshold for NMSdetections_per_img (
int
) – max number of boxes to keep for each image
Example
input_param = { "apply_sigmoid": True, "score_thresh": 0.1, "topk_candidates_per_level": 2, "nms_thresh": 0.1, "detections_per_img": 5, } box_selector = BoxSelector(**input_param) boxes = [torch.randn([3,6]), torch.randn([7,6])] logits = [torch.randn([3,3]), torch.randn([7,3])] spatial_size = (8,8,8) selected_boxes, selected_scores, selected_labels = box_selector.select_boxes_per_image( boxes, logits, spatial_size )
- select_boxes_per_image(boxes_list, logits_list, spatial_size)[source]#
Postprocessing to generate detection result from classification logits and boxes.
The box selection is performed with the following steps:
For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.
- Parameters
boxes_list (
List
[Tensor
]) – list of predicted boxes from a single image, each element i is a Tensor sized (N_i, 2*spatial_dims)logits_list (
List
[Tensor
]) – list of predicted classification logits from a single image, each element i is a Tensor sized (N_i, num_classes)spatial_size (
Union
[List
[int
],Tuple
[int
]]) – spatial size of the image
- Return type
Tuple
[Tensor
,Tensor
,Tensor
]- Returns
selected boxes, Tensor sized (P, 2*spatial_dims)
selected_scores, Tensor sized (P, )
selected_labels, Tensor sized (P, )
- select_top_score_idx_per_level(logits)[source]#
Select indices with highest scores.
The indice selection is performed with the following steps:
If self.apply_sigmoid, get scores by applying sigmoid to logits. Otherwise, use logits as scores.
Discard indices with scores less than self.score_thresh
Keep indices with top self.topk_candidates_per_level scores
- Parameters
logits (
Tensor
) – predicted classification logits, Tensor sized (N, num_classes)- Returns
selected M indices, Tensor sized (M, ) - selected_scores: selected M scores, Tensor sized (M, ) - selected_labels: selected M labels, Tensor sized (M, )
- Return type
topk_idxs
Detection metrics#
This script is almost same with https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/evaluator/detection/coco.py The changes include 1) code reformatting, 2) docstrings.
This script is almost same with https://github.com/MIC-DKFZ/nnDetection/blob/main/nndet/evaluator/detection/matching.py The changes include 1) code reformatting, 2) docstrings, 3) allow input args gt_ignore to be optional. (If so, no GT boxes will be ignored.)
- monai.apps.detection.metrics.matching.matching_batch(iou_fn, iou_thresholds, pred_boxes, pred_classes, pred_scores, gt_boxes, gt_classes, gt_ignore=None, max_detections=100)[source]#
Match boxes of a batch to corresponding ground truth for each category independently.
- Parameters
iou_fn (
Callable
[[ndarray
,ndarray
],ndarray
]) – compute overlap for each pairiou_thresholds (
Sequence
[float
]) – defined which IoU thresholds should be evaluatedpred_boxes (
Sequence
[ndarray
]) – predicted boxes from single batch; List[[D, dim * 2]], D number of predictionspred_classes (
Sequence
[ndarray
]) – predicted classes from a single batch; List[[D]], D number of predictionspred_scores (
Sequence
[ndarray
]) – predicted score for each bounding box; List[[D]], D number of predictionsgt_boxes (
Sequence
[ndarray
]) – ground truth boxes; List[[G, dim * 2]], G number of ground truthgt_classes (
Sequence
[ndarray
]) – ground truth classes; List[[G]], G number of ground truthgt_ignore (
Union
[Sequence
[Sequence
[bool
]],Sequence
[ndarray
],None
]) – specified if which ground truth boxes are not counted as true positives. If not given, when use all the gt_boxes. (detections which match theses boxes are not counted as false positives either); List[[G]], G number of ground truthmax_detections (
int
) – maximum number of detections which should be evaluated
- Return type
List
[Dict
[int
,Dict
[str
,ndarray
]]]- Returns
List[Dict[int, Dict[str, np.ndarray]]], each Dict[str, np.ndarray] corresponds to an image. Dict has the following keys.
dtMatches: matched detections [T, D], where T = number of thresholds, D = number of detections
gtMatches: matched ground truth boxes [T, G], where T = number of thresholds, G = number of ground truth
dtScores: prediction scores [D] detection scores
gtIgnore: ground truth boxes which should be ignored [G] indicate whether ground truth should be ignored
dtIgnore: detections which should be ignored [T, D], indicate which detections should be ignored
Example
from monai.data.box_utils import box_iou from monai.apps.detection.metrics.coco import COCOMetric from monai.apps.detection.metrics.matching import matching_batch # 3D example outputs of one image from detector val_outputs_all = [ {"boxes": torch.tensor([[1,1,1,3,4,5]],dtype=torch.float16), "labels": torch.randint(3,(1,)), "scores": torch.randn((1,)).absolute()}, ] val_targets_all = [ {"boxes": torch.tensor([[1,1,1,2,6,4]],dtype=torch.float16), "labels": torch.randint(3,(1,))}, ] coco_metric = COCOMetric( classes=['c0','c1','c2'], iou_list=[0.1], max_detection=[10] ) results_metric = matching_batch( iou_fn=box_iou, iou_thresholds=coco_metric.iou_thresholds, pred_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_outputs_all], pred_classes=[val_data_i["labels"].numpy() for val_data_i in val_outputs_all], pred_scores=[val_data_i["scores"].numpy() for val_data_i in val_outputs_all], gt_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_targets_all], gt_classes=[val_data_i["labels"].numpy() for val_data_i in val_targets_all], ) val_metric_dict = coco_metric(results_metric) print(val_metric_dict)
Reconstruction#
ConvertToTensorComplex#
- monai.apps.reconstruction.complex_utils.convert_to_tensor_complex(data, dtype=None, device=None, wrap_sequence=True, track_meta=False)[source]#
Convert complex-valued data to a 2-channel PyTorch tensor. The real and imaginary parts are stacked along the last dimension. This function relies on ‘monai.utils.type_conversion.convert_to_tensor’
- Parameters
data – input data can be PyTorch Tensor, numpy array, list, int, and float. will convert Tensor, Numpy array, float, int, bool to Tensor, strings and objects keep the original. for list, convert every item to a Tensor if applicable.
dtype (
Optional
[dtype
]) – target data type to when converting to Tensor.device (
Optional
[device
]) – target device to put the converted Tensor data.wrap_sequence (
bool
) – if False, then lists will recursively call this function. E.g., [1, 2] -> [tensor(1), tensor(2)]. If True, then [1, 2] -> tensor([1, 2]).track_meta (
bool
) – whether to track the meta information, if True, will convert to MetaTensor. default to False.
- Return type
Tensor
- Returns
PyTorch version of the data
Example
import numpy as np data = np.array([ [1+1j, 1-1j], [2+2j, 2-2j] ]) # the following line prints (2,2) print(data.shape) # the following line prints torch.Size([2, 2, 2]) print(convert_to_tensor_complex(data).shape)
ComplexAbs#
- monai.apps.reconstruction.complex_utils.complex_abs(x)[source]#
Compute the absolute value of a complex array.
- Parameters
x (
Union
[ndarray
,Tensor
]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.- Return type
Union
[ndarray
,Tensor
]- Returns
Absolute value along the last dimention
Example
import numpy as np x = np.array([3,4])[np.newaxis] # the following line prints 5 print(complex_abs(x))
RootSumOfSquares#
- monai.apps.reconstruction.mri_utils.root_sum_of_squares(x, spatial_dim)[source]#
Compute the root sum of squares (rss) of the data (typically done for multi-coil MRI samples)
- Parameters
x (
Union
[ndarray
,Tensor
]) – Input array/tensorspatial_dim (
int
) – dimension along which rss is applied
- Return type
Union
[ndarray
,Tensor
]- Returns
rss of x along spatial_dim
Example
import numpy as np x = np.ones([2,3]) # the following line prints array([1.41421356, 1.41421356, 1.41421356]) print(rss(x,spatial_dim=0))
ComplexMul#
- monai.apps.reconstruction.complex_utils.complex_mul(x, y)[source]#
Compute complex-valued multiplication. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)
- Parameters
x (
Union
[ndarray
,Tensor
]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.y (
Union
[ndarray
,Tensor
]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.
- Return type
Union
[ndarray
,Tensor
]- Returns
Complex multiplication of x and y
Example
import numpy as np x = np.array([[1,2],[3,4]]) y = np.array([[1,1],[1,1]]) # the following line prints array([[-1, 3], [-1, 7]]) print(complex_mul(x,y))
ComplexConj#
- monai.apps.reconstruction.complex_utils.complex_conj(x)[source]#
Compute complex conjugate of an/a array/tensor. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)
- Parameters
x (
Union
[ndarray
,Tensor
]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.- Return type
Union
[ndarray
,Tensor
]- Returns
Complex conjugate of x
Example
import numpy as np x = np.array([[1,2],[3,4]]) # the following line prints array([[ 1, -2], [ 3, -4]]) print(complex_conj(x))