Applications#
Datasets#
- class monai.apps.MedNISTDataset(root_dir, section, transform=(), download=False, seed=0, val_frac=0.1, test_frac=0.1, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, runtime_cache=False)[source]#
The Dataset to automatically download MedNIST data and generate items for training, validation or test. It’s based on CacheDataset to accelerate the training process.
- Parameters:
root_dir – target directory to download and load MedNIST dataset.
section – expected data section, can be: training, validation or test.
transform – transforms to execute operations on input data.
download – whether to download and extract the MedNIST from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy MedNIST.tar.gz file or MedNIST folder to root directory.
seed – random seed to randomly split training, validation and test datasets, default is 0.
val_frac – percentage of validation fraction in the whole dataset, default is 0.1.
test_frac – percentage of test fraction in the whole dataset, default is 0.1.
cache_num – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers – the number of worker threads if computing cache in the initialization. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
runtime_cache – whether to compute cache at the runtime, default to False to prepare the cache content at initialization. See:
monai.data.CacheDataset
.
- Raises:
ValueError – When
root_dir
is not a directory.RuntimeError – When
dataset_dir
doesn’t exist and downloading is not selected (download=False
).
- randomize(data)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises:
NotImplementedError – When the subclass does not override this method.
- Return type:
None
- class monai.apps.DecathlonDataset(root_dir, task, section, transform=(), download=False, seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=1.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, runtime_cache=False)[source]#
The Dataset to automatically download the data of Medical Segmentation Decathlon challenge (http://medicaldecathlon.com/) and generate items for training, validation or test. It will also load these properties from the JSON config file of dataset. user can call get_properties() to get specified properties or all the properties loaded. It’s based on
monai.data.CacheDataset
to accelerate the training process.- Parameters:
root_dir – user’s local directory for caching and loading the MSD datasets.
task – which task to download and execute: one of list (“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”).
section – expected data section, can be: training, validation or test.
transform – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D].
download – whether to download and extract the Decathlon from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.
val_frac – percentage of validation fraction in the whole dataset, default is 0.2.
seed – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.
cache_num – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers – the number of worker threads if computing cache in the initialization. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
runtime_cache – whether to compute cache at the runtime, default to False to prepare the cache content at initialization. See:
monai.data.CacheDataset
.
- Raises:
ValueError – When
root_dir
is not a directory.ValueError – When
task
is not one of [“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”].RuntimeError – When
dataset_dir
doesn’t exist and downloading is not selected (download=False
).
Example:
transform = Compose( [ LoadImaged(keys=["image", "label"]), EnsureChannelFirstd(keys=["image", "label"]), ScaleIntensityd(keys="image"), ToTensord(keys=["image", "label"]), ] ) val_data = DecathlonDataset( root_dir="./", task="Task09_Spleen", transform=transform, section="validation", seed=12345, download=True ) print(val_data[0]["image"], val_data[0]["label"])
- get_properties(keys=None)[source]#
Get the loaded properties of dataset with specified keys. If no keys specified, return all the loaded properties.
- randomize(data)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises:
NotImplementedError – When the subclass does not override this method.
- Return type:
None
- class monai.apps.TciaDataset(root_dir, collection, section, transform=(), download=False, download_len=-1, seg_type='SEG', modality_tag=(8, 96), ref_series_uid_tag=(32, 14), ref_sop_uid_tag=(8, 4437), specific_tags=((8, 4373), (8, 4416), (12294, 16), (32, 13), (16, 16), (16, 32), (32, 17), (32, 18)), fname_regex='^(?!.*LICENSE).*', seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=0.0, num_workers=1, progress=True, copy_cache=True, as_contiguous=True, runtime_cache=False)[source]#
The Dataset to automatically download the data from a public The Cancer Imaging Archive (TCIA) dataset and generate items for training, validation or test.
The Highdicom library is used to load dicom data with modality “SEG”, but only a part of collections are supported, such as: “C4KC-KiTS”, “NSCLC-Radiomics”, “NSCLC-Radiomics-Interobserver1”, “ QIN-PROSTATE-Repeatability” and “PROSTATEx”. Therefore, if “seg” is included in keys of the LoadImaged transform and loading some other collections, errors may be raised. For supported collections, the original “SEG” information may not always be consistent for each dicom file. Therefore, to avoid creating different format of labels, please use the label_dict argument of PydicomReader when calling the LoadImaged transform. The prepared label dicts of collections that are mentioned above is also saved in: monai.apps.tcia.TCIA_LABEL_DICT. You can also refer to the second example bellow.
This class is based on
monai.data.CacheDataset
to accelerate the training process.- Parameters:
root_dir – user’s local directory for caching and loading the TCIA dataset.
collection – name of a TCIA collection. a TCIA dataset is defined as a collection. Please check the following list to browse the collection list (only public collections can be downloaded): https://www.cancerimagingarchive.net/collections/
section – expected data section, can be: training, validation or test.
transform – transforms to execute operations on input data. for further usage, use EnsureChannelFirstd to convert the shape to [C, H, W, D]. If not specified, LoadImaged(reader=”PydicomReader”, keys=[“image”]) will be used as the default transform. In addition, we suggest to set the argument labels for PydicomReader if segmentations are needed to be loaded. The original labels for each dicom series may be different, using this argument is able to unify the format of labels.
download – whether to download and extract the dataset, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy tar file or dataset folder to the root directory.
download_len – number of series that will be downloaded, the value should be larger than 0 or -1, where -1 means all series will be downloaded. Default is -1.
seg_type – modality type of segmentation that is used to do the first step download. Default is “SEG”.
modality_tag – tag of modality. Default is (0x0008, 0x0060).
ref_series_uid_tag – tag of referenced Series Instance UID. Default is (0x0020, 0x000e).
ref_sop_uid_tag – tag of referenced SOP Instance UID. Default is (0x0008, 0x1155).
specific_tags – tags that will be loaded for “SEG” series. This argument will be used in monai.data.PydicomReader. Default is [(0x0008, 0x1115), (0x0008,0x1140), (0x3006, 0x0010), (0x0020,0x000D), (0x0010,0x0010), (0x0010,0x0020), (0x0020,0x0011), (0x0020,0x0012)].
fname_regex – a regular expression to match the file names when the input is a folder. If provided, only the matched files will be included. For example, to include the file name “image_0001.dcm”, the regular expression could be “.*image_(d+).dcm”. Default to “^(?!.*LICENSE).*”, ignoring any file name containing “LICENSE”.
val_frac – percentage of validation fraction in the whole dataset, default is 0.2.
seed – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.
cache_num – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).
cache_rate – percentage of cached data in total, default is 0.0 (no cache). will take the minimum of (cache_num, data_length x cache_rate, data_length).
num_workers – the number of worker threads if computing cache in the initialization. If num_workers is None then the number returned by os.cpu_count() is used. If a value less than 1 is specified, 1 will be used instead.
progress – whether to display a progress bar when downloading dataset and computing the transform cache content.
copy_cache – whether to deepcopy the cache content before applying the random transforms, default to True. if the random transforms don’t modify the cached content (for example, randomly crop from the cached image and deepcopy the crop region) or if every cache item is only used once in a multi-processing environment, may set copy=False for better performance.
as_contiguous – whether to convert the cached NumPy array or PyTorch tensor to be contiguous. it may help improve the performance of following logic.
runtime_cache – whether to compute cache at the runtime, default to False to prepare the cache content at initialization. See:
monai.data.CacheDataset
.
Example:
# collection is "Pancreatic-CT-CBCT-SEG", seg_type is "RTSTRUCT" data = TciaDataset( root_dir="./", collection="Pancreatic-CT-CBCT-SEG", seg_type="RTSTRUCT", download=True ) # collection is "C4KC-KiTS", seg_type is "SEG", and load both images and segmentations from monai.apps.tcia import TCIA_LABEL_DICT transform = Compose( [ LoadImaged(reader="PydicomReader", keys=["image", "seg"], label_dict=TCIA_LABEL_DICT["C4KC-KiTS"]), EnsureChannelFirstd(keys=["image", "seg"]), ResampleToMatchd(keys="image", key_dst="seg"), ] ) data = TciaDataset( root_dir="./", collection="C4KC-KiTS", section="validation", seed=12345, download=True ) print(data[0]["seg"].shape)
- randomize(data)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises:
NotImplementedError – When the subclass does not override this method.
- Return type:
None
- class monai.apps.CrossValidation(dataset_cls, nfolds=5, seed=0, **dataset_params)[source]#
Cross validation dataset based on the general dataset which must have _split_datalist API.
- Parameters:
dataset_cls (
object
) – dataset class to be used to create the cross validation partitions. It must have _split_datalist API.nfolds (
int
) – number of folds to split the data for cross validation.seed (
int
) – random seed to randomly shuffle the datalist before splitting into N folds, default is 0.dataset_params (
Any
) – other additional parameters for the dataset_cls base class.
Example of 5 folds cross validation training:
cvdataset = CrossValidation( dataset_cls=DecathlonDataset, nfolds=5, seed=12345, root_dir="./", task="Task09_Spleen", section="training", transform=train_transform, download=True, ) dataset_fold0_train = cvdataset.get_dataset(folds=[1, 2, 3, 4]) dataset_fold0_val = cvdataset.get_dataset(folds=0, transform=val_transform, download=False) # execute training for fold 0 ... dataset_fold1_train = cvdataset.get_dataset(folds=[0, 2, 3, 4]) dataset_fold1_val = cvdataset.get_dataset(folds=1, transform=val_transform, download=False) # execute training for fold 1 ... ... dataset_fold4_train = ... # execute training for fold 4 ...
- get_dataset(folds, **dataset_params)[source]#
Generate dataset based on the specified fold indices in the cross validation group.
- Parameters:
folds – index of folds for training or validation, if a list of values, concatenate the data.
dataset_params – other additional parameters for the dataset_cls base class, will override the same parameters in self.dataset_params.
Clara MMARs#
- monai.apps.download_mmar(item, mmar_dir=None, progress=True, api=True, version=-1)[source]#
Download and extract Medical Model Archive (MMAR) from Nvidia Clara Train.
See also
Nvidia NGC Registry CLI
- Parameters:
item – the corresponding model item from MODEL_DESC. Or when api is True, the substring to query NGC’s model name field.
mmar_dir – target directory to store the MMAR, default is mmars subfolder under torch.hub get_dir().
progress – whether to display a progress bar.
api – whether to query NGC and download via api
version – which version of MMAR to download. -1 means the latest from ngc.
- Examples::
>>> from monai.apps import download_mmar >>> download_mmar("clara_pt_prostate_mri_segmentation_1", mmar_dir=".") >>> download_mmar("prostate_mri_segmentation", mmar_dir=".", api=True)
- Returns:
The local directory of the downloaded model. If api is True, a list of local directories of downloaded models.
- monai.apps.load_from_mmar(item, mmar_dir=None, progress=True, version=-1, map_location=None, pretrained=True, weights_only=False, model_key='model', api=True, model_file=None)[source]#
Download and extract Medical Model Archive (MMAR) model weights from Nvidia Clara Train.
- Parameters:
item – the corresponding model item from MODEL_DESC.
mmar_dir – : target directory to store the MMAR, default is mmars subfolder under torch.hub get_dir().
progress – whether to display a progress bar when downloading the content.
version – version number of the MMAR. Set it to -1 to use item[Keys.VERSION].
map_location – pytorch API parameter for torch.load or torch.jit.load.
pretrained – whether to load the pretrained weights after initializing a network module.
weights_only – whether to load only the weights instead of initializing the network module and assign weights.
model_key – a key to search in the model file or config file for the model dictionary. Currently this function assumes that the model dictionary has {“[name|path]”: “test.module”, “args”: {‘kw’: ‘test’}}.
api – whether to query NGC API to get model infomation.
model_file – the relative path to the model file within an MMAR.
- Examples::
>>> from monai.apps import load_from_mmar >>> unet_model = load_from_mmar("clara_pt_prostate_mri_segmentation_1", mmar_dir=".", map_location="cpu") >>> print(unet_model)
See also
- monai.apps.MODEL_DESC#
Built-in immutable sequence.
If no argument is given, the constructor returns an empty tuple. If iterable is specified the tuple is initialized from iterable’s items.
If the argument is a tuple, the return value is the same object.
Utilities#
- monai.apps.check_hash(filepath, val=None, hash_type='md5')[source]#
Verify hash signature of specified file.
- Parameters:
filepath – path of source file to verify hash value.
val – expected hash value of the file.
hash_type – type of hash algorithm to use, default is “md5”. The supported hash types are “md5”, “sha1”, “sha256”, “sha512”. See also:
monai.apps.utils.SUPPORTED_HASH_TYPES
.
- monai.apps.download_url(url, filepath='', hash_val=None, hash_type='md5', progress=True, **gdown_kwargs)[source]#
Download file from specified URL link, support process bar and hash check.
- Parameters:
url – source URL link to download file.
filepath – target filepath to save the downloaded file (including the filename). If undefined, os.path.basename(url) will be used.
hash_val – expected hash value to validate the downloaded file. if None, skip hash validation.
hash_type – ‘md5’ or ‘sha1’, defaults to ‘md5’.
progress – whether to display a progress bar.
gdown_kwargs – other args for gdown except for the url, output and quiet. these args will only be used if download from google drive. details of the args of it: wkentaro/gdown
- Raises:
RuntimeError – When the hash validation of the
filepath
existing file fails.RuntimeError – When a network issue or denied permission prevents the file download from
url
tofilepath
.URLError – See urllib.request.urlretrieve.
HTTPError – See urllib.request.urlretrieve.
ContentTooShortError – See urllib.request.urlretrieve.
IOError – See urllib.request.urlretrieve.
RuntimeError – When the hash validation of the
url
downloaded file fails.
- monai.apps.extractall(filepath, output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True)[source]#
Extract file to the output directory. Expected file types are: zip, tar.gz and tar.
- Parameters:
filepath – the file path of compressed file.
output_dir – target directory to save extracted files.
hash_val – expected hash value to validate the compressed file. if None, skip hash validation.
hash_type – ‘md5’ or ‘sha1’, defaults to ‘md5’.
file_type – string of file type for decompressing. Leave it empty to infer the type from the filepath basename.
has_base – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.
- Raises:
RuntimeError – When the hash validation of the
filepath
compressed file fails.NotImplementedError – When the
filepath
file extension is not one of [zip”, “tar.gz”, “tar”].
- monai.apps.download_and_extract(url, filepath='', output_dir='.', hash_val=None, hash_type='md5', file_type='', has_base=True, progress=True)[source]#
Download file from URL and extract it to the output directory.
- Parameters:
url – source URL link to download file.
filepath – the file path of the downloaded compressed file. use this option to keep the directly downloaded compressed file, to avoid further repeated downloads.
output_dir – target directory to save extracted files. default is the current directory.
hash_val – expected hash value to validate the downloaded file. if None, skip hash validation.
hash_type – ‘md5’ or ‘sha1’, defaults to ‘md5’.
file_type – string of file type for decompressing. Leave it empty to infer the type from url’s base file name.
has_base – whether the extracted files have a base folder. This flag is used when checking if the existing folder is a result of extractall, if it is, the extraction is skipped. For example, if A.zip is unzipped to folder structure A/*.png, this flag should be True; if B.zip is unzipped to *.png, this flag should be False.
progress – whether to display progress bar.
Deepgrow#
- monai.apps.deepgrow.dataset.create_dataset(datalist, output_dir, dimension, pixdim, image_key='image', label_key='label', base_dir=None, limit=0, relative_path=False, transforms=None)[source]#
Utility to pre-process and create dataset list for Deepgrow training over on existing one. The input data list is normally a list of images and labels (3D volume) that needs pre-processing for Deepgrow training pipeline.
- Parameters:
datalist –
A list of data dictionary. Each entry should at least contain ‘image_key’: <image filename>. For example, typical input data can be a list of dictionaries:
[{'image': <image filename>, 'label': <label filename>}]
output_dir – target directory to store the training data for Deepgrow Training
pixdim – output voxel spacing.
dimension – dimension for Deepgrow training. It can be 2 or 3.
image_key – image key in input datalist. Defaults to ‘image’.
label_key – label key in input datalist. Defaults to ‘label’.
base_dir – base directory in case related path is used for the keys in datalist. Defaults to None.
limit – limit number of inputs for pre-processing. Defaults to 0 (no limit).
relative_path – output keys values should be based on relative path. Defaults to False.
transforms – explicit transforms to execute operations on input data.
- Raises:
ValueError – When
dimension
is not one of [2, 3]ValueError – When
datalist
is Empty
- Returns:
A new datalist that contains path to the images/labels after pre-processing.
Example:
datalist = create_dataset( datalist=[{'image': 'img1.nii', 'label': 'label1.nii'}], base_dir=None, output_dir=output_2d, dimension=2, image_key='image', label_key='label', pixdim=(1.0, 1.0), limit=0, relative_path=True ) print(datalist[0]["image"], datalist[0]["label"])
- class monai.apps.deepgrow.interaction.Interaction(transforms, max_interactions, train, key_probability='probability')[source]#
Ignite process_function used to introduce interactions (simulation of clicks) for Deepgrow Training/Evaluation. For more details please refer to: https://pytorch.org/ignite/generated/ignite.engine.engine.Engine.html. This implementation is based on:
Sakinis et al., Interactive segmentation of medical images through fully convolutional neural networks. (2019) https://arxiv.org/abs/1903.08205
- Parameters:
transforms – execute additional transformation during every iteration (before train). Typically, several Tensor based transforms composed by Compose.
max_interactions – maximum number of interactions per iteration
train – training or evaluation
key_probability – field name to fill probability for every interaction
- class monai.apps.deepgrow.transforms.AddInitialSeedPointd(label='label', guidance='guidance', sids='sids', sid='sid', connected_regions=5)[source]#
Add random guidance as initial seed point for a given label.
Note that the label is of size (C, D, H, W) or (C, H, W)
The guidance is of size (2, N, # of dims) where N is number of guidance added. # of dims = 4 when C, D, H, W; # of dims = 3 when (C, H, W)
- Parameters:
label (
str
) – label source.guidance (
str
) – key to store guidance.sids (
str
) – key that represents list of valid slice indices for the given label.sid (
str
) – key that represents the slice to add initial seed point. If not present, random sid will be chosen.connected_regions (
int
) – maximum connected regions to use for adding initial points.
- randomize(data)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises:
NotImplementedError – When the subclass does not override this method.
- class monai.apps.deepgrow.transforms.AddGuidanceSignald(image='image', guidance='guidance', sigma=2, number_intensity_ch=1)[source]#
Add Guidance signal for input image.
Based on the “guidance” points, apply gaussian to them and add them as new channel for input image.
- Parameters:
image (
str
) – key to the image source.guidance (
str
) – key to store guidance.sigma (
int
) – standard deviation for Gaussian kernel.number_intensity_ch (
int
) – channel index.
- class monai.apps.deepgrow.transforms.AddRandomGuidanced(guidance='guidance', discrepancy='discrepancy', probability='probability')[source]#
Add random guidance based on discrepancies that were found between label and prediction. input shape is as below: Guidance is of shape (2, N, # of dim) Discrepancy is of shape (2, C, D, H, W) or (2, C, H, W) Probability is of shape (1)
- Parameters:
guidance (
str
) – key to guidance source.discrepancy (
str
) – key that represents discrepancies found between label and prediction.probability (
str
) – key that represents click/interaction probability.
- randomize(data=None)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises:
NotImplementedError – When the subclass does not override this method.
- class monai.apps.deepgrow.transforms.AddGuidanceFromPointsd(ref_image, guidance='guidance', foreground='foreground', background='background', axis=0, depth_first=True, spatial_dims=2, slice_key='slice', meta_keys=None, meta_key_postfix='meta_dict')[source]#
Add guidance based on user clicks.
We assume the input is loaded by LoadImaged and has the shape of (H, W, D) originally. Clicks always specify the coordinates in (H, W, D)
If depth_first is True:
Input is now of shape (D, H, W), will return guidance that specifies the coordinates in (D, H, W)
else:
Input is now of shape (H, W, D), will return guidance that specifies the coordinates in (H, W, D)
- Parameters:
ref_image – key to reference image to fetch current and original image details.
guidance – output key to store guidance.
foreground – key that represents user foreground (+ve) clicks.
background – key that represents user background (-ve) clicks.
axis – axis that represents slices in 3D volume. (axis to Depth)
depth_first – if depth (slices) is positioned at first dimension.
spatial_dims – dimensions based on model used for deepgrow (2D vs 3D).
slice_key – key that represents applicable slice to add guidance.
meta_keys – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.
meta_key_postfix – if meta_key is None, use {ref_image}_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
- class monai.apps.deepgrow.transforms.SpatialCropForegroundd(keys, source_key, spatial_size, select_fn=<function is_positive>, channel_indices=None, margin=0, allow_smaller=True, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#
Crop only the foreground object of the expected images.
Difference VS
monai.transforms.CropForegroundd
:If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.
This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]
The typical usage is to help training and evaluation if the valid part is small in the whole medical image. The valid part can be determined by any field in the data with source_key, for example:
Select values > 0 in image field as the foreground and crop on all fields specified by keys.
Select label = 3 in label field as the foreground to crop on all fields specified by keys.
Select label > 0 in the third channel of a One-Hot label field as the foreground to crop all keys fields.
Users can define arbitrary function to select expected foreground from the whole source image or specified channels. And it can also add margin to every dim of the bounding box of foreground object.
- Parameters:
keys – keys of the corresponding items to be transformed. See also:
monai.transforms.MapTransform
source_key – data source to generate the bounding box of foreground, can be image or label, etc.
spatial_size – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.
select_fn – function to select expected foreground, default is to select values > 0.
channel_indices – if defined, select foreground only on the specified channels of image. if None, select foreground on the whole image.
margin – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
allow_smaller – when computing box size with margin, whether allow the image size to be smaller than box size, default to True. if the margined size is bigger than image size, will pad with specified mode.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_keys is None, use {key}_{meta_key_postfix} to fetch/store the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key – key to record the start coordinate of spatial bounding box for foreground.
end_coord_key – key to record the end coordinate of spatial bounding box for foreground.
original_shape_key – key to record original shape for foreground.
cropped_shape_key – key to record cropped shape for foreground.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.deepgrow.transforms.SpatialCropGuidanced(keys, guidance, spatial_size, margin=20, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#
Crop image based on guidance with minimal spatial size.
If the bounding box is smaller than spatial size in all dimensions then this transform will crop the object using box’s center and spatial_size.
This transform will set “start_coord_key”, “end_coord_key”, “original_shape_key” and “cropped_shape_key” in data[{key}_{meta_key_postfix}]
Input data is of shape (C, spatial_1, [spatial_2, …])
- Parameters:
keys – keys of the corresponding items to be transformed.
guidance – key to the guidance. It is used to generate the bounding box of foreground
spatial_size – minimal spatial size of the image patch e.g. [128, 128, 128] to fit in.
margin – add margin value to spatial dims of the bounding box, if only 1 value provided, use it for all dims.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key – key to record the start coordinate of spatial bounding box for foreground.
end_coord_key – key to record the end coordinate of spatial bounding box for foreground.
original_shape_key – key to record original shape for foreground.
cropped_shape_key – key to record cropped shape for foreground.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.deepgrow.transforms.RestoreLabeld(keys, ref_image, slice_only=False, mode=nearest, align_corners=None, meta_keys=None, meta_key_postfix='meta_dict', start_coord_key='foreground_start_coord', end_coord_key='foreground_end_coord', original_shape_key='foreground_original_shape', cropped_shape_key='foreground_cropped_shape', allow_missing_keys=False)[source]#
Restores label based on the ref image.
The ref_image is assumed that it went through the following transforms:
Fetch2DSliced (If 2D)
Spacingd
SpatialCropGuidanced
Resized
And its shape is assumed to be (C, D, H, W)
This transform tries to undo these operation so that the result label can be overlapped with original volume. It does the following operation:
Undo Resized
Undo SpatialCropGuidanced
Undo Spacingd
Undo Fetch2DSliced
The resulting label is of shape (D, H, W)
- Parameters:
keys – keys of the corresponding items to be transformed.
ref_image – reference image to fetch current and original image details
slice_only – apply only to an applicable slice, in case of 2D model/prediction
mode – {
"constant"
,"edge"
,"linear_ramp"
,"maximum"
,"mean"
,"median"
,"minimum"
,"reflect"
,"symmetric"
,"wrap"
,"empty"
} One of the listed string values or a user supplied function for padding. Defaults to"constant"
. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.htmlalign_corners – Geometrically, we consider the pixels of the input as squares rather than points. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.grid_sample.html It also can be a sequence of bool, each element corresponds to a key in
keys
.meta_keys – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_key is None, use key_{meta_key_postfix} to fetch the metadata according to the key data, default is `meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
start_coord_key – key that records the start coordinate of spatial bounding box for foreground.
end_coord_key – key that records the end coordinate of spatial bounding box for foreground.
original_shape_key – key that records original shape for foreground.
cropped_shape_key – key that records cropped shape for foreground.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.deepgrow.transforms.ResizeGuidanced(guidance, ref_image, meta_keys=None, meta_key_postfix='meta_dict', cropped_shape_key='foreground_cropped_shape')[source]#
Resize the guidance based on cropped vs resized image.
This transform assumes that the images have been cropped and resized. And the shape after cropped is store inside the meta dict of ref image.
- Parameters:
guidance – key to guidance
ref_image – key to reference image to fetch current and original image details
meta_keys – explicitly indicate the key of the metadata dictionary of ref_image. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. if None, will try to construct meta_keys by {ref_image}_{meta_key_postfix}.
meta_key_postfix – if meta_key is None, use {ref_image}_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
cropped_shape_key – key that records cropped shape for foreground.
- class monai.apps.deepgrow.transforms.FindDiscrepancyRegionsd(label='label', pred='pred', discrepancy='discrepancy')[source]#
Find discrepancy between prediction and actual during click interactions during training.
- Parameters:
label (
str
) – key to label source.pred (
str
) – key to prediction source.discrepancy (
str
) – key to store discrepancies found between label and prediction.
- class monai.apps.deepgrow.transforms.FindAllValidSlicesd(label='label', sids='sids')[source]#
Find/List all valid slices in the label. Label is assumed to be a 4D Volume with shape CDHW, where C=1.
- Parameters:
label (
str
) – key to the label source.sids (
str
) – key to store slices indices having valid label map.
- class monai.apps.deepgrow.transforms.Fetch2DSliced(keys, guidance='guidance', axis=0, meta_keys=None, meta_key_postfix='meta_dict', allow_missing_keys=False)[source]#
Fetch one slice in case of a 3D volume.
The volume only contains spatial coordinates.
- Parameters:
keys – keys of the corresponding items to be transformed.
guidance – key that represents guidance.
axis – axis that represents slice in 3D volume.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – use key_{meta_key_postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
allow_missing_keys – don’t raise exception if key is missing.
Pathology#
- class monai.apps.pathology.inferers.SlidingWindowHoVerNetInferer(roi_size, sw_batch_size=1, overlap=0.25, mode=constant, sigma_scale=0.125, padding_mode=constant, cval=0.0, sw_device=None, device=None, progress=False, cache_roi_weight_map=False, cpu_thresh=None, extra_input_padding=None)[source]#
Sliding window method for HoVerNet model inference, with sw_batch_size windows for every model.forward(). Usage example can be found in the
monai.inferers.Inferer
base class.- Parameters:
roi_size – the window size to execute SlidingWindow evaluation. If it has non-positive components, the corresponding inputs size will be used. if the components of the roi_size are non-positive values, the transform will use the corresponding components of img size. For example, roi_size=(32, -1) will be adapted to (32, 64) if the second spatial dimension size of img is 64.
sw_batch_size – the batch size to run window slices.
overlap – Amount of overlap between scans.
mode –
{
"constant"
,"gaussian"
} How to blend output of overlapping windows. Defaults to"constant"
."constant
”: gives equal weight to all predictions."gaussian
”: gives less weight to predictions on edges of windows.
sigma_scale – the standard deviation coefficient of the Gaussian window when mode is
"gaussian"
. Default: 0.125. Actual window sigma issigma_scale
*dim_size
. When sigma_scale is a sequence of floats, the values denote sigma_scale at the corresponding spatial dimensions.padding_mode – {
"constant"
,"reflect"
,"replicate"
,"circular"
} Padding mode whenroi_size
is larger than inputs. Defaults to"constant"
See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlcval – fill value for ‘constant’ padding mode. Default: 0
sw_device – device for the window data. By default the device (and accordingly the memory) of the inputs is used. Normally sw_device should be consistent with the device where predictor is defined.
device – device for the stitched output prediction. By default the device (and accordingly the memory) of the inputs is used. If for example set to device=torch.device(‘cpu’) the gpu memory consumption is less and independent of the inputs and roi_size. Output is on the device.
progress – whether to print a tqdm progress bar.
cache_roi_weight_map – whether to pre-compute the ROI weight map.
cpu_thresh – when provided, dynamically switch to stitching on cpu (to save gpu memory) when input image volume is larger than this threshold (in pixels/voxels). Otherwise use
"device"
. Thus, the output may end-up on either cpu or gpu.extra_input_padding – the amount of padding for the input image, which is a tuple of even number of pads. Refer to to the pad argument of torch.nn.functional.pad for more details.
Note
sw_batch_size
denotes the max number of windows per network inference iteration, not the batch size of inputs.
- class monai.apps.pathology.losses.hovernet_loss.HoVerNetLoss(lambda_hv_mse=2.0, lambda_hv_mse_grad=1.0, lambda_np_ce=1.0, lambda_np_dice=1.0, lambda_nc_ce=1.0, lambda_nc_dice=1.0)[source]#
Loss function for HoVerNet pipeline, which is combination of losses across the three branches. The NP (nucleus prediction) branch uses Dice + CrossEntropy. The HV (Horizontal and Vertical) distance from centroid branch uses MSE + MSE of the gradient. The NC (Nuclear Class prediction) branch uses Dice + CrossEntropy The result is a weighted sum of these losses.
- Parameters:
lambda_hv_mse (
float
) – Weight factor to apply to the HV regression MSE part of the overall losslambda_hv_mse_grad (
float
) – Weight factor to apply to the MSE of the HV gradient part of the overall losslambda_np_ce (
float
) – Weight factor to apply to the nuclei prediction CrossEntropyLoss part of the overall losslambda_np_dice (
float
) – Weight factor to apply to the nuclei prediction DiceLoss part of overall losslambda_nc_ce (
float
) – Weight factor to apply to the nuclei class prediction CrossEntropyLoss part of the overall losslambda_nc_dice (
float
) – Weight factor to apply to the nuclei class prediction DiceLoss part of the overall loss
- forward(prediction, target)[source]#
- Parameters:
prediction (
dict
[str
,Tensor
]) – dictionary of predicted outputs for three branches, each of which should have the shape of BNHW.target (
dict
[str
,Tensor
]) – dictionary of ground truths for three branches, each of which should have the shape of BNHW.
- Return type:
Tensor
- class monai.apps.pathology.metrics.LesionFROC(data, grow_distance=75, itc_diameter=200, eval_thresholds=(0.25, 0.5, 1, 2, 4, 8), nms_sigma=0.0, nms_prob_threshold=0.5, nms_box_size=48, image_reader_name='cuCIM')[source]#
Evaluate with Free Response Operating Characteristic (FROC) score.
- Parameters:
data (
list
[dict
]) – either the list of dictionaries containing probability maps (inference result) and tumor mask (ground truth), as below, or the path to a json file containing such list. { “prob_map”: “path/to/prob_map_1.npy”, “tumor_mask”: “path/to/ground_truth_1.tiff”, “level”: 6, “pixel_spacing”: 0.243 }grow_distance (
int
) – Euclidean distance (in micrometer) by which to grow the label the ground truth’s tumors. Defaults to 75, which is the equivalent size of 5 tumor cells.itc_diameter (
int
) – the maximum diameter of a region (in micrometer) to be considered as an isolated tumor cell. Defaults to 200.eval_thresholds (
tuple
) – the false positive rates for calculating the average sensitivity. Defaults to (0.25, 0.5, 1, 2, 4, 8) which is the same as the CAMELYON 16 Challenge.nms_sigma (
float
) – the standard deviation for gaussian filter of non-maximal suppression. Defaults to 0.0.nms_prob_threshold (
float
) – the probability threshold of non-maximal suppression. Defaults to 0.5.nms_box_size (
int
) – the box size (in pixel) to be removed around the pixel for non-maximal suppression.image_reader_name (
str
) – the name of library to be used for loading whole slide imaging, either CuCIM or OpenSlide. Defaults to CuCIM.
Note
For more info on nms_* parameters look at monai.utils.prob_nms.ProbNMS`.
- compute_fp_tp()[source]#
Compute false positive and true positive probabilities for tumor detection, by comparing the model outputs with the prepared ground truths for all samples
- evaluate()[source]#
Evaluate the detection performance of a model based on the model probability map output, the ground truth tumor mask, and their associated metadata (e.g., pixel_spacing, level)
- monai.apps.pathology.utils.compute_multi_instance_mask(mask, threshold)[source]#
This method computes the segmentation mask according to the binary tumor mask.
- Parameters:
mask (
ndarray
) – the binary mask arraythreshold (
float
) – the threshold to fill holes
- Return type:
Any
- monai.apps.pathology.utils.compute_isolated_tumor_cells(tumor_mask, threshold)[source]#
This method computes identifies Isolated Tumor Cells (ITC) and return their labels.
- Parameters:
tumor_mask (
ndarray
) – the tumor mask.threshold (
float
) – the threshold (at the mask level) to define an isolated tumor cell (ITC). A region with the longest diameter less than this threshold is considered as an ITC.
- Return type:
list
[int
]
- class monai.apps.pathology.utils.PathologyProbNMS(spatial_dims=2, sigma=0.0, prob_threshold=0.5, box_size=48)[source]#
This class extends monai.utils.ProbNMS and add the resolution option for Pathology.
- class monai.apps.pathology.transforms.stain.array.ExtractHEStains(tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308))[source]#
Class to extract a target stain from an image, using stain deconvolution (see Note).
- Parameters:
tli – transmitted light intensity. Defaults to 240.
alpha – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta – absorbance threshold for transparent pixels. Defaults to 0.15
max_cref – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).
Note
For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:
Python: schaugf/HEnorm_python
- class monai.apps.pathology.transforms.stain.array.NormalizeHEStains(tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308))[source]#
Class to normalize patches/images to a reference or target image stain (see Note).
Performs stain deconvolution of the source image using the ExtractHEStains class, to obtain the stain matrix and calculate the stain concentration matrix for the image. Then, performs the inverse Beer-Lambert transform to recreate the patch using the target H&E stain matrix provided. If no target stain provided, a default reference stain is used. Similarly, if no maximum stain concentrations are provided, a reference maximum stain concentrations matrix is used.
- Parameters:
tli – transmitted light intensity. Defaults to 240.
alpha – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta – absorbance threshold for transparent pixels. Defaults to 0.15.
target_he – target stain matrix. Defaults to ((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)).
max_cref – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to [1.9705, 1.0308].
Note
For more information refer to: - the original paper: Macenko et al., 2009 http://wwwx.cs.unc.edu/~mn/sites/default/files/macenko2009.pdf - the previous implementations:
Python: schaugf/HEnorm_python
A collection of dictionary-based wrappers around the pathology transforms
defined in monai.apps.pathology.transforms.array
.
Class names are ended with ‘d’ to denote dictionary-based transforms.
- class monai.apps.pathology.transforms.stain.dictionary.ExtractHEStainsd(keys, tli=240, alpha=1, beta=0.15, max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.ExtractHEStains
. Class to extract a target stain from an image, using stain deconvolution.- Parameters:
keys – keys of the corresponding items to be transformed. See also:
monai.transforms.compose.MapTransform
tli – transmitted light intensity. Defaults to 240.
alpha – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta – absorbance threshold for transparent pixels. Defaults to 0.15
max_cref – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to (1.9705, 1.0308).
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.stain.dictionary.NormalizeHEStainsd(keys, tli=240, alpha=1, beta=0.15, target_he=((0.5626, 0.2159), (0.7201, 0.8012), (0.4062, 0.5581)), max_cref=(1.9705, 1.0308), allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.NormalizeHEStains
.Class to normalize patches/images to a reference or target image stain.
Performs stain deconvolution of the source image using the ExtractHEStains class, to obtain the stain matrix and calculate the stain concentration matrix for the image. Then, performs the inverse Beer-Lambert transform to recreate the patch using the target H&E stain matrix provided. If no target stain provided, a default reference stain is used. Similarly, if no maximum stain concentrations are provided, a reference maximum stain concentrations matrix is used.
- Parameters:
keys – keys of the corresponding items to be transformed. See also:
monai.transforms.compose.MapTransform
tli – transmitted light intensity. Defaults to 240.
alpha – tolerance in percentile for the pseudo-min (alpha percentile) and pseudo-max (100 - alpha percentile). Defaults to 1.
beta – absorbance threshold for transparent pixels. Defaults to 0.15.
target_he – target stain matrix. Defaults to None.
max_cref – reference maximum stain concentrations for Hematoxylin & Eosin (H&E). Defaults to None.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.post.array.GenerateSuccinctContour(height, width)[source]#
Converts SciPy-style contours (generated by skimage.measure.find_contours) to a more succinct version which only includes the pixels to which lines need to be drawn (i.e. not the intervening pixels along each line).
- Parameters:
height (
int
) – height of bounding box, used to detect direction of line segment.width (
int
) – width of bounding box, used to detect direction of line segment.
- Returns:
- the pixels that need to be joined by straight lines to describe the outmost pixels of the foreground similar to
OpenCV’s cv.CHAIN_APPROX_SIMPLE (counterclockwise)
- class monai.apps.pathology.transforms.post.array.GenerateInstanceContour(min_num_points=3, contour_level=None)[source]#
Generate contour for each instance in a 2D array. Use GenerateSuccinctContour to only include the pixels to which lines need to be drawn
- Parameters:
min_num_points – assumed that the created contour does not form a contour if it does not contain more points than the specified value. Defaults to 3.
contour_level – an optional value for skimage.measure.find_contours to find contours in the array. If not provided, the level is set to (max(image) + min(image)) / 2.
- class monai.apps.pathology.transforms.post.array.GenerateInstanceCentroid(dtype=<class 'int'>)[source]#
Generate instance centroid using skimage.measure.centroid.
- Parameters:
dtype – the data type of output centroid.
- class monai.apps.pathology.transforms.post.array.GenerateInstanceType[source]#
Generate instance type and probability for each instance.
- class monai.apps.pathology.transforms.post.array.Watershed(connectivity=1, dtype=<class 'numpy.int64'>)[source]#
Use skimage.segmentation.watershed to get instance segmentation results from images. See: https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.watershed.
- Parameters:
connectivity – an array with the same number of dimensions as image whose non-zero elements indicate neighbors for connection. Following the scipy convention, default is a one-connected array of the dimension of the image.
dtype – target data content type to convert, default is np.int64.
- class monai.apps.pathology.transforms.post.array.GenerateWatershedMask(activation='softmax', threshold=None, min_object_size=10, dtype=<class 'numpy.uint8'>)[source]#
generate mask used in watershed. Only points at which mask == True will be labeled.
- Parameters:
activation – the activation layer to be applied on the input probability map. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
threshold – an optional float value to threshold to binarize probability map. If not provided, defaults to 0.5 when activation is not “softmax”, otherwise None.
min_object_size – objects smaller than this size (in pixel) are removed. Defaults to 10.
dtype – target data content type to convert, default is np.uint8.
- class monai.apps.pathology.transforms.post.array.GenerateInstanceBorder(kernel_size=5, dtype=<class 'numpy.float32'>)[source]#
Generate instance border by hover map. The more parts of the image that cannot be identified as foreground areas, the larger the grey scale value. The grey value of the instance’s border will be larger.
- Parameters:
kernel_size (
int
) – the size of the Sobel kernel. Defaults to 5.dtype (
Union
[dtype
,type
,str
,None
]) – target data type to convert to. Defaults to np.float32.
- Raises:
ValueError – when the mask shape is not [1, H, W].
ValueError – when the hover_map shape is not [2, H, W].
- class monai.apps.pathology.transforms.post.array.GenerateDistanceMap(smooth_fn=None, dtype=<class 'numpy.float32'>)[source]#
Generate distance map. In general, the instance map is calculated from the distance to the background. Here, we use 1 - “instance border map” to generate the distance map. Nuclei values form mountains so invert them to get basins.
- Parameters:
smooth_fn – smoothing function for distance map, which can be any callable object. If not provided
monai.transforms.GaussianSmooth()
is used.dtype – target data type to convert to. Defaults to np.float32.
- class monai.apps.pathology.transforms.post.array.GenerateWatershedMarkers(threshold=0.4, radius=2, min_object_size=10, postprocess_fn=None, dtype=<class 'numpy.int64'>)[source]#
Generate markers to be used in watershed. The watershed algorithm treats pixels values as a local topography (elevation). The algorithm floods basins from the markers until basins attributed to different markers meet on watershed lines. Generally, markers are chosen as local minima of the image, from which basins are flooded. Here is the implementation from HoVerNet paper. For more details refer to papers: https://arxiv.org/abs/1812.06499.
- Parameters:
threshold – a float value to threshold to binarize instance border map. It turns uncertain area to 1 and other area to 0. Defaults to 0.4.
radius – the radius of the disk-shaped footprint used in opening. Defaults to 2.
min_object_size – objects smaller than this size (in pixel) are removed. Defaults to 10.
postprocess_fn – additional post-process function on the markers. If not provided,
monai.transforms.post.FillHoles()
will be used.dtype – target data type to convert to. Defaults to np.int64.
- class monai.apps.pathology.transforms.post.array.HoVerNetNuclearTypePostProcessing(activation='softmax', threshold=None, return_type_map=True, device=None)[source]#
The post-processing transform for HoVerNet model to generate nuclear type information. It updates the input instance info dictionary with information about types of the nuclei (value and probability). Also if requested (return_type_map=True), it generates a pixel-level type map.
- Parameters:
activation – the activation layer to be applied on nuclear type branch. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
threshold – an optional float value to threshold to binarize probability map. If not provided, defaults to 0.5 when activation is not “softmax”, otherwise None.
return_type_map – whether to calculate and return pixel-level type map.
device – target device to put the output Tensor data.
- class monai.apps.pathology.transforms.post.array.HoVerNetInstanceMapPostProcessing(activation='softmax', mask_threshold=None, min_object_size=10, sobel_kernel_size=5, distance_smooth_fn=None, marker_threshold=0.4, marker_radius=2, marker_postprocess_fn=None, watershed_connectivity=1, min_num_points=3, contour_level=None, device=None)[source]#
The post-processing transform for HoVerNet model to generate instance segmentation map. It generates an instance segmentation map as well as a dictionary containing centroids, bounding boxes, and contours for each instance.
- Parameters:
activation – the activation layer to be applied on the input probability map. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
mask_threshold – a float value to threshold to binarize probability map to generate mask.
min_object_size – objects smaller than this size (in pixel) are removed. Defaults to 10.
sobel_kernel_size – the size of the Sobel kernel used in
GenerateInstanceBorder
. Defaults to 5.distance_smooth_fn – smoothing function for distance map. If not provided,
monai.transforms.intensity.GaussianSmooth()
will be used.marker_threshold – a float value to threshold to binarize instance border map for markers. It turns uncertain area to 1 and other area to 0. Defaults to 0.4.
marker_radius – the radius of the disk-shaped footprint used in opening of markers. Defaults to 2.
marker_postprocess_fn – post-process function for watershed markers. If not provided,
monai.transforms.post.FillHoles()
will be used.watershed_connectivity – connectivity argument of skimage.segmentation.watershed.
min_num_points – minimum number of points to be considered as a contour. Defaults to 3.
contour_level – an optional value for skimage.measure.find_contours to find contours in the array. If not provided, the level is set to (max(image) + min(image)) / 2.
device – target device to put the output Tensor data.
- class monai.apps.pathology.transforms.post.dictionary.GenerateSuccinctContourd(keys, height, width, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.post.array.GenerateSuccinctContour
. Converts SciPy-style contours (generated by skimage.measure.find_contours) to a more succinct version which only includes the pixels to which lines need to be drawn (i.e. not the intervening pixels along each line).- Parameters:
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed.height (
int
) – height of bounding box, used to detect direction of line segment.width (
int
) – width of bounding box, used to detect direction of line segment.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.post.dictionary.GenerateInstanceContourd(keys, contour_key_postfix='contour', offset_key=None, min_num_points=3, level=None, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.post.array.GenerateInstanceContour
. Generate contour for each instance in a 2D array. Use GenerateSuccinctContour to only include the pixels to which lines need to be drawn- Parameters:
keys – keys of the corresponding items to be transformed.
contour_key_postfix – the output contour coordinates will be written to the value of {key}_{contour_key_postfix}.
offset_key – keys of offset used in GenerateInstanceContour.
min_num_points – assumed that the created contour does not form a contour if it does not contain more points than the specified value. Defaults to 3.
level – optional. Value along which to find contours in the array. By default, the level is set to (max(image) + min(image)) / 2.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.post.dictionary.GenerateInstanceCentroidd(keys, centroid_key_postfix='centroid', offset_key=None, dtype=<class 'int'>, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.post.array.GenerateInstanceCentroid
. Generate instance centroid using skimage.measure.centroid.- Parameters:
keys – keys of the corresponding items to be transformed.
centroid_key_postfix – the output centroid coordinates will be written to the value of {key}_{centroid_key_postfix}.
offset_key – keys of offset used in GenerateInstanceCentroid.
dtype – the data type of output centroid.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.post.dictionary.GenerateInstanceTyped(keys, type_info_key='type_info', bbox_key='bbox', seg_pred_key='seg', instance_id_key='id', allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.post.array.GenerateInstanceType
. Generate instance type and probability for each instance.- Parameters:
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed.type_info_key (
str
) – the output instance type and probability will be written to the value of {type_info_key}.bbox_key (
str
) – keys of bounding box.seg_pred_key (
str
) – keys of segmentation prediction map.instance_id_key (
str
) – keys of instance id.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.post.dictionary.Watershedd(keys, mask_key='mask', markers_key=None, connectivity=1, dtype=<class 'numpy.uint8'>, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.array.Watershed
. Use skimage.segmentation.watershed to get instance segmentation results from images. See: https://scikit-image.org/docs/stable/api/skimage.segmentation.html#skimage.segmentation.watershed.- Parameters:
keys – keys of the corresponding items to be transformed. See also: monai.transforms.MapTransform
mask_key – keys of mask used in watershed. Only points at which mask == True will be labeled.
markers_key – keys of markers used in watershed. If None (no markers given), the local minima of the image are used as markers.
connectivity – An array with the same number of dimensions as image whose non-zero elements indicate neighbors for connection. Following the scipy convention, default is a one-connected array of the dimension of the image.
dtype – target data content type to convert. Defaults to np.uint8.
allow_missing_keys – don’t raise exception if key is missing.
- Raises:
ValueError – when the image shape is not [1, H, W].
ValueError – when the mask shape is not [1, H, W].
- class monai.apps.pathology.transforms.post.dictionary.GenerateWatershedMaskd(keys, mask_key='mask', activation='softmax', threshold=None, min_object_size=10, dtype=<class 'numpy.uint8'>, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.array.GenerateWatershedMask
.- Parameters:
keys – keys of the corresponding items to be transformed.
mask_key – the mask will be written to the value of {mask_key}.
activation – the activation layer to be applied on nuclear type branch. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
threshold – if not None, threshold the float values to int number 0 or 1 with specified threshold.
min_object_size – objects smaller than this size are removed. Defaults to 10.
dtype – target data content type to convert, default is np.uint8.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.post.dictionary.GenerateInstanceBorderd(mask_key='mask', hover_map_key='hover_map', border_key='border', kernel_size=21, dtype=<class 'numpy.float32'>)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.array.GenerateInstanceBorder
.- Parameters:
mask_key (
str
) – the input key where the watershed mask is stored. Defaults to “mask”.hover_map_key (
str
) – the input key where hover map is stored. Defaults to “hover_map”.border_key (
str
) – the output key where instance border map is written. Defaults to “border”.kernel_size (
int
) – the size of the Sobel kernel. Defaults to 21.dtype (
Union
[dtype
,type
,str
,None
]) – target data content type to convert, default is np.float32.allow_missing_keys – don’t raise exception if key is missing.
- Raises:
ValueError – when the hover_map has only one value.
ValueError – when the sobel gradient map has only one value.
- class monai.apps.pathology.transforms.post.dictionary.GenerateDistanceMapd(mask_key='mask', border_key='border', dist_map_key='dist_map', smooth_fn=None, dtype=<class 'numpy.float32'>)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.array.GenerateDistanceMap
.- Parameters:
mask_key – the input key where the watershed mask is stored. Defaults to “mask”.
border_key – the input key where instance border map is stored. Defaults to “border”.
dist_map_key – the output key where distance map is written. Defaults to “dist_map”.
smooth_fn – smoothing function for distance map, which can be any callable object. If not provided
monai.transforms.GaussianSmooth()
is used.dtype – target data content type to convert, default is np.float32.
- class monai.apps.pathology.transforms.post.dictionary.GenerateWatershedMarkersd(mask_key='mask', border_key='border', markers_key='markers', threshold=0.4, radius=2, min_object_size=10, postprocess_fn=None, dtype=<class 'numpy.uint8'>)[source]#
Dictionary-based wrapper of
monai.apps.pathology.transforms.array.GenerateWatershedMarkers
.- Parameters:
mask_key – the input key where the watershed mask is stored. Defaults to “mask”.
border_key – the input key where instance border map is stored. Defaults to “border”.
markers_key – the output key where markers is written. Defaults to “markers”.
threshold – threshold the float values of instance border map to int 0 or 1 with specified threshold. It turns uncertain area to 1 and other area to 0. Defaults to 0.4.
radius – the radius of the disk-shaped footprint used in opening. Defaults to 2.
min_object_size – objects smaller than this size are removed. Defaults to 10.
postprocess_fn – execute additional post transformation on marker. Defaults to None.
dtype – target data content type to convert, default is np.uint8.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.pathology.transforms.post.dictionary.HoVerNetInstanceMapPostProcessingd(nuclear_prediction_key='nucleus_prediction', hover_map_key='horizontal_vertical', instance_info_key='instance_info', instance_map_key='instance_map', activation='softmax', mask_threshold=None, min_object_size=10, sobel_kernel_size=5, distance_smooth_fn=None, marker_threshold=0.4, marker_radius=2, marker_postprocess_fn=None, watershed_connectivity=1, min_num_points=3, contour_level=None, device=None)[source]#
Dictionary-based wrapper for
monai.apps.pathology.transforms.post.array.HoVerNetInstanceMapPostProcessing
. The post-processing transform for HoVerNet model to generate instance segmentation map. It generates an instance segmentation map as well as a dictionary containing centroids, bounding boxes, and contours for each instance.- Parameters:
nuclear_prediction_key – the key for HoVerNet NP (nuclear prediction) branch. Defaults to HoVerNetBranch.NP.
hover_map_key – the key for HoVerNet NC (nuclear prediction) branch. Defaults to HoVerNetBranch.HV.
instance_info_key – the output key where instance information (contour, bounding boxes, and centroids) is written. Defaults to “instance_info”.
instance_map_key – the output key where instance map is written. Defaults to “instance_map”.
activation – the activation layer to be applied on the input probability map. It can be “softmax” or “sigmoid” string, or any callable. Defaults to “softmax”.
mask_threshold – a float value to threshold to binarize probability map to generate mask.
min_object_size – objects smaller than this size are removed. Defaults to 10.
sobel_kernel_size – the size of the Sobel kernel used in
GenerateInstanceBorder
. Defaults to 5.distance_smooth_fn – smoothing function for distance map. If not provided,
monai.transforms.intensity.GaussianSmooth()
will be used.marker_threshold – a float value to threshold to binarize instance border map for markers. It turns uncertain area to 1 and other area to 0. Defaults to 0.4.
marker_radius – the radius of the disk-shaped footprint used in opening of markers. Defaults to 2.
marker_postprocess_fn – post-process function for watershed markers. If not provided,
monai.transforms.post.FillHoles()
will be used.watershed_connectivity – connectivity argument of skimage.segmentation.watershed.
min_num_points – minimum number of points to be considered as a contour. Defaults to 3.
contour_level – an optional value for skimage.measure.find_contours to find contours in the array. If not provided, the level is set to (max(image) + min(image)) / 2.
device – target device to put the output Tensor data.
- class monai.apps.pathology.transforms.post.dictionary.HoVerNetNuclearTypePostProcessingd(type_prediction_key='type_prediction', instance_info_key='instance_info', instance_map_key='instance_map', type_map_key='type_map', activation='softmax', threshold=None, return_type_map=True, device=None)[source]#
Dictionary-based wrapper for
monai.apps.pathology.transforms.post.array.HoVerNetNuclearTypePostProcessing
. It updates the input instance info dictionary with information about types of the nuclei (value and probability). Also if requested (return_type_map=True), it generates a pixel-level type map.- Parameters:
type_prediction_key – the key for HoVerNet NC (type prediction) branch. Defaults to HoVerNetBranch.NC.
instance_info_key – the key where instance information (contour, bounding boxes, and centroids) is stored. Defaults to “instance_info”.
instance_map_key – the key where instance map is stored. Defaults to “instance_map”.
type_map_key – the output key where type map is written. Defaults to “type_map”.
device – target device to put the output Tensor data.
Detection#
Hard Negative Sampler#
The functions in this script are adapted from nnDetection, MIC-DKFZ/nnDetection
- class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#
HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it selects negative samples with high prediction scores.
The training workflow is described as the follows: 1) forward network and get prediction scores (classification prob/logits) for all the samples; 2) use hard negative sampler to choose negative samples with high prediction scores and some positive samples; 3) compute classification loss for the selected samples; 4) do back propagation.
- Parameters:
batch_size_per_image (
int
) – number of training samples to be randomly selected per imagepositive_fraction (
float
) – percentage of positive elements in the selected samplesmin_neg (
int
) – minimum number of negative samples to select if possible.pool_size (
float
) – when we neednum_neg
hard negative samples, they will be randomly selected fromnum_neg * pool_size
negative samples with the highest prediction scores. Largerpool_size
gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.
- get_num_neg(negative, num_pos)[source]#
Sample enough negatives to fill up
self.batch_size_per_image
- Parameters:
negative (
Tensor
) – indices of positive samplesnum_pos (
int
) – number of positive samples to draw
- Return type:
int
- Returns:
number of negative samples
- get_num_pos(positive)[source]#
Number of positive samples to draw
- Parameters:
positive (
Tensor
) – indices of positive samples- Return type:
int
- Returns:
number of positive sample
- select_positives(positive, num_pos, labels)[source]#
Select positive samples
- Parameters:
positive (
Tensor
) – indices of positive samples, sized (P,), where P is the number of positive samplesnum_pos (
int
) – number of positive samples to samplelabels (
Tensor
) – labels for all samples, sized (A,), where A is the number of samples.
- Return type:
Tensor
- Returns:
- binary mask of positive samples to choose, sized (A,),
where A is the number of samples in one image
- select_samples_img_list(target_labels, fg_probs)[source]#
Select positives and hard negatives from list samples per image. Hard negative sampler will be applied to each image independently.
- Parameters:
target_labels (
list
[Tensor
]) – list of labels per image. For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i. Positive samples have positive labels, negative samples have label 0.fg_probs (
list
[Tensor
]) – list of maximum foreground probability per images, For image i in the batch, target_labels[i] is a Tensor sized (A_i,), where A_i is the number of samples in image i.
- Return type:
tuple
[list
[Tensor
],list
[Tensor
]]- Returns:
list of binary mask for positive samples
list binary mask for negative samples
Example
sampler = HardNegativeSampler( batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2 ) # two images with different number of samples target_labels = [ torch.tensor([0,1]), torch.tensor([1,0,2,1])] fg_probs = [ torch.rand(2), torch.rand(4)] pos_idx_list, neg_idx_list = sampler.select_samples_img_list(target_labels, fg_probs)
- select_samples_per_img(labels_per_img, fg_probs_per_img)[source]#
Select positives and hard negatives from samples.
- Parameters:
labels_per_img (
Tensor
) – labels, sized (A,). Positive samples have positive labels, negative samples have label 0.fg_probs_per_img (
Tensor
) – maximum foreground probability, sized (A,)
- Return type:
tuple
[Tensor
,Tensor
]- Returns:
binary mask for positive samples, sized (A,)
binary mask for negative samples, sized (A,)
Example
sampler = HardNegativeSampler( batch_size_per_image=6, positive_fraction=0.5, min_neg=1, pool_size=2 ) # two images with different number of samples target_labels = torch.tensor([1,0,2,1]) fg_probs = torch.rand(4) pos_idx, neg_idx = sampler.select_samples_per_img(target_labels, fg_probs)
- class monai.apps.detection.utils.hard_negative_sampler.HardNegativeSamplerBase(pool_size=10)[source]#
Base class of hard negative sampler.
Hard negative sampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.
The training workflow is described as the follows: 1) forward network and get prediction scores (classification prob/logits) for all the samples; 2) use hard negative sampler to choose negative samples with high prediction scores and some positive samples; 3) compute classification loss for the selected samples; 4) do back propagation.
- Parameters:
pool_size (
float
) – when we neednum_neg
hard negative samples, they will be randomly selected fromnum_neg * pool_size
negative samples with the highest prediction scores. Largerpool_size
gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.
- select_negatives(negative, num_neg, fg_probs)[source]#
Select hard negative samples.
- Parameters:
negative (
Tensor
) – indices of all the negative samples, sized (P,), where P is the number of negative samplesnum_neg (
int
) – number of negative samples to samplefg_probs (
Tensor
) – maximum foreground prediction scores (probability) across all the classes for each sample, sized (A,), where A is the number of samples.
- Return type:
Tensor
- Returns:
- binary mask of negative samples to choose, sized (A,),
where A is the number of samples in one image
RetinaNet Network#
Part of this script is adapted from pytorch/vision
- class monai.apps.detection.networks.retinanet_network.RetinaNet(spatial_dims, num_classes, num_anchors, feature_extractor, size_divisible=1, use_list_output=False)[source]#
The network used in RetinaNet.
It takes an image tensor as inputs, and outputs either 1) a dictionary
head_outputs
.head_outputs[self.cls_key]
is the predicted classification maps, a list of Tensor.head_outputs[self.box_reg_key]
is the predicted box regression maps, a list of Tensor. or 2) a list of 2N tensorshead_outputs
, with first N tensors being the predicted classification maps and second N tensors being the predicted box regression maps.- Parameters:
spatial_dims – number of spatial dimensions of the images. We support both 2D and 3D images.
num_classes – number of output classes of the model (excluding the background).
num_anchors – number of anchors at each location.
feature_extractor – a network that outputs feature maps from the input images, each feature map corresponds to a different resolution. Its output can have a format of Tensor, Dict[Any, Tensor], or Sequence[Tensor]. It can be the output of
resnet_fpn_feature_extractor(*args, **kwargs)
.size_divisible – the spatial size of the network input should be divisible by size_divisible, decided by the feature_extractor.
use_list_output – default False. If False, the network outputs a dictionary
head_outputs
,head_outputs[self.cls_key]
is the predicted classification maps, a list of Tensor.head_outputs[self.box_reg_key]
is the predicted box regression maps, a list of Tensor. If True, the network outputs a list of 2N tensorshead_outputs
, with first N tensors being the predicted classification maps and second N tensors being the predicted box regression maps.
Example
from monai.networks.nets import resnet spatial_dims = 3 # 3D network conv1_t_stride = (2,2,1) # stride of first convolutional layer in backbone backbone = resnet.ResNet( spatial_dims = spatial_dims, block = resnet.ResNetBottleneck, layers = [3, 4, 6, 3], block_inplanes = resnet.get_inplanes(), n_input_channels= 1, conv1_t_stride = conv1_t_stride, conv1_t_size = (7,7,7), ) # This feature_extractor outputs 4-level feature maps. # number of output feature maps is len(returned_layers)+1 returned_layers = [1,2,3] # returned layer from feature pyramid network feature_extractor = resnet_fpn_feature_extractor( backbone = backbone, spatial_dims = spatial_dims, pretrained_backbone = False, trainable_backbone_layers = None, returned_layers = returned_layers, ) # This feature_extractor requires input image spatial size # to be divisible by (32, 32, 16). size_divisible = tuple(2*s*2**max(returned_layers) for s in conv1_t_stride) model = RetinaNet( spatial_dims = spatial_dims, num_classes = 5, num_anchors = 6, feature_extractor=feature_extractor, size_divisible = size_divisible, ).to(device) result = model(torch.rand(2, 1, 128,128,128)) cls_logits_maps = result["classification"] # a list of len(returned_layers)+1 Tensor box_regression_maps = result["box_regression"] # a list of len(returned_layers)+1 Tensor
- forward(images)[source]#
It takes an image tensor as inputs, and outputs predicted classification maps and predicted box regression maps in
head_outputs
.- Parameters:
images (
Tensor
) – input images, sized (B, img_channels, H, W) or (B, img_channels, H, W, D).- Return type:
Any
- Returns:
1) If self.use_list_output is False, output a dictionary
head_outputs
with keys including self.cls_key and self.box_reg_key.head_outputs[self.cls_key]
is the predicted classification maps, a list of Tensor.head_outputs[self.box_reg_key]
is the predicted box regression maps, a list of Tensor. 2) if self.use_list_output is True, outputs a list of 2N tensorshead_outputs
, with first N tensors being the predicted classification maps and second N tensors being the predicted box regression maps.
- class monai.apps.detection.networks.retinanet_network.RetinaNetClassificationHead(in_channels, num_anchors, num_classes, spatial_dims, prior_probability=0.01)[source]#
A classification head for use in RetinaNet.
This head takes a list of feature maps as inputs, and outputs a list of classification maps. Each output map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.
- Parameters:
in_channels (
int
) – number of channels of the input featurenum_anchors (
int
) – number of anchors to be predictednum_classes (
int
) – number of classes to be predictedspatial_dims (
int
) – spatial dimension of the network, should be 2 or 3.prior_probability (
float
) – prior probability to initialize classification convolutional layers.
- forward(x)[source]#
It takes a list of feature maps as inputs, and outputs a list of classification maps. Each output classification map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * num_classes.
- Parameters:
x (
list
[Tensor
]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.- Return type:
list
[Tensor
]- Returns:
cls_logits_maps, list of classification map. cls_logits_maps[i] is a (B, num_anchors * num_classes, H_i, W_i) or (B, num_anchors * num_classes, H_i, W_i, D_i) Tensor.
- class monai.apps.detection.networks.retinanet_network.RetinaNetRegressionHead(in_channels, num_anchors, spatial_dims)[source]#
A regression head for use in RetinaNet.
This head takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.
- Parameters:
in_channels (
int
) – number of channels of the input featurenum_anchors (
int
) – number of anchors to be predictedspatial_dims (
int
) – spatial dimension of the network, should be 2 or 3.
- forward(x)[source]#
It takes a list of feature maps as inputs, and outputs a list of box regression maps. Each output box regression map has same spatial size with the corresponding input feature map, and the number of output channel is num_anchors * 2 * spatial_dims.
- Parameters:
x (
list
[Tensor
]) – list of feature map, x[i] is a (B, in_channels, H_i, W_i) or (B, in_channels, H_i, W_i, D_i) Tensor.- Return type:
list
[Tensor
]- Returns:
box_regression_maps, list of box regression map. cls_logits_maps[i] is a (B, num_anchors * 2 * spatial_dims, H_i, W_i) or (B, num_anchors * 2 * spatial_dims, H_i, W_i, D_i) Tensor.
- monai.apps.detection.networks.retinanet_network.resnet_fpn_feature_extractor(backbone, spatial_dims, pretrained_backbone=False, returned_layers=(1, 2, 3), trainable_backbone_layers=None)[source]#
Constructs a feature extractor network with a ResNet-FPN backbone, used as feature_extractor in RetinaNet.
Reference: “Focal Loss for Dense Object Detection”.
The returned feature_extractor network takes an image tensor as inputs, and outputs a dictionary that maps string to the extracted feature maps (Tensor).
The input to the returned feature_extractor is expected to be a list of tensors, each of shape
[C, H, W]
or[C, H, W, D]
, one for each image. Different images can have different sizes.- Parameters:
backbone – a ResNet model, used as backbone.
spatial_dims – number of spatial dimensions of the images. We support both 2D and 3D images.
pretrained_backbone – whether the backbone has been pre-trained.
returned_layers – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.
trainable_backbone_layers – number of trainable (not frozen) resnet layers starting from final block. Valid values are between 0 and 5, with 5 meaning all backbone layers are trainable. When pretrained_backbone is False, this value is set to be 5. When pretrained_backbone is True, if
None
is passed (the default) this value is set to 3.
Example
from monai.networks.nets import resnet spatial_dims = 3 # 3D network backbone = resnet.ResNet( spatial_dims = spatial_dims, block = resnet.ResNetBottleneck, layers = [3, 4, 6, 3], block_inplanes = resnet.get_inplanes(), n_input_channels= 1, conv1_t_stride = (2,2,1), conv1_t_size = (7,7,7), ) # This feature_extractor outputs 4-level feature maps. # number of output feature maps is len(returned_layers)+1 feature_extractor = resnet_fpn_feature_extractor( backbone = backbone, spatial_dims = spatial_dims, pretrained_backbone = False, trainable_backbone_layers = None, returned_layers = [1,2,3], ) model = RetinaNet( spatial_dims = spatial_dims, num_classes = 5, num_anchors = 6, feature_extractor=feature_extractor, size_divisible = 32, ).to(device)
RetinaNet Detector#
Part of this script is adapted from pytorch/vision
- class monai.apps.detection.networks.retinanet_detector.RetinaNetDetector(network, anchor_generator, box_overlap_metric=<function box_iou>, spatial_dims=None, num_classes=None, size_divisible=1, cls_key='classification', box_reg_key='box_regression', debug=False)[source]#
Retinanet detector, expandable to other one stage anchor based box detectors in the future. An example of construction can found in the source code of
retinanet_resnet50_fpn_detector()
.The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.
The behavior of the model changes depending if it is in training or evaluation mode.
During training, the model expects both the input tensors, as well as a targets (list of dictionary), containing:
boxes (
FloatTensor[N, 4]
orFloatTensor[N, 6]
): the ground-truth boxes inStandardMode
, i.e.,[xmin, ymin, xmax, ymax]
or[xmin, ymin, zmin, xmax, ymax, zmax]
format, with0 <= xmin < xmax <= H
,0 <= ymin < ymax <= W
,0 <= zmin < zmax <= D
.labels: the class label for each ground-truth box
The model returns a Dict[str, Tensor] during training, containing the classification and regression losses. When saving the model, only self.network contains trainable parameters and needs to be saved.
During inference, the model requires only the input tensors, and returns the post-processed predictions as a List[Dict[Tensor]], one for each input image. The fields of the Dict are as follows:
boxes (
FloatTensor[N, 4]
orFloatTensor[N, 6]
): the predicted boxes inStandardMode
, i.e.,[xmin, ymin, xmax, ymax]
or[xmin, ymin, zmin, xmax, ymax, zmax]
format, with0 <= xmin < xmax <= H
,0 <= ymin < ymax <= W
,0 <= zmin < zmax <= D
.labels (Int64Tensor[N]): the predicted labels for each image
labels_scores (Tensor[N]): the scores for each prediction
- Parameters:
network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].
anchor_generator – anchor generator.
box_overlap_metric – func that compute overlap between two sets of boxes, default is Intersection over Union (IoU).
debug – whether to print out internal parameters, used for debugging and parameter tuning.
Notes
Input argument
network
can be a monai.apps.detection.networks.retinanet_network.RetinaNet(*) object, but any network that meets the following rules is a valid inputnetwork
.It should have attributes including spatial_dims, num_classes, cls_key, box_reg_key, num_anchors, size_divisible.
spatial_dims (int) is the spatial dimension of the network, we support both 2D and 3D.
num_classes (int) is the number of classes, excluding the background.
size_divisible (int or Sequence[int]) is the expectation on the input image shape. The network needs the input spatial_size to be divisible by size_divisible, length should be 2 or 3.
cls_key (str) is the key to represent classification in the output dict.
box_reg_key (str) is the key to represent box regression in the output dict.
num_anchors (int) is the number of anchor shapes at each location. it should equal to
self.anchor_generator.num_anchors_per_location()[0]
.
If network does not have these attributes, user needs to provide them for the detector.
Its input should be an image Tensor sized (B, C, H, W) or (B, C, H, W, D).
About its output
head_outputs
, it should be either a list of tensors or a dictionary of str: List[Tensor]:If it is a dictionary, it needs to have at least two keys:
network.cls_key
andnetwork.box_reg_key
, representing predicted classification maps and box regression maps.head_outputs[network.cls_key]
should be List[Tensor] or Tensor. Each Tensor represents classification logits map at one resolution level, sized (B, num_classes*num_anchors, H_i, W_i) or (B, num_classes*num_anchors, H_i, W_i, D_i).head_outputs[network.box_reg_key]
should be List[Tensor] or Tensor. Each Tensor represents box regression map at one resolution level, sized (B, 2*spatial_dims*num_anchors, H_i, W_i)or (B, 2*spatial_dims*num_anchors, H_i, W_i, D_i).len(head_outputs[network.cls_key]) == len(head_outputs[network.box_reg_key])
.If it is a list of 2N tensors, the first N tensors should be the predicted classification maps, and the second N tensors should be the predicted box regression maps.
Example
# define a naive network import torch class NaiveNet(torch.nn.Module): def __init__(self, spatial_dims: int, num_classes: int): super().__init__() self.spatial_dims = spatial_dims self.num_classes = num_classes self.size_divisible = 2 self.cls_key = "cls" self.box_reg_key = "box_reg" self.num_anchors = 1 def forward(self, images: torch.Tensor): spatial_size = images.shape[-self.spatial_dims:] out_spatial_size = tuple(s//self.size_divisible for s in spatial_size) # half size of input out_cls_shape = (images.shape[0],self.num_classes*self.num_anchors) + out_spatial_size out_box_reg_shape = (images.shape[0],2*self.spatial_dims*self.num_anchors) + out_spatial_size return {self.cls_key: [torch.randn(out_cls_shape)], self.box_reg_key: [torch.randn(out_box_reg_shape)]} # create a RetinaNetDetector detector spatial_dims = 3 num_classes = 5 anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape( feature_map_scales=(1, ), base_anchor_shapes=((8,) * spatial_dims) ) net = NaiveNet(spatial_dims, num_classes) detector = RetinaNetDetector(net, anchor_generator) # only detector.network may contain trainable parameters. optimizer = torch.optim.SGD( detector.network.parameters(), 1e-3, momentum=0.9, weight_decay=3e-5, nesterov=True, ) torch.save(detector.network.state_dict(), 'model.pt') # save model detector.network.load_state_dict(torch.load('model.pt')) # load model
- compute_anchor_matched_idxs(anchors, targets, num_anchor_locs_per_level)[source]#
Compute the matched indices between anchors and ground truth (gt) boxes in targets. output[k][i] represents the matched gt index for anchor[i] in image k. Suppose there are M gt boxes for image k. The range of it output[k][i] value is [-2, -1, 0, …, M-1]. [0, M - 1] indicates this anchor is matched with a gt box, while a negative value indicating that it is not matched.
- Parameters:
anchors (
list
[Tensor
]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.targets (
list
[dict
[str
,Tensor
]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.num_anchor_locs_per_level (
Sequence
[int
]) – each element represents HW or HWD at this level.
- Return type:
list
[Tensor
]- Returns:
a list of matched index matched_idxs_per_image (Tensor[int64]), Tensor sized (sum(HWA),) or (sum(HWDA),). Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2
- compute_box_loss(box_regression, targets, anchors, matched_idxs)[source]#
Compute box regression losses.
- Parameters:
box_regression (
Tensor
) – box regression results, sized (B, sum(HWA), 2*self.spatial_dims)targets (
list
[dict
[str
,Tensor
]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.anchors (
list
[Tensor
]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.matched_idxs (
list
[Tensor
]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)
- Return type:
Tensor
- Returns:
box regression losses.
- compute_cls_loss(cls_logits, targets, matched_idxs)[source]#
Compute classification losses.
- Parameters:
cls_logits (
Tensor
) – classification logits, sized (B, sum(HW(D)A), self.num_classes)targets (
list
[dict
[str
,Tensor
]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.matched_idxs (
list
[Tensor
]) – a list of matched index. each element is sized (sum(HWA),) or (sum(HWDA),)
- Return type:
Tensor
- Returns:
classification losses.
- compute_loss(head_outputs_reshape, targets, anchors, num_anchor_locs_per_level)[source]#
Compute losses.
- Parameters:
head_outputs_reshape (
dict
[str
,Tensor
]) – reshaped head_outputs.head_output_reshape[self.cls_key]
is a Tensor sized (B, sum(HW(D)A), self.num_classes).head_output_reshape[self.box_reg_key]
is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)targets (
list
[dict
[str
,Tensor
]]) – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.anchors (
list
[Tensor
]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
- Return type:
dict
[str
,Tensor
]- Returns:
a dict of several kinds of losses.
- forward(input_images, targets=None, use_inferer=False)[source]#
Returns a dict of losses during training, or a list predicted dict of boxes and labels during inference.
- Parameters:
input_images – The input to the model is expected to be a list of tensors, each of shape (C, H, W) or (C, H, W, D), one for each image, and should be in 0-1 range. Different images can have different sizes. Or it can also be a Tensor sized (B, C, H, W) or (B, C, H, W, D). In this case, all images have same size.
targets – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image (optional).
use_inferer – whether to use self.inferer, a sliding window inferer, to do the inference. If False, will simply forward the network. If True, will use self.inferer, and requires
self.set_sliding_window_inferer(*args)
to have been called before.
- Returns:
If training mode, will return a dict with at least two keys, including self.cls_key and self.box_reg_key, representing classification loss and box regression loss.
If evaluation mode, will return a list of detection results. Each element corresponds to an images in
input_images
, is a dict with at least three keys, including self.target_box_key, self.target_label_key, self.pred_score_key, representing predicted boxes, classification labels, and classification scores.
- generate_anchors(images, head_outputs)[source]#
Generate anchors and store it in self.anchors: List[Tensor]. We generate anchors only when there is no stored anchors, or the new coming images has different shape with self.previous_image_shape
- Parameters:
images (
Tensor
) – input images, a (B, C, H, W) or (B, C, H, W, D) Tensor.head_outputs (
dict
[str
,list
[Tensor
]]) – head_outputs.head_output_reshape[self.cls_key]
is a Tensor sized (B, sum(HW(D)A), self.num_classes).head_output_reshape[self.box_reg_key]
is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)
- Return type:
None
- get_box_train_sample_per_image(box_regression_per_image, targets_per_image, anchors_per_image, matched_idxs_per_image)[source]#
Get samples from one image for box regression losses computation.
- Parameters:
box_regression_per_image (
Tensor
) – box regression result for one image, (sum(HWA), 2*self.spatial_dims)targets_per_image (
dict
[str
,Tensor
]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.anchors_per_image (
Tensor
) – anchors of one image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.matched_idxs_per_image (
Tensor
) – matched index, sized (sum(HWA),) or (sum(HWDA),)
- Return type:
tuple
[Tensor
,Tensor
]- Returns:
paired predicted and GT samples from one image for box regression losses computation
- get_cls_train_sample_per_image(cls_logits_per_image, targets_per_image, matched_idxs_per_image)[source]#
Get samples from one image for classification losses computation.
- Parameters:
cls_logits_per_image (
Tensor
) – classification logits for one image, (sum(HWA), self.num_classes)targets_per_image (
dict
[str
,Tensor
]) – a dict with at least two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.matched_idxs_per_image (
Tensor
) – matched index, Tensor sized (sum(HWA),) or (sum(HWDA),) Suppose there are M gt boxes. matched_idxs_per_image[i] is a matched gt index in [0, M - 1] or a negative value indicating that anchor i could not be matched. BELOW_LOW_THRESHOLD = -1, BETWEEN_THRESHOLDS = -2
- Return type:
tuple
[Tensor
,Tensor
]- Returns:
paired predicted and GT samples from one image for classification losses computation
- postprocess_detections(head_outputs_reshape, anchors, image_sizes, num_anchor_locs_per_level, need_sigmoid=True)[source]#
Postprocessing to generate detection result from classification logits and box regression. Use self.box_selector to select the final output boxes for each image.
- Parameters:
head_outputs_reshape (
dict
[str
,Tensor
]) – reshaped head_outputs.head_output_reshape[self.cls_key]
is a Tensor sized (B, sum(HW(D)A), self.num_classes).head_output_reshape[self.box_reg_key]
is a Tensor sized (B, sum(HW(D)A), 2*self.spatial_dims)targets – a list of dict. Each dict with two keys: self.target_box_key and self.target_label_key, ground-truth boxes present in the image.
anchors (
list
[Tensor
]) – a list of Tensor. Each Tensor represents anchors for each image, sized (sum(HWA), 2*spatial_dims) or (sum(HWDA), 2*spatial_dims). A = self.num_anchors_per_loc.
- Return type:
list
[dict
[str
,Tensor
]]- Returns:
a list of dict, each dict corresponds to detection result on image.
- set_atss_matcher(num_candidates=4, center_in_gt=False)[source]#
Using for training. Set ATSS matcher that matches anchors with ground truth boxes
- Parameters:
num_candidates (
int
) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.center_in_gt (
bool
) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.
- Return type:
None
- set_balanced_sampler(batch_size_per_image, positive_fraction)[source]#
Using for training. Set torchvision balanced sampler that samples part of the anchors for training.
- Parameters:
batch_size_per_image (
int
) – number of elements to be selected per imagepositive_fraction (
float
) – percentage of positive elements per batch
- Return type:
None
- set_box_coder_weights(weights)[source]#
Set the weights for box coder.
- Parameters:
weights (
tuple
[float
]) – a list/tuple with length of 2*self.spatial_dims- Return type:
None
- set_box_regression_loss(box_loss, encode_gt, decode_pred)[source]#
Using for training. Set loss for box regression.
- Parameters:
box_loss (
Module
) – loss module for box regressionencode_gt (
bool
) – if True, will encode ground truth boxes to target box regression before computing the losses. Should be True for L1 loss and False for GIoU loss.decode_pred (
bool
) – if True, will decode predicted box regression into predicted boxes before computing losses. Should be False for L1 loss and True for GIoU loss.
Example
detector.set_box_regression_loss( torch.nn.SmoothL1Loss(beta=1.0 / 9, reduction="mean"), encode_gt = True, decode_pred = False ) detector.set_box_regression_loss( monai.losses.giou_loss.BoxGIoULoss(reduction="mean"), encode_gt = False, decode_pred = True )
- Return type:
None
- set_box_selector_parameters(score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300, apply_sigmoid=True)[source]#
Using for inference. Set the parameters that are used for box selection during inference. The box selection is performed with the following steps:
For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.
- Parameters:
score_thresh (
float
) – no box with scores less than score_thresh will be kepttopk_candidates_per_level (
int
) – max number of boxes to keep for each levelnms_thresh (
float
) – box overlapping threshold for NMSdetections_per_img (
int
) – max number of boxes to keep for each image
- Return type:
None
- set_cls_loss(cls_loss)[source]#
Using for training. Set loss for classification that takes logits as inputs, make sure sigmoid/softmax is built in.
- Parameters:
cls_loss (
Module
) – loss module for classification
Example
detector.set_cls_loss(torch.nn.BCEWithLogitsLoss(reduction="mean")) detector.set_cls_loss(FocalLoss(reduction="mean", gamma=2.0))
- Return type:
None
- set_hard_negative_sampler(batch_size_per_image, positive_fraction, min_neg=1, pool_size=10)[source]#
Using for training. Set hard negative sampler that samples part of the anchors for training.
HardNegativeSampler is used to suppress false positive rate in classification tasks. During training, it select negative samples with high prediction scores.
- Parameters:
batch_size_per_image (
int
) – number of elements to be selected per imagepositive_fraction (
float
) – percentage of positive elements in the selected samplesmin_neg (
int
) – minimum number of negative samples to select if possible.pool_size (
float
) – when we neednum_neg
hard negative samples, they will be randomly selected fromnum_neg * pool_size
negative samples with the highest prediction scores. Largerpool_size
gives more randomness, yet selects negative samples that are less ‘hard’, i.e., negative samples with lower prediction scores.
- Return type:
None
- set_regular_matcher(fg_iou_thresh, bg_iou_thresh, allow_low_quality_matches=True)[source]#
Using for training. Set torchvision matcher that matches anchors with ground truth boxes.
- Parameters:
fg_iou_thresh (
float
) – foreground IoU threshold for Matcher, considered as matched if IoU > fg_iou_threshbg_iou_thresh (
float
) – background IoU threshold for Matcher, considered as not matched if IoU < bg_iou_threshallow_low_quality_matches (
bool
) – if True, produce additional matches for predictions that have only low-quality match candidates.
- Return type:
None
- set_sliding_window_inferer(roi_size, sw_batch_size=1, overlap=0.5, mode=constant, sigma_scale=0.125, padding_mode=constant, cval=0.0, sw_device=None, device=None, progress=False, cache_roi_weight_map=False)[source]#
Define sliding window inferer and store it to self.inferer.
- set_target_keys(box_key, label_key)[source]#
Set keys for the training targets and inference outputs. During training, both box_key and label_key should be keys in the targets when performing
self.forward(input_images, targets)
. During inference, they will be the keys in the output dict of self.forward(input_images)`.- Return type:
None
- monai.apps.detection.networks.retinanet_detector.retinanet_resnet50_fpn_detector(num_classes, anchor_generator, returned_layers=(1, 2, 3), pretrained=False, progress=True, **kwargs)[source]#
Returns a RetinaNet detector using a ResNet-50 as backbone, which can be pretrained from Med3D: Transfer Learning for 3D Medical Image Analysis <https://arxiv.org/pdf/1904.00625.pdf> _.
- Parameters:
num_classes (
int
) – number of output classes of the model (excluding the background).anchor_generator (
AnchorGenerator
) – AnchorGenerator,returned_layers (
Sequence
[int
]) – returned layers to extract feature maps. Each returned layer should be in the range [1,4]. len(returned_layers)+1 will be the number of extracted feature maps. There is an extra maxpooling layer LastLevelMaxPool() appended.pretrained (
bool
) – If True, returns a backbone pre-trained on 23 medical datasetsprogress (
bool
) – If True, displays a progress bar of the download to stderr
- Return type:
- Returns:
A RetinaNetDetector object with resnet50 as backbone
Example
# define a naive network resnet_param = { "pretrained": False, "spatial_dims": 3, "n_input_channels": 2, "num_classes": 3, "conv1_t_size": 7, "conv1_t_stride": (2, 2, 2) } returned_layers = [1] anchor_generator = monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape( feature_map_scales=(1, 2), base_anchor_shapes=((8,) * resnet_param["spatial_dims"]) ) detector = retinanet_resnet50_fpn_detector( **resnet_param, anchor_generator=anchor_generator, returned_layers=returned_layers )
Transforms#
- monai.apps.detection.transforms.box_ops.apply_affine_to_boxes(boxes, affine)[source]#
This function applies affine matrices to the boxes
- Parameters:
boxes (~NdarrayTensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
affine (
Union
[ndarray
,Tensor
]) – affine matrix to be applied to the box coordinates, sized (spatial_dims+1,spatial_dims+1)
- Return type:
~NdarrayTensor
- Returns:
returned affine transformed boxes, with same data type as
boxes
, does not share memory withboxes
- monai.apps.detection.transforms.box_ops.convert_box_to_mask(boxes, labels, spatial_size, bg_label=-1, ellipse_mask=False)[source]#
Convert box to int16 mask image, which has the same size with the input image.
- Parameters:
boxes – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be
StandardMode
.labels – classification foreground(fg) labels corresponding to boxes, dtype should be int, sized (N,).
spatial_size – image spatial size.
bg_label – background labels for the output mask image, make sure it is smaller than any fg labels.
ellipse_mask –
bool.
If True, it assumes the object shape is close to ellipse or ellipsoid.
If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.
- Returns:
- int16 array, sized (num_box, H, W). Each channel represents a box.
The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.
- monai.apps.detection.transforms.box_ops.convert_mask_to_box(boxes_mask, bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#
Convert int16 mask image to box, which has the same size with the input image
- Parameters:
boxes_mask – int16 array, sized (num_box, H, W). Each channel represents a box. The foreground region in channel c has intensity of labels[c]. The background intensity is bg_label.
bg_label – background labels for the boxes_mask
box_dtype – output dtype for boxes
label_dtype – output dtype for labels
- Returns:
bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be
StandardMode
.classification foreground(fg) labels, dtype should be int, sized (N,).
- monai.apps.detection.transforms.box_ops.flip_boxes(boxes, spatial_size, flip_axes=None)[source]#
Flip boxes when the corresponding image is flipped
- Parameters:
boxes – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be
StandardMode
spatial_size – image spatial size.
flip_axes – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.
- Returns:
flipped boxes, with same data type as
boxes
, does not share memory withboxes
- monai.apps.detection.transforms.box_ops.resize_boxes(boxes, src_spatial_size, dst_spatial_size)[source]#
Resize boxes when the corresponding image is resized
- Parameters:
boxes – source bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be
StandardMode
src_spatial_size – source image spatial size.
dst_spatial_size – target image spatial size.
- Returns:
resized boxes, with same data type as
boxes
, does not share memory withboxes
Example
boxes = torch.ones(1,4) src_spatial_size = [100, 100] dst_spatial_size = [128, 256] resize_boxes(boxes, src_spatial_size, dst_spatial_size) # will return tensor([[1.28, 2.56, 1.28, 2.56]])
- monai.apps.detection.transforms.box_ops.rot90_boxes(boxes, spatial_size, k=1, axes=(0, 1))[source]#
Rotate boxes by 90 degrees in the plane specified by axes. Rotation direction is from the first towards the second axis.
- Parameters:
boxes – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be
StandardMode
spatial_size – image spatial size.
k – number of times the array is rotated by 90 degrees.
axes – (2,) array_like The array is rotated in the plane defined by the axes. Axes must be different.
- Returns:
A rotated view of boxes.
Notes
rot90_boxes(boxes, spatial_size, k=1, axes=(1,0))
is the reverse ofrot90_boxes(boxes, spatial_size, k=1, axes=(0,1))
rot90_boxes(boxes, spatial_size, k=1, axes=(1,0))
is equivalent torot90_boxes(boxes, spatial_size, k=-1, axes=(0,1))
- monai.apps.detection.transforms.box_ops.select_labels(labels, keep)[source]#
For element in labels, select indices keep from it.
- Parameters:
labels – Sequence of array. Each element represents classification labels or scores corresponding to
boxes
, sized (N,).keep – the indices to keep, same length with each element in labels.
- Returns:
selected labels, does not share memory with original labels.
- monai.apps.detection.transforms.box_ops.swapaxes_boxes(boxes, axis1, axis2)[source]#
Interchange two axes of boxes.
- Parameters:
boxes (~NdarrayTensor) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be
StandardMode
axis1 (
int
) – First axis.axis2 (
int
) – Second axis.
- Return type:
~NdarrayTensor
- Returns:
boxes with two axes interchanged.
- monai.apps.detection.transforms.box_ops.zoom_boxes(boxes, zoom)[source]#
Zoom boxes
- Parameters:
boxes – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to be StandardMode
zoom – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.
- Returns:
zoomed boxes, with same data type as
boxes
, does not share memory withboxes
Example
boxes = torch.ones(1,4) zoom_boxes(boxes, zoom=[0.5,2.2]) # will return tensor([[0.5, 2.2, 0.5, 2.2]])
A collection of “vanilla” transforms for box operations Project-MONAI/MONAI
- class monai.apps.detection.transforms.array.BoxToMask(bg_label=-1, ellipse_mask=False)[source]#
Convert box to int16 mask image, which has the same size with the input image.
- Parameters:
bg_label (
int
) – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.ellipse_mask (
bool
) –bool.
If True, it assumes the object shape is close to ellipse or ellipsoid.
If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.
- class monai.apps.detection.transforms.array.ClipBoxToImage(remove_empty=False)[source]#
Clip the bounding boxes and the associated labels/scores to make sure they are within the image. There might be multiple arrays of labels/scores associated with one array of boxes.
- Parameters:
remove_empty (
bool
) – whether to remove the boxes and corresponding labels that are actually empty
- class monai.apps.detection.transforms.array.ConvertBoxMode(src_mode=None, dst_mode=None)[source]#
This transform converts the boxes in src_mode to the dst_mode.
- Parameters:
src_mode – source box mode. If it is not given, this func will assume it is
StandardMode()
.dst_mode – target box mode. If it is not given, this func will assume it is
StandardMode()
.
Note
StandardMode
=CornerCornerModeTypeA
, also represented as “xyxy” for 2D and “xyzxyz” for 3D.- src_mode and dst_mode can be:
- str: choose from
BoxModeName
, for example, “xyxy”: boxes has format [xmin, ymin, xmax, ymax]
“xyzxyz”: boxes has format [xmin, ymin, zmin, xmax, ymax, zmax]
“xxyy”: boxes has format [xmin, xmax, ymin, ymax]
“xxyyzz”: boxes has format [xmin, xmax, ymin, ymax, zmin, zmax]
“xyxyzz”: boxes has format [xmin, ymin, xmax, ymax, zmin, zmax]
“xywh”: boxes has format [xmin, ymin, xsize, ysize]
“xyzwhd”: boxes has format [xmin, ymin, zmin, xsize, ysize, zsize]
“ccwh”: boxes has format [xcenter, ycenter, xsize, ysize]
“cccwhd”: boxes has format [xcenter, ycenter, zcenter, xsize, ysize, zsize]
- str: choose from
- BoxMode class: choose from the subclasses of
BoxMode
, for example, CornerCornerModeTypeA: equivalent to “xyxy” or “xyzxyz”
CornerCornerModeTypeB: equivalent to “xxyy” or “xxyyzz”
CornerCornerModeTypeC: equivalent to “xyxy” or “xyxyzz”
CornerSizeMode: equivalent to “xywh” or “xyzwhd”
CenterSizeMode: equivalent to “ccwh” or “cccwhd”
- BoxMode class: choose from the subclasses of
- BoxMode object: choose from the subclasses of
BoxMode
, for example, CornerCornerModeTypeA(): equivalent to “xyxy” or “xyzxyz”
CornerCornerModeTypeB(): equivalent to “xxyy” or “xxyyzz”
CornerCornerModeTypeC(): equivalent to “xyxy” or “xyxyzz”
CornerSizeMode(): equivalent to “xywh” or “xyzwhd”
CenterSizeMode(): equivalent to “ccwh” or “cccwhd”
- BoxMode object: choose from the subclasses of
None: will assume mode is
StandardMode()
Example
boxes = torch.ones(10,4) # convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize]. box_converter = ConvertBoxMode(src_mode="xyxy", dst_mode="ccwh") box_converter(boxes)
- class monai.apps.detection.transforms.array.ConvertBoxToStandardMode(mode=None)[source]#
Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].
- Parameters:
mode – source box mode. If it is not given, this func will assume it is
StandardMode()
. It follows the same format withsrc_mode
inConvertBoxMode
.
Example
boxes = torch.ones(10,6) # convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax] box_converter = ConvertBoxToStandardMode(mode="xxyyzz") box_converter(boxes)
- class monai.apps.detection.transforms.array.FlipBox(spatial_axis=None)[source]#
Reverses the box coordinates along the given spatial axis. Preserves shape.
- Parameters:
spatial_axis – spatial axes along which to flip over. Default is None. The default axis=None will flip over all of the axes of the input array. If axis is negative it counts from the last to the first axis. If axis is a tuple of ints, flipping is performed on all of the axes specified in the tuple.
- class monai.apps.detection.transforms.array.MaskToBox(bg_label=-1, box_dtype=torch.float32, label_dtype=torch.int64)[source]#
Convert int16 mask image to box, which has the same size with the input image. Pairs with
monai.apps.detection.transforms.array.BoxToMask
. Please make sure the samemin_fg_label
is used when using the two transforms in pairs.- Parameters:
bg_label – background labels for the output mask image, make sure it is smaller than any foreground(fg) labels.
box_dtype – output dtype for boxes
label_dtype – output dtype for labels
- class monai.apps.detection.transforms.array.ResizeBox(spatial_size, size_mode='all', **kwargs)[source]#
Resize the input boxes when the corresponding image is resized to given spatial size (with scaling, not cropping/padding).
- Parameters:
spatial_size – expected shape of spatial dimensions after resize operation. if some components of the spatial_size are non-positive values, the transform will use the corresponding components of img size. For example, spatial_size=(32, -1) will be adapted to (32, 64) if the second spatial dimension size of img is 64.
size_mode – should be “all” or “longest”, if “all”, will use spatial_size for all the spatial dims, if “longest”, rescale the image so that only the longest side is equal to specified spatial_size, which must be an int number in this case, keeping the aspect ratio of the initial image, refer to: https://albumentations.ai/docs/api_reference/augmentations/geometric/resize/ #albumentations.augmentations.geometric.resize.LongestMaxSize.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.
- class monai.apps.detection.transforms.array.RotateBox90(k=1, spatial_axes=(0, 1), lazy=False)[source]#
Rotate a boxes by 90 degrees in the plane specified by axes. See box_ops.rot90_boxes for additional details
- Parameters:
k (
int
) – number of times to rotate by 90 degrees.spatial_axes (
tuple
[int
,int
]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions. If axis is negative it counts from the last to the first axis.
- class monai.apps.detection.transforms.array.SpatialCropBox(roi_center=None, roi_size=None, roi_start=None, roi_end=None, roi_slices=None)[source]#
General purpose box cropper when the corresponding image is cropped by SpatialCrop(*) with the same ROI. The difference is that we do not support negative indexing for roi_slices.
If a dimension of the expected ROI size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected ROI, and the cropped results of several images may not have exactly the same shape. It can support to crop ND spatial boxes.
- The cropped region can be parameterised in various ways:
a list of slices for each spatial dimension (do not allow for use of negative indexing)
a spatial center and size
the start and end coordinates of the ROI
- Parameters:
roi_center – voxel coordinates for center of the crop ROI.
roi_size – size of the crop ROI, if a dimension of ROI size is bigger than image size, will not crop that dimension of the image.
roi_start – voxel coordinates for start of the crop ROI.
roi_end – voxel coordinates for end of the crop ROI, if a coordinate is out of image, use the end coordinate of image.
roi_slices – list of slices for each of the spatial dimensions.
- class monai.apps.detection.transforms.array.StandardizeEmptyBox(spatial_dims)[source]#
When boxes are empty, this transform standardize it to shape of (0,4) or (0,6).
- Parameters:
spatial_dims (
int
) – number of spatial dimensions of the bounding boxes.
- class monai.apps.detection.transforms.array.ZoomBox(zoom, keep_size=False, **kwargs)[source]#
Zooms an ND Box with same padding or slicing setting with Zoom().
- Parameters:
zoom – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.
keep_size – Should keep original size (padding/slicing if needed), default is True.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.
A collection of dictionary-based wrappers around the “vanilla” transforms for box operations
defined in monai.apps.detection.transforms.array
.
Class names are ended with ‘d’ to denote dictionary-based transforms.
- monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateD[source]#
alias of
AffineBoxToImageCoordinated
- monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinateDict[source]#
alias of
AffineBoxToImageCoordinated
- class monai.apps.detection.transforms.dictionary.AffineBoxToImageCoordinated(box_keys, box_ref_image_keys, allow_missing_keys=False, image_meta_key=None, image_meta_key_postfix='meta_dict', affine_lps_to_ras=False)[source]#
Dictionary-based transform that converts box in world coordinate to image coordinate.
- Parameters:
box_keys – Keys to pick box data for transformation. The box mode is assumed to be
StandardMode
.box_ref_image_keys – The single key that represents the reference image to which
box_keys
are attached.remove_empty – whether to remove the boxes that are actually empty
allow_missing_keys – don’t raise exception if key is missing.
image_meta_key – explicitly indicate the key of the corresponding metadata dictionary. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, affine, original_shape, etc. it is a string, map to the box_ref_image_key. if None, will try to construct meta_keys by box_ref_image_key_{meta_key_postfix}.
image_meta_key_postfix – if image_meta_keys=None, use box_ref_image_key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. For example, to handle key image, read/write affine matrices from the metadata image_meta_dict dictionary’s affine field.
affine_lps_to_ras – default
False
. Yet if 1) the image is read by ITKReader, and 2) the ITKReader has affine_lps_to_ras=True, and 3) the box is in world coordinate, then setaffine_lps_to_ras=True
.
- monai.apps.detection.transforms.dictionary.BoxToMaskD[source]#
alias of
BoxToMaskd
- monai.apps.detection.transforms.dictionary.BoxToMaskDict[source]#
alias of
BoxToMaskd
- class monai.apps.detection.transforms.dictionary.BoxToMaskd(box_keys, box_mask_keys, label_keys, box_ref_image_keys, min_fg_label, ellipse_mask=False, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.BoxToMask
. Pairs withmonai.apps.detection.transforms.dictionary.MaskToBoxd
. Please make sure the samemin_fg_label
is used when using the two transforms in pairs. The outputd[box_mask_key]
will have background intensity 0, since the following operations may pad 0 on the border.This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.
use
BoxToMaskd
to covert boxes and labels to box_masks;do transforms, e.g., rotation or cropping, on images and box_masks together;
use
MaskToBoxd
to convert box_masks back to boxes and labels.
- Parameters:
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_mask_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to store output box mask results for transformation. Same length withbox_keys
.label_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the labels corresponding to thebox_keys
. Same length withbox_keys
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.min_fg_label (
int
) – min foreground box label.ellipse_mask (
bool
) –bool.
If True, it assumes the object shape is close to ellipse or ellipsoid.
If False, it assumes the object shape is close to rectangle or cube and well occupies the bounding box.
If the users are going to apply random rotation as data augmentation, we suggest setting ellipse_mask=True See also Kalra et al. “Towards Rotation Invariance in Object Detection”, ICCV 2021.
allow_missing_keys (
bool
) – don’t raise exception if key is missing.
Example
# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and image together. import numpy as np from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd transforms = Compose( [ BoxToMaskd( box_keys="boxes", label_keys="labels", box_mask_keys="box_mask", box_ref_image_keys="image", min_fg_label=0, ellipse_mask=True ), RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"], prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6, keep_size=True,padding_mode="zeros" ), RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False), MaskToBoxd( box_mask_keys="box_mask", box_keys="boxes", label_keys="labels", min_fg_label=0 ) DeleteItemsd(keys=["box_mask"]), ] )
- monai.apps.detection.transforms.dictionary.ClipBoxToImageD[source]#
alias of
ClipBoxToImaged
- monai.apps.detection.transforms.dictionary.ClipBoxToImageDict[source]#
alias of
ClipBoxToImaged
- class monai.apps.detection.transforms.dictionary.ClipBoxToImaged(box_keys, label_keys, box_ref_image_keys, remove_empty=True, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.ClipBoxToImage
.Clip the bounding boxes and the associated labels/scores to makes sure they are within the image. There might be multiple keys of labels/scores associated with one key of boxes.
- Parameters:
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – The single key to pick box data for transformation. The box mode is assumed to beStandardMode
.label_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the labels corresponding to thebox_keys
. Multiple keys are allowed.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – The single key that represents the reference image to whichbox_keys
andlabel_keys
are attached.remove_empty (
bool
) – whether to remove the boxes that are actually emptyallow_missing_keys (
bool
) – don’t raise exception if key is missing.
Example
ClipBoxToImaged( box_keys="boxes", box_ref_image_keys="image", label_keys=["labels", "scores"], remove_empty=True )
- monai.apps.detection.transforms.dictionary.ConvertBoxModeD[source]#
alias of
ConvertBoxModed
- monai.apps.detection.transforms.dictionary.ConvertBoxModeDict[source]#
alias of
ConvertBoxModed
- class monai.apps.detection.transforms.dictionary.ConvertBoxModed(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.ConvertBoxMode
.This transform converts the boxes in src_mode to the dst_mode.
Example
data = {"boxes": torch.ones(10,4)} # convert boxes with format [xmin, ymin, xmax, ymax] to [xcenter, ycenter, xsize, ysize]. box_converter = ConvertBoxModed(box_keys=["boxes"], src_mode="xyxy", dst_mode="ccwh") box_converter(data)
- __init__(box_keys, src_mode=None, dst_mode=None, allow_missing_keys=False)[source]#
- Parameters:
box_keys – Keys to pick data for transformation.
src_mode – source box mode. If it is not given, this func will assume it is
StandardMode()
. It follows the same format withsrc_mode
inConvertBoxMode
.dst_mode – target box mode. If it is not given, this func will assume it is
StandardMode()
. It follows the same format withsrc_mode
inConvertBoxMode
.allow_missing_keys – don’t raise exception if key is missing.
See also
monai.apps.detection,transforms.array.ConvertBoxMode
- monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeD[source]#
alias of
ConvertBoxToStandardModed
- monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModeDict[source]#
alias of
ConvertBoxToStandardModed
- class monai.apps.detection.transforms.dictionary.ConvertBoxToStandardModed(box_keys, mode=None, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.ConvertBoxToStandardMode
.Convert given boxes to standard mode. Standard mode is “xyxy” or “xyzxyz”, representing box format of [xmin, ymin, xmax, ymax] or [xmin, ymin, zmin, xmax, ymax, zmax].
Example
data = {"boxes": torch.ones(10,6)} # convert boxes with format [xmin, xmax, ymin, ymax, zmin, zmax] to [xmin, ymin, zmin, xmax, ymax, zmax] box_converter = ConvertBoxToStandardModed(box_keys=["boxes"], mode="xxyyzz") box_converter(data)
- __init__(box_keys, mode=None, allow_missing_keys=False)[source]#
- Parameters:
box_keys – Keys to pick data for transformation.
mode – source box mode. If it is not given, this func will assume it is
StandardMode()
. It follows the same format withsrc_mode
inConvertBoxMode
.allow_missing_keys – don’t raise exception if key is missing.
See also
monai.apps.detection,transforms.array.ConvertBoxToStandardMode
- class monai.apps.detection.transforms.dictionary.FlipBoxd(image_keys, box_keys, box_ref_image_keys, spatial_axis=None, allow_missing_keys=False)[source]#
Dictionary-based transform that flip boxes and images.
- Parameters:
image_keys – Keys to pick image data for transformation.
box_keys – Keys to pick box data for transformation. The box mode is assumed to be
StandardMode
.box_ref_image_keys – Keys that represent the reference images to which
box_keys
are attached.spatial_axis – Spatial axes along which to flip over. Default is None.
allow_missing_keys – don’t raise exception if key is missing.
- monai.apps.detection.transforms.dictionary.MaskToBoxD[source]#
alias of
MaskToBoxd
- monai.apps.detection.transforms.dictionary.MaskToBoxDict[source]#
alias of
MaskToBoxd
- class monai.apps.detection.transforms.dictionary.MaskToBoxd(box_keys, box_mask_keys, label_keys, min_fg_label, box_dtype=torch.float32, label_dtype=torch.int64, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.MaskToBox
. Pairs withmonai.apps.detection.transforms.dictionary.BoxToMaskd
. Please make sure the samemin_fg_label
is used when using the two transforms in pairs.This is the general solution for transforms that need to be applied on images and boxes simultaneously. It is performed with the following steps.
use
BoxToMaskd
to covert boxes and labels to box_masks;do transforms, e.g., rotation or cropping, on images and box_masks together;
use
MaskToBoxd
to convert box_masks back to boxes and labels.
- Parameters:
box_keys – Keys to pick box data for transformation. The box mode is assumed to be
StandardMode
.box_mask_keys – Keys to store output box mask results for transformation. Same length with
box_keys
.label_keys – Keys that represent the labels corresponding to the
box_keys
. Same length withbox_keys
.min_fg_label – min foreground box label.
box_dtype – output dtype for box_keys
label_dtype – output dtype for label_keys
allow_missing_keys – don’t raise exception if key is missing.
Example
# This code snippet creates transforms (random rotation and cropping) on boxes, labels, and images together. import numpy as np from monai.transforms import Compose, RandRotated, RandSpatialCropd, DeleteItemsd transforms = Compose( [ BoxToMaskd( box_keys="boxes", label_keys="labels", box_mask_keys="box_mask", box_ref_image_keys="image", min_fg_label=0, ellipse_mask=True ), RandRotated(keys=["image","box_mask"],mode=["nearest","nearest"], prob=0.2,range_x=np.pi/6,range_y=np.pi/6,range_z=np.pi/6, keep_size=True,padding_mode="zeros" ), RandSpatialCropd(keys=["image","box_mask"],roi_size=128, random_size=False), MaskToBoxd( box_mask_keys="box_mask", box_keys="boxes", label_keys="labels", min_fg_label=0 ) DeleteItemsd(keys=["box_mask"]), ] )
- monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelD[source]#
alias of
RandCropBoxByPosNegLabeld
- monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabelDict[source]#
alias of
RandCropBoxByPosNegLabeld
- class monai.apps.detection.transforms.dictionary.RandCropBoxByPosNegLabeld(image_keys, box_keys, label_keys, spatial_size, pos=1.0, neg=1.0, num_samples=1, whole_box=True, thresh_image_key=None, image_threshold=0.0, fg_indices_key=None, bg_indices_key=None, meta_keys=None, meta_key_postfix='meta_dict', allow_smaller=False, allow_missing_keys=False)[source]#
Crop random fixed sized regions that contains foreground boxes. Suppose all the expected fields specified by image_keys have same shape, and add patch_index to the corresponding meta data. And will return a list of dictionaries for all the cropped images. If a dimension of the expected spatial size is bigger than the input image size, will not crop that dimension. So the cropped result may be smaller than the expected size, and the cropped results of several images may not have exactly the same shape.
- Parameters:
image_keys – Keys to pick image data for transformation. They need to have the same spatial size.
box_keys – The single key to pick box data for transformation. The box mode is assumed to be
StandardMode
.label_keys – Keys that represent the labels corresponding to the
box_keys
. Multiple keys are allowed.spatial_size – the spatial size of the crop region e.g. [224, 224, 128]. if a dimension of ROI size is bigger than image size, will not crop that dimension of the image. if its components have non-positive values, the corresponding size of data[label_key] will be used. for example: if the spatial size of input data is [40, 40, 40] and spatial_size=[32, 64, -1], the spatial size of output data will be [32, 40, 40].
pos – used with neg together to calculate the ratio
pos / (pos + neg)
for the probability to pick a foreground voxel as a center rather than a background voxel.neg – used with pos together to calculate the ratio
pos / (pos + neg)
for the probability to pick a foreground voxel as a center rather than a background voxel.num_samples – number of samples (crop regions) to take in each list.
whole_box – Bool, default True, whether we prefer to contain at least one whole box in the cropped foreground patch. Even if True, it is still possible to get partial box if there are multiple boxes in the image.
thresh_image_key – if thresh_image_key is not None, use
label == 0 & thresh_image > image_threshold
to select the negative sample(background) center. so the crop center will only exist on valid image area.image_threshold – if enabled thresh_image_key, use
thresh_image > image_threshold
to determine the valid image content area.fg_indices_key – if provided pre-computed foreground indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.
bg_indices_key – if provided pre-computed background indices of label, will ignore above image_key and image_threshold, and randomly select crop centers based on them, need to provide fg_indices_key and bg_indices_key together, expect to be 1 dim array of spatial indices after flattening. a typical usage is to call FgBgToIndicesd transform first and cache the results.
meta_keys – explicitly indicate the key of the corresponding metadata dictionary. used to add patch_index to the meta dict. for example, for data with key image, the metadata by default is in image_meta_dict. the metadata is a dictionary object which contains: filename, original_shape, etc. it can be a sequence of string, map to the keys. if None, will try to construct meta_keys by key_{meta_key_postfix}.
meta_key_postfix – if meta_keys is None, use key_{postfix} to fetch the metadata according to the key data, default is meta_dict, the metadata is a dictionary object. used to add patch_index to the meta dict.
allow_smaller – if False, an exception will be raised if the image is smaller than the requested ROI in any dimension. If True, any smaller dimensions will be set to match the cropped size (i.e., no cropping in that dimension).
allow_missing_keys – don’t raise exception if key is missing.
- randomize(boxes, image_size, fg_indices=None, bg_indices=None, thresh_image=None)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises:
NotImplementedError – When the subclass does not override this method.
- monai.apps.detection.transforms.dictionary.RandFlipBoxD[source]#
alias of
RandFlipBoxd
- monai.apps.detection.transforms.dictionary.RandFlipBoxDict[source]#
alias of
RandFlipBoxd
- class monai.apps.detection.transforms.dictionary.RandFlipBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, spatial_axis=None, allow_missing_keys=False)[source]#
Dictionary-based transform that randomly flip boxes and images with the given probabilities.
- Parameters:
image_keys – Keys to pick image data for transformation.
box_keys – Keys to pick box data for transformation. The box mode is assumed to be
StandardMode
.box_ref_image_keys – Keys that represent the reference images to which
box_keys
are attached.prob – Probability of flipping.
spatial_axis – Spatial axes along which to flip over. Default is None.
allow_missing_keys – don’t raise exception if key is missing.
- inverse(data)[source]#
Inverse of
__call__
.- Raises:
NotImplementedError – When the subclass does not override this method.
- Return type:
dict
[Hashable
,Tensor
]
- set_random_state(seed=None, state=None)[source]#
Set the random state locally, to control the randomness, the derived classes should use
self.R
instead of np.random to introduce random factors.- Parameters:
seed – set the random state with an integer seed.
state – set the random state with a np.random.RandomState object.
- Raises:
TypeError – When
state
is not anOptional[np.random.RandomState]
.- Returns:
a Randomizable instance.
- monai.apps.detection.transforms.dictionary.RandRotateBox90D[source]#
alias of
RandRotateBox90d
- monai.apps.detection.transforms.dictionary.RandRotateBox90Dict[source]#
alias of
RandRotateBox90d
- class monai.apps.detection.transforms.dictionary.RandRotateBox90d(image_keys, box_keys, box_ref_image_keys, prob=0.1, max_k=3, spatial_axes=(0, 1), allow_missing_keys=False)[source]#
With probability prob, input boxes and images are rotated by 90 degrees in the plane specified by spatial_axes.
- Parameters:
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation.box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.prob (
float
) – probability of rotating. (Default 0.1, with 10% probability it returns a rotated array.)max_k (
int
) – number of rotations will be sampled from np.random.randint(max_k) + 1. (Default 3)spatial_axes (
tuple
[int
,int
]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default: (0, 1), this is the first two axis in spatial dimensions.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- inverse(data)[source]#
Inverse of
__call__
.- Raises:
NotImplementedError – When the subclass does not override this method.
- Return type:
dict
[Hashable
,Tensor
]
- randomize(data=None)[source]#
Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- monai.apps.detection.transforms.dictionary.RandZoomBoxD[source]#
alias of
RandZoomBoxd
- monai.apps.detection.transforms.dictionary.RandZoomBoxDict[source]#
alias of
RandZoomBoxd
- class monai.apps.detection.transforms.dictionary.RandZoomBoxd(image_keys, box_keys, box_ref_image_keys, prob=0.1, min_zoom=0.9, max_zoom=1.1, mode=area, padding_mode=edge, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#
Dictionary-based transform that randomly zooms input boxes and images with given probability within given zoom range.
- Parameters:
image_keys – Keys to pick image data for transformation.
box_keys – Keys to pick box data for transformation. The box mode is assumed to be
StandardMode
.box_ref_image_keys – Keys that represent the reference images to which
box_keys
are attached.prob – Probability of zooming.
min_zoom – Min zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, min_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.
max_zoom – Max zoom factor. Can be float or sequence same size as image. If a float, select a random factor from [min_zoom, max_zoom] then apply to all spatial dims to keep the original spatial shape ratio. If a sequence, max_zoom should contain one value for each spatial axis. If 2 values provided for 3D data, use the first value for both H & W dims to keep the same zoom ratio.
mode – {
"nearest"
,"nearest-exact"
,"linear"
,"bilinear"
,"bicubic"
,"trilinear"
,"area"
} The interpolation mode. Defaults to"area"
. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key inkeys
.padding_mode – available modes for numpy array:{
"constant"
,"edge"
,"linear_ramp"
,"maximum"
,"mean"
,"median"
,"minimum"
,"reflect"
,"symmetric"
,"wrap"
,"empty"
} available modes for PyTorch Tensor: {"constant"
,"reflect"
,"replicate"
,"circular"
}. One of the listed string values or a user supplied function. Defaults to"constant"
. The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlalign_corners – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key in
keys
.keep_size – Should keep original size (pad if needed), default is True.
allow_missing_keys – don’t raise exception if key is missing.
kwargs – other args for np.pad API, note that np.pad treats channel dimension as the first dimension. more details: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html
- inverse(data)[source]#
Inverse of
__call__
.- Raises:
NotImplementedError – When the subclass does not override this method.
- Return type:
dict
[Hashable
,Tensor
]
- set_random_state(seed=None, state=None)[source]#
Set the random state locally, to control the randomness, the derived classes should use
self.R
instead of np.random to introduce random factors.- Parameters:
seed – set the random state with an integer seed.
state – set the random state with a np.random.RandomState object.
- Raises:
TypeError – When
state
is not anOptional[np.random.RandomState]
.- Returns:
a Randomizable instance.
- monai.apps.detection.transforms.dictionary.RotateBox90D[source]#
alias of
RotateBox90d
- monai.apps.detection.transforms.dictionary.RotateBox90Dict[source]#
alias of
RotateBox90d
- class monai.apps.detection.transforms.dictionary.RotateBox90d(image_keys, box_keys, box_ref_image_keys, k=1, spatial_axes=(0, 1), allow_missing_keys=False)[source]#
Input boxes and images are rotated by 90 degrees in the plane specified by
spatial_axes
fork
times- Parameters:
image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick image data for transformation.box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick box data for transformation. The box mode is assumed to beStandardMode
.box_ref_image_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys that represent the reference images to whichbox_keys
are attached.k (
int
) – number of times to rotate by 90 degrees.spatial_axes (
tuple
[int
,int
]) – 2 int numbers, defines the plane to rotate with 2 spatial axes. Default (0, 1), this is the first two axis in spatial dimensions.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- monai.apps.detection.transforms.dictionary.StandardizeEmptyBoxD[source]#
alias of
StandardizeEmptyBoxd
- monai.apps.detection.transforms.dictionary.StandardizeEmptyBoxDict[source]#
alias of
StandardizeEmptyBoxd
- class monai.apps.detection.transforms.dictionary.StandardizeEmptyBoxd(box_keys, box_ref_image_keys, allow_missing_keys=False)[source]#
Dictionary-based wrapper of
monai.apps.detection.transforms.array.StandardizeEmptyBox
.When boxes are empty, this transform standardize it to shape of (0,4) or (0,6).
Example
data = {"boxes": torch.ones(0,), "image": torch.ones(1, 128, 128, 128)} box_converter = StandardizeEmptyBoxd(box_keys=["boxes"], box_ref_image_keys="image") box_converter(data)
- __init__(box_keys, box_ref_image_keys, allow_missing_keys=False)[source]#
- Parameters:
box_keys (
Union
[Collection
[Hashable
],Hashable
]) – Keys to pick data for transformation.box_ref_image_keys (
str
) – The single key that represents the reference image to whichbox_keys
are attached.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
See also
monai.apps.detection,transforms.array.ConvertBoxToStandardMode
- class monai.apps.detection.transforms.dictionary.ZoomBoxd(image_keys, box_keys, box_ref_image_keys, zoom, mode=area, padding_mode=edge, align_corners=None, keep_size=True, allow_missing_keys=False, **kwargs)[source]#
Dictionary-based transform that zooms input boxes and images with the given zoom scale.
- Parameters:
image_keys – Keys to pick image data for transformation.
box_keys – Keys to pick box data for transformation. The box mode is assumed to be
StandardMode
.box_ref_image_keys – Keys that represent the reference images to which
box_keys
are attached.zoom – The zoom factor along the spatial axes. If a float, zoom is the same for each spatial axis. If a sequence, zoom should contain one value for each spatial axis.
mode – {
"nearest"
,"nearest-exact"
,"linear"
,"bilinear"
,"bicubic"
,"trilinear"
,"area"
} The interpolation mode. Defaults to"area"
. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of string, each element corresponds to a key inkeys
.padding_mode – available modes for numpy array:{
"constant"
,"edge"
,"linear_ramp"
,"maximum"
,"mean"
,"median"
,"minimum"
,"reflect"
,"symmetric"
,"wrap"
,"empty"
} available modes for PyTorch Tensor: {"constant"
,"reflect"
,"replicate"
,"circular"
}. One of the listed string values or a user supplied function. Defaults to"constant"
. The mode to pad data after zooming. See also: https://numpy.org/doc/1.18/reference/generated/numpy.pad.html https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlalign_corners – This only has an effect when mode is ‘linear’, ‘bilinear’, ‘bicubic’ or ‘trilinear’. Default: None. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.interpolate.html It also can be a sequence of bool or None, each element corresponds to a key in
keys
.keep_size – Should keep original size (pad if needed), default is True.
allow_missing_keys – don’t raise exception if key is missing.
kwargs – other arguments for the np.pad or torch.pad function. note that np.pad treats channel dimension as the first dimension.
Anchor#
This script is adapted from pytorch/vision
- class monai.apps.detection.utils.anchor_utils.AnchorGenerator(sizes=((20, 30, 40),), aspect_ratios=(((0.5, 1), (1, 0.5)),), indexing='ij')[source]#
This module is modified from torchvision to support both 2D and 3D images.
Module that generates anchors for a set of feature maps and image sizes.
The module support computing anchors at multiple sizes and aspect ratios per feature map.
sizes and aspect_ratios should have the same number of elements, and it should correspond to the number of feature maps.
sizes[i] and aspect_ratios[i] can have an arbitrary number of elements. For 2D images, anchor width and height w:h = 1:aspect_ratios[i,j] For 3D images, anchor width, height, and depth w:h:d = 1:aspect_ratios[i,j,0]:aspect_ratios[i,j,1]
AnchorGenerator will output a set of sizes[i] * aspect_ratios[i] anchors per spatial location for feature map i.
- Parameters:
sizes (
Sequence
[Sequence
[int
]]) – base size of each anchor. len(sizes) is the number of feature maps, i.e., the number of output levels for the feature pyramid network (FPN). Each element ofsizes
is a Sequence which represents several anchor sizes for each feature map.aspect_ratios (
Sequence
) – the aspect ratios of anchors.len(aspect_ratios) = len(sizes)
. For 2D images, each element ofaspect_ratios[i]
is a Sequence of float. For 3D images, each element ofaspect_ratios[i]
is a Sequence of 2 value Sequence.indexing (
str
) –choose from {
'ij'
,'xy'
}, optional, Matrix ('ij'
, default and recommended) or Cartesian ('xy'
) indexing of output.Matrix (
'ij'
, default and recommended) indexing keeps the original axis not changed.To use other monai detection components, please set
indexing = 'ij'
.Cartesian (
'xy'
) indexing swaps axis 0 and 1.For 2D cases, monai
AnchorGenerator(sizes, aspect_ratios, indexing='xy')
andtorchvision.models.detection.anchor_utils.AnchorGenerator(sizes, aspect_ratios)
are equivalent.
- Reference:.
Example
# 2D example inputs for a 2-level feature maps sizes = ((10,12,14,16), (20,24,28,32)) base_aspect_ratios = (1., 0.5, 2.) aspect_ratios = (base_aspect_ratios, base_aspect_ratios) anchor_generator = AnchorGenerator(sizes, aspect_ratios) # 3D example inputs for a 2-level feature maps sizes = ((10,12,14,16), (20,24,28,32)) base_aspect_ratios = ((1., 1.), (1., 0.5), (0.5, 1.), (2., 2.)) aspect_ratios = (base_aspect_ratios, base_aspect_ratios) anchor_generator = AnchorGenerator(sizes, aspect_ratios)
- forward(images, feature_maps)[source]#
Generate anchor boxes for each image.
- Parameters:
images (
Tensor
) – sized (B, C, W, H) or (B, C, W, H, D)feature_maps (
list
[Tensor
]) – for FPN level i, feature_maps[i] is sized (B, C_i, W_i, H_i) or (B, C_i, W_i, H_i, D_i). This input argument does not have to be the actual feature maps. Any list variable with the same (C_i, W_i, H_i) or (C_i, W_i, H_i, D_i) as feature maps works.
- Return type:
list
[Tensor
]- Returns:
A list with length of B. Each element represents the anchors for this image. The B elements are identical.
Example
images = torch.zeros((3,1,128,128,128)) feature_maps = [torch.zeros((3,6,64,64,32)), torch.zeros((3,6,32,32,16))] anchor_generator(images, feature_maps)
- generate_anchors(scales, aspect_ratios, dtype=torch.float32, device=None)[source]#
Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.
- Parameters:
scales – a sequence which represents several anchor sizes for the current feature map.
aspect_ratios – a sequence which represents several aspect_ratios for the current feature map. For 2D images, it is a Sequence of float aspect_ratios[j], anchor width and height w:h = 1:aspect_ratios[j]. For 3D images, it is a Sequence of 2 value Sequence aspect_ratios[j,0] and aspect_ratios[j,1], anchor width, height, and depth w:h:d = 1:aspect_ratios[j,0]:aspect_ratios[j,1]
dtype – target data type of the output Tensor.
device – target device to put the output Tensor data.
Returns – For each s in scales, returns [s, s*aspect_ratios[j]] for 2D images, and [s, s*aspect_ratios[j,0],s*aspect_ratios[j,1]] for 3D images.
- grid_anchors(grid_sizes, strides)[source]#
Every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:spatial_dims) corresponds to a feature map. It outputs g[i] anchors that are s[i] distance apart in direction i, with the same dimensions as a.
- Parameters:
grid_sizes (
list
[list
[int
]]) – spatial size of the feature mapsstrides (
list
[list
[Tensor
]]) – strides of the feature maps regarding to the original image
Example
grid_sizes = [[100,100],[50,50]] strides = [[torch.tensor(2),torch.tensor(2)], [torch.tensor(4),torch.tensor(4)]]
- Return type:
list
[Tensor
]
- class monai.apps.detection.utils.anchor_utils.AnchorGeneratorWithAnchorShape(feature_map_scales=(1, 2, 4, 8), base_anchor_shapes=((32, 32, 32), (48, 20, 20), (20, 48, 20), (20, 20, 48)), indexing='ij')[source]#
Module that generates anchors for a set of feature maps and image sizes, inherited from
AnchorGenerator
The module support computing anchors at multiple base anchor shapes per feature map.
feature_map_scales
should have the same number of elements with the number of feature maps.base_anchor_shapes can have an arbitrary number of elements. For 2D images, each element represents anchor width and height [w,h]. For 2D images, each element represents anchor width, height, and depth [w,h,d].
AnchorGenerator will output a set of
len(base_anchor_shapes)
anchors per spatial location for feature mapi
.- Parameters:
feature_map_scales – scale of anchors for each feature map, i.e., each output level of the feature pyramid network (FPN).
len(feature_map_scales)
is the number of feature maps.scale[i]*base_anchor_shapes
represents the anchor shapes for feature mapi
.base_anchor_shapes – a sequence which represents several anchor shapes for one feature map. For N-D images, it is a Sequence of N value Sequence.
indexing – choose from {‘xy’, ‘ij’}, optional Cartesian (‘xy’) or matrix (‘ij’, default) indexing of output. Cartesian (‘xy’) indexing swaps axis 0 and 1, which is the setting inside torchvision. matrix (‘ij’, default) indexing keeps the original axis not changed. See also indexing in https://pytorch.org/docs/stable/generated/torch.meshgrid.html
Example
# 2D example inputs for a 2-level feature maps feature_map_scales = (1, 2) base_anchor_shapes = ((10, 10), (6, 12), (12, 6)) anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes) # 3D example inputs for a 2-level feature maps feature_map_scales = (1, 2) base_anchor_shapes = ((10, 10, 10), (12, 12, 8), (10, 10, 6), (16, 16, 10)) anchor_generator = AnchorGeneratorWithAnchorShape(feature_map_scales, base_anchor_shapes)
- static generate_anchors_using_shape(anchor_shapes, dtype=torch.float32, device=None)[source]#
Compute cell anchor shapes at multiple sizes and aspect ratios for the current feature map.
- Parameters:
anchor_shapes – [w, h] or [w, h, d], sized (N, spatial_dims), represents N anchor shapes for the current feature map.
dtype – target data type of the output Tensor.
device – target device to put the output Tensor data.
- Returns:
For 2D images, returns [-w/2, -h/2, w/2, h/2]; For 3D images, returns [-w/2, -h/2, -d/2, w/2, h/2, d/2]
Matcher#
The functions in this script are adapted from nnDetection, MIC-DKFZ/nnDetection which is adapted from torchvision.
These are the changes compared with nndetection: 1) comments and docstrings; 2) reformat; 3) add a debug option to ATSSMatcher to help the users to tune parameters; 4) add a corner case return in ATSSMatcher.compute_matches; 5) add support for float16 cpu
- class monai.apps.detection.utils.ATSS_matcher.ATSSMatcher(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#
- __init__(num_candidates=4, similarity_fn=<function box_iou>, center_in_gt=True, debug=False)[source]#
Compute matching based on ATSS https://arxiv.org/abs/1912.02424 Bridging the Gap Between Anchor-based and Anchor-free Detection via Adaptive Training Sample Selection
- Parameters:
num_candidates (
int
) – number of positions to select candidates from. Smaller value will result in a higher matcher threshold and less matched candidates.similarity_fn (
Callable
[Tensor
,Tensor
,Tensor
]) – function for similarity computation between boxes and anchorscenter_in_gt (
bool
) – If False (default), matched anchor center points do not need to lie withing the ground truth box. Recommend False for small objects. If True, will result in a strict matcher and less matched candidates.debug (
bool
) – if True, will print the matcher threshold in order to tunenum_candidates
andcenter_in_gt
.
- compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#
Compute matches according to ATTS for a single image Adapted from (sfzhang15/ATSS)
- Parameters:
boxes (
Tensor
) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
anchors (
Tensor
) – anchors to match Mx4 or Mx6, also assumed to beStandardMode
.num_anchors_per_level (
Sequence
[int
]) – number of anchors per feature pyramid levelnum_anchors_per_loc (
int
) – number of anchors per position
- Return type:
tuple
[Tensor
,Tensor
]- Returns:
matrix which contains the similarity from each boxes to each anchor [N, M]
vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]
Note
StandardMode
=CornerCornerModeTypeA
, also represented as “xyxy” ([xmin, ymin, xmax, ymax]) for 2D and “xyzxyz” ([xmin, ymin, zmin, xmax, ymax, zmax]) for 3D.
- class monai.apps.detection.utils.ATSS_matcher.Matcher(similarity_fn=<function box_iou>)[source]#
Base class of Matcher, which matches boxes and anchors to each other
- Parameters:
similarity_fn (
Callable
[Tensor
,Tensor
,Tensor
]) – function for similarity computation between boxes and anchors
- abstract compute_matches(boxes, anchors, num_anchors_per_level, num_anchors_per_loc)[source]#
Compute matches
- Parameters:
boxes (
Tensor
) – bounding boxes, Nx4 or Nx6 torch tensor or ndarray. The box mode is assumed to beStandardMode
anchors (
Tensor
) – anchors to match Mx4 or Mx6, also assumed to beStandardMode
.num_anchors_per_level (
Sequence
[int
]) – number of anchors per feature pyramid levelnum_anchors_per_loc (
int
) – number of anchors per position
- Return type:
tuple
[Tensor
,Tensor
]- Returns:
matrix which contains the similarity from each boxes to each anchor [N, M]
vector which contains the matched box index for all anchors (if background BELOW_LOW_THRESHOLD is used and if it should be ignored BETWEEN_THRESHOLDS is used) [M]
Box coder#
This script is modified from torchvision to support N-D images,
- class monai.apps.detection.utils.box_coder.BoxCoder(weights, boxes_xform_clip=None)[source]#
This class encodes and decodes a set of bounding boxes into the representation used for training the regressors.
- Parameters:
weights – 4-element tuple or 6-element tuple
boxes_xform_clip – high threshold to prevent sending too large values into torch.exp()
Example
box_coder = BoxCoder(weights=[1., 1., 1., 1., 1., 1.]) gt_boxes = torch.tensor([[1,2,1,4,5,6],[1,3,2,7,8,9]]) proposals = gt_boxes + torch.rand(gt_boxes.shape) rel_gt_boxes = box_coder.encode_single(gt_boxes, proposals) gt_back = box_coder.decode_single(rel_gt_boxes, proposals) # We expect gt_back to be equal to gt_boxes
- decode(rel_codes, reference_boxes)[source]#
From a set of original reference_boxes and encoded relative box offsets,
- Parameters:
rel_codes (
Tensor
) – encoded boxes, Nx4 or Nx6 torch tensor.reference_boxes (
Sequence
[Tensor
]) – a list of reference boxes, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to beStandardMode
- Return type:
Tensor
- Returns:
decoded boxes, Nx1x4 or Nx1x6 torch tensor. The box mode will be
StandardMode
- decode_single(rel_codes, reference_boxes)[source]#
From a set of original boxes and encoded relative box offsets,
- Parameters:
rel_codes (
Tensor
) – encoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor.reference_boxes (
Tensor
) – reference boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
- Return type:
Tensor
- Returns:
decoded boxes, Nx(4*num_box_reg) or Nx(6*num_box_reg) torch tensor. The box mode will to be
StandardMode
- encode(gt_boxes, proposals)[source]#
Encode a set of proposals with respect to some ground truth (gt) boxes.
- Parameters:
gt_boxes (
Sequence
[Tensor
]) – list of gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
proposals (
Sequence
[Tensor
]) – list of boxes to be encoded, each element is Mx4 or Mx6 torch tensor. The box mode is assumed to beStandardMode
- Return type:
tuple
[Tensor
]- Returns:
- A tuple of encoded gt, target of box regression that is used to
convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.
- encode_single(gt_boxes, proposals)[source]#
Encode proposals with respect to ground truth (gt) boxes.
- Parameters:
gt_boxes (
Tensor
) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
proposals (
Tensor
) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
- Return type:
Tensor
- Returns:
encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.
- monai.apps.detection.utils.box_coder.encode_boxes(gt_boxes, proposals, weights)[source]#
Encode a set of proposals with respect to some reference ground truth (gt) boxes.
- Parameters:
gt_boxes (
Tensor
) – gt boxes, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
proposals (
Tensor
) – boxes to be encoded, Nx4 or Nx6 torch tensor. The box mode is assumed to beStandardMode
weights (
Tensor
) – the weights for(cx, cy, w, h) or (cx,cy,cz, w,h,d)
- Return type:
Tensor
- Returns:
encoded gt, target of box regression that is used to convert proposals into gt_boxes, Nx4 or Nx6 torch tensor.
Detection Utilities#
- monai.apps.detection.utils.detector_utils.check_input_images(input_images, spatial_dims)[source]#
Validate the input dimensionality (raise a ValueError if invalid).
- Parameters:
input_images – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims – number of spatial dimensions of the images, 2 or 3.
- monai.apps.detection.utils.detector_utils.check_training_targets(input_images, targets, spatial_dims, target_label_key, target_box_key)[source]#
Validate the input images/targets during training (raise a ValueError if invalid).
- Parameters:
input_images – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
targets – a list of dict. Each dict with two keys: target_box_key and target_label_key, ground-truth boxes present in the image.
spatial_dims – number of spatial dimensions of the images, 2 or 3.
target_label_key – the expected key of target labels.
target_box_key – the expected key of target boxes.
- monai.apps.detection.utils.detector_utils.pad_images(input_images, spatial_dims, size_divisible, mode=constant, **kwargs)[source]#
Pad the input images, so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0
- Parameters:
input_images – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims – number of spatial dimensions of the images, 2D or 3D.
size_divisible – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.
mode – available modes for PyTorch Tensor: {
"constant"
,"reflect"
,"replicate"
,"circular"
}. One of the listed string values or a user supplied function. Defaults to"constant"
. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlkwargs – other arguments for torch.pad function.
- Returns:
images, a (B, C, H, W) or (B, C, H, W, D) Tensor
image_sizes, the original spatial size of each image
- monai.apps.detection.utils.detector_utils.preprocess_images(input_images, spatial_dims, size_divisible, mode=constant, **kwargs)[source]#
Preprocess the input images, including
validate of the inputs
pad the inputs so that the output spatial sizes are divisible by size_divisible. It pads them at the end to create a (B, C, H, W) or (B, C, H, W, D) Tensor. Padded size (H, W) or (H, W, D) is divisible by size_divisible. Default padding uses constant padding with value 0.0
- Parameters:
input_images – It can be 1) a tensor sized (B, C, H, W) or (B, C, H, W, D), or 2) a list of image tensors, each image i may have different size (C, H_i, W_i) or (C, H_i, W_i, D_i).
spatial_dims – number of spatial dimensions of the images, 2 or 3.
size_divisible – int or Sequence[int], is the expected pattern on the input image shape. If an int, the same size_divisible will be applied to all the input spatial dimensions.
mode – available modes for PyTorch Tensor: {
"constant"
,"reflect"
,"replicate"
,"circular"
}. One of the listed string values or a user supplied function. Defaults to"constant"
. See also: https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.htmlkwargs – other arguments for torch.pad function.
- Returns:
images, a (B, C, H, W) or (B, C, H, W, D) Tensor
image_sizes, the original spatial size of each image
- monai.apps.detection.utils.predict_utils.check_dict_values_same_length(head_outputs, keys=None)[source]#
We expect the values in
head_outputs
: Dict[str, List[Tensor]] to have the same length. Will raise ValueError if not.- Parameters:
head_outputs – a Dict[str, List[Tensor]] or Dict[str, Tensor]
keys – the keys in head_output that need to have values (List) with same length. If not provided, will use head_outputs.keys().
- monai.apps.detection.utils.predict_utils.ensure_dict_value_to_list_(head_outputs, keys=None)[source]#
An in-place function. We expect
head_outputs
to be Dict[str, List[Tensor]]. Yet if it is Dict[str, Tensor], this func converts it to Dict[str, List[Tensor]]. It will be modified in-place.- Parameters:
head_outputs – a Dict[str, List[Tensor]] or Dict[str, Tensor], will be modifier in-place
keys – the keys in head_output that need to have value type List[Tensor]. If not provided, will use head_outputs.keys().
- monai.apps.detection.utils.predict_utils.predict_with_inferer(images, network, keys, inferer=None)[source]#
Predict network dict output with an inferer. Compared with directly output network(images), it enables a sliding window inferer that can be used to handle large inputs.
- Parameters:
images – input of the network, Tensor sized (B, C, H, W) or (B, C, H, W, D)
network – a network that takes an image Tensor sized (B, C, H, W) or (B, C, H, W, D) as input and outputs a dictionary Dict[str, List[Tensor]] or Dict[str, Tensor].
keys – the keys in the output dict, should be network output keys or a subset of them.
inferer – a SlidingWindowInferer to handle large inputs.
- Returns:
The predicted head_output from network, a Dict[str, List[Tensor]]
Example
# define a naive network import torch import monai class NaiveNet(torch.nn.Module): def __init__(self, ): super().__init__() def forward(self, images: torch.Tensor): return {"cls": torch.randn(images.shape), "box_reg": [torch.randn(images.shape)]} # create a predictor network = NaiveNet() inferer = monai.inferers.SlidingWindowInferer( roi_size = (128, 128, 128), overlap = 0.25, cache_roi_weight_map = True, ) network_output_keys=["cls", "box_reg"] images = torch.randn((2, 3, 512, 512, 512)) # a large input head_outputs = predict_with_inferer(images, network, network_output_keys, inferer)
Inference box selector#
Part of this script is adapted from pytorch/vision
- class monai.apps.detection.utils.box_selector.BoxSelector(box_overlap_metric=<function box_iou>, apply_sigmoid=True, score_thresh=0.05, topk_candidates_per_level=1000, nms_thresh=0.5, detections_per_img=300)[source]#
Box selector which selects the predicted boxes. The box selection is performed with the following steps:
For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.
- Parameters:
apply_sigmoid (
bool
) – whether to apply sigmoid to get scores from classification logitsscore_thresh (
float
) – no box with scores less than score_thresh will be kepttopk_candidates_per_level (
int
) – max number of boxes to keep for each levelnms_thresh (
float
) – box overlapping threshold for NMSdetections_per_img (
int
) – max number of boxes to keep for each image
Example
input_param = { "apply_sigmoid": True, "score_thresh": 0.1, "topk_candidates_per_level": 2, "nms_thresh": 0.1, "detections_per_img": 5, } box_selector = BoxSelector(**input_param) boxes = [torch.randn([3,6]), torch.randn([7,6])] logits = [torch.randn([3,3]), torch.randn([7,3])] spatial_size = (8,8,8) selected_boxes, selected_scores, selected_labels = box_selector.select_boxes_per_image( boxes, logits, spatial_size )
- select_boxes_per_image(boxes_list, logits_list, spatial_size)[source]#
Postprocessing to generate detection result from classification logits and boxes.
The box selection is performed with the following steps:
For each level, discard boxes with scores less than self.score_thresh.
For each level, keep boxes with top self.topk_candidates_per_level scores.
For the whole image, perform non-maximum suppression (NMS) on boxes, with overlapping threshold nms_thresh.
For the whole image, keep boxes with top self.detections_per_img scores.
- Parameters:
boxes_list – list of predicted boxes from a single image, each element i is a Tensor sized (N_i, 2*spatial_dims)
logits_list – list of predicted classification logits from a single image, each element i is a Tensor sized (N_i, num_classes)
spatial_size – spatial size of the image
- Returns:
selected boxes, Tensor sized (P, 2*spatial_dims)
selected_scores, Tensor sized (P, )
selected_labels, Tensor sized (P, )
- select_top_score_idx_per_level(logits)[source]#
Select indices with highest scores.
The indices selection is performed with the following steps:
If self.apply_sigmoid, get scores by applying sigmoid to logits. Otherwise, use logits as scores.
Discard indices with scores less than self.score_thresh
Keep indices with top self.topk_candidates_per_level scores
- Parameters:
logits (
Tensor
) – predicted classification logits, Tensor sized (N, num_classes)- Returns:
selected M indices, Tensor sized (M, ) - selected_scores: selected M scores, Tensor sized (M, ) - selected_labels: selected M labels, Tensor sized (M, )
- Return type:
topk_idxs
Detection metrics#
This script is almost same with MIC-DKFZ/nnDetection The changes include 1) code reformatting, 2) docstrings.
This script is almost same with MIC-DKFZ/nnDetection The changes include 1) code reformatting, 2) docstrings, 3) allow input args gt_ignore to be optional. (If so, no GT boxes will be ignored.)
- monai.apps.detection.metrics.matching.matching_batch(iou_fn, iou_thresholds, pred_boxes, pred_classes, pred_scores, gt_boxes, gt_classes, gt_ignore=None, max_detections=100)[source]#
Match boxes of a batch to corresponding ground truth for each category independently.
- Parameters:
iou_fn – compute overlap for each pair
iou_thresholds – defined which IoU thresholds should be evaluated
pred_boxes – predicted boxes from single batch; List[[D, dim * 2]], D number of predictions
pred_classes – predicted classes from a single batch; List[[D]], D number of predictions
pred_scores – predicted score for each bounding box; List[[D]], D number of predictions
gt_boxes – ground truth boxes; List[[G, dim * 2]], G number of ground truth
gt_classes – ground truth classes; List[[G]], G number of ground truth
gt_ignore – specified if which ground truth boxes are not counted as true positives. If not given, when use all the gt_boxes. (detections which match theses boxes are not counted as false positives either); List[[G]], G number of ground truth
max_detections – maximum number of detections which should be evaluated
- Returns:
List[Dict[int, Dict[str, np.ndarray]]], each Dict[str, np.ndarray] corresponds to an image. Dict has the following keys.
dtMatches: matched detections [T, D], where T = number of thresholds, D = number of detections
gtMatches: matched ground truth boxes [T, G], where T = number of thresholds, G = number of ground truth
dtScores: prediction scores [D] detection scores
gtIgnore: ground truth boxes which should be ignored [G] indicate whether ground truth should be ignored
dtIgnore: detections which should be ignored [T, D], indicate which detections should be ignored
Example
from monai.data.box_utils import box_iou from monai.apps.detection.metrics.coco import COCOMetric from monai.apps.detection.metrics.matching import matching_batch # 3D example outputs of one image from detector val_outputs_all = [ {"boxes": torch.tensor([[1,1,1,3,4,5]],dtype=torch.float16), "labels": torch.randint(3,(1,)), "scores": torch.randn((1,)).absolute()}, ] val_targets_all = [ {"boxes": torch.tensor([[1,1,1,2,6,4]],dtype=torch.float16), "labels": torch.randint(3,(1,))}, ] coco_metric = COCOMetric( classes=['c0','c1','c2'], iou_list=[0.1], max_detection=[10] ) results_metric = matching_batch( iou_fn=box_iou, iou_thresholds=coco_metric.iou_thresholds, pred_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_outputs_all], pred_classes=[val_data_i["labels"].numpy() for val_data_i in val_outputs_all], pred_scores=[val_data_i["scores"].numpy() for val_data_i in val_outputs_all], gt_boxes=[val_data_i["boxes"].numpy() for val_data_i in val_targets_all], gt_classes=[val_data_i["labels"].numpy() for val_data_i in val_targets_all], ) val_metric_dict = coco_metric(results_metric) print(val_metric_dict)
Reconstruction#
FastMRIReader#
- class monai.apps.reconstruction.fastmri_reader.FastMRIReader(*args, **kwargs)[source]#
Load fastMRI files with ‘.h5’ suffix. fastMRI files, when loaded with “h5py”, are HDF5 dictionary-like datasets. The keys are:
kspace: contains the fully-sampled kspace
- reconstruction_rss: contains the root sum of squares of ifft of kspace. This
is the ground-truth image.
It also has several attributes with the following keys:
acquisition (str): acquisition mode of the data (e.g., AXT2 denotes T2 brain MRI scans)
max (float): dynamic range of the data
norm (float): norm of the kspace
patient_id (str): the patient’s id whose measurements were recorded
- get_data(dat)[source]#
Extract data array and metadata from the loaded data and return them. This function returns two objects, first is numpy array of image data, second is dict of metadata.
- Parameters:
dat (
dict
) – a dictionary loaded from an h5 file- Return type:
tuple
[ndarray
,dict
]
ConvertToTensorComplex#
- monai.apps.reconstruction.complex_utils.convert_to_tensor_complex(data, dtype=None, device=None, wrap_sequence=True, track_meta=False)[source]#
Convert complex-valued data to a 2-channel PyTorch tensor. The real and imaginary parts are stacked along the last dimension. This function relies on ‘monai.utils.type_conversion.convert_to_tensor’
- Parameters:
data – input data can be PyTorch Tensor, numpy array, list, int, and float. will convert Tensor, Numpy array, float, int, bool to Tensor, strings and objects keep the original. for list, convert every item to a Tensor if applicable.
dtype – target data type to when converting to Tensor.
device – target device to put the converted Tensor data.
wrap_sequence – if False, then lists will recursively call this function. E.g., [1, 2] -> [tensor(1), tensor(2)]. If True, then [1, 2] -> tensor([1, 2]).
track_meta – whether to track the meta information, if True, will convert to MetaTensor. default to False.
- Returns:
PyTorch version of the data
Example
import numpy as np data = np.array([ [1+1j, 1-1j], [2+2j, 2-2j] ]) # the following line prints (2,2) print(data.shape) # the following line prints torch.Size([2, 2, 2]) print(convert_to_tensor_complex(data).shape)
ComplexAbs#
- monai.apps.reconstruction.complex_utils.complex_abs(x)[source]#
Compute the absolute value of a complex array.
- Parameters:
x (
Union
[ndarray
,Tensor
]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.- Return type:
Union
[ndarray
,Tensor
]- Returns:
Absolute value along the last dimension
Example
import numpy as np x = np.array([3,4])[np.newaxis] # the following line prints 5 print(complex_abs(x))
RootSumOfSquares#
- monai.apps.reconstruction.mri_utils.root_sum_of_squares(x, spatial_dim)[source]#
Compute the root sum of squares (rss) of the data (typically done for multi-coil MRI samples)
- Parameters:
x (
Union
[ndarray
,Tensor
]) – Input array/tensorspatial_dim (
int
) – dimension along which rss is applied
- Return type:
Union
[ndarray
,Tensor
]- Returns:
rss of x along spatial_dim
Example
import numpy as np x = np.ones([2,3]) # the following line prints array([1.41421356, 1.41421356, 1.41421356]) print(rss(x,spatial_dim=0))
ComplexMul#
- monai.apps.reconstruction.complex_utils.complex_mul(x, y)[source]#
Compute complex-valued multiplication. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)
- Parameters:
x (
Union
[ndarray
,Tensor
]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.y (
Union
[ndarray
,Tensor
]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.
- Return type:
Union
[ndarray
,Tensor
]- Returns:
Complex multiplication of x and y
Example
import numpy as np x = np.array([[1,2],[3,4]]) y = np.array([[1,1],[1,1]]) # the following line prints array([[-1, 3], [-1, 7]]) print(complex_mul(x,y))
ComplexConj#
- monai.apps.reconstruction.complex_utils.complex_conj(x)[source]#
Compute complex conjugate of an/a array/tensor. Supports Ndim inputs with last dim equal to 2 (real/imaginary channels)
- Parameters:
x (
Union
[ndarray
,Tensor
]) – Input array/tensor with 2 channels in the last dimension representing real and imaginary parts.- Return type:
Union
[ndarray
,Tensor
]- Returns:
Complex conjugate of x
Example
import numpy as np x = np.array([[1,2],[3,4]]) # the following line prints array([[ 1, -2], [ 3, -4]]) print(complex_conj(x))
Vista3d#
- monai.apps.vista3d.inferer.point_based_window_inferer(inputs, roi_size, predictor, point_coords, point_labels, class_vector=None, prompt_class=None, prev_mask=None, point_start=0, center_only=True, margin=5, **kwargs)[source]#
Point-based window inferer that takes an input image, a set of points, and a model, and returns a segmented image. The inferer algorithm crops the input image into patches that centered at the point sets, which is followed by patch inference and average output stitching, and finally returns the segmented mask.
- Parameters:
inputs – [1CHWD], input image to be processed.
roi_size – the spatial window size for inferences. When its components have None or non-positives, the corresponding inputs dimension will be used. if the components of the roi_size are non-positive values, the transform will use the corresponding components of img size. For example, roi_size=(32, -1) will be adapted to (32, 64) if the second spatial dimension size of img is 64.
sw_batch_size – the batch size to run window slices.
predictor – the model. For vista3D, the output is [B, 1, H, W, D] which needs to be transposed to [1, B, H, W, D]. Add transpose=True in kwargs for vista3d.
point_coords – [B, N, 3]. Point coordinates for B foreground objects, each has N points.
point_labels – [B, N]. Point labels. 0/1 means negative/positive points for regular supported or zero-shot classes. 2/3 means negative/positive points for special supported classes (e.g. tumor, vessel).
class_vector – [B]. Used for class-head automatic segmentation. Can be None value.
prompt_class – [B]. The same as class_vector representing the point class and inform point head about supported class or zeroshot, not used for automatic segmentation. If None, point head is default to supported class segmentation.
prev_mask – [1, B, H, W, D]. The value is before sigmoid. An optional tensor of previously segmented masks.
point_start – only use points starting from this number. All points before this number is used to generate prev_mask. This is used to avoid re-calculating the points in previous iterations if given prev_mask.
center_only – for each point, only crop the patch centered at this point. If false, crop 3 patches for each point.
margin – if center_only is false, this value is the distance between point to the patch boundary.
- Returns:
[1, B, H, W, D]. The value is before sigmoid.
- Return type:
stitched_output
Notice: The function only supports SINGLE OBJECT INFERENCE with B=1.
- class monai.apps.vista3d.transforms.VistaPreTransformd(keys, allow_missing_keys=False, special_index=(25, 26, 27, 28, 29, 117), labels_dict=None, subclass=None)[source]#
- __init__(keys, allow_missing_keys=False, special_index=(25, 26, 27, 28, 29, 117), labels_dict=None, subclass=None)[source]#
Pre-transform for Vista3d.
It performs two functionalities:
If label prompt shows the points belong to special class (defined by special index, e.g. tumors, vessels), convert point labels from 0 (negative), 1 (positive) to special 2 (negative), 3 (positive).
If label prompt is within the keys in subclass, convert the label prompt to its subclasses defined by subclass[key]. e.g. “lung” label is converted to [“left lung”, “right lung”].
The label_prompt is a list of int values of length [B] and point_labels is a list of length B, where each element is an int value of length [B, N].
- Parameters:
keys – keys of the corresponding items to be transformed.
special_index – the index that defines the special class.
subclass – a dictionary that maps a label prompt to its subclasses.
allow_missing_keys – don’t raise exception if key is missing.
- class monai.apps.vista3d.transforms.VistaPostTransformd(keys, allow_missing_keys=False)[source]#
- __init__(keys, allow_missing_keys=False)[source]#
Post-transform for Vista3d. It converts the model output logits into final segmentation masks. If label_prompt is None, the output will be thresholded to be sequential indexes [0,1,2,…], else the indexes will be [0, label_prompt[0], label_prompt[1], …]. If label_prompt is None while points are provided, the model will perform postprocess to remove regions that does not contain positive points.
- Parameters:
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed.dataset_transforms – a dictionary specifies the transform for corresponding dataset: key: dataset name, value: list of data transforms.
dataset_key – key to get the dataset name from the data dictionary, default to “dataset_name”.
allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- class monai.apps.vista3d.transforms.Relabeld(keys, label_mappings, dtype=<class 'numpy.int16'>, dataset_key='dataset_name', allow_missing_keys=False)[source]#
- __init__(keys, label_mappings, dtype=<class 'numpy.int16'>, dataset_key='dataset_name', allow_missing_keys=False)[source]#
Remap the voxel labels in the input data dictionary based on the specified mapping.
This list of local -> global label mappings will be applied to each input data[keys]. if data[dataset_key] is not in label_mappings, label_mappings[‘default’]` will be used. if label_mappings[data[dataset_key]] is None, no relabeling will be performed.
- Parameters:
keys (
Union
[Collection
[Hashable
],Hashable
]) – keys of the corresponding items to be transformed.label_mappings (
dict
[str
,list
[tuple
[int
,int
]]]) – a dictionary specifies how local dataset class indices are mapped to the global class indices. The dictionary keys are dataset names and the values are lists of list of (local label, global label) pairs. This list of local -> global label mappings will be applied to each input data[keys]. If data[dataset_key] is not in label_mappings, label_mappings[‘default’]` will be used. if label_mappings[data[dataset_key]] is None, no relabeling will be performed. Please set label_mappings={} to completely skip this transform.dtype (
Union
[dtype
,type
,str
,None
]) – convert the output data to dtype, default to float32.dataset_key (
str
) – key to get the dataset name from the data dictionary, default to “dataset_name”.allow_missing_keys (
bool
) – don’t raise exception if key is missing.
- monai.apps.vista3d.sampler.sample_prompt_pairs(labels, label_set, max_prompt=None, max_foreprompt=None, max_backprompt=1, max_point=20, include_background=False, drop_label_prob=0.2, drop_point_prob=0.2, point_sampler=None, **point_sampler_kwargs)[source]#
Sample training pairs for VISTA3D training.
- Parameters:
labels – [1, 1, H, W, D], ground truth labels.
label_set – the label list for the specific dataset. Note if 0 is included in label_set, it will be added into automatic branch training. Recommend removing 0 from label_set for multi-partially-labeled-dataset training, and adding 0 for finetuning specific dataset. The reason is region with 0 in one partially labeled dataset may contain foregrounds in another dataset.
max_prompt – int, max number of total prompt, including foreground and background.
max_foreprompt – int, max number of prompt from foreground.
max_backprompt – int, max number of prompt from background.
max_point – maximum number of points for each object.
include_background – if include 0 into training prompt. If included, background 0 is treated the same as foreground and points will be sampled. Can be true only if user want to segment background 0 with point clicks, otherwise always be false.
drop_label_prob – probability to drop label prompt.
drop_point_prob – probability to drop point prompt.
point_sampler – sampler to augment masks with supervoxel.
point_sampler_kwargs – arguments for point_sampler.
- Returns:
label_prompt (Tensor | None): Tensor of shape [B, 1] containing the classes used for training automatic segmentation.
point (Tensor | None): Tensor of shape [B, N, 3] representing the corresponding points for each class. Note that background label prompts require matching points as well (e.g., [0, 0, 0] is used).
point_label (Tensor | None): Tensor of shape [B, N] representing the corresponding point labels for each point (negative or positive). -1 is used for padding the background label prompt and will be ignored.
prompt_class (Tensor | None): Tensor of shape [B, 1], exactly the same as label_prompt for label indexing during training. If label_prompt is None, prompt_class is used to identify point classes.
- Return type:
tuple
Auto3DSeg#
- class monai.apps.auto3dseg.AlgoEnsemble[source]#
The base class of Ensemble methods
- __call__(pred_param=None)[source]#
Use the ensembled model to predict result.
- Parameters:
pred_param –
prediction parameter dictionary. The key has two groups: the first one will be consumed in this function, and the second group will be passed to the InferClass to override the parameters of the class functions. The first group contains:
"infer_files"
: file paths to the images to read in a list."files_slices"
: a value type of slice. The files_slices will slice the"infer_files"
and only make prediction on the infer_files[file_slices]."mode"
: ensemble mode. Currently “mean” and “vote” (majority voting) schemes are supported."image_save_func"
: a dictionary used to instantiate theSaveImage
transform. When specified, the ensemble prediction will save the prediction files, instead of keeping the files in the memory. Example: {“_target_”: “SaveImage”, “output_dir”: “./”}"sigmoid"
: use the sigmoid function (e.g. x > 0.5) to convert the prediction probability map to the label class prediction, otherwise argmax(x) is used."algo_spec_params"
: a dictionary to add pred_params that are specific to a model. The dict has a format of {“<name of algo>”: “<pred_params for that algo>”}.
The parameters in the second group is defined in the
config
of each Algo templates. Please check: Project-MONAI/research-contributions- Returns:
A list of tensors or file paths, depending on whether
"image_save_func"
is set.
- ensemble_pred(preds, sigmoid=False)[source]#
ensemble the results using either “mean” or “vote” method
- Parameters:
preds – a list of probability prediction in Tensor-Like format.
sigmoid – use the sigmoid function to threshold probability one-hot map, otherwise argmax is used. Defaults to False
- Returns:
a tensor which is the ensembled prediction.
- get_algo(identifier)[source]#
Get a model by identifier.
- Parameters:
identifier – the name of the bundleAlgo
- class monai.apps.auto3dseg.AlgoEnsembleBestByFold(n_fold=5)[source]#
Ensemble method that select the best models that are the tops in each fold.
- Parameters:
n_fold (
int
) – number of cross-validation folds used in training
- class monai.apps.auto3dseg.AlgoEnsembleBestN(n_best=5)[source]#
Ensemble method that select N model out of all using the models’ best_metric scores
- Parameters:
n_best (
int
) – number of models to pick for ensemble (N).
- class monai.apps.auto3dseg.AlgoEnsembleBuilder(history, data_src_cfg_name=None)[source]#
Build ensemble workflow from configs and arguments.
- Parameters:
history – a collection of trained bundleAlgo algorithms.
data_src_cfg_name – filename of the data source.
Examples
builder = AlgoEnsembleBuilder(history, data_src_cfg) builder.set_ensemble_method(BundleAlgoEnsembleBestN(3)) ensemble = builder.get_ensemble()
- add_inferer(identifier, gen_algo, best_metric=None)[source]#
Add model inferer to the builder.
- Parameters:
identifier – name of the bundleAlgo.
gen_algo – a trained BundleAlgo model object.
best_metric – the best metric in validation of the trained model.
- set_ensemble_method(ensemble, *args, **kwargs)[source]#
Set the ensemble method.
- Parameters:
ensemble (
AlgoEnsemble
) – the AlgoEnsemble to build.- Return type:
None
- class monai.apps.auto3dseg.AutoRunner(work_dir='./work_dir', input=None, algos=None, analyze=None, algo_gen=None, train=None, hpo=False, hpo_backend='nni', ensemble=True, not_use_cache=False, templates_path_or_url=None, allow_skip=True, mlflow_tracking_uri=None, mlflow_experiment_name=None, **kwargs)[source]#
An interface for handling Auto3Dseg with minimal inputs and understanding of the internal states in Auto3Dseg. The users can run the Auto3Dseg with default settings in one line of code. They can also customize the advanced features Auto3Dseg in a few additional lines. Examples of customization include
change cross-validation folds
change training/prediction parameters
change ensemble methods
automatic hyperparameter optimization.
The output of the interface is a directory that contains
data statistics analysis report
algorithm definition files (scripts, configs, pickle objects) and training results (checkpoints, accuracies)
the predictions on the testing datasets from the final algorithm ensemble
a copy of the input arguments in form of YAML
cached intermediate results
- Parameters:
work_dir – working directory to save the intermediate and final results.
input – the configuration dictionary or the file path to the configuration in form of YAML. The configuration should contain datalist, dataroot, modality, multigpu, and class_names info.
algos – optionally specify algorithms to use. If a dictionary, must be in the form {“algname”: dict(_target_=”algname.scripts.algo.AlgnameAlgo”, template_path=”algname”), …} If a list or a string, defines a subset of names of the algorithms to use, e.g. ‘segresnet’ or [‘segresnet’, ‘dints’] out of the full set of algorithm templates provided by templates_path_or_url. Defaults to None, to use all available algorithms.
analyze – on/off switch to run DataAnalyzer and generate a datastats report. Defaults to None, to automatically decide based on cache, and run data analysis only if we have not completed this step yet.
algo_gen – on/off switch to run AlgoGen and generate templated BundleAlgos. Defaults to None, to automatically decide based on cache, and run algorithm folders generation only if we have not completed this step yet.
train – on/off switch to run training and generate algorithm checkpoints. Defaults to None, to automatically decide based on cache, and run training only if we have not completed this step yet.
hpo – use hyperparameter optimization (HPO) in the training phase. Users can provide a list of hyper-parameter and a search will be performed to investigate the algorithm performances.
hpo_backend – a string that indicates the backend of the HPO. Currently, only NNI Grid-search mode is supported
ensemble – on/off switch to run model ensemble and use the ensemble to predict outputs in testing datasets.
not_use_cache – if the value is True, it will ignore all cached results in data analysis, algorithm generation, or training, and start the pipeline from scratch.
templates_path_or_url – the folder with the algorithm templates or a url. If None provided, the default template zip url will be downloaded and extracted into the work_dir.
allow_skip – a switch passed to BundleGen process which determines if some Algo in the default templates can be skipped based on the analysis on the dataset from Auto3DSeg DataAnalyzer.
mlflow_tracking_uri – a tracking URI for MLflow server which could be local directory or address of the remote tracking Server; MLflow runs will be recorded locally in algorithms’ model folder if the value is None.
mlflow_experiment_name – the name of the experiment in MLflow server.
kwargs – image writing parameters for the ensemble inference. The kwargs format follows the SaveImage transform. For more information, check https://docs.monai.io/en/stable/transforms.html#saveimage.
Examples
User can use the one-liner to start the Auto3Dseg workflow
python -m monai.apps.auto3dseg AutoRunner run --input '{"modality": "ct", "datalist": "dl.json", "dataroot": "/dr", "multigpu": true, "class_names": ["A", "B"]}'
User can also save the input dictionary as a input YAML file and use the following one-liner
python -m monai.apps.auto3dseg AutoRunner run --input=./input.yaml
User can specify work_dir and data source config input and run AutoRunner:
work_dir = "./work_dir" input = "path/to/input_yaml" runner = AutoRunner(work_dir=work_dir, input=input) runner.run()
User can specify a subset of algorithms to use and run AutoRunner:
work_dir = "./work_dir" input = "path/to/input_yaml" algos = ["segresnet", "dints"] runner = AutoRunner(work_dir=work_dir, input=input, algos=algos) runner.run()
User can specify a local folder with algorithms templates and run AutoRunner:
work_dir = "./work_dir" input = "path/to/input_yaml" algos = "segresnet" templates_path_or_url = "./local_path_to/algorithm_templates" runner = AutoRunner(work_dir=work_dir, input=input, algos=algos, templates_path_or_url=templates_path_or_url) runner.run()
User can specify training parameters by:
input = "path/to/input_yaml" runner = AutoRunner(input=input) train_param = { "num_epochs_per_validation": 1, "num_images_per_batch": 2, "num_epochs": 2, } runner.set_training_params(params=train_param) # 2 epochs runner.run()
User can specify the fold number of cross validation
input = "path/to/input_yaml" runner = AutoRunner(input=input) runner.set_num_fold(n_fold = 2) runner.run()
User can specify the prediction parameters during algo ensemble inference:
input = "path/to/input_yaml" pred_params = { 'files_slices': slice(0,2), 'mode': "vote", 'sigmoid': True, } runner = AutoRunner(input=input) runner.set_prediction_params(params=pred_params) runner.run()
User can define a grid search space and use the HPO during training.
input = "path/to/input_yaml" runner = AutoRunner(input=input, hpo=True) runner.set_nni_search_space({"learning_rate": {"_type": "choice", "_value": [0.0001, 0.001, 0.01, 0.1]}}) runner.run()
Notes
Expected results in the work_dir as below:
work_dir/ ├── algorithm_templates # bundle algo templates (scripts/configs) ├── cache.yaml # Autorunner will automatically cache results to save time ├── datastats.yaml # datastats of the dataset ├── dints_0 # network scripts/configs/checkpoints and pickle object of the algo ├── ensemble_output # the prediction of testing datasets from the ensemble of the algos ├── input.yaml # copy of the input data source configs ├── segresnet_0 # network scripts/configs/checkpoints and pickle object of the algo ├── segresnet2d_0 # network scripts/configs/checkpoints and pickle object of the algo └── swinunetr_0 # network scripts/configs/checkpoints and pickle object of the algo
- inspect_datalist_folds(datalist_filename)[source]#
Returns number of folds in the datalist file, and assigns fold numbers if not provided.
- Parameters:
datalist_filename (
str
) – path to the datalist file.
Notes
If the fold key is not provided, it auto generates 5 folds assignments in the training key list. If validation key list is available, then it assumes a single fold validation.
- Return type:
int
- read_cache()[source]#
Check if the intermediate result is cached after each step in the current working directory
- Returns:
a dict of cache results. If not_use_cache is set to True, or there is no cache file in the working directory, the result will be
empty_cache
in which allhas_cache
keys are set to False.
- set_analyze_params(params=None)[source]#
Set the data analysis extra params.
- Parameters:
params – a dict that defines the overriding key-value pairs during training. The overriding method is defined by the algo class.
- set_device_info(cuda_visible_devices=None, num_nodes=None, mn_start_method=None, cmd_prefix=None)[source]#
Set the device related info
- Parameters:
cuda_visible_devices – define GPU ids for data analyzer, training, and ensembling. List of GPU ids [0,1,2,3] or a string “0,1,2,3”. Default using env “CUDA_VISIBLE_DEVICES” or all devices available.
num_nodes – number of nodes for training and ensembling. Default using env “NUM_NODES” or 1 if “NUM_NODES” is unset.
mn_start_method – multi-node start method. Autorunner will use the method to start multi-node processes. Default using env “MN_START_METHOD” or ‘bcprun’ if “MN_START_METHOD” is unset.
cmd_prefix –
command line prefix for subprocess running in BundleAlgo and EnsembleRunner. Default using env “CMD_PREFIX” or None, examples are:
single GPU/CPU or multinode bcprun: “python “ or “/opt/conda/bin/python3.9 “,
single node multi-GPU running “torchrun –nnodes=1 –nproc_per_node=2 “
If user define this prefix, please make sure –nproc_per_node matches cuda_visible_device or os.env[‘CUDA_VISIBLE_DEVICES’]. Also always set –nnodes=1. Set num_nodes for multi-node.
- set_ensemble_method(ensemble_method_name='AlgoEnsembleBestByFold', **kwargs)[source]#
Set the bundle ensemble method name and parameters for save image transform parameters.
- Parameters:
ensemble_method_name (
str
) – the name of the ensemble method. Only two methods are supported “AlgoEnsembleBestN” and “AlgoEnsembleBestByFold”.kwargs (
Any
) – the keyword arguments used to define the ensemble method. Currently onlyn_best
forAlgoEnsembleBestN
is supported.
- Return type:
- set_gpu_customization(gpu_customization=False, gpu_customization_specs=None)[source]#
Set options for GPU-based parameter customization/optimization.
- Parameters:
gpu_customization – the switch to determine automatically customize/optimize bundle script/config parameters for each bundleAlgo based on gpus. Custom parameters are obtained through dummy training to simulate the actual model training process and hyperparameter optimization (HPO) experiments.
gpu_customization_specs (optional) –
the dictionary to enable users overwrite the HPO settings. user can overwrite part of variables as follows or all of them. The structure is as follows.
gpu_customization_specs = { 'ALGO': { 'num_trials': 6, 'range_num_images_per_batch': [1, 20], 'range_num_sw_batch_size': [1, 20] } }
ALGO –
the name of algorithm. It could be one of algorithm names (e.g., ‘dints’) or ‘universal’ which would apply changes to all algorithms. Possible options are
{
"universal"
,"dints"
,"segresnet"
,"segresnet2d"
,"swinunetr"
}.
num_trials – the number of HPO trials/experiments to run.
range_num_images_per_batch – the range of number of images per mini-batch.
range_num_sw_batch_size – the range of batch size in sliding-window inferer.
- set_hpo_params(params=None)[source]#
Set parameters for the HPO module and the algos before the training. It will attempt to (1) override bundle templates with the key-value pairs in
params
(2) change the config of the HPO module (e.g. NNI) if the key is found to be one of:“trialCodeDirectory”
“trialGpuNumber”
“trialConcurrency”
“maxTrialNumber”
“maxExperimentDuration”
“tuner”
“trainingService”
and (3) enable the dry-run mode if the user would generate the NNI configs without starting the NNI service.
- Parameters:
params – a dict that defines the overriding key-value pairs during instantiation of the algo. For BundleAlgo, it will override the template config filling.
Notes
Users can set
nni_dry_run
toTrue
in theparams
to enable the dry-run mode for the NNI backend.
- set_image_save_transform(**kwargs)[source]#
Set the ensemble output transform.
- Parameters:
kwargs (
Any
) – image writing parameters for the ensemble inference. The kwargs format follows SaveImage transform. For more information, check https://docs.monai.io/en/stable/transforms.html#saveimage.- Return type:
- set_nni_search_space(search_space)[source]#
Set the search space for NNI parameter search.
- Parameters:
search_space (
dict
[str
,Any
]) – hyper parameter search space in the form of dict. For more information, please check NNI documentation: https://nni.readthedocs.io/en/v2.2/Tutorial/SearchSpaceSpec.html .- Return type:
- set_num_fold(num_fold=5)[source]#
Set the number of cross validation folds for all algos.
- Parameters:
num_fold (
int
) – a positive integer to define the number of folds.- Return type:
- set_prediction_params(params=None)[source]#
Set the prediction params for all algos.
- Parameters:
params – a dict that defines the overriding key-value pairs during prediction. The overriding method is defined by the algo class.
Examples
- For BundleAlgo objects, this set of param will specify the algo ensemble to only inference the first
two files in the testing datalist {“file_slices”: slice(0, 2)}
- set_training_params(params=None)[source]#
Set the training params for all algos.
- Parameters:
params – a dict that defines the overriding key-value pairs during training. The overriding method is defined by the algo class.
Examples
- For BundleAlgo objects, the training parameter to shorten the training time to a few epochs can be
{“num_epochs”: 2, “num_epochs_per_validation”: 1}
- class monai.apps.auto3dseg.BundleAlgo(template_path)[source]#
An algorithm represented by a set of bundle configurations and scripts.
BundleAlgo.cfg
is amonai.bundle.ConfigParser
instance.from monai.apps.auto3dseg import BundleAlgo data_stats_yaml = "../datastats.yaml" algo = BundleAlgo(template_path="../algorithm_templates") algo.set_data_stats(data_stats_yaml) # algo.set_data_src("../data_src.json") algo.export_to_disk(".", algo_name="segresnet2d_1")
This class creates MONAI bundles from a directory of ‘bundle template’. Different from the regular MONAI bundle format, the bundle template may contain placeholders that must be filled using
fill_template_config
duringexport_to_disk
. Then created bundle keeps the same file structure as the template.- __init__(template_path)[source]#
Create an Algo instance based on the predefined Algo template.
- Parameters:
template_path (
Union
[str
,PathLike
]) – path to a folder that contains the algorithm templates. Please check Project-MONAI/research-contributions
- export_to_disk(output_path, algo_name, **kwargs)[source]#
Fill the configuration templates, write the bundle (configs + scripts) to folder output_path/algo_name.
- Parameters:
output_path (
str
) – Path to export the ‘scripts’ and ‘configs’ directories.algo_name (
str
) – the identifier of the algorithm (usually contains the name and extra info like fold ID).kwargs (
Any
) – other parameters, including: “copy_dirs=True/False” means whether to copy the template as output instead of inplace operation, “fill_template=True/False” means whether to fill the placeholders in the template. other parameters are for fill_template_config function.
- Return type:
None
- fill_template_config(data_stats_filename, algo_path, **kwargs)[source]#
The configuration files defined when constructing this Algo instance might not have a complete training and validation pipelines. Some configuration components and hyperparameters of the pipelines depend on the training data and other factors. This API is provided to allow the creation of fully functioning config files. Return the records of filling template config: {“<config name>”: {“<placeholder key>”: value, …}, …}.
- Parameters:
data_stats_filename (
str
) – filename of the data stats report (generated by DataAnalyzer)
Notes
Template filling is optional. The user can construct a set of pre-filled configs without replacing values by using the data analysis results. It is also intended to be re-implemented in subclasses of BundleAlgo if the user wants their own way of auto-configured template filling.
- Return type:
dict
- get_inferer(*args, **kwargs)[source]#
Load the InferClass from the infer.py. The InferClass should be defined in the template under the path of “scripts/infer.py”. It is required to define the “InferClass” (name is fixed) with two functions at least (
__init__
andinfer
). The init class has an override kwargs that can be used to override parameters in the run-time optionally.Examples:
class InferClass def __init__(self, config_file: Optional[Union[str, Sequence[str]]] = None, **override): # read configs from config_file (sequence) # set up transforms # set up model # set up other hyper parameters return @torch.no_grad() def infer(self, image_file): # infer the model and save the results to output return output
- get_score(*args, **kwargs)[source]#
Returns validation scores of the model trained by the current Algo.
- pre_check_skip_algo(skip_bundlegen=False, skip_info='')[source]#
Analyse the data analysis report and check if the algorithm needs to be skipped. This function is overriden within algo. :type skip_bundlegen:
bool
:param skip_bundlegen: skip generating bundles for this algo if true. :type skip_info:str
:param skip_info: info to print when skipped.- Return type:
tuple
[bool
,str
]
- predict(predict_files, predict_params=None)[source]#
Use the trained model to predict the outputs with a given input image.
- Parameters:
predict_files – a list of paths to files to run inference on [“path_to_image_1”, “path_to_image_2”]
predict_params – a dict to override the parameters in the bundle config (including the files to predict).
- set_data_source(data_src_cfg)[source]#
Set the data source configuration file
- Parameters:
data_src_cfg (
str
) – path to a configuration file (yaml) that contains datalist, dataroot, and other params. The config will be in a form of {“modality”: “ct”, “datalist”: “path_to_json_datalist”, “dataroot”: “path_dir_data”}- Return type:
None
- set_data_stats(data_stats_files)[source]#
Set the data analysis report (generated by DataAnalyzer).
- Parameters:
data_stats_files (
str
) – path to the datastats yaml file- Return type:
None
- set_mlflow_experiment_name(mlflow_experiment_name)[source]#
Set the experiment name for MLflow server
- Parameters:
mlflow_experiment_name – a string to specify the experiment name for MLflow server.
- set_mlflow_tracking_uri(mlflow_tracking_uri)[source]#
Set the tracking URI for MLflow server
- Parameters:
mlflow_tracking_uri – a tracking URI for MLflow server which could be local directory or address of the remote tracking Server; MLflow runs will be recorded locally in algorithms’ model folder if the value is None.
- train(train_params=None, device_setting=None)[source]#
Load the run function in the training script of each model. Training parameter is predefined by the algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.
- Parameters:
train_params – training parameters
device_setting – device related settings, should follow the device_setting in auto_runner.set_device_info. ‘CUDA_VISIBLE_DEVICES’ should be a string e.g. ‘0,1,2,3’
- class monai.apps.auto3dseg.BundleGen(algo_path='.', algos=None, templates_path_or_url=None, data_stats_filename=None, data_src_cfg_name=None, mlflow_tracking_uri=None, mlflow_experiment_name=None)[source]#
This class generates a set of bundles according to the cross-validation folds, each of them can run independently.
- Parameters:
algo_path – the directory path to save the algorithm templates. Default is the current working dir.
algos – If dictionary, it outlines the algorithm to use. If a list or a string, defines a subset of names of the algorithms to use, e.g. (‘segresnet’, ‘dints’) out of the full set of algorithm templates provided by templates_path_or_url. Defaults to None - to use all available algorithms.
templates_path_or_url – the folder with the algorithm templates or a url. If None provided, the default template zip url will be downloaded and extracted into the algo_path. The current default options are released at: Project-MONAI/research-contributions.
data_stats_filename – the path to the data stats file (generated by DataAnalyzer).
data_src_cfg_name – the path to the data source config YAML file. The config will be in a form of {“modality”: “ct”, “datalist”: “path_to_json_datalist”, “dataroot”: “path_dir_data”}.
mlflow_tracking_uri – a tracking URI for MLflow server which could be local directory or address of the remote tracking Server; MLflow runs will be recorded locally in algorithms’ model folder if the value is None.
mlfow_experiment_name – a string to specify the experiment name for MLflow server.
python -m monai.apps.auto3dseg BundleGen generate --data_stats_filename="../algorithms/datastats.yaml"
- generate(output_folder='.', num_fold=5, gpu_customization=False, gpu_customization_specs=None, allow_skip=True)[source]#
Generate the bundle scripts/configs for each bundleAlgo
- Parameters:
output_folder – the output folder to save each algorithm.
num_fold – the number of cross validation fold.
gpu_customization – the switch to determine automatically customize/optimize bundle script/config parameters for each bundleAlgo based on gpus. Custom parameters are obtained through dummy training to simulate the actual model training process and hyperparameter optimization (HPO) experiments.
gpu_customization_specs – the dictionary to enable users overwrite the HPO settings. user can overwrite part of variables as follows or all of them. The structure is as follows.
allow_skip –
a switch to determine if some Algo in the default templates can be skipped based on the analysis on the dataset from Auto3DSeg DataAnalyzer.
gpu_customization_specs = { 'ALGO': { 'num_trials': 6, 'range_num_images_per_batch': [1, 20], 'range_num_sw_batch_size': [1, 20] } }
ALGO –
the name of algorithm. It could be one of algorithm names (e.g., ‘dints’) or ‘universal’ which would apply changes to all algorithms. Possible options are
{
"universal"
,"dints"
,"segresnet"
,"segresnet2d"
,"swinunetr"
}.
num_trials – the number of HPO trials/experiments to run.
range_num_images_per_batch – the range of number of images per mini-batch.
range_num_sw_batch_size – the range of batch size in sliding-window inferer.
- get_history()[source]#
Get the history of the bundleAlgo object with their names/identifiers
- Return type:
list
- set_data_src(data_src_cfg_name)[source]#
Set the data source filename
- Parameters:
data_src_cfg_name – filename of data_source file
- set_data_stats(data_stats_filename)[source]#
Set the data stats filename
- Parameters:
data_stats_filename (
str
) – filename of datastats- Return type:
None
- set_mlflow_experiment_name(mlflow_experiment_name)[source]#
Set the experiment name for MLflow server
- Parameters:
mlflow_experiment_name – a string to specify the experiment name for MLflow server.
- set_mlflow_tracking_uri(mlflow_tracking_uri)[source]#
Set the tracking URI for MLflow server
- Parameters:
mlflow_tracking_uri – a tracking URI for MLflow server which could be local directory or address of the remote tracking Server; MLflow runs will be recorded locally in algorithms’ model folder if the value is None.
- class monai.apps.auto3dseg.DataAnalyzer(datalist, dataroot='', output_path='./datastats.yaml', average=True, do_ccp=False, device='cuda', worker=4, image_key='image', label_key='label', hist_bins=0, hist_range=None, fmt='yaml', histogram_only=False, **extra_params)[source]#
The DataAnalyzer automatically analyzes given medical image dataset and reports the statistics. The module expects file paths to the image data and utilizes the LoadImaged transform to read the files, which supports nii, nii.gz, png, jpg, bmp, npz, npy, and dcm formats. Currently, only segmentation task is supported, so the user needs to provide paths to the image and label files (if have). Also, label data format is preferred to be (1,H,W,D), with the label index in the first dimension. If it is in onehot format, it will be converted to the preferred format.
- Parameters:
datalist – a Python dictionary storing group, fold, and other information of the medical image dataset, or a string to the JSON file storing the dictionary.
dataroot – user’s local directory containing the datasets.
output_path – path to save the analysis result.
average – whether to average the statistical value across different image modalities.
do_ccp – apply the connected component algorithm to process the labels/images
device – a string specifying hardware (CUDA/CPU) utilized for the operations.
worker – number of workers to use for loading datasets in each GPU/CPU sub-process.
image_key – a string that user specify for the image. The DataAnalyzer will look it up in the datalist to locate the image files of the dataset.
label_key – a string that user specify for the label. The DataAnalyzer will look it up in the datalist to locate the label files of the dataset. If label_key is NoneType or “None”, the DataAnalyzer will skip looking for labels and all label-related operations.
hist_bins – bins to compute histogram for each image channel.
hist_range – ranges to compute histogram for each image channel.
fmt – format used to save the analysis results. Currently support
"json"
and"yaml"
, defaults to “yaml”.histogram_only – whether to only compute histograms. Defaults to False.
extra_params – other optional arguments. Currently supported arguments are : ‘allowed_shape_difference’ (default 5) can be used to change the default tolerance of the allowed shape differences between the image and label items. In case of shape mismatch below the tolerance, the label image will be resized to match the image using nearest interpolation.
Examples
from monai.apps.auto3dseg.data_analyzer import DataAnalyzer datalist = { "testing": [{"image": "image_003.nii.gz"}], "training": [ {"fold": 0, "image": "image_001.nii.gz", "label": "label_001.nii.gz"}, {"fold": 0, "image": "image_002.nii.gz", "label": "label_002.nii.gz"}, {"fold": 1, "image": "image_001.nii.gz", "label": "label_001.nii.gz"}, {"fold": 1, "image": "image_004.nii.gz", "label": "label_004.nii.gz"}, ], } dataroot = '/datasets' # the directory where you have the image files (nii.gz) DataAnalyzer(datalist, dataroot)
Notes
The module can also be called from the command line interface (CLI).
For example:
python -m monai.apps.auto3dseg \ DataAnalyzer \ get_all_case_stats \ --datalist="my_datalist.json" \ --dataroot="my_dataroot_dir"
- get_all_case_stats(key='training', transform_list=None)[source]#
Get all case stats. Caller of the DataAnalyser class. The function initiates multiple GPU or CPU processes of the internal _get_all_case_stats functions, which iterates datalist and call SegSummarizer to generate stats for each case. After all case stats are generated, SegSummarizer is called to combine results.
- Parameters:
key – dataset key
transform_list – option list of transforms before SegSummarizer
- Returns:
- A data statistics dictionary containing
”stats_summary” (summary statistics of the entire datasets). Within stats_summary there are “image_stats” (summarizing info of shape, channel, spacing, and etc using operations_summary), “image_foreground_stats” (info of the intensity for the non-zero labeled voxels), and “label_stats” (info of the labels, pixel percentage, image_intensity, and each individual label in a list) “stats_by_cases” (List type value. Each element of the list is statistics of an image-label info. Within each element, there are: “image” (value is the path to an image), “label” (value is the path to the corresponding label), “image_stats” (summarizing info of shape, channel, spacing, and etc using operations), “image_foreground_stats” (similar to the previous one but one foreground image), and “label_stats” (stats of the individual labels )
Notes
Since the backend of the statistics computation are torch/numpy, nan/inf value may be generated and carried over in the computation. In such cases, the output dictionary will include .nan/.inf in the statistics.
- class monai.apps.auto3dseg.EnsembleRunner(data_src_cfg_name, work_dir='./work_dir', num_fold=5, ensemble_method_name='AlgoEnsembleBestByFold', mgpu=True, **kwargs)[source]#
The Runner for ensembler. It ensembles predictions and saves them to the disk with a support of using multi-GPU.
- Parameters:
data_src_cfg_name (
str
) – filename of the data source.work_dir (
str
) – working directory to save the intermediate and final results. Default is ./work_dir.num_fold (
int
) – number of fold. Default is 5.ensemble_method_name (
str
) – method to ensemble predictions from different model. Default is AlgoEnsembleBestByFold. Supported methods: [“AlgoEnsembleBestN”, “AlgoEnsembleBestByFold”].mgpu (
bool
) – if using multi-gpu. Default is True.kwargs (
Any
) – additional image writing, ensembling parameters and prediction parameters for the ensemble inference. - for image saving, please check the supported parameters in SaveImage transform. - for prediction parameters, please check the supported parameters in theAlgoEnsemble
callables. - for ensemble parameters, please check the documentation of the selected AlgoEnsemble callable.
Example
ensemble_runner = EnsembleRunner(data_src_cfg_name, work_dir, ensemble_method_name, mgpu=device_setting['n_devices']>1, **kwargs, **pred_params) ensemble_runner.run(device_setting)
- run(device_setting=None)[source]#
Load the run function in the training script of each model. Training parameter is predefined by the algo_config.yaml file, which is pre-filled by the fill_template_config function in the same instance.
- Parameters:
device_setting – device related settings, should follow the device_setting in auto_runner.set_device_info. ‘CUDA_VISIBLE_DEVICES’ should be a string e.g. ‘0,1,2,3’
- set_ensemble_method(ensemble_method_name='AlgoEnsembleBestByFold', **kwargs)[source]#
Set the bundle ensemble method
- Parameters:
ensemble_method_name (
str
) – the name of the ensemble method. Only two methods are supported “AlgoEnsembleBestN” and “AlgoEnsembleBestByFold”.kwargs (
Any
) – the keyword arguments used to define the ensemble method. Currently onlyn_best
forAlgoEnsembleBestN
is supported.
- Return type:
None
- set_image_save_transform(**kwargs)[source]#
Set the ensemble output transform.
- Parameters:
kwargs (
Any
) – image writing parameters for the ensemble inference. The kwargs format follows SaveImage transform. For more information, check https://docs.monai.io/en/stable/transforms.html#saveimage .- Return type:
None
- class monai.apps.auto3dseg.NNIGen(algo=None, params=None)[source]#
Generate algorithms for the NNI to automate hyperparameter tuning. The module has two major interfaces:
__init__
which prints out how to set up the NNI, and a trialCommand functionrun_algo
for the NNI library to start the trial of the algo. More about trialCommand function can be found intrail code
section in NNI webpage https://nni.readthedocs.io/en/latest/tutorials/hpo_quickstart_pytorch/main.html .- Parameters:
algo – an Algo object (e.g. BundleAlgo) with defined methods:
get_output_path
and train and supports saving to and loading from pickle files viaalgo_from_pickle
andalgo_to_pickle
.params – a set of parameter to override the algo if override is supported by Algo subclass.
Examples:
The experiment will keep generating new folders to save the model checkpoints, scripts, and configs if available. ├── algorithm_templates │ └── unet ├── unet_0 │ ├── algo_object.pkl │ ├── configs │ └── scripts ├── unet_0_learning_rate_0.01 │ ├── algo_object.pkl │ ├── configs │ ├── model_fold0 │ └── scripts └── unet_0_learning_rate_0.1 ├── algo_object.pkl ├── configs ├── model_fold0 └── scripts .. code-block:: python # Bundle Algorithms are already generated by BundleGen in work_dir import_bundle_algo_history(work_dir, only_trained=False) algo_dict = self.history[0] # pick the first algorithm algo_name = algo_dict[AlgoKeys.ID] onealgo = algo_dict[AlgoKeys.ALGO] nni_gen = NNIGen(algo=onealgo) nni_gen.print_bundle_algo_instruction()
Notes
The NNIGen will prepare the algorithms in a folder and suggest a command to replace trialCommand in the experiment config. However, NNIGen will not trigger NNI. User needs to write their NNI experiment configs, and then run the NNI command manually.
- generate(output_folder='.')[source]#
Generate the record for each Algo. If it is a BundleAlgo, it will generate the config files.
- Parameters:
output_folder (
str
) – the directory nni will save the results to.- Return type:
None
- get_task_id()[source]#
Get the identifier of the current experiment. In the format of listing the searching parameter name and values connected by underscore in the file name.
- run_algo(obj_filename, output_folder='.', template_path=None)[source]#
The python interface for NNI to run.
- Parameters:
obj_filename – the pickle-exported Algo object.
output_folder – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path:
{algorithm_templates_dir}/{network}/scripts/algo.py
- class monai.apps.auto3dseg.OptunaGen(algo=None, params=None)[source]#
Generate algorithms for the Optuna to automate hyperparameter tuning. Please refer to NNI and Optuna (https://optuna.readthedocs.io/en/stable/) for more information. Optuna has different running scheme compared to NNI. The hyperparameter samples come from a trial object (trial.suggest…) created by Optuna, so OptunaGen needs to accept this trial object as input. Meanwhile, Optuna calls OptunaGen, thus OptunaGen.__call__() should return the accuracy. Use functools.partial to wrap OptunaGen for addition input arguments.
- Parameters:
algo – an Algo object (e.g. BundleAlgo). The object must at least define two methods: get_output_path and train and supports saving to and loading from pickle files via
algo_from_pickle
andalgo_to_pickle
.params – a set of parameter to override the algo if override is supported by Algo subclass.
Examples:
The experiment will keep generating new folders to save the model checkpoints, scripts, and configs if available. ├── algorithm_templates │ └── unet ├── unet_0 │ ├── algo_object.pkl │ ├── configs │ └── scripts ├── unet_0_learning_rate_0.01 │ ├── algo_object.pkl │ ├── configs │ ├── model_fold0 │ └── scripts └── unet_0_learning_rate_0.1 ├── algo_object.pkl ├── configs ├── model_fold0 └── scripts
Notes
Different from NNI and NNIGen, OptunaGen and Optuna can be ran within the Python process.
- __call__(trial, obj_filename, output_folder='.', template_path=None)[source]#
Callable that Optuna will use to optimize the hyper-parameters
- Parameters:
obj_filename – the pickle-exported Algo object.
output_folder – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path:
{algorithm_templates_dir}/{network}/scripts/algo.py
- generate(output_folder='.')[source]#
Generate the record for each Algo. If it is a BundleAlgo, it will generate the config files.
- Parameters:
output_folder (
str
) – the directory nni will save the results to.- Return type:
None
- get_hyperparameters()[source]#
Get parameter for next round of training from optuna trial object. This function requires user rewrite during usage for different search space.
- get_task_id()[source]#
Get the identifier of the current experiment. In the format of listing the searching parameter name and values connected by underscore in the file name.
- run_algo(obj_filename, output_folder='.', template_path=None)[source]#
The python interface for NNI to run.
- Parameters:
obj_filename – the pickle-exported Algo object.
output_folder – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path:
{algorithm_templates_dir}/{network}/scripts/algo.py
- monai.apps.auto3dseg.export_bundle_algo_history(history)[source]#
Save all the BundleAlgo in the history to algo_object.pkl in each individual folder
- Parameters:
history (
list
[dict
[str
,BundleAlgo
]]) – a List of Bundle. Typically, the history can be obtained from BundleGen get_history method- Return type:
None
- monai.apps.auto3dseg.get_name_from_algo_id(id)[source]#
Get the name of Algo from the identifier of the Algo.
- Parameters:
id (
str
) – identifier which follows a convention of “name_fold_other”.- Return type:
str
- Returns:
name of the Algo.
- monai.apps.auto3dseg.import_bundle_algo_history(output_folder='.', template_path=None, only_trained=True)[source]#
import the history of the bundleAlgo objects as a list of algo dicts. each algo_dict has keys name (folder name), algo (bundleAlgo), is_trained (bool),
- Parameters:
output_folder – the root path of the algorithms templates.
template_path – the algorithm_template. It must contain algo.py in the follow path:
{algorithm_templates_dir}/{network}/scripts/algo.py
.only_trained – only read the algo history if the algo is trained.
nnUNet#
- class monai.apps.nnunet.nnUNetV2Runner(input_config, trainer_class_name='nnUNetTrainer', work_dir='work_dir', export_validation_probabilities=True)[source]#
nnUNetV2Runner
provides an interface in MONAI to use nnU-Net V2 library to analyze, train, and evaluate neural networks for medical image segmentation tasks. A version of nnunetv2 higher than 2.2 is needed for this class.nnUNetV2Runner
can be used in two ways:with one line of code to execute the complete pipeline.
with a series of commands to run each modules in the pipeline.
The output of the interface is a directory that contains:
converted dataset met the requirement of nnU-Net V2
data analysis results
checkpoints from the trained U-Net models
validation accuracy in each fold of cross-validation
the predictions on the testing datasets from the final algorithm ensemble and potential post-processing
- Parameters:
input_config (
Any
) – the configuration dictionary or the file path to the configuration in the form of YAML. The keys required in the configuration are: -"datalist"
: File path to the datalist for the train/testing splits -"dataroot"
: File path to the dataset -"modality"
: Imaging modality, e.g. “CT”, [“T2”, “ADC”] Currently, the configuration supports these optional keys: -"nnunet_raw"
: File path that will be written to env variable for nnU-Net -"nnunet_preprocessed"
: File path that will be written to env variable for nnU-Net -"nnunet_results"
: File path that will be written to env variable for nnU-Net -"nnUNet_trained_models"
-"dataset_name_or_id"
: Name or Integer ID of the dataset If an optional key is not specified, then the pipeline will use the default values.trainer_class_name (
str
) – the trainer class names offered by nnUNetV2 exhibit variations in training duration. Default: “nnUNetTrainer”. Other options: “nnUNetTrainer_Xepoch”. X could be one of 1,5,10,20,50,100, 250,2000,4000,8000.export_validation_probabilities (
bool
) – True to save softmax predictions from final validation as npz files (in addition to predicted segmentations). Needed for finding the best ensemble. Default: True.work_dir (
str
) – working directory to save the intermediate and final results.
Examples
Use the one-liner to start the nnU-Net workflow
python -m monai.apps.nnunet nnUNetV2Runner run --input_config ./input.yaml
- Use convert_dataset to prepare the data to meet nnU-Net requirements, generate dataset JSON file,
and copy the dataset to a location specified by
nnunet_raw
in the input config file
python -m monai.apps.nnunet nnUNetV2Runner convert_dataset --input_config="./input.yaml"
convert_msd_dataset is an alternative option to prepare the data if the dataset is MSD.
python -m monai.apps.nnunet nnUNetV2Runner convert_msd_dataset \ --input_config "./input.yaml" --data_dir "/path/to/Task09_Spleen"
experiment planning and data pre-processing
python -m monai.apps.nnunet nnUNetV2Runner plan_and_process --input_config "./input.yaml"
- training all 20 models using all GPUs available.
“CUDA_VISIBLE_DEVICES” environment variable is not supported.
python -m monai.apps.nnunet nnUNetV2Runner train --input_config "./input.yaml"
training a single model on a single GPU for 5 epochs. Here
config
is used to specify the configuration.
python -m monai.apps.nnunet nnUNetV2Runner train_single_model --input_config "./input.yaml" \ --config "3d_fullres" \ --fold 0 \ --gpu_id 0 \ --trainer_class_name "nnUNetTrainer_5epochs" \ --export_validation_probabilities True
training for all 20 models (4 configurations by 5 folds) on 2 GPUs
python -m monai.apps.nnunet nnUNetV2Runner train --input_config "./input.yaml" --gpu_id_for_all "0,1"
5-fold training for a single model on 2 GPUs. Here
configs
is used to specify the configurations.
python -m monai.apps.nnunet nnUNetV2Runner train --input_config "./input.yaml" \ --configs "3d_fullres" \ --trainer_class_name "nnUNetTrainer_5epochs" \ --export_validation_probabilities True \ --gpu_id_for_all "0,1"
find the best configuration
python -m monai.apps.nnunet nnUNetV2Runner find_best_configuration --input_config "./input.yaml"
predict, ensemble, and post-process
python -m monai.apps.nnunet nnUNetV2Runner predict_ensemble_postprocessing --input_config "./input.yaml"
- convert_dataset()[source]#
Convert and make a copy the dataset to meet the requirements of nnU-Net workflow.
- convert_msd_dataset(data_dir, overwrite_id=None, n_proc=-1)[source]#
Convert and make a copy the MSD dataset to meet requirements of nnU-Net workflow.
- Parameters:
data_dir – downloaded and extracted MSD dataset folder. CANNOT be nnUNetv1 dataset! Example: “/workspace/downloads/Task05_Prostate”.
overwrite_id – Overwrite the dataset id. If not set then use the id of the MSD task (inferred from the folder name). Only use this if you already have an equivalently numbered dataset!
n_proc – Number of processes used.
- extract_fingerprints(fpe='DatasetFingerprintExtractor', npfp=-1, verify_dataset_integrity=False, clean=False, verbose=False)[source]#
Extracts the dataset fingerprint used for experiment planning.
- Parameters:
fpe (
str
) – [OPTIONAL] Name of the Dataset Fingerprint Extractor class that should be used. Default is “DatasetFingerprintExtractor”.npfp (
int
) – [OPTIONAL] Number of processes used for fingerprint extraction.verify_dataset_integrity (
bool
) – [RECOMMENDED] set this flag to check the dataset integrity. This is useful and should be done once for each dataset!clean (
bool
) – [OPTIONAL] Set this flag to overwrite existing fingerprints. If this flag is not set and a fingerprint already exists, the fingerprint extractor will not run.verbose (
bool
) – set this to print a lot of stuff. Useful for debugging. Will disable progress bar! Recommended for cluster environments.
- Return type:
None
- find_best_configuration(plans='nnUNetPlans', configs=(2d, 3d_fullres, 3d_lowres, 3d_cascade_fullres), trainers=None, allow_ensembling=True, num_processes=-1, overwrite=True, folds=(0, 1, 2, 3, 4), strict=False)[source]#
Find the best model configurations.
- Parameters:
plans – list of plan identifiers. Default: nnUNetPlans.
configs – list of configurations. Default: [“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”].
trainers – list of trainers. Default: nnUNetTrainer.
allow_ensembling – set this flag to enable ensembling.
num_processes – number of processes to use for ensembling, postprocessing, etc.
overwrite – if set we will overwrite already ensembled files etc. May speed up consecutive runs of this command (not recommended) at the risk of not updating outdated results.
folds – folds to use. Default: (0, 1, 2, 3, 4).
strict – a switch that triggers RunTimeError if the logging folder cannot be found. Default: False.
- plan_and_process(fpe='DatasetFingerprintExtractor', npfp=8, verify_dataset_integrity=False, no_pp=False, clean=False, pl='ExperimentPlanner', gpu_memory_target=8, preprocessor_name='DefaultPreprocessor', overwrite_target_spacing=None, overwrite_plans_name='nnUNetPlans', c=(2d, 3d_fullres, 3d_lowres), n_proc=(8, 8, 8), verbose=False)[source]#
Performs experiment planning and preprocessing before the training.
- Parameters:
fpe (
str
) – [OPTIONAL] Name of the Dataset Fingerprint Extractor class that should be used. Default is “DatasetFingerprintExtractor”.npfp (
int
) – [OPTIONAL] Number of processes used for fingerprint extraction. Default: 8.verify_dataset_integrity (
bool
) – [RECOMMENDED] set this flag to check the dataset integrity. This is useful and should be done once for each dataset!no_pp (
bool
) – [OPTIONAL] Set this to only run fingerprint extraction and experiment planning (no preprocessing). Useful for debugging.clean (
bool
) – [OPTIONAL] Set this flag to overwrite existing fingerprints. If this flag is not set and a fingerprint already exists, the fingerprint extractor will not run. REQUIRED IF YOU CHANGE THE DATASET FINGERPRINT EXTRACTOR OR MAKE CHANGES TO THE DATASET!pl (
str
) – [OPTIONAL] Name of the Experiment Planner class that should be used. Default is “ExperimentPlanner”. Note: There is no longer a distinction between 2d and 3d planner. It’s an all-in-one solution now.gpu_memory_target (
int
) – [OPTIONAL] DANGER ZONE! Sets a custom GPU memory target. Default: 8 [GB]. Changing this will affect patch and batch size and will definitely affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline).preprocessor_name (
str
) – [OPTIONAL] DANGER ZONE! Sets a custom preprocessor class. This class must be located in nnunetv2.preprocessing. Default: “DefaultPreprocessor”. Changing this may affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline).overwrite_target_spacing (
Optional
[Any
]) – [OPTIONAL] DANGER ZONE! Sets a custom target spacing for the 3d_fullres and 3d_cascade_fullres configurations. Default: None [no changes]. Changing this will affect image size and potentially patch and batch size. This will definitely affect your models performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline). Changing the target spacing for the other configurations is currently not implemented. New target spacing must be a list of three numbers!overwrite_plans_name (
str
) – [OPTIONAL] USE A CUSTOM PLANS IDENTIFIER. If you used -gpu_memory_target, -preprocessor_name or -overwrite_target_spacing it is best practice to use -overwrite_plans_name to generate a differently named plans file such that the nnunet default plans are not overwritten. You will then need to specify your custom plans file with -p whenever running other nnunet commands (training, inference, etc)c (
tuple
) – [OPTIONAL] Configurations for which the preprocessing should be run. Default: 2d 3f_fullres 3d_lowres. 3d_cascade_fullres does not need to be specified because it uses the data from 3f_fullres. Configurations that do not exist for some datasets will be skipped.n_proc (
tuple
) – [OPTIONAL] Use this to define how many processes are to be used. If this is just one number then this number of processes is used for all configurations specified with -c. If it’s a list of numbers this list must have as many elements as there are configurations. We then iterate over zip(configs, num_processes) to determine the number of processes used for each configuration. More processes are always faster (up to the number of threads your PC can support, so 8 for a 4-core CPU with hyperthreading. If you don’t know what that is then don’t touch it, or at least don’t increase it!). DANGER: More often than not the number of processes that can be used is limited by the amount of RAM available. Image resampling takes up a lot of RAM. MONITOR RAM USAGE AND DECREASE -n_proc IF YOUR RAM FILLS UP TOO MUCH! Default: 8 4 8 (=8 processes for 2d, 4 for 3d_fullres and 8 for 3d_lowres if -c is at its default).verbose (
bool
) – Set this to print a lot of stuff. Useful for debugging. Will disable progress bar! (Recommended for cluster environments).
- Return type:
None
- plan_experiments(pl='ExperimentPlanner', gpu_memory_target=8, preprocessor_name='DefaultPreprocessor', overwrite_target_spacing=None, overwrite_plans_name='nnUNetPlans')[source]#
Generate a configuration file that specifies the details of the experiment.
- Parameters:
pl (
str
) – [OPTIONAL] Name of the Experiment Planner class that should be used. Default is “ExperimentPlanner”. Note: There is no longer a distinction between 2d and 3d planner. It’s an all-in-one solution now.gpu_memory_target (
float
) – [OPTIONAL] DANGER ZONE! Sets a custom GPU memory target. Default: 8 [GB]. Changing this will affect patch and batch size and will definitely affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline).preprocessor_name (
str
) – [OPTIONAL] DANGER ZONE! Sets a custom preprocessor class. This class must be located in nnunetv2.preprocessing. Default: “DefaultPreprocessor”. Changing this may affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline).overwrite_target_spacing (
Optional
[Any
]) – [OPTIONAL] DANGER ZONE! Sets a custom target spacing for the 3d_fullres and 3d_cascade_fullres configurations. Default: None [no changes]. Changing this will affect image size and potentially patch and batch size. This will definitely affect your models’ performance! Only use this if you really know what you are doing and NEVER use this without running the default nnU-Net first (as a baseline). Changing the target spacing for the other configurations is currently not implemented. New target spacing must be a list of three numbers!overwrite_plans_name (
str
) – [OPTIONAL] DANGER ZONE! If you used -gpu_memory_target, -preprocessor_name or -overwrite_target_spacing it is best practice to use -overwrite_plans_name to generate a differently named plans file such that the nnunet default plans are not overwritten. You will then need to specify your custom plan.
- Return type:
None
- predict(list_of_lists_or_source_folder, output_folder, model_training_output_dir, use_folds=None, tile_step_size=0.5, use_gaussian=True, use_mirroring=True, perform_everything_on_gpu=True, verbose=True, save_probabilities=False, overwrite=True, checkpoint_name='checkpoint_final.pth', folder_with_segs_from_prev_stage=None, num_parts=1, part_id=0, num_processes_preprocessing=-1, num_processes_segmentation_export=-1, gpu_id=0)[source]#
- Use this to run inference with nnU-Net. This function is used when you want to manually specify a folder containing
a trained nnU-Net model. This is useful when the nnunet environment variables (nnUNet_results) are not set.
- Parameters:
list_of_lists_or_source_folder – input folder. Remember to use the correct channel numberings for your files (_0000 etc). File endings must be the same as the training dataset!
output_folder – Output folder. If it does not exist it will be created. Predicted segmentations will have the same name as their source images.
model_training_output_dir – folder in which the trained model is. Must have subfolders fold_X for the different folds you trained.
use_folds – specify the folds of the trained model that should be used for prediction Default: (0, 1, 2, 3, 4).
tile_step_size – step size for sliding window prediction. The larger it is the faster but less accurate the prediction. Default: 0.5. Cannot be larger than 1. We recommend the default.
use_gaussian – use Gaussian smoothing as test-time augmentation.
use_mirroring – use mirroring/flipping as test-time augmentation.
verbose – set this if you like being talked to. You will have to be a good listener/reader.
save_probabilities – set this to export predicted class “probabilities”. Required if you want to ensemble multiple configurations.
overwrite – overwrite an existing previous prediction (will not overwrite existing files)
checkpoint_name – name of the checkpoint you want to use. Default: checkpoint_final.pth.
folder_with_segs_from_prev_stage – folder containing the predictions of the previous stage. Required for cascaded models.
num_parts – number of separate nnUNetv2_predict call that you will be making. Default: 1 (= this one call predicts everything).
part_id – if multiple nnUNetv2_predict exist, which one is this? IDs start with 0 can end with num_parts - 1. So when you submit 5 nnUNetv2_predict calls you need to set -num_parts 5 and use -part_id 0, 1, 2, 3 and 4.
num_processes_preprocessing – out-of-RAM issues.
num_processes_segmentation_export – Number of processes used for segmentation export. More is not always better. Beware of out-of-RAM issues.
gpu_id – which GPU to use for prediction.
- predict_ensemble_postprocessing(folds=(0, 1, 2, 3, 4), run_ensemble=True, run_predict=True, run_postprocessing=True, **kwargs)[source]#
Run prediction, ensemble, and/or postprocessing optionally.
- Parameters:
folds (
tuple
) – which folds to userun_ensemble (
bool
) – whether to run ensembling.run_predict (
bool
) – whether to predict using trained checkpointsrun_postprocessing (
bool
) – whether to conduct post-processingkwargs (
Any
) – this optional parameter allows you to specify additional arguments defined in thepredict
method.
- Return type:
None
- preprocess(c=(2d, 3d_fullres, 3d_lowres), n_proc=(8, 8, 8), overwrite_plans_name='nnUNetPlans', verbose=False)[source]#
Apply a set of preprocessing operations to the input data before the training.
- Parameters:
overwrite_plans_name (
str
) – [OPTIONAL] You can use this to specify a custom plans file that you may have generated.c (
tuple
) – [OPTIONAL] Configurations for which the preprocessing should be run. Default: 2d 3f_fullres 3d_lowres. 3d_cascade_fullres does not need to be specified because it uses the data from 3f_fullres. Configurations that do not exist for some datasets will be skipped).n_proc (
tuple
) – [OPTIONAL] Use this to define how many processes are to be used. If this is just one number then this number of processes is used for all configurations specified with -c. If it’s a list of numbers this list must have as many elements as there are configurations. We then iterate over zip(configs, num_processes) to determine the number of processes used for each configuration. More processes are always faster (up to the number of threads your PC can support, so 8 for a 4-core CPU with hyperthreading. If you don’t know what that is then don’t touch it, or at least don’t increase it!). DANGER: More often than not the number of processes that can be used is limited by the amount of RAM available. Image resampling takes up a lot of RAM. MONITOR RAM USAGE AND DECREASE -n_proc IF YOUR RAM FILLS UP TOO MUCH! Default: 8 4 8 (=8 processes for 2d, 4 for 3d_fullres and 8 for 3d_lowres if -c is at its default).verbose (
bool
) – Set this to print a lot of stuff. Useful for debugging. Will disable the progress bar! Recommended for cluster environments.
- Return type:
None
- run(run_convert_dataset=True, run_plan_and_process=True, run_train=True, run_find_best_configuration=True, run_predict_ensemble_postprocessing=True)[source]#
Run the nnU-Net pipeline.
- Parameters:
run_convert_dataset (
bool
) – whether to convert datasets, defaults to True.run_plan_and_process (
bool
) – whether to preprocess and analyze the dataset, defaults to True.run_train (
bool
) – whether to train models, defaults to True.run_find_best_configuration (
bool
) – whether to find the best model (ensemble) configurations, defaults to True.run_predict_ensemble_postprocessing (
bool
) – whether to make predictions on test datasets, defaults to True.
- Return type:
None
- train(configs=(3d_fullres, 2d, 3d_lowres, 3d_cascade_fullres), gpu_id_for_all=None, **kwargs)[source]#
Run the training for all the models specified by the configurations. Note: to set the number of GPUs to use, use
gpu_id_for_all
instead of the CUDA_VISIBLE_DEVICES environment variable.- Parameters:
configs – configurations that should be trained. Default: (“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”).
gpu_id_for_all – a tuple/list/integer of GPU device ID(s) to use for the training. Default: None (all available GPUs).
kwargs – this optional parameter allows you to specify additional arguments defined in the
train_single_model
method.
- train_parallel(configs=(3d_fullres, 2d, 3d_lowres, 3d_cascade_fullres), gpu_id_for_all=None, **kwargs)[source]#
Create the line command for subprocess call for parallel training. Note: to set the number of GPUs to use, use
gpu_id_for_all
instead of the CUDA_VISIBLE_DEVICES environment variable.- Parameters:
configs – configurations that should be trained. default: (“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”).
gpu_id_for_all – a tuple/list/integer of GPU device ID(s) to use for the training. Default: None (all available GPUs).
kwargs – this optional parameter allows you to specify additional arguments defined in the
train_single_model
method.
- train_parallel_cmd(configs=(3d_fullres, 2d, 3d_lowres, 3d_cascade_fullres), gpu_id_for_all=None, **kwargs)[source]#
Create the line command for subprocess call for parallel training.
- Parameters:
configs – configurations that should be trained. Default: (“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”).
gpu_id_for_all – a tuple/list/integer of GPU device ID(s) to use for the training. Default: None (all available GPUs).
kwargs – this optional parameter allows you to specify additional arguments defined in the
train_single_model
method.
- train_single_model(config, fold, gpu_id=0, **kwargs)[source]#
Run the training on a single GPU with one specified configuration provided. Note: this will override the environment variable CUDA_VISIBLE_DEVICES.
- Parameters:
config – configuration that should be trained. Examples: “2d”, “3d_fullres”, “3d_lowres”.
fold – fold of the 5-fold cross-validation. Should be an int between 0 and 4.
gpu_id – an integer to select the device to use, or a tuple/list of GPU device indices used for multi-GPU training (e.g., (0,1)). Default: 0.
kwargs –
this optional parameter allows you to specify additional arguments in
nnunetv2.run.run_training.run_training_entry
.Currently supported args are:
p: custom plans identifier. Default: “nnUNetPlans”.
- pretrained_weights: path to nnU-Net checkpoint file to be used as pretrained model. Will only be
used when actually training. Beta. Use with caution. Default: False.
- use_compressed: True to use compressed data for training. Reading compressed data is much
more CPU and (potentially) RAM intensive and should only be used if you know what you are doing. Default: False.
c: continue training from latest checkpoint. Default: False.
- val: True to run the validation only. Requires training to have finished.
Default: False.
- disable_checkpointing: True to disable checkpointing. Ideal for testing things out and you
don’t want to flood your hard drive with checkpoints. Default: False.
- validate(configs=(3d_fullres, 2d, 3d_lowres, 3d_cascade_fullres), **kwargs)[source]#
Perform validation in all models defined by the configurations over 5 folds.
- Parameters:
configs (
tuple
) – configurations that should be trained. default: (“2d”, “3d_fullres”, “3d_lowres”, “3d_cascade_fullres”).kwargs (
Any
) – this optional parameter allows you to specify additional arguments defined in thetrain_single_model
method.
- Return type:
None
- validate_single_model(config, fold, **kwargs)[source]#
Perform validation on single model.
- Parameters:
config (
str
) – configuration that should be trained.fold (
int
) – fold of the 5-fold cross-validation. Should be an int between 0 and 4.kwargs (
Any
) – this optional parameter allows you to specify additional arguments defined in thetrain_single_model
method.
- Return type:
None