Applications

Datasets

class monai.apps.MedNISTDataset(root_dir, section, transform=(), download=False, seed=0, val_frac=0.1, test_frac=0.1, cache_num=9223372036854775807, cache_rate=1.0, num_workers=0)[source]

The Dataset to automatically download MedNIST data and generate items for training, validation or test. It’s based on CacheDataset to accelerate the training process.

Parameters
  • root_dir (str) – target directory to download and load MedNIST dataset.

  • section (str) – expected data section, can be: training, validation or test.

  • transform (Union[Sequence[Callable], Callable]) – transforms to execute operations on input data. the default transform is LoadPNGd, which can load data into numpy array with [H, W] shape. for further usage, use AddChanneld to convert the shape to [C, H, W, D].

  • download (bool) – whether to download and extract the MedNIST from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy MedNIST.tar.gz file or MedNIST folder to root directory.

  • seed (int) – random seed to randomly split training, validation and test datasets, default is 0.

  • val_frac (float) – percentage of of validation fraction in the whole dataset, default is 0.1.

  • test_frac (float) – percentage of of test fraction in the whole dataset, default is 0.1.

  • cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • num_workers (int) – the number of worker threads to use. if 0 a single thread will be used. Default is 0.

Raises
  • ValueError – When root_dir is not a directory.

  • RuntimeError – When dataset_dir doesn’t exist and downloading is not selected (download=False).

randomize(data=None)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

None

class monai.apps.DecathlonDataset(root_dir, task, section, transform=(), download=False, seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=1.0, num_workers=0)[source]

The Dataset to automatically download the data of Medical Segmentation Decathlon challenge (http://medicaldecathlon.com/) and generate items for training, validation or test. It will also load these properties from the JSON config file of dataset. user can call get_properties() to get specified properties or all the properties loaded. It’s based on monai.data.CacheDataset to accelerate the training process.

Parameters
  • root_dir (str) – user’s local directory for caching and loading the MSD datasets.

  • task (str) – which task to download and execute: one of list (“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”).

  • section (str) – expected data section, can be: training, validation or test.

  • transform (Union[Sequence[Callable], Callable]) – transforms to execute operations on input data. the default transform is LoadNiftid, which can load Nifti format data into numpy array with [H, W, D] or [H, W, D, C] shape. for further usage, use AddChanneld or AsChannelFirstd to convert the shape to [C, H, W, D].

  • download (bool) – whether to download and extract the Decathlon from resource link, default is False. if expected file already exists, skip downloading even set it to True.

  • val_frac (float) – percentage of of validation fraction in the whole dataset, default is 0.2. user can manually copy tar file or dataset folder to the root directory.

  • seed (int) – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.

  • cache_num (int) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • cache_rate (float) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).

  • num_workers (int) – the number of worker threads to use. if 0 a single thread will be used. Default is 0.

Raises
  • ValueError – When root_dir is not a directory.

  • ValueError – When task is not one of [“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”].

  • RuntimeError – When dataset_dir doesn’t exist and downloading is not selected (download=False).

Example:

transform = Compose(
    [
        LoadNiftid(keys=["image", "label"]),
        AddChanneld(keys=["image", "label"]),
        ScaleIntensityd(keys="image"),
        ToTensord(keys=["image", "label"]),
    ]
)

val_data = DecathlonDataset(
    root_dir="./", task="Task09_Spleen", transform=transform, section="validation", seed=12345, download=True
)

print(val_data[0]["image"], val_data[0]["label"])
get_indices()[source]

Get the indices of datalist used in this dataset.

Return type

ndarray

get_properties(keys=None)[source]

Get the loaded properties of dataset with specified keys. If no keys specified, return all the loaded properties.

randomize(data)[source]

Within this method, self.R should be used, instead of np.random, to introduce random factors.

all self.R calls happen here so that we have a better chance to identify errors of sync the random state.

This method can generate the random factors based on properties of the input data.

Raises

NotImplementedError – When the subclass does not override this method.

Return type

None

class monai.apps.CrossValidation(dataset_cls, nfolds=5, seed=0, **dataset_params)[source]

Cross validation dataset based on the general dataset which must have _split_datalist API.

Parameters
  • dataset_cls – dataset class to be used to create the cross validation partitions. It must have _split_datalist API.

  • nfolds (int) – number of folds to split the data for cross validation.

  • seed (int) – random seed to randomly shuffle the datalist before splitting into N folds, default is 0.

  • dataset_params – other additional parameters for the dataset_cls base class.

Example of 5 folds cross validation training:

cvdataset = CrossValidation(
    dataset_cls=DecathlonDataset,
    nfolds=5,
    seed=12345,
    root_dir="./",
    task="Task09_Spleen",
    section="training",
    download=True,
)
dataset_fold0_train = cvdataset.get_dataset(folds=[1, 2, 3, 4])
dataset_fold0_val = cvdataset.get_dataset(folds=0)
# execute training for fold 0 ...

dataset_fold1_train = cvdataset.get_dataset(folds=[1])
dataset_fold1_val = cvdataset.get_dataset(folds=[0, 2, 3, 4])
# execute training for fold 1 ...

...

dataset_fold4_train = ...
# execute training for fold 4 ...
get_dataset(folds)[source]

Generate dataset based on the specified fold indice in the cross validation group.

Parameters

folds (Union[Sequence[int], int]) – index of folds for training or validation, if a list of values, concatenate the data.

Utilities

monai.apps.check_hash(filepath, val=None, hash_type='md5')[source]

Verify hash signature of specified file.

Parameters
  • filepath (str) – path of source file to verify hash value.

  • val (Optional[str]) – expected hash value of the file.

  • hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.

Return type

bool

monai.apps.download_url(url, filepath, hash_val=None, hash_type='md5')[source]

Download file from specified URL link, support process bar and hash check.

Parameters
  • url (str) – source URL link to download file.

  • filepath (str) – target filepath to save the downloaded file.

  • hash_val (Optional[str]) – expected hash value to validate the downloaded file. if None, skip hash validation.

  • hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.

Raises
  • RuntimeError – When the hash validation of the filepath existing file fails.

  • RuntimeError – When a network issue or denied permission prevents the file download from url to filepath.

  • URLError – See urllib.request.urlretrieve.

  • HTTPError – See urllib.request.urlretrieve.

  • ContentTooShortError – See urllib.request.urlretrieve.

  • IOError – See urllib.request.urlretrieve.

  • RuntimeError – When the hash validation of the url downloaded file fails.

Return type

None

monai.apps.extractall(filepath, output_dir, hash_val=None, hash_type='md5')[source]

Extract file to the output directory. Expected file types are: zip, tar.gz and tar.

Parameters
  • filepath (str) – the file path of compressed file.

  • output_dir (str) – target directory to save extracted files.

  • hash_val (Optional[str]) – expected hash value to validate the compressed file. if None, skip hash validation.

  • hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.

Raises
  • RuntimeError – When the hash validation of the filepath compressed file fails.

  • ValueError – When the filepath file extension is not one of [zip”, “tar.gz”, “tar”].

Return type

None

monai.apps.download_and_extract(url, filepath, output_dir, hash_val=None, hash_type='md5')[source]

Download file from URL and extract it to the output directory.

Parameters
  • url (str) – source URL link to download file.

  • filepath (str) – the file path of compressed file.

  • output_dir (str) – target directory to save extracted files. default is None to save in current directory.

  • hash_val (Optional[str]) – expected hash value to validate the downloaded file. if None, skip hash validation.

  • hash_type (str) – ‘md5’ or ‘sha1’, defaults to ‘md5’.

Return type

None