Applications¶
Datasets¶
-
class
monai.apps.
MedNISTDataset
(root_dir, section, transform=(), download=False, seed=0, val_frac=0.1, test_frac=0.1, cache_num=9223372036854775807, cache_rate=1.0, num_workers=0)[source]¶ The Dataset to automatically download MedNIST data and generate items for training, validation or test. It’s based on CacheDataset to accelerate the training process.
- Parameters
root_dir (
str
) – target directory to download and load MedNIST dataset.section (
str
) – expected data section, can be: training, validation or test.transform (
Union
[Sequence
[Callable
],Callable
]) – transforms to execute operations on input data. the default transform is LoadPNGd, which can load data into numpy array with [H, W] shape. for further usage, use AddChanneld to convert the shape to [C, H, W, D].download (
bool
) – whether to download and extract the MedNIST from resource link, default is False. if expected file already exists, skip downloading even set it to True. user can manually copy MedNIST.tar.gz file or MedNIST folder to root directory.seed (
int
) – random seed to randomly split training, validation and test datasets, default is 0.val_frac (
float
) – percentage of of validation fraction in the whole dataset, default is 0.1.test_frac (
float
) – percentage of of test fraction in the whole dataset, default is 0.1.cache_num (
int
) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).cache_rate (
float
) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).num_workers (
int
) – the number of worker threads to use. if 0 a single thread will be used. Default is 0.
- Raises
ValueError – When
root_dir
is not a directory.RuntimeError – When
dataset_dir
doesn’t exist and downloading is not selected (download=False
).
-
randomize
(data=None)[source]¶ Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
None
-
class
monai.apps.
DecathlonDataset
(root_dir, task, section, transform=(), download=False, seed=0, val_frac=0.2, cache_num=9223372036854775807, cache_rate=1.0, num_workers=0)[source]¶ The Dataset to automatically download the data of Medical Segmentation Decathlon challenge (http://medicaldecathlon.com/) and generate items for training, validation or test. It will also load these properties from the JSON config file of dataset. user can call get_properties() to get specified properties or all the properties loaded. It’s based on
monai.data.CacheDataset
to accelerate the training process.- Parameters
root_dir (
str
) – user’s local directory for caching and loading the MSD datasets.task (
str
) – which task to download and execute: one of list (“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”).section (
str
) – expected data section, can be: training, validation or test.transform (
Union
[Sequence
[Callable
],Callable
]) – transforms to execute operations on input data. the default transform is LoadNiftid, which can load Nifti format data into numpy array with [H, W, D] or [H, W, D, C] shape. for further usage, use AddChanneld or AsChannelFirstd to convert the shape to [C, H, W, D].download (
bool
) – whether to download and extract the Decathlon from resource link, default is False. if expected file already exists, skip downloading even set it to True.val_frac (
float
) – percentage of of validation fraction in the whole dataset, default is 0.2. user can manually copy tar file or dataset folder to the root directory.seed (
int
) – random seed to randomly shuffle the datalist before splitting into training and validation, default is 0. note to set same seed for training and validation sections.cache_num (
int
) – number of items to be cached. Default is sys.maxsize. will take the minimum of (cache_num, data_length x cache_rate, data_length).cache_rate (
float
) – percentage of cached data in total, default is 1.0 (cache all). will take the minimum of (cache_num, data_length x cache_rate, data_length).num_workers (
int
) – the number of worker threads to use. if 0 a single thread will be used. Default is 0.
- Raises
ValueError – When
root_dir
is not a directory.ValueError – When
task
is not one of [“Task01_BrainTumour”, “Task02_Heart”, “Task03_Liver”, “Task04_Hippocampus”, “Task05_Prostate”, “Task06_Lung”, “Task07_Pancreas”, “Task08_HepaticVessel”, “Task09_Spleen”, “Task10_Colon”].RuntimeError – When
dataset_dir
doesn’t exist and downloading is not selected (download=False
).
Example:
transform = Compose( [ LoadNiftid(keys=["image", "label"]), AddChanneld(keys=["image", "label"]), ScaleIntensityd(keys="image"), ToTensord(keys=["image", "label"]), ] ) val_data = DecathlonDataset( root_dir="./", task="Task09_Spleen", transform=transform, section="validation", seed=12345, download=True ) print(val_data[0]["image"], val_data[0]["label"])
-
get_properties
(keys=None)[source]¶ Get the loaded properties of dataset with specified keys. If no keys specified, return all the loaded properties.
-
randomize
(data)[source]¶ Within this method,
self.R
should be used, instead of np.random, to introduce random factors.all
self.R
calls happen here so that we have a better chance to identify errors of sync the random state.This method can generate the random factors based on properties of the input data.
- Raises
NotImplementedError – When the subclass does not override this method.
- Return type
None
-
class
monai.apps.
CrossValidation
(dataset_cls, nfolds=5, seed=0, **dataset_params)[source]¶ Cross validation dataset based on the general dataset which must have _split_datalist API.
- Parameters
dataset_cls – dataset class to be used to create the cross validation partitions. It must have _split_datalist API.
nfolds (
int
) – number of folds to split the data for cross validation.seed (
int
) – random seed to randomly shuffle the datalist before splitting into N folds, default is 0.dataset_params – other additional parameters for the dataset_cls base class.
Example of 5 folds cross validation training:
cvdataset = CrossValidation( dataset_cls=DecathlonDataset, nfolds=5, seed=12345, root_dir="./", task="Task09_Spleen", section="training", download=True, ) dataset_fold0_train = cvdataset.get_dataset(folds=[1, 2, 3, 4]) dataset_fold0_val = cvdataset.get_dataset(folds=0) # execute training for fold 0 ... dataset_fold1_train = cvdataset.get_dataset(folds=[1]) dataset_fold1_val = cvdataset.get_dataset(folds=[0, 2, 3, 4]) # execute training for fold 1 ... ... dataset_fold4_train = ... # execute training for fold 4 ...
Utilities¶
-
monai.apps.
check_hash
(filepath, val=None, hash_type='md5')[source]¶ Verify hash signature of specified file.
- Parameters
filepath (
str
) – path of source file to verify hash value.val (
Optional
[str
]) – expected hash value of the file.hash_type (
str
) – ‘md5’ or ‘sha1’, defaults to ‘md5’.
- Return type
bool
-
monai.apps.
download_url
(url, filepath, hash_val=None, hash_type='md5')[source]¶ Download file from specified URL link, support process bar and hash check.
- Parameters
url (
str
) – source URL link to download file.filepath (
str
) – target filepath to save the downloaded file.hash_val (
Optional
[str
]) – expected hash value to validate the downloaded file. if None, skip hash validation.hash_type (
str
) – ‘md5’ or ‘sha1’, defaults to ‘md5’.
- Raises
RuntimeError – When the hash validation of the
filepath
existing file fails.RuntimeError – When a network issue or denied permission prevents the file download from
url
tofilepath
.URLError – See urllib.request.urlretrieve.
HTTPError – See urllib.request.urlretrieve.
ContentTooShortError – See urllib.request.urlretrieve.
IOError – See urllib.request.urlretrieve.
RuntimeError – When the hash validation of the
url
downloaded file fails.
- Return type
None
-
monai.apps.
extractall
(filepath, output_dir, hash_val=None, hash_type='md5')[source]¶ Extract file to the output directory. Expected file types are: zip, tar.gz and tar.
- Parameters
filepath (
str
) – the file path of compressed file.output_dir (
str
) – target directory to save extracted files.hash_val (
Optional
[str
]) – expected hash value to validate the compressed file. if None, skip hash validation.hash_type (
str
) – ‘md5’ or ‘sha1’, defaults to ‘md5’.
- Raises
RuntimeError – When the hash validation of the
filepath
compressed file fails.ValueError – When the
filepath
file extension is not one of [zip”, “tar.gz”, “tar”].
- Return type
None
-
monai.apps.
download_and_extract
(url, filepath, output_dir, hash_val=None, hash_type='md5')[source]¶ Download file from URL and extract it to the output directory.
- Parameters
url (
str
) – source URL link to download file.filepath (
str
) – the file path of compressed file.output_dir (
str
) – target directory to save extracted files. default is None to save in current directory.hash_val (
Optional
[str
]) – expected hash value to validate the downloaded file. if None, skip hash validation.hash_type (
str
) – ‘md5’ or ‘sha1’, defaults to ‘md5’.
- Return type
None