Optimizers¶
Novograd¶
-
class
monai.optimizers.
Novograd
(params, lr=0.001, betas=(0.9, 0.98), eps=1e-08, weight_decay=0, grad_averaging=False, amsgrad=False)[source]¶ Novograd based on Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks. The code is adapted from the implementations in Jasper for PyTorch, and OpenSeq2Seq.
- Parameters
params (
Iterable
) – iterable of parameters to optimize or dicts defining parameter groups.lr (
float
) – learning rate. Defaults to 1e-3.betas (
Tuple
[float
,float
]) – coefficients used for computing running averages of gradient and its square. Defaults to (0.9, 0.98).eps (
float
) – term added to the denominator to improve numerical stability. Defaults to 1e-8.weight_decay (
float
) – weight decay (L2 penalty). Defaults to 0.grad_averaging (
bool
) – gradient averaging. Defaults toFalse
.amsgrad (
bool
) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Defaults toFalse
.
Generate parameter groups¶
-
monai.optimizers.
generate_param_groups
(network, layer_matches, match_types, lr_values, include_others=True)[source]¶ Utility function to generate parameter groups with different LR values for optimizer. The output parameter groups have the same order as layer_match functions.
- Parameters
network (
Module
) – source network to generate parameter groups from.layer_matches (
Sequence
[Callable
]) – a list of callable functions to select or filter out network layer groups, for “select” type, the input will be the network, for “filter” type, the input will be every item of network.named_parameters().match_types (
Sequence
[str
]) – a list of tags to identify the matching type corresponding to the layer_matches functions, can be “select” or “filter”.lr_values (
Sequence
[float
]) – a list of LR values corresponding to the layer_matches functions.include_others (
bool
) – whether to include the rest layers as the last group, default to True.
It’s mainly used to set different LR values for different network elements, for example:
net = Unet(dimensions=3, in_channels=1, out_channels=3, channels=[2, 2, 2], strides=[1, 1, 1]) print(net) # print out network components to select expected items print(net.named_parameters()) # print out all the named parameters to filter out expected items params = generate_param_groups( network=net, layer_matches=[lambda x: x.model[-1], lambda x: "conv.weight" in x], match_types=["select", "filter"], lr_values=[1e-2, 1e-3], ) # the groups will be a list of dictionaries: # [{'params': <generator object Module.parameters at 0x7f9090a70bf8>, 'lr': 0.01}, # {'params': <filter object at 0x7f9088fd0dd8>, 'lr': 0.001}, # {'params': <filter object at 0x7f9088fd0da0>}] optimizer = torch.optim.Adam(params, 1e-4)