class monai.optimizers.Novograd(params, lr=0.001, betas=(0.9, 0.98), eps=1e-08, weight_decay=0, grad_averaging=False, amsgrad=False)[source]

Novograd based on Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks. The code is adapted from the implementations in Jasper for PyTorch, and OpenSeq2Seq.

  • params (Iterable) – iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – learning rate. Defaults to 1e-3.

  • betas (Tuple[float, float]) – coefficients used for computing running averages of gradient and its square. Defaults to (0.9, 0.98).

  • eps (float) – term added to the denominator to improve numerical stability. Defaults to 1e-8.

  • weight_decay (float) – weight decay (L2 penalty). Defaults to 0.

  • grad_averaging (bool) – gradient averaging. Defaults to False.

  • amsgrad (bool) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Defaults to False.


Performs a single optimization step.


closure (Optional[Callable]) – A closure that reevaluates the model and returns the loss. Defaults to None.

Generate parameter groups

monai.optimizers.generate_param_groups(network, layer_matches, match_types, lr_values, include_others=True)[source]

Utility function to generate parameter groups with different LR values for optimizer. The output parameter groups have the same order as layer_match functions.

  • network (Module) – source network to generate parameter groups from.

  • layer_matches (Sequence[Callable]) – a list of callable functions to select or filter out network layer groups, for “select” type, the input will be the network, for “filter” type, the input will be every item of network.named_parameters(). for “select”, the parameters will be select_func(network).parameters(). for “filter”, the parameters will be map(lambda x: x[1], filter(filter_func, network.named_parameters()))

  • match_types (Sequence[str]) – a list of tags to identify the matching type corresponding to the layer_matches functions, can be “select” or “filter”.

  • lr_values (Sequence[float]) – a list of LR values corresponding to the layer_matches functions.

  • include_others (bool) – whether to include the rest layers as the last group, default to True.

It’s mainly used to set different LR values for different network elements, for example:

net = Unet(spatial_dims=3, in_channels=1, out_channels=3, channels=[2, 2, 2], strides=[1, 1, 1])
print(net)  # print out network components to select expected items
print(net.named_parameters())  # print out all the named parameters to filter out expected items
params = generate_param_groups(
    layer_matches=[lambda x: x.model[0], lambda x: "2.0.conv" in x[0]],
    match_types=["select", "filter"],
    lr_values=[1e-2, 1e-3],
# the groups will be a list of dictionaries:
# [{'params': <generator object Module.parameters at 0x7f9090a70bf8>, 'lr': 0.01},
#  {'params': <filter object at 0x7f9088fd0dd8>, 'lr': 0.001},
#  {'params': <filter object at 0x7f9088fd0da0>}]
optimizer = torch.optim.Adam(params, 1e-4)