class monai.optimizers.Novograd(params, lr=0.001, betas=(0.9, 0.98), eps=1e-08, weight_decay=0, grad_averaging=False, amsgrad=False)[source]

Novograd based on Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks. The code is adapted from the implementations in Jasper for PyTorch, and OpenSeq2Seq.

  • params (Iterable) – iterable of parameters to optimize or dicts defining parameter groups.

  • lr (float) – learning rate. Defaults to 1e-3.

  • betas (Tuple[float, float]) – coefficients used for computing running averages of gradient and its square. Defaults to (0.9, 0.98).

  • eps (float) – term added to the denominator to improve numerical stability. Defaults to 1e-8.

  • weight_decay (float) – weight decay (L2 penalty). Defaults to 0.

  • grad_averaging (bool) – gradient averaging. Defaults to False.

  • amsgrad (bool) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond. Defaults to False.


Performs a single optimization step.


closure (Optional[Callable]) – A closure that reevaluates the model and returns the loss. Defaults to None.