Adamax
class Adamax(learningRate: Float, beta1: Float, beta2: Float, epsilon: Float, clipGradient: ClipGradientAction) : Optimizer
Content copied to clipboard
Adamax optimizer from Adam paper's Section 7.
Updates variable according next formula:
m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- max(beta2 * v_{t-1}, abs(g))
variable <- variable - learning_rate / (1 - beta1^t) * m_t / (v_t + epsilon)
It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the paper.
NOTE: This optimizer works on CPU only. It has known bug on GPU: NaN instead of gradient values https://github.com/tensorflow/tensorflow/issues/26256
It is recommended to leave the parameters of this optimizer at their default values.
Constructors
Properties
clipGradient
Link copied to clipboard
Strategy of gradient clipping as subclass of ClipGradientAction.
optimizerName
Link copied to clipboard