Adamax
class Adamax(learningRate: Float, beta1: Float, beta2: Float, epsilon: Float, clipGradient: ClipGradientAction) : Optimizer
Content copied to clipboard
Adamax optimizer from Adam paper's Section 7.
Updates variable according next formula:
m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- max(beta2 * v_{t-1}, abs(g))
variable <- variable - learning_rate / (1 - beta1^t) * m_t / (v_t + epsilon)
It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the paper.
NOTE: This optimizer works on CPU only. It has known bug on GPU: NaN instead of gradient values https://github.com/tensorflow/tensorflow/issues/26256
It is recommended to leave the parameters of this optimizer at their default values.
Constructors
Properties
clipGradient
Link copied to clipboard
Strategy of gradient clipping as sub-class of ClipGradientAction.
optimizerName
Link copied to clipboard