Adamax

class Adamax(learningRate: Float, beta1: Float, beta2: Float, epsilon: Float, clipGradient: ClipGradientAction) : Optimizer

Adamax optimizer from Adam paper's Section 7.

Updates variable according next formula:

m_t <- beta1 * m_{t-1} + (1 - beta1) * g
v_t <- max(beta2 * v_{t-1}, abs(g))
variable <- variable - learning_rate / (1 - beta1^t) * m_t / (v_t + epsilon)

It is a variant of Adam based on the infinity norm. Default parameters follow those provided in the paper.

NOTE: This optimizer works on CPU only. It has known bug on GPU: NaN instead of gradient values https://github.com/tensorflow/tensorflow/issues/26256

It is recommended to leave the parameters of this optimizer at their default values.

Constructors

Adamax
Link copied to clipboard
fun Adamax(learningRate: Float = 0.001f, beta1: Float = 0.9f, beta2: Float = 0.999f, epsilon: Float = 1e-07f, clipGradient: ClipGradientAction = NoClipGradient())

Properties

clipGradient
Link copied to clipboard
val clipGradient: ClipGradientAction

Strategy of gradient clipping as subclass of ClipGradientAction.

optimizerName
Link copied to clipboard
open override val optimizerName: String

Returns optimizer name.