AdaGradDA

class AdaGradDA(learningRate: Float, initialAccumulatorValue: Float, l1Strength: Float, l2Strength: Float, clipGradient: ClipGradientAction) : Optimizer

Adagrad Dual Averaging algorithm for sparse linear models.

This optimizer takes care of regularization of unseen features in a mini batch by updating them when they are seen with a closed form update rule that is equivalent to having updated them on every mini-batch.

AdagradDA is typically used when there is a need for large sparsity in the trained model. This optimizer only guarantees sparsity for linear models. Be careful when using AdagradDA for deep networks as it will require careful initialization of the gradient accumulators for it to train.

It is recommended to leave the parameters of this optimizer at their default values.

See also

Constructors

AdaGradDA
Link copied to clipboard
fun AdaGradDA(learningRate: Float = 0.1f, initialAccumulatorValue: Float = 0.01f, l1Strength: Float = 0.01f, l2Strength: Float = 0.01f, clipGradient: ClipGradientAction = NoClipGradient())

Properties

clipGradient
Link copied to clipboard
val clipGradient: ClipGradientAction

Strategy of gradient clipping as sub-class of ClipGradientAction.

optimizerName
Link copied to clipboard
open override val optimizerName: String

Returns optimizer name.