AdaDelta
class AdaDelta(learningRate: Float, rho: Float, epsilon: Float, clipGradient: ClipGradientAction) : Optimizer
Content copied to clipboard
Adadelta optimizer.
Updates variable according next formula:
accum = rho() * accum + (1 - rho()) * grad.square();
update = (update_accum + epsilon).sqrt() * (accum + epsilon()).rsqrt() * grad;
update_accum = rho() * update_accum + (1 - rho()) * update.square();
var -= update;
Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done. Compared to Adagrad, in the original version of Adadelta you don't have to set an initial learning rate. In this version, initial learning rate and decay factor can be set, as in most other Keras optimizers.
It is recommended to leave the parameters of this optimizer at their default values.
See also
Constructors
Properties
clipGradient
Link copied to clipboard
Strategy of gradient clipping as sub-class of ClipGradientAction.
optimizerName
Link copied to clipboard