Adagrad Optimiser¶
adagrad_optimiser_type
adagrad_optimiser_type(
learning_rate=0.01,
epsilon=1.0e-8,
num_params=...,
regulariser=...,
clip_dict=...,
lr_decay=...
)
Adaptive Gradient (Adagrad) optimiser adapts the learning rate for each parameter based on historical gradients.
The update rule:
where \(\eta\) is the learning rate, \(G_t\) accumulates squared gradients, and \(\epsilon\) prevents division by zero.
Arguments¶
learning_rate (real): Step size for parameter updates. Default:
0.01.epsilon (real): Small constant for numerical stability. Default:
1.0e-8.num_params (integer): Number of parameters to optimise.
regulariser (class(base_regulariser_type)): Regularisation method (e.g., L2 regularisation).
clip_dict (type(clip_type)): Gradient clipping configuration.
lr_decay (class(base_lr_decay_type)): Learning rate decay schedule.
Notes:¶
Adagrad performs larger updates for infrequent parameters and smaller updates for frequent parameters. It works well for sparse data but can cause premature convergence due to aggressive learning rate decay.