Learning Rate Decay¶
Gradually reduces the learning rate during training.
Available Types¶
base_lr_decay_type: No decay (constant learning rate)
exp_lr_decay_type: Exponential decay
step_lr_decay_type: Step-wise decay at fixed intervals
inv_lr_decay_type: Inverse time decay
Exponential Decay¶
exp_lr_decay_type
exp_lr_decay_type(decay_rate=...)
Smooth exponential decay: \(\eta_t = \eta_0 \cdot e^{-t \cdot r}\)
Arguments¶
decay_rate (real, optional): Decay rate. Default:
0.9.
Usage¶
use athena
type(exp_lr_decay_type) :: lr_schedule
lr_schedule = exp_lr_decay_type(decay_rate=0.01)
call network%compile( &
optimiser_type=adam_optimiser_type( &
learning_rate=0.001, &
lr_decay=lr_schedule), &
loss_method="categorical_crossentropy")
Step Decay¶
step_lr_decay_type
step_lr_decay_type(decay_rate=..., decay_steps=...)
Discrete drops every n epochs: \(\eta_t = \eta_0 \cdot r^{\lfloor t / s \rfloor}\)
Arguments¶
decay_rate (real, optional): Multiplicative decay factor. Default:
0.1.decay_steps (integer, optional): Number of epochs between decays. Default:
100.
Usage¶
type(step_lr_decay_type) :: lr_schedule
! Reduce by half every 10 epochs
lr_schedule = step_lr_decay_type( &
decay_rate=0.5, &
decay_steps=10)
call network%compile( &
optimiser_type=sgd_optimiser_type( &
learning_rate=0.1, &
lr_decay=lr_schedule), &
loss_method="mse")
Inverse Time Decay¶
inv_lr_decay_type
inv_lr_decay_type(decay_rate=..., decay_power=...)
Inverse time decay: \(\eta_t = \frac{\eta_0}{(1 + r \cdot t)^p}\)
Arguments¶
decay_rate (real, optional): Decay rate coefficient. Default:
0.001.decay_power (real, optional): Exponent for decay. Default:
1.0.
Usage¶
type(inv_lr_decay_type) :: lr_schedule
lr_schedule = inv_lr_decay_type( &
decay_rate=0.001, &
decay_power=1.0)
call network%compile( &
optimiser_type=adam_optimiser_type( &
learning_rate=0.01, &
lr_decay=lr_schedule), &
loss_method="binary_crossentropy")
Typical Values¶
Exponential Decay¶
Small datasets (<1k samples):
decay_rate= 0.001 to 0.01Medium datasets (1k-100k):
decay_rate= 0.01 to 0.05Large datasets (>100k):
decay_rate= 0.05 to 0.1
Step Decay¶
Conservative:
decay_rate= 0.5,decay_steps= 20-50Aggressive:
decay_rate= 0.1,decay_steps= 10-30Fine-tuning:
decay_rate= 0.3,decay_steps= 5-10
Inverse Time Decay¶
Long training:
decay_rate= 0.0001 to 0.001,decay_power= 0.5 to 1.0Medium training:
decay_rate= 0.001 to 0.01,decay_power= 1.0
See Also¶
Training Configuration: Overview
Gradient Clipping: Preventing exploding gradients
Regularisation: Preventing overfitting