Gradient Clipping¶

clip_type

Prevents gradient explosion by limiting gradient magnitudes during backpropagation.

clip_type(
  clip_min=...,
  clip_max=...,
  clip_norm=...
)

Arguments¶

clip_min (real, optional): Minimum allowed gradient value
clip_max (real, optional): Maximum allowed gradient value
clip_norm (real, optional): Maximum allowed L2-norm

Value clipping and norm clipping can be used independently or together.

Usage¶

Norm Clipping¶

use athena

type(clip_type) :: clipper

clipper = clip_type(clip_norm=1.0)

call network%compile( &
     optimiser_type=adam_optimiser_type( &
          learning_rate=0.001, &
          clip_dict=clipper), &
     loss_method="mse")

Value Clipping¶

clipper = clip_type(clip_min=-0.5, clip_max=0.5)

call network%compile( &
     optimiser_type=sgd_optimiser_type( &
          learning_rate=0.01, &
          clip_dict=clipper), &
     loss_method="categorical_crossentropy")

Combined Clipping¶

clipper = clip_type( &
     clip_min=-1.0, &
     clip_max=1.0, &
     clip_norm=5.0)

Typical Values¶

RNNs/LSTMs: clip_norm = 0.5 to 2.0
GRUs: clip_norm = 1.0 to 5.0
CNNs: clip_norm = 5.0 to 10.0
GNNs: clip_norm = 0.5 to 2.0
PINNs: clip_min = -0.1, clip_max = 0.1

See Also¶

Training Configuration: Overview
Learning Rate Decay: Gradually reducing learning rate
Regularisation: Preventing overfitting