Regularisation¶

Adds penalty terms to the loss function to prevent overfitting by encouraging simpler models.

Available Types¶

l1_regulariser_type: L1 regularisation (Lasso)
l2_regulariser_type: L2 regularisation (Ridge)
l1l2_regulariser_type: Combined L1 and L2 regularisation

L1 Regularisation¶

l1_regulariser_type

Adds penalty based on absolute values of weights: \(L_{total} = L + \lambda_1 \sum_i |w_i|\)

Encourages sparsity (many weights become exactly zero).

l1_regulariser_type()

Attributes¶

l1 (real): L1 regularisation parameter. Default: 0.01.

Usage¶

use athena

type(l1_regulariser_type) :: regulariser

regulariser = l1_regulariser_type()
regulariser%l1 = 0.001

call network%compile( &
     optimiser_type=adam_optimiser_type( &
          learning_rate=0.001, &
          regulariser=regulariser), &
     loss_method="mse")

L2 Regularisation¶

l2_regulariser_type

Adds penalty based on squared weights: \(L_{total} = L + \lambda_2 \sum_i w_i^2\)

Encourages small weights distributed across all parameters.

l2_regulariser_type()

Attributes¶

l2 (real): L2 regularisation parameter. Default: 0.01.
l2_decoupled (real): Decoupled weight decay parameter (AdamW). Default: 0.01.
decoupled (logical): Use decoupled weight decay. Default: .true..

Usage¶

type(l2_regulariser_type) :: regulariser

regulariser = l2_regulariser_type()
regulariser%l2 = 0.0001

call network%compile( &
     optimiser_type=adam_optimiser_type( &
          learning_rate=0.001, &
          regulariser=regulariser), &
     loss_method="categorical_crossentropy")

Decoupled Weight Decay (AdamW)¶

For Adam optimiser, decoupled weight decay is recommended:

regulariser = l2_regulariser_type()
regulariser%l2_decoupled = 0.01
regulariser%decoupled = .true.  ! Default

call network%compile( &
     optimiser_type=adam_optimiser_type( &
          learning_rate=0.001, &
          regulariser=regulariser), &
     loss_method="mse")

Combined L1/L2 Regularisation¶

l1l2_regulariser_type

Also known as Elastic Net regularisation. Combines both L1 and L2 penalties.

l1l2_regulariser_type()

Attributes¶

l1 (real): L1 regularisation parameter. Default: 0.01.
l2 (real): L2 regularisation parameter. Default: 0.01.

Usage¶

type(l1l2_regulariser_type) :: regulariser

regulariser = l1l2_regulariser_type()
regulariser%l1 = 0.0001
regulariser%l2 = 0.001

call network%compile( &
     optimiser_type=sgd_optimiser_type( &
          learning_rate=0.01, &
          regulariser=regulariser), &
     loss_method="mse")

Typical Values¶

L1 Regularisation¶

Weak: l1 = 0.00001 to 0.0001
Moderate: l1 = 0.0001 to 0.001
Strong: l1 = 0.001 to 0.01

Use L1 when you want feature selection or sparse models.

L2 Regularisation¶

Weak: l2 = 0.00001 to 0.0001
Moderate: l2 = 0.0001 to 0.001
Strong: l2 = 0.001 to 0.01

Use L2 as the default choice for most problems.

L1/L2 Combined¶

Typically use l1 << l2 (e.g., l1 = 0.0001, l2 = 0.001)
Start with L2 alone, add L1 if you need sparsity

When to Use¶

Small datasets: Use stronger regularisation (higher λ values)
Large models: Models with many parameters benefit from regularisation
Overfitting symptoms: Large gap between training and validation performance
Less with dropout: If using dropout layers, reduce regularisation strength
Less with batch norm: Batch normalisation provides some regularisation effect

Regularisation¶

Available Types¶

L1 Regularisation¶

Attributes¶

Usage¶

L2 Regularisation¶

Attributes¶

Usage¶

Decoupled Weight Decay (AdamW)¶

Combined L1/L2 Regularisation¶

Attributes¶

Usage¶

Typical Values¶

L1 Regularisation¶

L2 Regularisation¶

L1/L2 Combined¶

When to Use¶

See Also¶