Huber Loss

huber_loss_type

huber_loss_type(gamma=1.0)

Huber loss combines the best properties of MSE and MAE. It is quadratic for small errors and linear for large errors, making it robust to outliers while maintaining smooth gradients.

\[\begin{split}L_\gamma(y, \hat{y}) = \begin{cases} \frac{1}{2}(y - \hat{y})^2 & \text{if } |y - \hat{y}| \leq \gamma \\ \gamma |y - \hat{y}| - \frac{1}{2}\gamma^2 & \text{otherwise} \end{cases}\end{split}\]

where \(\gamma\) is the threshold parameter that determines the transition point.

Arguments

  • gamma (real): Threshold parameter. Default: 1.0.

    • Smaller values make the loss more similar to MAE

    • Larger values make it more similar to MSE

Use Cases

  • Regression with outliers

  • Robust regression problems

  • Time series with anomalies

  • When you want a balance between MSE and MAE properties

Example

use athena__loss

type(huber_loss_type) :: loss
type(array_type), dimension(:,:) :: predicted, expected
type(array_type), pointer :: loss_value

! Initialise loss function with custom gamma
loss = huber_loss_type()
loss%gamma = 0.5  ! Adjust sensitivity to outliers

! Compute loss
loss_value => loss%compute(predicted, expected)

Properties

Advantages: - Robust to outliers (like MAE) - Smooth gradients everywhere (like MSE) - Adjustable sensitivity via gamma parameter - Convex function

Characteristics: - Differentiable everywhere - Less sensitive to outliers than MSE - Faster convergence than MAE for small errors - Computationally efficient

Choosing Gamma

The gamma parameter controls the trade-off:

  • Small gamma (e.g., 0.1-0.5): More robust to outliers, behaves more like MAE

  • Medium gamma (e.g., 1.0-2.0): Balanced approach (default)

  • Large gamma (e.g., 5.0+): Less robust to outliers, behaves more like MSE

Rule of thumb: Set gamma to approximately the expected scale of typical residuals.

Mathematical Properties

Gradient:

\[\begin{split}\frac{\partial L_\gamma}{\partial \hat{y}} = \begin{cases} \hat{y} - y & \text{if } |y - \hat{y}| \leq \gamma \\ \gamma \cdot \text{sign}(\hat{y} - y) & \text{otherwise} \end{cases}\end{split}\]

Notes

  • Also known as smooth L1 loss

  • Reduces to MSE when gamma → ∞

  • Reduces to MAE when gamma → 0

  • Commonly used in robust statistics and reinforcement learning

  • Particularly effective for regression with heterogeneous noise

See Also

  • MSE Loss - For problems without outliers

  • MAE Loss - For maximum outlier robustness