Orthogonal Neural Operator Block

orthogonal_nop_block_type

orthogonal_nop_block_type(
  num_outputs,
  num_basis,
  num_inputs=...,
  use_bias=.true.,
  activation="none",
  kernel_initialiser=...,
  bias_initialiser=...
)

The orthogonal_nop_block_type derived type provides an orthogonal neural operator block. It combines a learned orthonormal basis, a spectral mixing path, and a local bypass:

\[\mathbf{v} = \sigma\left(\mathbf{W}\,\mathbf{\Phi}\,\mathbf{R}\,\mathbf{\Phi}^T\,\mathbf{u} + \mathbf{W}\,\mathbf{u} + \mathbf{b}\right)\]

where:

  • \(\mathbf{u} \in \mathbb{R}^{n_{in}}\) is the input sampled on a grid

  • \(\mathbf{\Phi} \in \mathbb{R}^{n_{in} \times k}\) is the learned orthonormal basis obtained from basis weights \(\mathbf{B}\)

  • \(\mathbf{R} \in \mathbb{R}^{k \times k}\) is the learnable spectral mixing matrix

  • \(\mathbf{W} \in \mathbb{R}^{n_{out} \times n_{in}}\) is the learnable bypass and output projection matrix

  • \(\mathbf{b} \in \mathbb{R}^{n_{out}}\) is the bias vector when use_bias=.true.

  • \(k\) is num_basis

  • \(\sigma\) is the activation function

The basis matrix \(\mathbf{\Phi}\) is formed by orthogonalising the learnable basis weights with modified Gram-Schmidt. This gives a low-rank operator block whose non-local interaction scales with the chosen basis size rather than the full grid resolution.

Arguments

  • num_outputs (integer): Number of output discretisation points.

  • num_basis (integer): Number of orthogonal basis functions.

  • num_inputs (integer): Number of input discretisation points. If not provided, it is inferred when the block is initialised.

  • use_bias (logical): If .false., the block will not use a bias term. Default: .true..

  • activation (class(*)): Activation function for the block.

    • Accepts character(*) or class(base_actv_type).

    • See Activation Functions for available options.

    • Default: none_actv_type.

  • kernel_initialiser (class(*)): Initialiser for the spectral matrix \(\mathbf{R}\), basis weights \(\mathbf{B}\), and bypass weights \(\mathbf{W}\) (see Initialisers).

    • If activation is selu_actv_type, default: lecun_normal_init_type.

    • If activation is a version of relu_actv_type, default: he_normal_init_type.

    • For all other activations, default: glorot_uniform_init_type.

  • bias_initialiser (class(*)): Initialiser for the biases (see Initialisers). Default: zeros_init_type.

Shape

  • Input: (num_inputs, batch_size).

  • Output: (num_outputs, batch_size).

Parameters

The block contains the following learnable parameters:

  • R: Spectral mixing matrix of shape (num_basis, num_basis).

  • B: Basis weight matrix of shape (num_inputs, num_basis).

  • W: Local bypass and output projection matrix of shape (num_outputs, num_inputs).

  • b: Bias vector of shape (num_outputs) when use_bias=.true..

The following tensors are derived from the basis weights and rebuilt during forward propagation:

  • Phi: Orthogonal basis of shape (num_inputs, num_basis).

  • Phi_T: Transposed orthogonal basis of shape (num_basis, num_inputs).

Total learnable parameters:

  • With bias: num_basis * num_basis + num_inputs * num_basis + num_outputs * num_inputs + num_outputs

  • Without bias: num_basis * num_basis + num_inputs * num_basis + num_outputs * num_inputs

Examples

Basic orthogonal neural operator block:

use athena
type(network_type) :: network

call network%add(orthogonal_nop_block_type( &
     num_inputs=128, &
     num_outputs=128, &
     num_basis=16, &
     activation="relu" &
))

Stacked orthogonal operator block:

call network%add(orthogonal_nop_block_type( &
     num_inputs=256, &
     num_outputs=256, &
     num_basis=32, &
     activation="swish" &
))
call network%add(orthogonal_nop_block_type( &
     num_outputs=128, &
     num_basis=16, &
     activation="swish" &
))
call network%add(full_layer_type( &
     num_outputs=1, &
     activation="none" &
))

Heat-equation operator example:

call network%add(orthogonal_nop_block_type( &
     num_inputs=N_grid, num_outputs=N_hidden, &
     num_basis=k_basis, activation="relu"))
call network%add(orthogonal_nop_block_type( &
     num_outputs=N_grid, &
     num_basis=k_basis))

Notes

  • Smaller num_basis gives a cheaper low-rank operator but can limit spectral expressiveness.

  • The basis is learned from data rather than fixed analytically, unlike fixed_lno_layer_type.

  • The orthogonality quality can be monitored through the block’s get_orthogonality_metric() method, which reports \(\max |\mathbf{\Phi}^T\mathbf{\Phi} - \mathbf{I}|\).

  • In the current implementation, the same matrix \(\mathbf{W}\) is used for both the local bypass and the decoded spectral projection.

See Also