Recurrent Layer¶

recurrent_layer_type

recurrent_layer_type(
  hidden_size,
  input_size=...,
  use_bias=.true.,
  activation="none",
  kernel_initialiser=...,
  bias_initialiser=...
)

The recurrent_layer_type derived type provides a simple recurrent neural network (RNN) layer. The operation performed by this layer is given by:

\[\mathbf{h}_t = \text{activation}(\mathbf{W}_{ih} \mathbf{x}_t + \mathbf{W}_{hh} \mathbf{h}_{t-1} + \mathbf{b}_h)\]

where:

\(\mathbf{x}_t\) is the input at time step \(t\)
\(\mathbf{h}_t\) is the hidden state at time step \(t\)
\(\mathbf{W}_{ih}\) is the input-to-hidden weight matrix
\(\mathbf{W}_{hh}\) is the hidden-to-hidden weight matrix
\(\mathbf{b}_h\) is the bias vector (if used)
\(\text{activation}\) is the activation function applied element-wise

The layer maintains a hidden state across time steps, allowing it to capture temporal dependencies in sequential data.

Arguments¶

hidden_size (integer): Number of features in the hidden state
input_size (integer): Number of features in the input. If not provided, it will be inferred when the layer is initialised.
use_bias (logical): If .false., the layer will not use bias terms. Default: .true..
activation (class(*)): Activation function for the hidden state.
- Accepts character(*) or class(base_actv_type).
- See Activation Functions for available options.
- Default: tanh_actv_type.
- Common choices: tanh, relu, sigmoid
kernel_initialiser (class(*)): Initialiser for the weight matrices \(\mathbf{W}_{ih}\) and \(\mathbf{W}_{hh}\) (see Initialisers).
- If activation is selu_actv_type, default: lecun_normal_init_type.
- If activation is a version of relu_actv_type, default: he_normal_init_type.
- For all other activations, default: glorot_uniform_init_type.
bias_initialiser (class(*)): Initialiser for the biases (see Initialisers). Default: zeros_init_type.

Shape¶

Input: (input_size, batch_size)
Output: (hidden_size, batch_size)

The layer maintains an internal hidden state of shape (hidden_size, batch_size) that persists across forward passes.

Parameters¶

The layer contains the following learnable parameters:

W_ih: Input-to-hidden weight matrix of shape (hidden_size, input_size)
W_hh: Hidden-to-hidden weight matrix of shape (hidden_size, hidden_size)
b_h: Hidden bias vector of shape (hidden_size) (if use_bias=.true.)
b_o: Output bias vector of shape (hidden_size) (if use_bias=.true.)

Total parameters: hidden_size * input_size + hidden_size * hidden_size + 2 * hidden_size (with bias)

Examples¶

Basic RNN layer for sequence processing:

use athena
type(network_type) :: network

! Create RNN layer with 10 hidden units processing 5-dimensional input
call network%add(recurrent_layer_type( &
     input_size=5, &
     hidden_size=10, &
     activation="tanh" &
))

RNN layer with custom initialisation:

call network%add(recurrent_layer_type( &
     input_size=3, &
     hidden_size=20, &
     activation="relu", &
     kernel_initialiser="he_uniform", &
     bias_initialiser="zeros" &
))

Multi-layer RNN network:

! Stack multiple RNN layers
call network%add(recurrent_layer_type( &
     input_size=8, &
     hidden_size=32, &
     activation="tanh" &
))
call network%add(recurrent_layer_type( &
     hidden_size=16, &
     activation="tanh" &
))
call network%add(full_layer_type( &
     num_outputs=1, &
     activation="sigmoid" &
))

Notes¶

The hidden state is initialised to zero at the start of training
The hidden state persists across forward passes within an epoch
For proper sequence processing, consider resetting the hidden state between different sequences
The time_step counter tracks the number of forward passes