Add Layer

add_layer_type

add_layer_type(
  input_layer_ids,
  input_rank=...,
  verbose=0
)

The add_layer_type performs element-wise addition of outputs from multiple layers. This is commonly used to implement skip connections (residual connections) in deep networks.

The operation is defined as:

\[\text{output} = \sum_{i=1}^{N} \text{input}_i\]

where \(N\) is the number of input layers. All inputs must have the same shape.

Arguments

  • input_layer_ids (integer, dimension(:)): Array of layer indices to add together.

    • Layer IDs refer to the position in the network (0-indexed)

    • Special value -1 refers to the immediately previous layer

    • Example: [0, -1] adds layer 0 with the previous layer

    • All specified layers must have identical output shapes

  • input_rank (integer, optional): Rank (number of dimensions) of input tensors.

    • Used to determine proper shape handling

    • Typically inferred automatically from input layers

    • Manual specification useful for graph data (rank 2) vs image data (rank 3)

  • verbose (integer, optional): Verbosity level for initialisation. Default: 0.

Shape

  • Input: Multiple tensors of shape (d1, d2, ..., dn, batch_size)

    • All inputs must have identical shapes

    • Number of inputs determined by length of input_layer_ids

  • Output: Same shape as inputs (d1, d2, ..., dn, batch_size)

Key Features

  • Element-wise: Adds corresponding elements from each input

  • Shape preservation: Output has same shape as inputs

  • Gradient flow: Distributes gradients equally to all inputs during backpropagation

  • No learnable parameters: Pure operation layer

Usage Example

Basic Residual Connection

use athena

type(network_type) :: network

! Layer 0: Input layer
call network%add(input_layer_type(input_shape=[28, 28, 1]))

! Layer 1: Convolutional layer
call network%add(conv2d_layer_type( &
     num_filters=32, kernel_size=[3,3], activation="relu"))

! Layer 2: Another convolutional layer
call network%add(conv2d_layer_type( &
     num_filters=32, kernel_size=[3,3], activation="relu"))

! Layer 3: Add layer - creates residual connection
! Adds output of layer 1 with output of layer 2
call network%add(add_layer_type(input_layer_ids=[1, -1]))

! Continue network...
call network%add(conv2d_layer_type( &
     num_filters=64, kernel_size=[3,3], activation="relu"))

Multi-Input Addition

Multiple layers can be added together using this layer.

! Add three different layers together
call network%add(add_layer_type(input_layer_ids=[2, 5, 7]))

Implicit inclusion

The add_layer_type can be implicitly included when specifying skip connections in other layers. This can be achieved by providing multiple input layer IDs and the operator="+" argument when using the add() method of the network_type.

! Implicit add layer by specifying multiple inputs and operator
call network%add(conv2d_layer_type( &
     num_filters=64, kernel_size=[3,3], activation="relu"), &
     input_layer_ids=[0, 3], operator="+")

Notes

Shape Requirements

All inputs must have exactly the same shape. If shapes differ, consider:

  • Using projection layers to match dimensions

  • Using concat_layer_type instead

  • Applying reshape/pooling operations first

Layer ID Indexing
  • Layer IDs are 0-indexed (first layer after input is 0)

  • Use -1 to refer to the immediately previous layer

  • Use -2 for two layers back, etc.

  • Forward references (future layers) are not allowed

Gradient Distribution

During backpropagation, gradients are copied (not split) to all inputs:

\[\frac{\partial L}{\partial \text{input}_i} = \frac{\partial L}{\partial \text{output}}\]

This allows gradients to flow equally through all paths.

Comparison with Concatenation

Aspect

add_layer_type

concat_layer_type

Output shape

Same as inputs

Larger along concatenation dimension

Shape requirement

All inputs identical

Can differ along concat dimension

Use case

Residual connections

Feature combination

Parameters

None

None

Gradient flow

Copied to all inputs

Split along concat dimension

When to Use
  • Residual connections: Add input to output of processing block

  • Deep networks: Help gradient flow in very deep architectures

  • Skip connections: Connect distant layers in encoder-decoder architectures

  • Ensemble-like behavior: Combine predictions from parallel paths

  • Identity mappings: Allow network to learn when to bypass transformations

When Not to Use
  • Inputs have different shapes (use concat or reshape first)

  • Want to combine features additively (consider learned weighted sum)

  • Need to concatenate features (use concat_layer_type)

Benefits for Training
  • Gradient flow: Provides direct path for gradients to flow to early layers

  • Training stability: Helps prevent vanishing gradients in deep networks

  • Faster convergence: Often leads to faster training

  • Better optimisation: Creates implicit ensemble of shallow networks

Typical Hyperparameters

There are no hyperparameters to tune for this layer, only architectural choices:

Consideration

Guidelines

Number of inputs

Usually 2 (main path + skip connection); rarely >3

Skip distance

2-4 layers for CNNs; 1-2 layers for GNNs

Activation placement

Often apply activation after add layer

Batch norm placement

Apply before add layer in both paths

See Also