Concatenate Layer

concat_layer_type

concat_layer_type(
  input_layer_ids,
  input_rank=...,
  verbose=0
)

The concat_layer_type concatenates outputs from multiple layers along a specified dimension. This is commonly used to combine features from different processing paths or to implement skip connections that preserve information.

The operation concatenates tensors along the feature dimension:

\[\text{output} = [\text{input}_1 \parallel \text{input}_2 \parallel \cdots \parallel \text{input}_N]\]

where \([\cdot \parallel \cdot]\) denotes concatenation and \(N\) is the number of input layers.

Arguments

  • input_layer_ids (integer, dimension(:)): Array of layer indices to concatenate.

    • Layer IDs refer to the position in the network (0-indexed)

    • Special value -1 refers to the immediately previous layer

    • Example: [0, -1] concatenates layer 0 with the previous layer

    • Inputs must have the same shape except along the concatenation dimension

  • input_rank (integer, optional): Rank (number of dimensions) of input tensors.

    • Used to determine concatenation dimension

    • For rank 2 (graphs): concatenates along first dimension (features)

    • For rank 3 (images): concatenates along third dimension (channels)

    • Typically inferred automatically from input layers

  • verbose (integer, optional): Verbosity level for initialisation. Default: 0.

Shape

  • Input: Multiple tensors of shape (input_shape, batch_size)

  • Output: (input_shape(:-1), sum(input_shape(end)), batch_size)

  • Concatenation dimension: last dimension before batch size

Key Features

  • Feature combination: Combines features from multiple sources

  • Information preservation: Retains all information from inputs

  • Shape expansion: Output size grows with number of inputs

  • Gradient splitting: Distributes gradients to corresponding input portions

  • No learnable parameters: Pure operation layer

Usage Example

Basic Skip Connection

use athena

type(network_type) :: network

! Layer 0: Input layer
call network%add(input_layer_type(input_shape=[100]))

! Layer 1: First dense layer (100 → 64)
call network%add(full_layer_type( &
     num_inputs=100, num_outputs=64, activation="relu"))

! Layer 2: Second dense layer (64 → 32)
call network%add(full_layer_type( &
     num_outputs=32, activation="relu"))

! Layer 3: Concatenate with original input
! Output will be (32 + 100, batch_size)
call network%add(concat_layer_type(input_layer_ids=[0, -1]))

! Layer 4: Process combined features (132 → 10)
call network%add(full_layer_type( &
     num_inputs=132, num_outputs=10, activation="softmax"))

Multi-Input Concatenation

Multiple layers can be concatenated together using this layer.

! Concatenate three different layers together
call network%add(concat_layer_type(input_layer_ids=[2, 5, 7]))

Implicit inclusion

The concat_layer_type can be implicitly included when specifying skip connections in other layers. This can be achieved by providing multiple input layer IDs and the operator="||" argument when using the add() method of the network_type.

! Implicit add layer by specifying multiple inputs and operator
call network%add(conv2d_layer_type( &
    num_filters=64, kernel_size=[3,3], activation="relu"), &
    input_layer_ids=[0, 3], operator="||")

Notes

Concatenation Dimension

The dimension along which concatenation occurs depends on input rank:

! Rank 2 (graphs, dense): concatenate along dimension 1 (features)
! Input: [(f1, batch), (f2, batch), ...]
! Output: [(f1+f2+..., batch)]

! Rank 3 (images): concatenate along dimension 3 (channels)
! Input: [(w, h, c1), (w, h, c2), ...]
! Output: [(w, h, c1+c2+...)]
Shape Compatibility

Inputs must have matching shapes except along concatenation dimension:

! Valid: same width, height, different channels
! Input 1: (28, 28, 32)
! Input 2: (28, 28, 64)
! Output:  (28, 28, 96) ✓

! Invalid: different width
! Input 1: (28, 28, 32)
! Input 2: (14, 14, 32)  ✗
! Need to resize first
Gradient Distribution

During backpropagation, gradients are split among inputs:

\[\frac{\partial L}{\partial \text{input}_i} = \frac{\partial L}{\partial \text{output}_{[\text{slice}_i]}}\]

Each input receives gradients only for its corresponding slice of the output.

Comparison with Addition

Aspect

concat_layer_type

add_layer_type

Output shape

Larger along concat dimension

Same as inputs

Shape requirement

Can differ along concat dimension

All inputs identical

Information

Preserves all information

Combines information

Use case

Feature combination

Residual connections

Memory

Increases

Same as inputs

When to Use
  • Skip connections: Preserve early features in deep networks

  • Multi-scale features: Combine information at different scales

  • Feature fusion: Merge outputs from parallel processing paths

  • U-Net architectures: Connect encoder and decoder paths

  • Dense connections: DenseNet-style architectures

  • Graph neural networks: Concatenate input features at each layer

When Not to Use
  • Memory is constrained (consider add_layer_type instead)

  • Features should be combined additively (use add or learned attention)

  • Output dimension growth becomes problematic for subsequent layers

  • Simple residual connections suffice (add is more efficient)

Benefits
  • Information preservation: No information loss from inputs

  • Gradient flow: Provides direct path for gradients

  • Feature reuse: Allows later layers to access early features

  • Flexibility: Combines features without learning parameters

Typical Hyperparameters

Architectural considerations:

Consideration

Guidelines

Number of inputs

2-4 typical; >5 may cause dimension explosion

Skip distance

Connect complementary feature levels

Channel growth

Monitor total channels after concatenation

Subsequent layers

May need more parameters to process larger inputs

See Also