Residual Networks (ResNet)¶
This tutorial covers building Residual Networks (ResNets) for image processing and computer vision tasks using skip connections.
What are ResNets?
Residual Networks solve the degradation problem in very deep neural networks by introducing skip connections (residual connections). The key concepts of ResNets include:
Residual blocks: Add input directly to output via skip connections [1]
Skip connections: Allow gradients to flow directly through the network
Identity mappings: Enable training of very deep networks (100+ layers) [2]
Convolutional layers: Extract features at each level
Building a Basic ResNet¶
A residual block performs: output = F(x) + x
Where:
* F(x) is the transformation (conv layers)
* x is the identity (skip connection)
* + is element-wise addition
Note
The examples in this tutorial use simplified ResNet architectures without batch normalisation for clarity. The example found in git: example/resnet/src/main.f90 contains a more complete implementation.
For image classification, we can build a simple ResNet as follows:
program basic_resnet
use athena
implicit none
type(network_type) :: net
type(adam_optimiser_type) :: optimiser
type(cce_loss_type) :: loss
integer :: layer_id
! Image dimensions: 28x28 grayscale images
! Data format: [width, height, channels]
integer, parameter :: width = 28, height = 28, channels = 1
integer, parameter :: num_classes = 10
! Build ResNet architecture
! Initial conv layer
call net%add(input_layer_type(input_shape=[width, height, channels]))
call net%add(conv2d_layer_type( &
num_filters=64, kernel_size=3, padding="same", activation="relu"))
! First residual block (64 filters)
layer_id = net%num_layers ! Save for skip connection
call net%add(conv2d_layer_type( &
num_filters=64, kernel_size=3, padding="same", activation="relu"))
call net%add(conv2d_layer_type( &
num_filters=64, kernel_size=3, padding="same"))
! Add skip connection
call net%add(add_layer_type( &
input_layer_ids=[layer_id, net%num_layers], input_rank=3), &
input_list=[layer_id, net%num_layers], operator="+")
call net%add(actv_layer_type(activation="relu"))
! Pooling and output
call net%add(maxpool2d_layer_type(pool_size=2))
call net%add(flatten_layer_type(input_rank=3))
call net%add(full_layer_type(num_outputs=num_classes, activation="softmax"))
! Compile
optimiser = adam_optimiser_type(learning_rate=0.001_real32)
loss = cce_loss_type()
call net%compile(optimiser=optimiser, loss_method=loss)
call net%print_summary()
end program basic_resnet
The skip connection can be introduced in one of two ways; either by saving the layer ID before the residual path and using it in the add_layer_type, or by using the input_list parameter to specify which layers to combine.
The former method is shown above to be more explicit, while the latter is more concise.
The latter method is shown below and just automates the tracking of layer IDs and handling of building out the addition layer operation.
Direct use of add_layer_type requires specification of the input_rank parameter to indicate the rank of the input tensors (3 for 2D images due to the two spatial dimensions and channels).
Building Residual Blocks¶
For building deeper ResNets, we can create a subroutine to add residual blocks. This enables us to easily define repeatable stacks and add them to the network multiple times.
Helper Subroutine for Residual Blocks
Create a reusable subroutine for adding residual blocks:
subroutine add_residual_block(net, num_filters, stride)
type(network_type), intent(inout) :: net
integer, intent(in) :: num_filters
integer, optional, intent(in) :: stride
integer :: stride_, skip_id
stride_ = 1
if (present(stride)) stride_ = stride
! Save layer ID for skip connection
skip_id = net%num_layers
! First conv layer in block
call net%add(conv2d_layer_type( &
num_filters=num_filters, kernel_size=3, &
stride=stride_, padding="same", activation="relu"))
! Second conv layer in block
call net%add(conv2d_layer_type( &
num_filters=num_filters, kernel_size=3, padding="same"))
! Skip connection with addition
call net%add(add_layer_type( &
input_layer_ids=[skip_id, net%num_layers], input_rank=3), &
input_list=[skip_id, net%num_layers], operator="+")
! Final activation
call net%add(actv_layer_type(activation="relu"))
end subroutine add_residual_block
Deeper ResNet Architecture
ResNet-18 style network:
program resnet18_style
use athena
implicit none
type(network_type) :: net
type(adam_optimiser_type) :: optimiser
type(cce_loss_type) :: loss
! Initial convolution
call net%add(input_layer_type(input_shape=[224, 224, 3]))
call net%add(conv2d_layer_type( &
num_filters=64, kernel_size=7, stride=2, padding="same"))
call net%add(batchnorm2d_layer_type(num_channels=64))
call net%add(actv_layer_type(activation="relu"))
call net%add(maxpool2d_layer_type(pool_size=3, stride=2))
! Residual blocks - Stage 1 (64 filters)
call add_residual_block(net, 64)
call add_residual_block(net, 64)
! Residual blocks - Stage 2 (128 filters)
call add_residual_block(net, 128, stride=2)
call add_residual_block(net, 128)
! Residual blocks - Stage 3 (256 filters)
call add_residual_block(net, 256, stride=2)
call add_residual_block(net, 256)
! Residual blocks - Stage 4 (512 filters)
call add_residual_block(net, 512, stride=2)
call add_residual_block(net, 512)
! Global average pooling and classifier
call net%add(avgpool2d_layer_type(pool_size=7))
call net%add(flatten_layer_type(input_rank=3))
call net%add(full_layer_type(num_outputs=1000, activation="softmax"))
optimiser = adam_optimiser_type(learning_rate=0.001_real32)
loss = cce_loss_type()
call net%compile(optimiser=optimiser, loss_method=loss)
contains
! Include add_residual_block subroutine here
end program resnet18_style
Projection Shortcuts
When dimensions change, use 1x1 convolutions for the skip connection:
subroutine add_residual_block_projection(net, num_filters, stride)
type(network_type), intent(inout) :: net
integer, intent(in) :: num_filters, stride
integer :: skip_id, main_path_id
skip_id = net%num_layers
! Main path
call net%add(conv2d_layer_type( &
num_filters=num_filters, kernel_size=3, &
stride=stride, padding="same", activation="relu"))
call net%add(conv2d_layer_type( &
num_filters=num_filters, kernel_size=3, padding="same"))
main_path_id = net%num_layers
! Projection shortcut (1x1 conv to match dimensions)
call net%add(conv2d_layer_type( &
num_filters=num_filters, kernel_size=1, stride=stride), &
input_list=[skip_id])
! Add skip connection
call net%add(add_layer_type( &
input_layer_ids=[net%num_layers, main_path_id], input_rank=3), &
input_list=[net%num_layers, main_path_id], operator="+")
call net%add(actv_layer_type(activation="relu"))
end subroutine add_residual_block_projection
ResNet for Small Images (CIFAR-10)
Adapted architecture for 32x32 images:
program resnet_cifar10
use athena
implicit none
type(network_type) :: net
integer :: i
! Initial conv (no pooling for small images)
call net%add(input_layer_type(input_shape=[32, 32, 3]))
call net%add(conv2d_layer_type( &
num_filters=16, kernel_size=3, padding="same", activation="relu"))
! Stage 1: 16 filters
do i = 1, 3
call add_residual_block(net, 16)
end do
! Stage 2: 32 filters with stride
call add_residual_block(net, 32, stride=2)
do i = 1, 2
call add_residual_block(net, 32)
end do
! Stage 3: 64 filters with stride
call add_residual_block(net, 64, stride=2)
do i = 1, 2
call add_residual_block(net, 64)
end do
! Global average pooling
call net%add(avgpool2d_layer_type(pool_size=8))
call net%add(flatten_layer_type(input_rank=3))
call net%add(full_layer_type(num_outputs=10, activation="softmax"))
end program resnet_cifar10
Understanding Skip Connections¶
The following sections explain how to implement skip connections within athena.
These rely on the merge_layer_type derived type and the input_list argument of the add() method of the network_type derived type.
There are currently two options of merge_layer_type that can be used for skip connections: addition (add_layer_type) and concatenation (concat_layer_type).
Additionally, the output of any layer can be easily broadcast to multiple subsequent layers using the input_list argument.
How Skip Connections Work
In athena, there are two ways we can implement the ResNet skip connections.
! Save the layer ID before the residual path
skip_id = net%num_layers
! Add transformation layers (F(x))
call net%add(conv2d_layer_type(...))
! Add skip connection: output = F(x) + x
call net%add(add_layer_type( &
input_layer_ids=[skip_id, net%num_layers], input_rank=3), &
input_list=[skip_id, net%num_layers], operator="+")
Or, we can use the input_list argument to specify which layers to merge directly:
! Save the layer ID before the residual path
skip_id = net%num_layers
! Add transformation layers (F(x))
call net%add(conv2d_layer_type(...))
! Add skip connection: output = F(x) + x
call net%add(add_layer_type( &
input_layer_ids=[skip_id, net%num_layers], input_rank=3), &
input_list=[skip_id, net%num_layers], operator="+")
The latter is the more concise and preferred method.
The input_list accepts values between -num_layers + 1 and num_layers to specify which layers to merge.
Negative indices count backwards from the most recently added layer (-1 refers to the last added layer, -2 the second last, etc.).
Positive indices refer to absolute layer IDs (i.e. the order in which layers have been added to the network via the add() method).
0 refers to the input layer of the network (i.e. the data input); if multiple input layers exist, this refers to the input to the first layer added.
Key Points
Layer IDs: Track
net%num_layersto reference previous layersAdd layer: Combines outputs element-wise from specified layers
Input list: Specifies which layers to merge (typically start and end of block)
Operator: Use
"+"for addition (residual connections) and"||"for concatenation
Dimension Matching
Skip connections require matching dimensions:
Option 1: Same dimensions (identity shortcut)
! Both input and output have same shape
! Can directly add
call net%add(add_layer_type(...), input_list=[skip_id, current_id], operator="+")
Option 2: Different dimensions (projection shortcut)
! Use 1x1 convolution to match dimensions
skip_id = net%num_layers
! Main path
call net%add(conv2d_layer_type(num_filters=128, kernel_size=3, stride=2))
main_id = net%num_layers
! Projection path for skip connection
call net%add(conv2d_layer_type(num_filters=128, kernel_size=1, stride=2), &
input_list=[skip_id])
! Combine
call net%add(add_layer_type( &
input_layer_ids=[net%num_layers, main_id], input_rank=3), &
input_list=[net%num_layers, main_id], operator="+")
Next Steps¶
Implement your own ResNet for image classification
Try the MNIST example and adapt it for ResNet
Experiment with different residual block designs
Learn about custom layers for advanced architectures
See Also¶
conv2d_layer - Convolutional layers
batchnorm2d_layer - Batch normalisation
Activation layers - Activation functions
Basic Network Tutorial - Foundation concepts
Training Guide - Training deep networks
Footnotes