Duvenaud Message Passing Layer¶

duvenaud_msgpass_layer_type

duvenaud_msgpass_layer_type(
  num_vertex_features,
  num_edge_features,
  num_time_steps,
  max_vertex_degree,
  num_outputs,
  min_vertex_degree=1,
  message_activation="none",
  readout_activation="none",
  kernel_initialiser=...,
  verbose=0
)

The duvenaud_msgpass_layer_type implements the message passing architecture from Duvenaud et al. (2015) for learning on molecular graphs. This layer aggregates information from neighboring nodes through message passing iterations and performs graph-level readout to produce a fixed-size vector representation.

The operation is performed in two stages:

Message Passing: For each time step \(t\), update node features by aggregating neighbor information:

\[h_v^{(t+1)} = \text{activation}(h_v^{(t)} + \sum_{u \in N(v)} M(h_v^{(t)}, h_u^{(t)}, e_{vu}))\]

where \(h_v\) is the feature vector for node \(v\), \(N(v)\) is the set of neighbors, \(e_{vu}\) are edge features, and \(M\) is a learned message function.

Graph Readout: Aggregate all node features into a graph-level representation:

\[h_{\text{graph}} = \text{readout\_activation}\left(\sum_{d=1}^{D} \sum_{v: \deg(v)=d} W_d h_v^{(T)}\right)\]

where \(D\) is the maximum vertex degree, \(T\) is the number of time steps, and \(W_d\) are degree-specific weight matrices.

Arguments¶

num_vertex_features (integer, dimension(:)): Dimensionality of vertex features at each layer.
- If single value: all time steps use same feature dimension
- If array: specifies input and output dimensions (e.g., [16, 32] means 16→32)
- Defines the evolution of feature dimensions through message passing
num_edge_features (integer, dimension(:)): Dimensionality of edge features.
- Similar to num_vertex_features
- Edge features are incorporated into message function
- Essential for molecular graphs (bond types, bond orders)
num_time_steps (integer): Number of message passing iterations.
- Controls how far information propagates across the graph
- Typical values: 2-6 for molecular graphs
- More steps = larger receptive field but slower computation
max_vertex_degree (integer): Maximum degree of any vertex in the graph.
- Used to allocate degree-specific weight matrices for readout
- Should be set to maximum degree in your dataset
- Example: for molecules, often 4-6 (typical valence)
num_outputs (integer): Dimensionality of graph-level output after readout.
- Output size of the graph representation vector
- This output typically feeds into dense layers for prediction
- Does not affect node-level features during message passing
min_vertex_degree (integer, optional): Minimum degree of vertices to consider in readout. Default: 1.
- Vertices with degree less than this are excluded from readout
- Useful for ignoring isolated nodes
- Typically set to 1 (include all connected nodes)
message_activation (class(*)): Activation function for message updates.
- Accepts character(*) or class(base_actv_type)
- See Activation Functions for available options
- Default: "none" (linear)
- Common choices: "relu", "leaky_relu", "tanh"
readout_activation (class(*)): Activation function for graph readout aggregation.
- Applied to the aggregated graph representation
- Default: "none" (linear)
- Common choices: "softmax" (weighted sum), "sigmoid"
- "softmax" creates an attention-like mechanism over node degrees
kernel_initialiser (character(*)): Initialiser for the weight matrices (see Initialisers).
- Default depends on activation function (Glorot/He initialisation)
- All weight matrices (message and readout) use the same initialiser
verbose (integer, optional): Verbosity level for initialisation. Default: 0.

Shape¶

Input: Graph structure represented by graph_type:
- vertex_features: (num_vertex_features(1), num_vertices) - Node features
- edge_features: (num_edge_features(1), num_edges) - Edge features
- adjacency_matrix: Sparse or dense connectivity matrix
- Batch dimension handled internally
Output: (num_outputs, batch_size) - Graph-level representation vector

Key Features¶

Degree-aware readout: Uses separate weights for nodes of different degrees
Edge feature incorporation: Considers both node and edge attributes in message passing
Permutation invariant: Graph-level output is invariant to node ordering
Differentiable: Supports backpropagation through the graph structure

Usage Example¶

Molecular Property Prediction¶

use athena

type(network_type) :: network
type(graph_type), allocatable :: molecules(:,:)

! Add Duvenaud message passing layer
call network%add(duvenaud_msgpass_layer_type( &
     num_vertex_features=[6, 16, 16, 16], &  ! 4 time steps
     num_edge_features=[4, 8, 8, 8], &
     num_time_steps=4, &
     max_vertex_degree=6, &
     num_outputs=32, &
     min_vertex_degree=1, &
     message_activation="relu", &
     readout_activation="softmax", &
     kernel_initialiser="glorot_normal", &
     batch_size=8))

! Add dense layers for prediction
call network%add(full_layer_type( &
     num_inputs=32, num_outputs=128, activation="relu"))
call network%add(full_layer_type( &
     num_outputs=64, activation="relu"))
call network%add(full_layer_type( &
     num_outputs=1, activation="none"))

! Compile and train
call network%compile( &
     optimiser=adam_optimiser_type(), &
     loss_method="mse", &
     batch_size=8)

call network%train(molecules, target_energies, num_epochs=100)

Notes¶

Graph Data Requirements

Graphs should have add_self_loops() called and be converted to sparse format for efficiency:

call molecule%add_self_loops()
call molecule%convert_to_sparse()

Computational Complexity

Time complexity: \(O(T \cdot |E| \cdot F)\) where \(T\) is time steps, \(|E|\) is edges, \(F\) is feature dimension
Space complexity: \(O(D \cdot F \cdot F_{\text{out}})\) for readout weights

When to Use

Molecular property prediction (energy, solubility, toxicity)
Chemical reaction prediction
Graph classification tasks
When edge features are important
When graph-level (not node-level) predictions are needed

References¶

Duvenaud, D. K., et al. (2015). “Convolutional Networks on Graphs for Learning Molecular Fingerprints.” Advances in Neural Information Processing Systems, 28, DOI: 10.48550/arXiv.1509.09292.