Skip to content

KevinBian107/GLearning-Benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Graph Learning Benchmark

This is a repository for benchmarking graph learning methods's performance across various different tasks (for UCSD 2026 DSC180 Capstone).

Environment

Refer to this documentation for various different environmental setup.

Running the Codebase

The documentation folder contains information about models we are training here. All training scripts use YAML configuration files for reproducibility and ease of use. Each method has a default config file in the configs/ directory:

  • MPNN: configs/mpnn_graph_token.yaml
  • IBTT (Index-Based Tokenization + Transformer): configs/ibtt_graph_token.yaml
  • AGTT (AutoGraph Trail Tokenization + Transformer): configs/agtt_graph_token.yaml
  • GraphGPS: configs/gps_graph_token.yaml

Config Structure: Each config file contains sections for:

  • dataset or data: Task, algorithm, and data paths
  • model: Architecture hyperparameters
  • train: Training settings (batch size, epochs, learning rate)
  • output: Output directory and run naming
  • wandb: Weights & Biases logging configuration

Index-Based Tokenization Transformer (IBTT)

First cd into the correct directory for generating synthetic graphs and tokenize them by following certain rule set (refer to the symthetic data documentation for our specific graph + task setup):

# auto setup enviornment + make graphs
bash graph_generator.sh

# auto setup enviornment + tokenize for tasks
bash task_generator.sh

This should automatically setup the environment for the graph-token repository and generate the training graphs and tasks. Then we need to switch the path name and the split argument in the sh file to test to generate the test directory. Then run the following to train a simple transformer for this task:

conda activate glearning_180a
python train.py --model ibtt
python train.py --model ibtt --config configs/ibtt_zinc.yaml

AutoGraph Trail Tokenization Transformer (AGTT)

This method uses AutoGraph's trail-based tokenization (SENT algorithm) to convert native graphs into sequences, then trains a vanilla transformer. This allows direct comparison with IBTT to isolate the effect of tokenization strategy:

conda activate autograph
python train.py --model agtt
python train.py --model agtt --config configs/agtt_zinc.yaml

Note: AGTT uses the same transformer architecture as IBTT but with different tokenization. It loads native graphs (like MPNN/GraphGPS) and applies AutoGraph's trail-based tokenization instead of index-based tokenization. This enables comparing tokenization strategies while keeping the model architecture constant.

Graph Native GPS

Similarly, we can run a GPS model upon the native graph like the following:

conda activate graphgps
python train.py --model ggps
python train.py --model ggps --config configs/gps_zinc.yaml

Note: GraphGPS uses a more complex config structure with sections like gt (graph transformer), gnn, and optim. See docs/ggps.md for details on the architecture.

Graph Native MPNN (Message Passing Neural Network)

At last, using similar setup as GPS, we can perform MPNN upon the native graph as well:

conda activate glearning_180a
python train.py --model mpnn
python train.py --model mpnn --config configs/mpnn_zinc.yaml

Note: Our MPNN implementation uses GIN (Graph Isomorphism Network) layers, which are provably as expressive as the Weisfeiler-Leman graph isomorphism test. See docs/mpnn.md for detailed architecture information.

Notebooks

Acknowledgements

About

Benchmark codebase for different graph learning tasks

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published