This is a repository for benchmarking graph learning methods's performance across various different tasks (for UCSD 2026 DSC180 Capstone).
- We provide this WandB report for show casing the latest benchmark results.
Refer to this documentation for various different environmental setup.
The documentation folder contains information about models we are training here. All training scripts use YAML configuration files for reproducibility and ease of use. Each method has a default config file in the configs/ directory:
- MPNN:
configs/mpnn_graph_token.yaml - IBTT (Index-Based Tokenization + Transformer):
configs/ibtt_graph_token.yaml - AGTT (AutoGraph Trail Tokenization + Transformer):
configs/agtt_graph_token.yaml - GraphGPS:
configs/gps_graph_token.yaml
Config Structure: Each config file contains sections for:
datasetordata: Task, algorithm, and data pathsmodel: Architecture hyperparameterstrain: Training settings (batch size, epochs, learning rate)output: Output directory and run namingwandb: Weights & Biases logging configuration
First cd into the correct directory for generating synthetic graphs and tokenize them by following certain rule set (refer to the symthetic data documentation for our specific graph + task setup):
# auto setup enviornment + make graphs
bash graph_generator.sh
# auto setup enviornment + tokenize for tasks
bash task_generator.shThis should automatically setup the environment for the graph-token repository and generate the training graphs and tasks. Then we need to switch the path name and the split argument in the sh file to test to generate the test directory. Then run the following to train a simple transformer for this task:
conda activate glearning_180a
python train.py --model ibtt
python train.py --model ibtt --config configs/ibtt_zinc.yamlThis method uses AutoGraph's trail-based tokenization (SENT algorithm) to convert native graphs into sequences, then trains a vanilla transformer. This allows direct comparison with IBTT to isolate the effect of tokenization strategy:
conda activate autograph
python train.py --model agtt
python train.py --model agtt --config configs/agtt_zinc.yamlNote: AGTT uses the same transformer architecture as IBTT but with different tokenization. It loads native graphs (like MPNN/GraphGPS) and applies AutoGraph's trail-based tokenization instead of index-based tokenization. This enables comparing tokenization strategies while keeping the model architecture constant.
Similarly, we can run a GPS model upon the native graph like the following:
conda activate graphgps
python train.py --model ggps
python train.py --model ggps --config configs/gps_zinc.yamlNote: GraphGPS uses a more complex config structure with sections like gt (graph transformer), gnn, and optim. See docs/ggps.md for details on the architecture.
At last, using similar setup as GPS, we can perform MPNN upon the native graph as well:
conda activate glearning_180a
python train.py --model mpnn
python train.py --model mpnn --config configs/mpnn_zinc.yamlNote: Our MPNN implementation uses GIN (Graph Isomorphism Network) layers, which are provably as expressive as the Weisfeiler-Leman graph isomorphism test. See docs/mpnn.md for detailed architecture information.
- Official graph-token repository
- Official Autograph repository
- Official GraphGPS repositroy
- Official Py-Geomeric documentation