Benchmarking neural surrogates on realistic spatiotemporal multiphysics flows
Runze Mao1,†, Rui Zhang2,†, Xuan Bai3, Tianhao Wu3, Teng Zhang3, Zhenyi Chen1, Minqi Lin1, Bocheng Zeng2, Yangchen Xu1, Yingxuan Xiang1, Haoze Zhang1, Shubham Goswami4, Pierre A. Dawe4, Yifan Xu1, Zhenhua An5, Mengtao Yan2, Xiaoyi Lu6, Yi Wang6, Rongbo Bai7, Haobu Gao8, Xiaohang Fang4, Han Li1,3, Hao Sun2,*, Zhi X. Chen1,3,*
1Peking University, 2Renmin University of China, 3AI for Science Institute, Beijing, 4University of Calgary, 5Kyoto University, 6FM Global, 7LandSpace Technology, 8Aero Engine Academy of China
†Equal contribution, *Corresponding authors
REALM (REalistic AI Learning for Multiphysics) addresses a critical gap in scientific machine learning: while neural surrogates show promise for accelerating multiphysics simulations, current evaluations rely heavily on simplified benchmarks that fail to expose model limitations in realistic regimes.
- 11 High-Fidelity Datasets: Spanning canonical problems to complex propulsion/fire-safety scenarios
- Rigorous Protocol: Standardized preprocessing, training, and evaluation for fair comparison
- Comprehensive Benchmark: Systematic evaluation of 12+ representative model families
- Three Key Findings:
- Scaling barrier governed by dimensionality, stiffness, and mesh irregularity
- Performance controlled by architectural inductive biases over parameter count
- Persistent gap between nominal accuracy and physically trustworthy behavior
| Category | Cases | Description |
|---|---|---|
| Canonical Problems (CP) | IgnitHIT, ReactTGV | Fundamental multiphysics configurations |
| High-Mach Flows (HF) | PlanarDet, PropHIT | Detonation and supersonic combustion |
| Propulsion Engines (PE) | SupCavityFlame, SymmCoaxFlame, MultiCoaxFlame | Scramjet and rocket applications |
| Fire Hazards (FH) | PoolFire, FacadeFire, EvolveJet | Building fire safety scenarios |
- Total Size: ~15 TB
- Mesh Types: Regular (2D/3D) and irregular meshes
- Grid Sizes: 2×10⁴ to 1.2×10⁷ cells
- Variables: 6-40 physical fields per case
- Trajectories: Multiple operating conditions per case
- Time Steps: 20-50 snapshots per trajectory
- Box-Cox Transformation: Compress species dynamic range from O(10⁻ᵏ) to O(1)
- Z-score Normalization: Standardize all variables consistently
- Autoregressive Training: Short-horizon rollout with stable backpropagation
- Spectral Operators: FNO, FFNO, CROP, DPOT, UNO, LSM
- Convolutional Models: CNext
- Transformer-Style: FactFormer, Transolver, ONO, GNOT
- Pointwise Models: DeepONet, PointNet
- Graph/Mesh Networks: MGN, GraphUNet, GraphSAGE
2D Regular Cases
- FFNO and DPOT achieve slowest error growth
- CNext shows competitive performance with minimal artifacts
- Transformer models limited by memory at high resolutions
3D Regular Cases
- All models struggle with fine-scale structure preservation
- FFNO and DPOT maintain best performance
- Faster error accumulation than 2D cases
Irregular Mesh Cases
- DeepONet most robust across irregular geometries
- Graph models prone to over-smoothing
- Spectral methods struggle with non-uniform grids
This demo shows how to download the REALM-Bench dataset and run training/evaluation on a sample 2D dataset.
# Clone the repository
git clone https://github.com/deepflame-ai/REALM.git
cd REALM
# Install dependencies
pip install -r requirements.txt Download the dataset from Hugging Face:
# Install huggingface-hub if not already installed
pip install huggingface-hub
# Download the dataset
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="TianhaoWu/realm-bench-IgnitHIT",
repo_type="dataset",
local_dir="./data"
)Or download manually from: https://huggingface.co/datasets/TianhaoWu/realm-bench-IgnitHIT
Navigate to the tutorial folder and configure your training setup:
cd tutorialpython multi_gpu_launcher.pypython multi_gpu_launcher_rollout.pypython multi_gpu_launcher_3d.pypython multi_gpu_launcher_U.pypython run_deeponetTrainer.pyConfiguration: Before running, modify the following in the launcher file:
gpus = [0, 1, 2]- Set your available GPU IDsdata_path- Path to your downloaded datasetmodel_list- Choose models to train (e.g., FNO, FFNO, Transolver)- Hyperparameters:
batch_size,width,n_layers,lr, etc.
After training, evaluate the model performance:
python run_evaluator.pyConfiguration: Edit run_evaluator.py to set:
data_path = "./data/2dHIT" # Path to your dataset
experiment_name = "hit" # Experiment nameThe evaluator will:
- Extract best results from training runs
- Evaluate model performance on test set
- Generate performance metrics
| File | Purpose |
|---|---|
multi_gpu_launcher.py |
Train models on 2D regular grid datasets |
multi_gpu_launcher_rollout.py |
Train models with rollout (autoregressive) on 2D data |
multi_gpu_launcher_3d.py |
Train models on 3D regular grid datasets |
multi_gpu_launcher_U.py |
Train models on unstructured/irregular grid datasets |
run_deeponetTrainer.py |
Train DeepONet models |
run_evaluator.py |
Evaluate trained models and extract results |
Visit our live leaderboard to view up-to-date model rankings across all cases.
| Category | Best Model | Test Error | Correlation |
|---|---|---|---|
| 2D Regular | FFNO | 1.87 | 0.973 |
| 3D Regular | FFNO | 18.45 | 0.896 |
| 2D Irregular | DeepONet | 29.56 | 0.796 |
| 3D Irregular | DeepONet | 23.24 | 0.768 |
IgnitHIT²ᵈ: Hydrogen ignition kernels in homogeneous isotropic turbulence
- Domain: 50×50 mm², 1024×1024 grid
- Physics: Premixed flame propagation, turbulence-flame interaction
- Trajectories: 36 (varying kernel geometry and turbulence intensity)
ReactTGV³ᵈ: Reacting Taylor-Green vortex
- Domain: 2π×2π×2π mm³, 256³ grid
- Physics: Flame-vortex interaction, extinction/reignition
- Trajectories: 16 (varying Reynolds number and mixing length)
PlanarDet²ᵈ: Planar cellular detonation
- Domain: 200×10 mm², 840×400 grid
- Physics: Shock-reaction coupling, cellular structure
- Trajectories: 9 (varying equivalence ratio and temperature)
PropHIT³ᵈ: Propagating flame in turbulence
- Domain: 42.4×5.3×5.3 δₗ, 1536×128×128 grid
- Physics: Turbulent premixed combustion at elevated pressure
- Trajectories: 8 (varying pressure and turbulence intensity)
SupCavityFlame²ᵈ: Supersonic cavity flame
- Domain: ~3M irregular cells
- Physics: Scramjet combustion, shock-shear-flame interaction
- Trajectories: 9 (varying injection velocity and location)
SymmCoaxFlame²ᵈ/MultiCoaxFlame³ᵈ: Rocket combustors
- Domains: 295K (2D) / 13.5M (3D) irregular cells
- Physics: Shear-coaxial injection, chamber acoustics
- Trajectories: 12 (2D), 6 (3D) varying mixture ratio and thrust
PoolFire³ᵈ: Buoyancy-driven pool fire
- Domain: 3×3×3 m³, 80×80×200 grid
- Physics: Plume dynamics, McCaffrey regimes
- Trajectories: 15 (varying heat release rate and pool size)
FacadeFire³ᵈ: Building facade fire
- Domain: ~2.5M irregular cells
- Physics: Compartment-facade coupling, external flame spread
- Trajectories: 9 (varying heat release rate)
Multiphysics reactive flows are governed by:
∂q/∂t + ∇·F(q) - ∇·D(q,∇q) + S(q) = 0
where:
- q: Conservative variables [ρ, ρu, ρe, ρY₁, ..., ρYₙ]
- F: Convective fluxes
- D: Diffusive fluxes
- S: Chemical source terms (stiff ODEs)
- Preprocessing:
- Box-Cox transform for species (λ=0.1)
- Z-score normalization across all fields
- Training:
- Short-horizon autoregressive rollout
- Grouped loss by physical variable type
- OneCycle learning rate schedule
- Evaluation:
- Full-horizon autoregressive rollout
- Metrics: MSE, correlation, SSIM, inference time
If you use REALM in your research, please cite:
@article{mao2025realm,
title={Benchmarking neural surrogates on realistic spatiotemporal multiphysics flows},
author={Mao, Runze and Zhang, Rui and Bai, Xuan and others},
journal={arXiv preprint arXiv:2506.10862},
year={2025}
}We welcome contributions! Please see our contribution guidelines for details on:
- Adding new models
- Submitting to the leaderboard
- Reporting issues
- Improving documentation
- Zhi X. Chen: chenzhi@pku.edu.cn
- Hao Sun: haosun@ruc.edu.cn
- Project Website: realm-bench.org
- GitHub Issues: github.com/deepflame-ai/REALM/issues
This project is licensed under the MIT License - see the LICENSE file for details.
This work is supported by:
- National Natural Science Foundation of China (92270203, 52441603, 523B2062, 52276096, 62276269, 6250636, 92270118)
- China Postdoctoral Science Foundation (2025M771582)
- Postdoctoral Fellowship Program of CPSF (GZB20250408)
Special thanks to all institutions and collaborators who contributed to dataset generation and validation.
- DeepFlame: github.com/deepmodeling/deepflame-dev
- PDEBench: github.com/pdebench/PDEBench
- Neural Operator Resources: neuraloperator.github.io





