ADStereo: Efficient Stereo Matching with Adaptive Downsampling and Disparity Alignment TIP 2025
This paper presents two sampling strategies: the Adaptive Downsampling Module (ADM) and the Disparity Alignment Module (DAM), to prioritize real-time inference while ensuring accuracy. The ADM leverages local features to learn adaptive weights, enabling more effective downsampling while preserving crucial structure information. On the other hand, the DAM employs a learnable interpolation strategy to predict transformation offsets of pixels, thereby mitigating the spatial misalignment issue. Building upon these modules, we introduce ADStereo, a real-time yet accurate network that achieves highly competitive performance on multiple public benchmarks.
The pretrained KITTI model is loaded from './fined/KITTI/' datafolders.
ADStereo.mp4
Run demo_video.py to perform stereo matching on the raw Kitti sequence.
Here is an example result on our system with RTX a5000ada on Ubuntu 20.04
![]() |
![]() |
We introduce a more lightweight model called ADStereo_fast (highly competetive performance & faster speed), also included in this repo.
All pretrained models are available in the Google Driver:ADStereo and Google Driver:ADStereo_fast
We assume the downloaded weights are located under the ./trained directory.
Otherwise, you may need to change the corresponding paths in the scripts.
Python 3.9
Pytorch 2.4.0
conda create -n ADStereo python=3.9
conda activate ADStereo
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch -c nvidia
pip install opencv-python
pip install scikit-image
pip install tensorboard
pip install matplotlib
pip install tqdm
pip install chardet
pip install imageio
pip install thop
pip install timm==0.5.4
To evaluate/train ADStereo, you will need to download the required datasets.
By default datasets.py will search for the datasets in these locations.
DATA
├── KITTI
│ ├── kitti_2012
│ │ └── training
└── testing
│ ├── kitti_2015
│ │ └── training
└── testing
└── SceneFlow
├── Driving
│ ├── disparity
│ └── frames_finalpass
├── FlyingThings3D
│ ├── disparity
│ └── frames_finalpass
└── Monkaa
├── disparity
└── frames_finalpass
└── Middlebury
├── trainingH
├── trainingH_GT
└── ETH3D
├── two_view_training
├── two_view_training_gtRun main.py to train on the SceneFlow dataset. Please update datapath in main.py as your training data path.
Run finetune.py to finetune on the different real-world datasets, such as KITTI 2012, KITTI 2015, and ETH3D. Please update datapath in finetune.py as your training data path.
Run counts_op.py to validate FLOPs consumption.
To generate prediction results on the test set of the KITTI dataset, you can run evaluate_kitti.py.
The inference time can be printed once you run evaluate_kitti.py.
And the inference results on the KITTI dataset can be directly submitted to the online evaluation server for benchmarking.
If you find our work useful in your research, please consider citing our paper:
@article{wang2025ad,
author={Wang, Yun and Li, Kunhong and Wang, Longguang and Hu, Junjie and Wu, Dapeng Oliver and Guo, Yulan},
journal={IEEE Transactions on Image Processing},
title={ADStereo: Efficient Stereo Matching with Adaptive Downsampling and Disparity Alignment},
journal={IEEE Transactions on Image Processing},
year={2025},
publisher={IEEE}
}
This project is based on GwcNet, IGEV-Stereo, and CoEx. We thank the original authors for their excellent works.




