- [2025/06/23] We add support for the occupancy prediction task
- [2025/06/23] We add support for Waymo dataset
- [2025/01/23] We release the code and pretrained weights for NuScenes dataset
- [2024/12/04] PolarBEVDet is on Arxiv.
Multi-view 3D object detection built upon the Lift-Splat-Shoot (LSS) mechanism provides an economical and deployment-friendly solution for autonomous driving. However, all existing LSS-based methods transform multi-view image features into a Cartesian Bird’s-Eye-View (BEV) representation without considering the non-uniform distribution of image infor- mation. Consequently, this leads to information loss in the near and computational redundancy in the far. Furthermore, these methods struggle to exploit view symmetry, increasing the difficulty of representation learning. In this paper, to fundamentally remove these limitations, we propose to replace the Cartesian BEV representation with the polar BEV representation, which naturally adapts to the image information distribution and effortlessly preserves view symmetry by regular convolution. To achieve this, we elaborately tailor three modules: a polar view transformer to generate the polar BEV representation, a polar temporal fusion module for fusing historical polar BEV features and a polar detection head to predict the polar-parameterized representation of the object. In addition, we design a 2D auxiliary detection head and a spatial attention enhancement module to improve the quality of feature extraction in perspective view and BEV, respectively. Finally, we integrate the above improvements into a novel multi-view 3D object detector, PolarBEVDet. Experiments on nuScenes and Waymo show that PolarBEVDet achieves superior performance, and the polar BEV representation can be seamlessly substituted into different LSS-based detectors with consistent performance improvement.
| Setting | Pretrain | NDS | MAP | Weights |
|---|---|---|---|---|
| r50_704x256_24e | ImageNet | 53.0 | 43.2 | gdrive |
| r50_704x256_60e | ImageNet | 55.3 | 45.0 | gdrive |
| r50_704x256_nuImg_60e | nuImg | 56.7 | 46.9 | gdrive |
| Setting | Pretrain | mAPL | mAPH | mAP |
|---|---|---|---|---|
| r101_832x1920_24e | nuImg | 43.8 | 54.9 | 59.6 |
Regrettably, we are unable to provide the model weights due to Waymo Dataset License Agreement
| Setting | mIoU | RayIoU | Weights |
|---|---|---|---|
| r50_704x256_24e | 33.7 | 39.4 | gdrive |
Many thanks to these excellent open-source projects:
- 3D Detection:BEVDet, SOLOFusion, StreamPETR
- Codebase: MMDetection3D
If this work is helpful for your research, please consider citing the following BibTeX entry.
@article{yu2024polarbevdet,
title={PolarBEVDet: Exploring Polar Representation for Multi-View 3D Object Detection in Bird's-Eye-View},
author={Yu, Zichen and Liu, Quanli and Wang, Wei and Zhang, Liyong and Zhao, Xiaoguang},
journal={arXiv preprint arXiv:2408.16200},
year={2024}
}

