- DTU Electro, Perception & Cognition for Autonomous Systems Group
- FAU Erlangen-Nürnberg, Machine Learning & Perception Group
Patrick Schmidt1, Vasileios Belagiannis2, and Lazaros Nalpantidis1
Abstract: Autonomous robotic systems applied to new domains require an abundance of expensive, pixel-level dense labels to train robust semantic segmentation models under full supervision. This study proposes a model-agnostic Depth Edge Alignment Loss to improve Weakly Supervised Semantic Segmentation models across different datasets. The methodology generates pixel-level semantic labels from image-level supervision, avoiding expensive annotation processes. While weak supervision is widely explored in traditional computer vision, our approach adds supervision with pixel-level depth information, a modality commonly available in robotic systems. We demonstrate how our approach improves segmentation performance across datasets and models, but can also be combined with other losses for even better performance, with improvements up to +5.439, +1.274 and +16.416 points in mean Intersection over Union on the PASCAL VOC / MS COCO validation, and the HOPE static onboarding split, respectively.
To reproduce our experiments, please set up the repo following the instructions
in setup.md. Afterwards, you can run the experiments_{voc,coco,bop}.sh scripts from the repo root to run the experiments. Note that by default, the BOP experiments use the original RealSense depth maps. If you instead want to run experiments on the pre-computed depth maps we provide, change depth to depth_estimate in L.62 of bop_dataset.py: deal/datasets/bop_dataset.py.
We recommend using a NVIDIA GPU with at least 40 GB of memory for the experiments. The following parameters can be modified in the scripts:
| Parameter | Default Value | Description |
|---|---|---|
seeds |
(11 42 55 128) |
Seeds to run the experiments. Will run a loop with all the experiments step by step, and repeats using a different seed. |
NJOBS |
12 |
Number of threads to use for parallel processing. |
VAL |
1 |
1 enables the validation steps, i.e. generation of CAMs for the validation splits of each dataset. Will take a long time on COCO |
FT |
1 |
By default, WeakTr uses the full train set for training, but only evaluates on a subset of the COCO train set during training. After completed training, for evaluation on the full dataset, one needs to generate CAMs for the full train split, and repeat evaluation on this. Will take a long time, can be deactivated by setting this to 0 |
| Model | Dataset | Variant | mIoU Train | mIoU Val |
|---|---|---|---|---|
| WeakTr | VOC | Baseline | 63.461 | 60.635 |
| DEAL | 64.792 | 61.908 | ||
| ISL/FSL | 66.545 | 63.818 | ||
| DEAL + ISL/FSL | 66.876 | 64.671 | ||
| WeakTr | COCO | Baseline | 40.331 | 39.701 |
| DEAL | 40.805 | 40.312 | ||
| ISL/FSL | 41.719 | 41.258 | ||
| DEAL + ISL/FSL | 41.583 | 41.186 | ||
| SEAM | VOC | Baseline | 57.556 | 54.045 |
| DEAL | 58.380 | 55.198 | ||
| ISL/FSL | 61.615 | 58.296 | ||
| DEAL + ISL/FSL | 62.579 | 59.391 | ||
| WeakTr | HOPE | Baseline | 40.104 | 18.909 |
| DEAL (RealSense) | 51.306 | 33.715 | ||
| DEAL (DA-v2b) | 50.145 | 32.066 |
If you find this work helpful, please cite us as follows:
@misc{schmidt2025deal,
title={Depth Edge Alignment Loss: DEALing with Depth in Weakly Supervised Semantic Segmentation},
author={Patrick Schmidt and Vasileios Belagiannis and Lazaros Nalpantidis},
year={2025},
eprint={2509.17702},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2509.17702},
}
