conda create -n pvsm python=3.11
conda activate pvsm
# Install torch torchvision based on your environment configurations
pip install -r requirements.txtThere's a known issue of the current release of gsplat==1.5.3, so please install gsplat via source for now:
# Install gsplat from source
pip install git+https://github.com/nerfstudio-project/gsplat.gitDownload DINOv3-ViT-B and place it under metric_checkpoints/;
Download our pre-trained model checkpoints:
After downloading, organize your checkpoints directory as follows:
metric_checkpoints/
├── pvsm_finetuned_full.pt # Our trained full 24-layer model
├── pvsm_finetuned_small.pt # Our trained smaller 12-layer model
├── dinov3-vitb16-pretrain-lvd1689m # DINOv3 Checkpoint
│ ├── config.json
│ ├── LICENSE.md
│ ├── model.safetensors
│ ├── preprocessor_config.json
│ └── README.md
├── imagenet-vgg-verydeep-19.mat # (Optional) for training
└── map-anything # (Optional) for dataset generation
├── config.json
├── model.safetensors
└── README.mdFor a quick interactive demo, please follow the instruction and unzip the downloaded example data (22.3 MB) to your local machine.
To launch the interactive web-based demo:
torchrun --nproc_per_node 1 --standalone viser_demo.py --config-name runs/pvsm_finetuned_smallThe demo will start a web server. Open your browser and navigate to the displayed URL to interact with the model.
System Requirements:
- Small model: ~2.5GB VRAM
- Full model: ~3.0GB VRAM
Note: The rendering quality in gsplat is compressed.
To run inference on a dataset:
python inference.py --config-name runs/pvsm_finetuned_smallOr for the full model:
python inference.py --config-name runs/pvsm_finetuned_fullTo train the model:
torchrun --nproc_per_node <num_gpus> train.py --config-name runs/pvsm_finetuned_smallConfiguration:
- Training configurations are located in
configs/runs/ - Model configurations are in
configs/model/ - Dataset configurations are in
configs/dataset/
API Keys:
Before training, create configs/api_keys.yaml with your WandB API key:
wandb: YOUR_WANDB_KEYYou can use configs/api_keys_example.yaml as a template.
If you find this work useful in your research, please consider citing:
@article{wu_pvsm_2026,
title={From Rays to Projections: Better Inputs for Feed-Forward View Synthesis},
author={Wu, Zirui and Jiang, Zeren and Oswald, Martin R. and Song, Jie},
journal={arxiv preprint arxiv:2601.05116},
year={2026}
}This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. See LICENSE.md for details.
This work is built upon LVSM's code base.
