Skip to content

RuiningLi/dso

Repository files navigation

DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

arXiv Project Page

🌟 Overview

  • A state-of-the-art image-to-3D model like TRELLIS often fails to reconstruct 3D objects that can stand under gravity, even when prompted with images of stable objects.
  • Our method, DSO, improves the image-to-3D model via Direct Simulation Optimization, significantly increasing the likelihood that generated 3D objects can stand, both in physical simulation and in real life when 3D printed.
  • The method incurs no additional cost at test time and thus generate such objects in seconds.

📦 Installation

  1. Install the TRELLIS dependencies:
conda create -n dso python=3.10
conda activate dso
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
. ./setup.sh --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast
pip install kaolin==0.17.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.4.0_cu124.html

If you run into issues, please consult the installation guide.

  1. Install the remaining dependencies:
pip install -r requirements.txt

🤖 Pretrained Models

We provide two checkpoints, one trained with the direct preference optimization (DPO) and the other trained with our introduced direct reward optimization (DRO).

You can download them here:

git lfs install
git clone https://huggingface.co/rayli/DSO-finetuned-TRELLIS

Quick Start

You can generate physically-sound 3D assets conditioned on a single image using DSO-finetuned models with example.py:

from peft import LoraConfig, get_peft_model
from PIL import Image
from safetensors.torch import load_file

from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import postprocessing_utils

ckpt_path = "./DSO-finetuned-TRELLIS/dro-4000iters.safetensors"  # "./DSO-finetuned-TRELLIS/dpo-8000iters.safetensors"
image_path = "./image-to-3D-eval-stability-under-gravity/clock-eval/01.jpg"  # Or your own image

pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large")
pipeline.cuda()

peft_config = LoraConfig(
    r=64,
    lora_alpha=128,
    lora_dropout=0.0,
    target_modules=["to_q", "to_kv", "to_out", "to_qkv"]
)
pipeline.models["sparse_structure_flow_model"] = get_peft_model(pipeline.models["sparse_structure_flow_model"], peft_config)
pipeline.models["sparse_structure_flow_model"].load_state_dict(load_file(ckpt_path))

image = Image.open(image_path)
image = pipeline.preprocess_image(image)

outputs = pipeline.run(
    image,
    seed=0,
    preprocess_image=False,
    formats=["gaussian", "mesh"],
    sparse_structure_sampler_params={
        "steps": 12,
        "cfg_strength": 7.5,
    },
    slat_sampler_params={
        "steps": 12,
        "cfg_strength": 3,
    },
)[0]

glb = postprocessing_utils.to_glb(
    outputs['gaussian'][0],
    outputs['mesh'][0],
    simplify=0.95,          # Ratio of triangles to remove in the simplification process
    texture_size=1024,      # Size of the texture used for the GLB
    with_texture=True,      # Disable texture for faster stability evaluation
)
glb.export("./output.glb")

📊 Evaluation

  1. Download the evaluation dataset:
git lfs install
git clone https://huggingface.co/datasets/rayli/image-to-3D-eval-stability-under-gravity
  1. Generate 3D models using trained checkpoint:
python finetune.py --eval --config configs/eval-objaverse.yaml
python finetune.py --eval --config configs/eval-real.yaml
  1. Compute stability and geometry metrics:
python evaluation/evaluate_stability.py --mesh_paths "./samples/objaverse_samples/*.glb"
python evaluation/evaluate_stability.py --mesh_paths "./samples/motorcycle_samples/*.glb"
python evaluation/evaluate_geometry.py --mesh_paths "./samples/objaverse_samples/*.glb" --ground_truth_paths "/path/to/objaverse/hf-objaverse-v1/glbs"

⚙️ Training

  1. Generate synthetic training data.

First, generate / obtain a set of images saved with this structure:

📁 category/
|--- 📁 prompt1/
|    |--- image1.png
|    |--- image2.png
|    |--- ...
|--- 📁 prompt2/
|    |--- image1.png
|    |--- image2.png
|    |--- ...
|--- ...

Then, generate the 3D models conditioned on these images:

python -m data_preprocessing.generate_synthetic_data --num_samples 4 --save_extra --output_dir /path/to/data/root --image_paths "/path/to/the/above/dir/*/*/*.png"

Finally, augment the 3D models with simulation feedback and save their physical soundness score:

python -m data_preprocessing.augment_with_simulation_feedback --root_dir /path/to/data/root --num_models_per_image 4
  1. Launch the training job:
# This runs on 4 NVIDIA A100s 80GB GPUs.

CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --multi_gpu --num_processes 4 --mixed_precision bf16 finetune.py --config configs/dpo.yaml 

Note you need to specify the data directory in the configuration file.

Citation

@article{li2025dso,
    title   = {DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness},
    author  = {Li, Ruining and Zheng, Chuanxia and Rupprecht, Christian and Vedaldi, Andrea},
    journal = {arXiv preprint arXiv:2503.22677},
    year    = {2025}
}

About

[ICCV 2025] DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published