- A state-of-the-art image-to-3D model like TRELLIS often fails to reconstruct 3D objects that can stand under gravity, even when prompted with images of stable objects.
- Our method, DSO, improves the image-to-3D model via Direct Simulation Optimization, significantly increasing the likelihood that generated 3D objects can stand, both in physical simulation and in real life when 3D printed.
- The method incurs no additional cost at test time and thus generate such objects in seconds.
- Install the TRELLIS dependencies:
conda create -n dso python=3.10
conda activate dso
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
. ./setup.sh --basic --xformers --flash-attn --diffoctreerast --spconv --mipgaussian --kaolin --nvdiffrast
pip install kaolin==0.17.0 -f https://nvidia-kaolin.s3.us-east-2.amazonaws.com/torch-2.4.0_cu124.htmlIf you run into issues, please consult the installation guide.
- Install the remaining dependencies:
pip install -r requirements.txtWe provide two checkpoints, one trained with the direct preference optimization (DPO) and the other trained with our introduced direct reward optimization (DRO).
You can download them here:
git lfs install
git clone https://huggingface.co/rayli/DSO-finetuned-TRELLISYou can generate physically-sound 3D assets conditioned on a single image using DSO-finetuned models with example.py:
from peft import LoraConfig, get_peft_model
from PIL import Image
from safetensors.torch import load_file
from trellis.pipelines import TrellisImageTo3DPipeline
from trellis.utils import postprocessing_utils
ckpt_path = "./DSO-finetuned-TRELLIS/dro-4000iters.safetensors" # "./DSO-finetuned-TRELLIS/dpo-8000iters.safetensors"
image_path = "./image-to-3D-eval-stability-under-gravity/clock-eval/01.jpg" # Or your own image
pipeline = TrellisImageTo3DPipeline.from_pretrained("JeffreyXiang/TRELLIS-image-large")
pipeline.cuda()
peft_config = LoraConfig(
r=64,
lora_alpha=128,
lora_dropout=0.0,
target_modules=["to_q", "to_kv", "to_out", "to_qkv"]
)
pipeline.models["sparse_structure_flow_model"] = get_peft_model(pipeline.models["sparse_structure_flow_model"], peft_config)
pipeline.models["sparse_structure_flow_model"].load_state_dict(load_file(ckpt_path))
image = Image.open(image_path)
image = pipeline.preprocess_image(image)
outputs = pipeline.run(
image,
seed=0,
preprocess_image=False,
formats=["gaussian", "mesh"],
sparse_structure_sampler_params={
"steps": 12,
"cfg_strength": 7.5,
},
slat_sampler_params={
"steps": 12,
"cfg_strength": 3,
},
)[0]
glb = postprocessing_utils.to_glb(
outputs['gaussian'][0],
outputs['mesh'][0],
simplify=0.95, # Ratio of triangles to remove in the simplification process
texture_size=1024, # Size of the texture used for the GLB
with_texture=True, # Disable texture for faster stability evaluation
)
glb.export("./output.glb")- Download the evaluation dataset:
git lfs install
git clone https://huggingface.co/datasets/rayli/image-to-3D-eval-stability-under-gravity- Generate 3D models using trained checkpoint:
python finetune.py --eval --config configs/eval-objaverse.yaml
python finetune.py --eval --config configs/eval-real.yaml- Compute stability and geometry metrics:
python evaluation/evaluate_stability.py --mesh_paths "./samples/objaverse_samples/*.glb"
python evaluation/evaluate_stability.py --mesh_paths "./samples/motorcycle_samples/*.glb"
python evaluation/evaluate_geometry.py --mesh_paths "./samples/objaverse_samples/*.glb" --ground_truth_paths "/path/to/objaverse/hf-objaverse-v1/glbs"- Generate synthetic training data.
First, generate / obtain a set of images saved with this structure:
📁 category/
|--- 📁 prompt1/
| |--- image1.png
| |--- image2.png
| |--- ...
|--- 📁 prompt2/
| |--- image1.png
| |--- image2.png
| |--- ...
|--- ...Then, generate the 3D models conditioned on these images:
python -m data_preprocessing.generate_synthetic_data --num_samples 4 --save_extra --output_dir /path/to/data/root --image_paths "/path/to/the/above/dir/*/*/*.png"Finally, augment the 3D models with simulation feedback and save their physical soundness score:
python -m data_preprocessing.augment_with_simulation_feedback --root_dir /path/to/data/root --num_models_per_image 4- Launch the training job:
# This runs on 4 NVIDIA A100s 80GB GPUs.
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch --multi_gpu --num_processes 4 --mixed_precision bf16 finetune.py --config configs/dpo.yaml Note you need to specify the data directory in the configuration file.
@article{li2025dso,
title = {DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness},
author = {Li, Ruining and Zheng, Chuanxia and Rupprecht, Christian and Vedaldi, Andrea},
journal = {arXiv preprint arXiv:2503.22677},
year = {2025}
}