This project implements a training-free 4D human reconstruction and optimization pipeline:
Single Human Image → HunyuanVideo-I2V 2D Video Generation → SAM3D Body 3D Reconstruction → Video-SDS Motion Refinement → 4D Human Sequence
We propose Video-SDS (Score Distillation Sampling) strategy to refine 4D human sequences reconstructed by SAM3D using pre-trained video diffusion models in an unsupervised manner:
- Map 4D motion parameters to differentiable video representations
- Push motion towards high-probability regions of diffusion models via Video-SDS gradients
- Significantly improve physical plausibility and text alignment without 4D GT data
(T, V, 3)Dynamic human mesh vertex sequences(T, J, 3)3D keypoint sequences(T, J, 3)Video-SDS refined keypoint sequences- Physical plausibility metrics (velocity, acceleration, foot sliding, jitter, etc.)
sam4d/
├── sam-3d-body/ # SAM3D Body repository (requires git clone)
├── HunyuanVideo-I2V/ # HunyuanVideo-I2V repository (requires git clone)
├── data/
│ ├── input_images/ # Input images
│ ├── videos/ # Generated videos
│ ├── frames/ # Extracted video frames
│ ├── sam3d_seq/ # SAM3D outputs
│ └── sam4d_sds/ # Video-SDS optimization results
├── scripts/
│ ├── setup_env_hunyuan.sh # HunyuanVideo-I2V environment setup
│ ├── setup_env_sam3d.sh # SAM3D Body environment setup
│ ├── run_hunyuan_i2v.sh # Run I2V video generation
│ ├── extract_frames.py # Video frame extraction
│ ├── run_sam3d_on_frames.py # SAM3D batch reconstruction
│ ├── build_4d_numpy.py # Build 4D sequences
│ ├── eval_motion.py # Physical evaluation
│ ├── refine_sam4d_with_sds.py # Video-SDS motion refinement
│ ├── demo_video_sds.py # Video-SDS demo script
│ ├── run_full_pipeline.sh # Basic pipeline script
│ ├── run_full_pipeline_with_sds.sh # Full pipeline (with SDS)
│ └── video_sds/ # Video-SDS module
│ ├── __init__.py
│ ├── config.py # Configuration
│ ├── params.py # Optimizable parameters
│ ├── renderer.py # Differentiable renderer
│ ├── diffusion_wrapper.py # Diffusion model wrapper
│ ├── losses.py # Loss functions
│ └── optimizer.py # SDS optimizer
├── requirements.txt
└── README.md
This project uses two separate conda environments that communicate via files.
cd sam4d
bash scripts/setup_env_hunyuan.shOr install manually:
conda create -n hunyuan_i2v python=3.11.9
conda activate hunyuan_i2v
git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V.git
cd HunyuanVideo-I2V
# PyTorch + CUDA
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# Other dependencies
pip install -r requirements.txt
pip install ninja
pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
pip install xfuser==0.4.0cd sam4d
bash scripts/setup_env_sam3d.shOr install manually:
conda create -n sam3d python=3.10
conda activate sam3d
git clone https://github.com/facebookresearch/sam-3d-body.git
cd sam-3d-body
pip install -r requirements.txt
pip install opencv-pythonSet environment variables before downloading models:
export HF_ENDPOINT=https://hf-mirror.com
export HF_TOKEN=your_hf_token_hereconda activate hunyuan_i2v
bash scripts/run_hunyuan_i2v.sh \
data/input_images/person.jpg \
"An Asian man in black clothes slowly walks forward, realistic, stable motion." \
data/videos
# Rename the generated video
mv data/videos/xxx.mp4 data/videos/person_walk.mp4conda activate sam3d
# Extract video frames
python scripts/extract_frames.py \
--video data/videos/person_walk.mp4 \
--output data/frames/person_walk
# SAM3D 3D reconstruction
python scripts/run_sam3d_on_frames.py \
--frames data/frames/person_walk \
--output data/sam3d_seq/person_walk
# Build 4D sequence
python scripts/build_4d_numpy.py \
--input data/sam3d_seq/person_walk \
--output data/
# Physical evaluation + smoothing
python scripts/eval_motion.py \
--kpts data/sam4d_keypoints3d.npy \
--verts data/sam4d_vertices.npy \
--fps 25 \
--smooth# First generate video in hunyuan_i2v environment, then switch to sam3d environment
conda activate sam3d
bash scripts/run_full_pipeline.sh \
data/input_images/person.jpg \
"A man walking forward" \
person_walk| File | Shape | Description |
|---|---|---|
sam4d_vertices.npy |
(T, V, 3) | Dynamic human mesh vertices, T=frames, V=vertices |
sam4d_keypoints3d.npy |
(T, J, 3) | 3D keypoints, J=joints |
sam4d_keypoints2d.npy |
(T, J, 2) | 2D keypoint projections |
sam4d_params.npz |
- | SMPL pose/shape parameters |
sam4d_faces.npy |
(F, 3) | Mesh triangle face indices |
sam4d_keypoints3d_eval.json |
- | Physical evaluation results |
sam4d_keypoints3d_smooth.npy |
(T, J, 3) | Smoothed keypoints |
eval_motion.py provides the following evaluation metrics:
- Velocity/Acceleration Statistics - Mean velocity, max velocity, mean acceleration
- Jitter - Rate of acceleration change, lower is smoother
- Foot Sliding Score - Foot sliding metric, lower is more physically plausible
- Penetration Score - Mesh self-penetration degree
This project is for research purposes only. Please comply with the licenses of all dependencies.