Skip to content

ChengShiest/sam4d

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SAM4D: Zero-Training SAM3D Body + HunyuanVideo-I2V + Video-SDS → 4D Human Reconstruction

Overview

This project implements a training-free 4D human reconstruction and optimization pipeline:

Single Human Image → HunyuanVideo-I2V 2D Video Generation → SAM3D Body 3D Reconstruction → Video-SDS Motion Refinement → 4D Human Sequence

🔥 Core Innovation: Video-SDS Motion Refinement

We propose Video-SDS (Score Distillation Sampling) strategy to refine 4D human sequences reconstructed by SAM3D using pre-trained video diffusion models in an unsupervised manner:

  • Map 4D motion parameters to differentiable video representations
  • Push motion towards high-probability regions of diffusion models via Video-SDS gradients
  • Significantly improve physical plausibility and text alignment without 4D GT data

Outputs

  • (T, V, 3) Dynamic human mesh vertex sequences
  • (T, J, 3) 3D keypoint sequences
  • (T, J, 3) Video-SDS refined keypoint sequences
  • Physical plausibility metrics (velocity, acceleration, foot sliding, jitter, etc.)

Project Structure

sam4d/
├── sam-3d-body/              # SAM3D Body repository (requires git clone)
├── HunyuanVideo-I2V/         # HunyuanVideo-I2V repository (requires git clone)
├── data/
│   ├── input_images/         # Input images
│   ├── videos/               # Generated videos
│   ├── frames/               # Extracted video frames
│   ├── sam3d_seq/            # SAM3D outputs
│   └── sam4d_sds/            # Video-SDS optimization results
├── scripts/
│   ├── setup_env_hunyuan.sh       # HunyuanVideo-I2V environment setup
│   ├── setup_env_sam3d.sh         # SAM3D Body environment setup
│   ├── run_hunyuan_i2v.sh         # Run I2V video generation
│   ├── extract_frames.py          # Video frame extraction
│   ├── run_sam3d_on_frames.py     # SAM3D batch reconstruction
│   ├── build_4d_numpy.py          # Build 4D sequences
│   ├── eval_motion.py             # Physical evaluation
│   ├── refine_sam4d_with_sds.py   # Video-SDS motion refinement
│   ├── demo_video_sds.py          # Video-SDS demo script
│   ├── run_full_pipeline.sh       # Basic pipeline script
│   ├── run_full_pipeline_with_sds.sh  # Full pipeline (with SDS)
│   └── video_sds/                 # Video-SDS module
│       ├── __init__.py
│       ├── config.py              # Configuration
│       ├── params.py              # Optimizable parameters
│       ├── renderer.py            # Differentiable renderer
│       ├── diffusion_wrapper.py   # Diffusion model wrapper
│       ├── losses.py              # Loss functions
│       └── optimizer.py           # SDS optimizer
├── requirements.txt
└── README.md

Installation

This project uses two separate conda environments that communicate via files.

1. HunyuanVideo-I2V Environment

cd sam4d
bash scripts/setup_env_hunyuan.sh

Or install manually:

conda create -n hunyuan_i2v python=3.11.9
conda activate hunyuan_i2v

git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V.git
cd HunyuanVideo-I2V

# PyTorch + CUDA
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# Other dependencies
pip install -r requirements.txt
pip install ninja
pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
pip install xfuser==0.4.0

2. SAM3D Body Environment

cd sam4d
bash scripts/setup_env_sam3d.sh

Or install manually:

conda create -n sam3d python=3.10
conda activate sam3d

git clone https://github.com/facebookresearch/sam-3d-body.git
cd sam-3d-body

pip install -r requirements.txt
pip install opencv-python

HuggingFace Mirror Setup

Set environment variables before downloading models:

export HF_ENDPOINT=https://hf-mirror.com
export HF_TOKEN=your_hf_token_here

Usage

Method 1: Step-by-Step Execution

Step 1: Generate Video (hunyuan_i2v environment)

conda activate hunyuan_i2v
bash scripts/run_hunyuan_i2v.sh \
    data/input_images/person.jpg \
    "An Asian man in black clothes slowly walks forward, realistic, stable motion." \
    data/videos

# Rename the generated video
mv data/videos/xxx.mp4 data/videos/person_walk.mp4

Steps 2-5: Reconstruction and Evaluation (sam3d environment)

conda activate sam3d

# Extract video frames
python scripts/extract_frames.py \
    --video data/videos/person_walk.mp4 \
    --output data/frames/person_walk

# SAM3D 3D reconstruction
python scripts/run_sam3d_on_frames.py \
    --frames data/frames/person_walk \
    --output data/sam3d_seq/person_walk

# Build 4D sequence
python scripts/build_4d_numpy.py \
    --input data/sam3d_seq/person_walk \
    --output data/

# Physical evaluation + smoothing
python scripts/eval_motion.py \
    --kpts data/sam4d_keypoints3d.npy \
    --verts data/sam4d_vertices.npy \
    --fps 25 \
    --smooth

Method 2: Using Full Pipeline Script

# First generate video in hunyuan_i2v environment, then switch to sam3d environment
conda activate sam3d
bash scripts/run_full_pipeline.sh \
    data/input_images/person.jpg \
    "A man walking forward" \
    person_walk

Output Files

File Shape Description
sam4d_vertices.npy (T, V, 3) Dynamic human mesh vertices, T=frames, V=vertices
sam4d_keypoints3d.npy (T, J, 3) 3D keypoints, J=joints
sam4d_keypoints2d.npy (T, J, 2) 2D keypoint projections
sam4d_params.npz - SMPL pose/shape parameters
sam4d_faces.npy (F, 3) Mesh triangle face indices
sam4d_keypoints3d_eval.json - Physical evaluation results
sam4d_keypoints3d_smooth.npy (T, J, 3) Smoothed keypoints

Physical Evaluation Metrics

eval_motion.py provides the following evaluation metrics:

  1. Velocity/Acceleration Statistics - Mean velocity, max velocity, mean acceleration
  2. Jitter - Rate of acceleration change, lower is smoother
  3. Foot Sliding Score - Foot sliding metric, lower is more physically plausible
  4. Penetration Score - Mesh self-penetration degree

References


License

This project is for research purposes only. Please comply with the licenses of all dependencies.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published