SAM4D: Zero-Training SAM3D Body + HunyuanVideo-I2V + Video-SDS → 4D Human Reconstruction

Overview

This project implements a training-free 4D human reconstruction and optimization pipeline:

Single Human Image → HunyuanVideo-I2V 2D Video Generation → SAM3D Body 3D Reconstruction → Video-SDS Motion Refinement → 4D Human Sequence

🔥 Core Innovation: Video-SDS Motion Refinement

We propose Video-SDS (Score Distillation Sampling) strategy to refine 4D human sequences reconstructed by SAM3D using pre-trained video diffusion models in an unsupervised manner:

Map 4D motion parameters to differentiable video representations
Push motion towards high-probability regions of diffusion models via Video-SDS gradients
Significantly improve physical plausibility and text alignment without 4D GT data

Outputs

(T, V, 3) Dynamic human mesh vertex sequences
(T, J, 3) 3D keypoint sequences
(T, J, 3) Video-SDS refined keypoint sequences
Physical plausibility metrics (velocity, acceleration, foot sliding, jitter, etc.)

Project Structure

sam4d/
├── sam-3d-body/              # SAM3D Body repository (requires git clone)
├── HunyuanVideo-I2V/         # HunyuanVideo-I2V repository (requires git clone)
├── data/
│   ├── input_images/         # Input images
│   ├── videos/               # Generated videos
│   ├── frames/               # Extracted video frames
│   ├── sam3d_seq/            # SAM3D outputs
│   └── sam4d_sds/            # Video-SDS optimization results
├── scripts/
│   ├── setup_env_hunyuan.sh       # HunyuanVideo-I2V environment setup
│   ├── setup_env_sam3d.sh         # SAM3D Body environment setup
│   ├── run_hunyuan_i2v.sh         # Run I2V video generation
│   ├── extract_frames.py          # Video frame extraction
│   ├── run_sam3d_on_frames.py     # SAM3D batch reconstruction
│   ├── build_4d_numpy.py          # Build 4D sequences
│   ├── eval_motion.py             # Physical evaluation
│   ├── refine_sam4d_with_sds.py   # Video-SDS motion refinement
│   ├── demo_video_sds.py          # Video-SDS demo script
│   ├── run_full_pipeline.sh       # Basic pipeline script
│   ├── run_full_pipeline_with_sds.sh  # Full pipeline (with SDS)
│   └── video_sds/                 # Video-SDS module
│       ├── __init__.py
│       ├── config.py              # Configuration
│       ├── params.py              # Optimizable parameters
│       ├── renderer.py            # Differentiable renderer
│       ├── diffusion_wrapper.py   # Diffusion model wrapper
│       ├── losses.py              # Loss functions
│       └── optimizer.py           # SDS optimizer
├── requirements.txt
└── README.md

Installation

This project uses two separate conda environments that communicate via files.

1. HunyuanVideo-I2V Environment

cd sam4d
bash scripts/setup_env_hunyuan.sh

Or install manually:

conda create -n hunyuan_i2v python=3.11.9
conda activate hunyuan_i2v

git clone https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V.git
cd HunyuanVideo-I2V

# PyTorch + CUDA
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia

# Other dependencies
pip install -r requirements.txt
pip install ninja
pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
pip install xfuser==0.4.0

2. SAM3D Body Environment

cd sam4d
bash scripts/setup_env_sam3d.sh

Or install manually:

conda create -n sam3d python=3.10
conda activate sam3d

git clone https://github.com/facebookresearch/sam-3d-body.git
cd sam-3d-body

pip install -r requirements.txt
pip install opencv-python

HuggingFace Mirror Setup

Set environment variables before downloading models:

export HF_ENDPOINT=https://hf-mirror.com
export HF_TOKEN=your_hf_token_here

Usage

Method 1: Step-by-Step Execution

Step 1: Generate Video (hunyuan_i2v environment)

conda activate hunyuan_i2v
bash scripts/run_hunyuan_i2v.sh \
    data/input_images/person.jpg \
    "An Asian man in black clothes slowly walks forward, realistic, stable motion." \
    data/videos

# Rename the generated video
mv data/videos/xxx.mp4 data/videos/person_walk.mp4

Steps 2-5: Reconstruction and Evaluation (sam3d environment)

conda activate sam3d

# Extract video frames
python scripts/extract_frames.py \
    --video data/videos/person_walk.mp4 \
    --output data/frames/person_walk

# SAM3D 3D reconstruction
python scripts/run_sam3d_on_frames.py \
    --frames data/frames/person_walk \
    --output data/sam3d_seq/person_walk

# Build 4D sequence
python scripts/build_4d_numpy.py \
    --input data/sam3d_seq/person_walk \
    --output data/

# Physical evaluation + smoothing
python scripts/eval_motion.py \
    --kpts data/sam4d_keypoints3d.npy \
    --verts data/sam4d_vertices.npy \
    --fps 25 \
    --smooth

Method 2: Using Full Pipeline Script

# First generate video in hunyuan_i2v environment, then switch to sam3d environment
conda activate sam3d
bash scripts/run_full_pipeline.sh \
    data/input_images/person.jpg \
    "A man walking forward" \
    person_walk

Output Files

File	Shape	Description
`sam4d_vertices.npy`	(T, V, 3)	Dynamic human mesh vertices, T=frames, V=vertices
`sam4d_keypoints3d.npy`	(T, J, 3)	3D keypoints, J=joints
`sam4d_keypoints2d.npy`	(T, J, 2)	2D keypoint projections
`sam4d_params.npz`	-	SMPL pose/shape parameters
`sam4d_faces.npy`	(F, 3)	Mesh triangle face indices
`sam4d_keypoints3d_eval.json`	-	Physical evaluation results
`sam4d_keypoints3d_smooth.npy`	(T, J, 3)	Smoothed keypoints

Physical Evaluation Metrics

eval_motion.py provides the following evaluation metrics:

Velocity/Acceleration Statistics - Mean velocity, max velocity, mean acceleration
Jitter - Rate of acceleration change, lower is smoother
Foot Sliding Score - Foot sliding metric, lower is more physically plausible
Penetration Score - Mesh self-penetration degree

References

License

This project is for research purposes only. Please comply with the licenses of all dependencies.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SAM4D: Zero-Training SAM3D Body + HunyuanVideo-I2V + Video-SDS → 4D Human Reconstruction

Overview

🔥 Core Innovation: Video-SDS Motion Refinement

Outputs

Project Structure

Installation

1. HunyuanVideo-I2V Environment

2. SAM3D Body Environment

HuggingFace Mirror Setup

Usage

Method 1: Step-by-Step Execution

Step 1: Generate Video (hunyuan_i2v environment)

Steps 2-5: Reconstruction and Evaluation (sam3d environment)

Method 2: Using Full Pipeline Script

Output Files

Physical Evaluation Metrics

References

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
detectron2		detectron2
sam-3d-body		sam-3d-body
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

ChengShiest/sam4d

Folders and files

Latest commit

History

Repository files navigation

SAM4D: Zero-Training SAM3D Body + HunyuanVideo-I2V + Video-SDS → 4D Human Reconstruction

Overview

🔥 Core Innovation: Video-SDS Motion Refinement

Outputs

Project Structure

Installation

1. HunyuanVideo-I2V Environment

2. SAM3D Body Environment

HuggingFace Mirror Setup

Usage

Method 1: Step-by-Step Execution

Step 1: Generate Video (hunyuan_i2v environment)

Steps 2-5: Reconstruction and Evaluation (sam3d environment)

Method 2: Using Full Pipeline Script

Output Files

Physical Evaluation Metrics

References

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages