LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization

Overview

LayerPeeler is a framework for layer-wise image vectorization that decomposes images into structured vector representations. The system uses a vision-language model (VLM) to analyze an input image and construct a hierarchical layer graph by identifying topmost visual elements. These detected elements are then passed to a fine-tuned image diffusion model, which generates clean background images with the specified elements removed. This process enables precise, layer-by-layer vectorization.

Updates

[2025.12.11]: Dataset Released
[2025.12.11]: Training Code Released
[2025.12.14]: Inference Code Released
[2025.12.29]: Vectorization Code Released

Setup

1. Environment Setup

conda create -n layerpeeler python=3.10
conda activate layerpeeler

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt

2. Merge the Pretrained Model

As described in the paper, we use a pretrained LoRA model trained on the SeedEdit dataset. Before training or inference, you must merge this LoRA checkpoint with the base model.

python merge_pretrain.py

Tip

You can update the path to the pretrained model inside merge_pretrain.py if needed.

Inference

The inference code is located in the inference directory.

1. Gemini API Setup

Caution

Gemini is not accessible in Hong Kong. Therefore, we use a third-party API from V-API to forward requests to Gemini. If you are in a region with direct access to Gemini, you can modify the BASE_URL (line 26) in inference/utils/vlm_util.py to use the official Gemini API. Additional changes may be required. We apologize for the inconvenience.

Paste your Gemini API key into the .env file:

GEMINI_API_KEY=<your_key>

2. Test Set

We provide a test set containing 250 images (with corresponding ground-truth SVG files) in the data directory. The inference script processes all images in this set.

3. Inference Scripts

Run the inference script without attention control:

cd inference
bash inference.sh

If you have full access to Gemini, you can run the inference script with attention control:

bash inference_attn.sh

Important

LoRA Weights: The LoRA weights are hosted on Hugging Face: LayerPeeler. They will be downloaded automatically during inference.
Attention Control: When using V-API to access Gemini, we found bounding box and mask detection to be unstable. As a result, inference.sh does not use the attention control mechanism described in the paper.
Output: Results are saved in the outputs directory, including the VLM responses, layer graphs, and generated images in their respective subdirectories.
VLM Model: We use Gemini 2.5 Pro for VLM reasoning because it provides reliable mask and bounding box detection. If attention control is disabled, other VLM models may also be suitable.

4. Limitations

For complex input images (e.g., those with multiple top layers to remove), the inference script may fail to produce the desired output. In such cases, trying different random seeds and inference steps may help.

Vectorization

1. Recraft API Setup

We use Recraft to vectorize the visual differences between layered PNGs and merge them into a final SVG. We chose Recraft because it offers superior accuracy and stability compared to other vectorizers.

Note: This API is not free. You must sign up for an account to obtain an API key.

Add your Recraft API key to the .env file:

RECRAFT_API_KEY=<your_key>

2. Vectorization Scripts

The vectorization code is located in the inference directory.

cd inference
python vectorize.py --base_dir outputs/testset

This command processes all subfolders within base_dir. For each folder, it calculates the difference between consecutive layer PNGs (isolating the object removed in each step), vectorizes these differences, and merges them into a final combined SVG.

Important

Tuning Extraction Parameters

Morphological Opening (--morph_kernel_size): Since the image "before removal" and "after removal" are not guaranteed to be perfectly pixel-aligned, we use morphological opening to clean up small noise artifacts.
Difference Threshold (--diff_threshold): You may need to adjust this if the layer being removed has a color very similar to the background. A higher threshold reduces noise but might miss faint details.

Training

The training code and dataset are available in the train directory. Detailed training instructions can be found in the corresponding README.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
inference		inference
train		train
.gitignore		.gitignore
README.md		README.md
merge_pretrain.py		merge_pretrain.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization

Overview

Updates

Setup

1. Environment Setup

2. Merge the Pretrained Model

Inference

1. Gemini API Setup

2. Test Set

3. Inference Scripts

4. Limitations

Vectorization

1. Recraft API Setup

2. Vectorization Scripts

Training

About

Uh oh!

Releases

Packages

Languages

kingnobro/LayerPeeler

Folders and files

Latest commit

History

Repository files navigation

LayerPeeler: Autoregressive Peeling for Layer-wise Image Vectorization

Overview

Updates

Setup

1. Environment Setup

2. Merge the Pretrained Model

Inference

1. Gemini API Setup

2. Test Set

3. Inference Scripts

4. Limitations

Vectorization

1. Recraft API Setup

2. Vectorization Scripts

Training

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages