Welcome to Prompt2Sign!
This repository stores the preprocessed data for the paper:
SignLLM: Sign Languages Production Large Language Models.
Note: Please prioritize using the DWPose extraction and preprocessing data on the homepage, as this is compatible with almost all Pose2Vid models currently available. I believe this will contribute to the development of the field.
[2025.07.30] In May, we developed a faster tool. However, for some beginners, it was difficult for them to quickly perform various video processing tasks. Now, we have added a new "Pipeline" folder here, which is designed to handle all sign language videos more smoothly. This will be our new processing standard. The previous dataset page has been deprecated.
[2025.07.10] Our paper has been accepted by the ICCV Workshop! In addition, we provide the Original DWPose keypoint npz file for your use!
[2025.05.24] We have recently developed a tool named fast_dwpose for minimizing the extraction and visualization of DW Pose, and we hope it will be helpful to everyone.
[2025.04.18] Surprise: We have released How2Sign new compressed data based on DWPose, and an upgraded version of the SignLLM-based application will be launched strongly in the future.
[2025.04.01] IMPORTANT: We will try to provide a new compression solution (maybe based DWpose) at some point. Therefore, for unreleased preprocessed data and for existing data processing, the best approach is to download the original dataset and then process it using our processing tools.
[2025.03.31] The prompt template has been updated, more data information has been updated. In the past, I've been wanting to optimize filtering, re-normalize according to body type and improve data quality, this make me have severe procrastination. And later I noticed that DWpose might be a better training method, so unreleased data will not be maintained because our time should spent on better data formats.
[2024.06.30] The Jupyer Notebook and Docker for data processing has been released.
[2024.05.17] The arXiv version of the paper is now available.
[2024.01.16] Prompt2Sign homepage is available and data is expected to be released after accept (maybe at the end of 2024, so don't rush).
[2023.12.14] We have made supplementary materials and demo available at this page.
[2023.11.04] We have made Prompt2Sign and Tools available at GitHub. Check out here.
For further questions and suggestions, please only contact Sen Fang or SignLLM.
If there are any commercial collaborations, funding arrangements, or sign language cooperation projects, please send the email to Sen's current advisor to discuss the details (and cc Sen).
Prompt2Sign is first comprehensive multilingual sign language dataset, which uses tools to automate the acquisition and processing of sign language videos on the web, is an evolving data set that is efficient, lightweight, reducing the previous shortcomings. The details of the are available at https://signllm.github.io/Prompt2Sign/.
Current languages include: American Sign Language (ASL), German Sign Language (GSL, Alias DGS), Swiss German Sign Language (DSGS), French Sign Language of Switzerland (LSF-CH), Italian Sign Language of Switzerland (LIS-CH), Argentine Sign Language (Lengua de Señas Argentina, LSA), Korean Sign Language (KSL), and Turkish Sign Language (TSL).
Dataset Summary
| Name | Language | Vocab. | Duration (h) | Signers | Multiview | Transcription | Gloss | Pose | Depth | Speech | Prompt | Compress |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Video-Based CSL | CSL | 178 | 100 | 50 | ❌ | ✔️ | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ |
| SIGNUM | GSL | 450 | 55 | 25 | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ | ❌ |
| RWTH-Phoenix-2014T | GSL | 3k | 11 | 9 | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ | ❌ |
| Public DGS Corpus | GSL | -- | 50 | 327 | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ |
| BSL Corpus | BSL | 5k | -- | 249 | ❌ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ | ❌ |
| NCSLGR | ASL | 1.8k | 5.3 | 4 | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ❌ | ❌ | ❌ |
| How2Sign | ASL | 16k | 79 | 11 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ |
| Prompt2Sign (ours) | Multilingual | 40k | 200 | 40 | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Please cite the following paper when using Prompt2Sign in your research:
@misc{fang2025signllmsignlanguageproduction,
title={SignLLM: Sign Language Production Large Language Models},
author={Sen Fang and Chen Chen and Lei Wang and Ce Zheng and Chunyu Sui and Yapeng Tian},
year={2025},
eprint={2405.10718},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2405.10718},
}
@misc{fang2025signdiffdiffusionmodelamerican,
title={SignDiff: Diffusion Model for American Sign Language Production},
author={Sen Fang and Chunyu Sui and Yanghao Zhou and Xuedong Zhang and Hongbin Zhong and Yapeng Tian and Chen Chen},
year={2025},
eprint={2308.16082},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2308.16082},
}