RT-Pose

Real-time (GPU) pose estimation pipeline with 🤗 Transformers

Notebooks

🚀🚀🚀 Walkthrough for optimizations done, speeding up the pipeline 9 -> 47 FPS - notebook
🎥 Run inference on video - notebook

Installation

[Optional] It's recommended to run with uv for faster installation. First, install uv:

pip install uv

Install rt_pose (you can ignore uv in case you want to install with pure pip)

uv pip install rt-pose        # with minimal dependencies
uv pip install rt-pose[demo]  # with additional dependencies to run `scripts/` and `notebooks/`

Python snippet

import torch
from rt_pose import PoseEstimationPipeline

# Load pose estimation pipeline
pipeline = PoseEstimationPipeline(
    object_detection_checkpoint="PekingU/rtdetr_r50vd_coco_o365",
    pose_estimation_checkpoint="usyd-community/vitpose-plus-small",
    device="cuda",
    dtype=torch.bfloat16,
    compile=False,  # or True to get more speedup
)

# Run pose estimation on image
output = pipeline(image)

# output.person_boxes_xyxy (`torch.Tensor`): 
#   of shape `(N, 4)` with `N` boxes of detected persons on the image in (x_min, y_min, x_max, y_max) format
# output.keypoints_xy (`torch.Tensor`):
#   of shape `(N, 17, 2)` with 17 keypoints per each person
# output.scores (`torch.Tensor`): 
#   of shape (N, 17) with corresponding scores (aka confidence) for each keypoint

# Visualize with supervision/matplotlib/opencv
# see ./scripts/run_on_image.py

Other object detection checkpoints on the Hub:

Other pose estimation checkpoints on the Hub:

ViTPose and ViTPose++

Run pose estimation on image

--input can be URL or path

python scripts/run_on_image.py \
    --input "https://res-3.cloudinary.com/dostuff-media/image/upload//w_1200,q_75,c_limit,f_auto/v1511369692/page-image-10656-892d1842-b089-4a7a-80f1-5be99b2b3454.png" \
    --output "results/image.png" \
    --device "cuda:0"

Run pose estimation on video

--input can be URL or path
--dtype it's recommended to run in bfloat16 precision to get the best precision/speed tradeoff
--compile you can compile models in the pipeline to get even more speed up (x2), but compilation can be quite long, so it makes sense to activate for long videos only.

python scripts/run_on_video.py \
    --input "https://huggingface.co/datasets/qubvel-hf/assets/blob/main/rt_pose_break_dance_v1.mp4" \
    --output "results/rt_pose_break_dance_v1_annotated.mp4" \
    --device "cuda:0" \
    --dtype bfloat16

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
notebooks		notebooks
rt_pose		rt_pose
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RT-Pose

Notebooks

Installation

Quick start

Python snippet

Run pose estimation on image

Run pose estimation on video

About

Uh oh!

Releases 2

Packages

Languages

License

qubvel/rt-pose

Folders and files

Latest commit

History

Repository files navigation

RT-Pose

Notebooks

Installation

Quick start

Python snippet

Run pose estimation on image

Run pose estimation on video

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages