🎉 DINOv3 Object Tracking Demo

Blogpost

🎉 DINOv3 Object Tracking Demo

⚠️ It is not an official Meta product.

This project shows how to track objects in videos using the powerful DINOv3 model. Let's dive in! 🏊‍♂️

🦖 What is DINOv3?

DINOv3 is a self-supervised vision transformer (ViT) model created by Meta.
It can:

Understand images without needing labeled data
Produce super-robust feature embeddings for image patches
Can be used for image segmentation, object tracking, zero-shot classification, and more
Works even if the object rotates, scales, or changes appearance

🤓 In short: DinoV3 just knows. Everything. Period.

💪 What does this project do?

This project is a fun demo of object tracking on videos using DINOv3.

How it works:

Take the first frame of your video.
Click on the object you want to track using your mouse.
Pass the frame through DINOv3, which splits the image into patches. Each patch gets its own feature vector. We are interested in the feature vector of the user' selected patch.
Compute the cosine similarity between the feature vector of the selected patch and all other patches of other frames.
Use these similarities to create a similarity heatmap. More 🟠 "orange" - more similar!

🏃‍♂️ How to install and run

Step 1: Create a virtual environment

python -m venv dino-venv
source dino-venv/Scripts/activate

Step 2: Install dependencies

pip install onnxruntime
pip install opencv-python
pip install tqdm matplotlib

Step 3: Download the model

Go to HuggingFace ONNX community and download a DINOv3 model.
Place it in the model/ folder.
We used fp16 ViT-S, but you can try any other variant

Step 4: Config is "all you need" 😅

Open config.py and set path_to_input_video to your video file
The video will be cropped to a square and resized to 224×224 for convenience

Step 5: RUN!

python run.py

The first frame will appear
Click on the object you want to track
The script will process all frames and save a tracked video 🎥

🔑 Licenses

Code in this repository is licensed under the MIT License.
The DINOv3 model weights are licensed under the DINOv3 License by Meta.
Weights were downloaded from HuggingFace ONNX community.
By using the model weights, you agree to the terms of the DINOv3 License.
Images/Videos used in this project are sourced from Pixabay and Unsplash under their respective licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
data		data
interaction		interaction
preprocessing		preprocessing
resources		resources
results		results
tracking		tracking
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE-DINOv3		LICENSE-DINOv3
README.md		README.md
config.yaml		config.yaml
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎉 DINOv3 Object Tracking Demo

🦖 What is DINOv3?

💪 What does this project do?

🏃‍♂️ How to install and run

Step 1: Create a virtual environment

Step 2: Install dependencies

Step 3: Download the model

Step 4: Config is "all you need" 😅

Step 5: RUN!

🔑 Licenses

About

Uh oh!

Releases 1

Languages

License

Vitgracer/DinoV3-Object-Tracking

Folders and files

Latest commit

History

Repository files navigation

🎉 DINOv3 Object Tracking Demo

🦖 What is DINOv3?

💪 What does this project do?

🏃‍♂️ How to install and run

Step 1: Create a virtual environment

Step 2: Install dependencies

Step 3: Download the model

Step 4: Config is "all you need" 😅

Step 5: RUN!

🔑 Licenses

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Languages