Skip to content

Vitgracer/DinoV3-Object-Tracking

Repository files navigation

Visits GitHub last commit GitHub repo size GitHub stars GitHub forks Python

Blogpost

🎉 DINOv3 Object Tracking Demo

⚠️ It is not an official Meta product.

This project shows how to track objects in videos using the powerful DINOv3 model. Let's dive in! 🏊‍♂️


🦖 What is DINOv3?

DINOv3 is a self-supervised vision transformer (ViT) model created by Meta.
It can:

  • Understand images without needing labeled data
  • Produce super-robust feature embeddings for image patches
  • Can be used for image segmentation, object tracking, zero-shot classification, and more
  • Works even if the object rotates, scales, or changes appearance

🤓 In short: DinoV3 just knows. Everything. Period.


💪 What does this project do?

This project is a fun demo of object tracking on videos using DINOv3.

How it works:

  1. Take the first frame of your video.
  2. Click on the object you want to track using your mouse.
  3. Pass the frame through DINOv3, which splits the image into patches. Each patch gets its own feature vector. We are interested in the feature vector of the user' selected patch.
  4. Compute the cosine similarity between the feature vector of the selected patch and all other patches of other frames.
  5. Use these similarities to create a similarity heatmap. More 🟠 "orange" - more similar!

🏃‍♂️ How to install and run

Step 1: Create a virtual environment

python -m venv dino-venv
source dino-venv/Scripts/activate 

Step 2: Install dependencies

pip install onnxruntime
pip install opencv-python
pip install tqdm matplotlib

Step 3: Download the model

  • Go to HuggingFace ONNX community and download a DINOv3 model.
  • Place it in the model/ folder.
  • We used fp16 ViT-S, but you can try any other variant

Step 4: Config is "all you need" 😅

  • Open config.py and set path_to_input_video to your video file
  • The video will be cropped to a square and resized to 224×224 for convenience

Step 5: RUN!

python run.py
  • The first frame will appear
  • Click on the object you want to track
  • The script will process all frames and save a tracked video 🎥

🔑 Licenses

  • Code in this repository is licensed under the MIT License.
  • The DINOv3 model weights are licensed under the DINOv3 License by Meta.
  • Weights were downloaded from HuggingFace ONNX community.
  • By using the model weights, you agree to the terms of the DINOv3 License.
  • Images/Videos used in this project are sourced from Pixabay and Unsplash under their respective licenses.