⏱️ TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning

A model that learns to actively search for relevant temporal clips through end-to-end reinforcement learning.

[📄 Paper] [🤗 Model]

📰 News

🔥 [2025/11/13] Our Model Checkpoint is uploaded!

👁️ Overview

TimeSearch-R reformulates temporal search as interleaved text–video thinking, seamlessly integrating searching video clips into the reasoning process through reinforcement learning (RL).

We introduce GRPO with Completeness Self-Verification (GRPO-CSV), which gathers searched video frames from the interleaved reasoning process and utilizes the same policy model to verify the adequacy of searched frames, thereby improving the completeness of video reasoning.

🚀 Quick Start

🏝️ Environmental Setup

Step 1: Prepare the running environment.

Prepare the environment with CUDA and PyTorch (CUDA 12.4 and PyTorch 2.6.0 in our experiments), and install the dependencies with pip.

pip install -r requirements.txt

Step 2: Run the clip server for video frame retrieval.

Download the pre-trained SigLIP model.

huggingface-cli download google/siglip-so400m-patch14-384 --local-dir /path/to/your/local/filedir

Modify the clip_as_service/server/clip_server/torch-flow.yml to use the downloaded local model path and run the SigLIP server.

cd clip_as_service/server && pip3 install .

export CUDA_VISIBLE_DEVICES=RR
export GRPC_VERBOSITY=debug
export HF_HUB_OFFLINE=1
export PYTHONPATH=$PYTHONPATH:.

python3 -m clip_server

📦️ Dataset & Model

We provide the preprocessed JSON files for Haystack-LVBench. The corresponding .mp4 video files can be downloaded from the original LongVideoBench dataset.

Download the pre-trained TimeSearch-R model.

huggingface-cli download --resume-download Time-Search/TimeSearch-R --local-dir /path/to/your/local/filedir

(Recommended) Prepare the frame cache and feature cache. To accelerate the inference and training speed, we recommend extracting the frames and features for the videos in advance.

python3 scripts/converts/prepare_frame_cache.py /path/to/your/local/data_root /path/to/your/local/haystack_lvbench_input.jsonl --num_workers 16 --target_fps 2
python3 scripts/converts/prepare_feature_cache.py /path/to/your/local/data_root /path/to/your/local/haystack_lvbench_input.jsonl --num_workers 16

📋️ Inference & Evaluation

Step 1: Run the TimeSearch-R inference.

# The IP address from the above step
export SIGLIP_URL=grpc://127.0.0.1:51000

torchrun \
    --nproc_per_node=8 \
    --master_port=24137 \
    time_r1/inference.py \
    --input_path /path/to/your/local/haystack_lvbench_input.jsonl \
    --save_path /path/to/your/local/haystack_lvbench_output \
    --data_root /path/to/your/local/data_root \
    --model_base /path/to/your/local/checkpoint \
    --prompt_template v4 \
    --use_env True \
    --use_vllm True \
    --batch_size 1 \
    --num_data_workers 2 \
    --total_video_tokens 24000 \
    --max_frames 768 \
    --max_tokens 256

Step 2: Evaluate the temporal search and QA performance.

The temporal search evaluation script is modified from T*.

# Temporal search evaluation
python time_r1/eval/eval_temporal_search.py --search_result_path /path/to/your/local/haystack_lvbench_output.jsonl

# QA evaluation
python time_r1/eval/longvideobench_eval.py /path/to/your/local/haystack_lvbench_output.jsonl

🏗️ GRPO-CSV Training

Step 1: Prepare the reward model.

We use Qwen-2.5-72B-Instruct as our reward model for LLM-as-a-judge verification.

# download Qwen-2.5-72B-Instruct model
huggingface-cli download --resume-download https://huggingface.co/Qwen/Qwen2.5-72B-Instruct --local-dir /path/to/your/local/filedir

Start a VLLM server of Qwen-2.5-72B-Instruct for LLM-as-a-judge verification.

vllm serve /path/to/your/local/filedir \
    --port 18901 \
    --gpu-memory-utilization 0.8 \
    --max-model-len 32768 \
    --tensor-parallel-size 8 \ 
    --served-model-name "judge" \
    --trust-remote-code \
    --disable-log-requests

Step 2: Train TimeSearch-R with GRPO-CSV.

We recommend using no less than 16 GPUs (2 nodes x 8 GPUs) for 7B training. For each node, we recommend using no less than 1024GB CPU RAM, as the long-form videos in training datasets can consume a large amount of memory.

We provide the training script for TimeSearch-R with GRPO-CSV in scripts/train.sh.

bash scripts/train.sh

🔖 Citation

If you find TimeSearch-R useful for your research and applications, please cite using this BibTeX:

@article{timesearch-r,
  title={TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning},
  author={Pan, Junwen and Zhang, Qizhe and Zhang, Rui and Lu, Ming and Wan, Xin and Zhang, Yuan and Liu, Chang and She, Qi},
  journal={arXiv preprint arXiv:2511.05489},
  year={2025}
}

🎟️ License

This project is released under the Apache 2.0 license.

🏅 Acknowledgements

We thank the authors of the following projects for their contributions:

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
assets		assets
clip_as_service		clip_as_service
configs		configs
scripts		scripts
time_r1		time_r1
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

⏱️ TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning

📰 News

👁️ Overview

🚀 Quick Start

🏝️ Environmental Setup

📦️ Dataset & Model

📋️ Inference & Evaluation

🏗️ GRPO-CSV Training

🔖 Citation

🎟️ License

🏅 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

Time-Search/TimeSearch-R

Folders and files

Latest commit

History

Repository files navigation

⏱️ TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning

📰 News

👁️ Overview

🚀 Quick Start

🏝️ Environmental Setup

📦️ Dataset & Model

📋️ Inference & Evaluation

🏗️ GRPO-CSV Training

🔖 Citation

🎟️ License

🏅 Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages