[Feature request] Efficient text detection

### Efficient text detection

Currently the start and end of a subtitle is determined by the frames set by the user. Why not make the process more efficient by scanning frames at a higer rate (e.g. 30 fps video, check every 15th frame). Only when text detected, do a more thorough search around the frame to get the correct timestamps. This way the whole process should be way faster.

My suggestion would be to make a hybrid approach by combining coarse sampling pass with your SSIM filtering:

**Coarse sampling pass**
- From a 30fps video, pick every Nth frame (e.g. every 15th = 2 fps).
- Run OCR only on these frames.
- This gives a rough map of where subtitles appear/disappear.

**Trigger expansion around detections**

- If OCR finds text in a sampled frame:
- Look at its neighbors (e.g. ±10 frames).
- Run OCR on each of those frames to pinpoint the exact subtitle start/end timestamps with SSIM to avoid re-OCRing identical frames.

**Merging step**
Combine text blocks that remain the same across multiple consecutive frames into a single subtitle event.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request] Efficient text detection #50

Efficient text detection

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Feature request] Efficient text detection #50

Description

Efficient text detection

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions