Skip to content

[Feature request] Efficient text detection #50

@Morv55555

Description

@Morv55555

Efficient text detection

Currently the start and end of a subtitle is determined by the frames set by the user. Why not make the process more efficient by scanning frames at a higer rate (e.g. 30 fps video, check every 15th frame). Only when text detected, do a more thorough search around the frame to get the correct timestamps. This way the whole process should be way faster.

My suggestion would be to make a hybrid approach by combining coarse sampling pass with your SSIM filtering:

Coarse sampling pass

  • From a 30fps video, pick every Nth frame (e.g. every 15th = 2 fps).
  • Run OCR only on these frames.
  • This gives a rough map of where subtitles appear/disappear.

Trigger expansion around detections

  • If OCR finds text in a sampled frame:
  • Look at its neighbors (e.g. ±10 frames).
  • Run OCR on each of those frames to pinpoint the exact subtitle start/end timestamps with SSIM to avoid re-OCRing identical frames.

Merging step
Combine text blocks that remain the same across multiple consecutive frames into a single subtitle event.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions