forked from devmaxxing/videocr-PaddleOCR
-
Notifications
You must be signed in to change notification settings - Fork 28
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Efficient text detection
Currently the start and end of a subtitle is determined by the frames set by the user. Why not make the process more efficient by scanning frames at a higer rate (e.g. 30 fps video, check every 15th frame). Only when text detected, do a more thorough search around the frame to get the correct timestamps. This way the whole process should be way faster.
My suggestion would be to make a hybrid approach by combining coarse sampling pass with your SSIM filtering:
Coarse sampling pass
- From a 30fps video, pick every Nth frame (e.g. every 15th = 2 fps).
- Run OCR only on these frames.
- This gives a rough map of where subtitles appear/disappear.
Trigger expansion around detections
- If OCR finds text in a sampled frame:
- Look at its neighbors (e.g. ±10 frames).
- Run OCR on each of those frames to pinpoint the exact subtitle start/end timestamps with SSIM to avoid re-OCRing identical frames.
Merging step
Combine text blocks that remain the same across multiple consecutive frames into a single subtitle event.
SzBenedek2006 and kribjo
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request