A powerful video analysis and retrieval system that processes videos to extract meaningful information and enables natural language queries to find relevant video segments.
- Video Processing: Extract frames, audio, and generate video clips
- Object Detection: Identify objects in video frames using YOLOv8
- Audio Transcription: Convert speech to text using OpenAI's Whisper
- OCR (Optical Character Recognition): Extract text from video frames using Tesseract
- Semantic Search: Find relevant video segments using natural language queries
- Web Interface: User-friendly Gradio interface for easy interaction
- Python 3.8+
- FFmpeg (for audio/video processing)
- Tesseract OCR (for text recognition)
-
Clone the repository:
git clone <repository-url> cd ANN-Video-RAG
-
Create and activate a virtual environment:
python -m venv venv .\venv\Scripts\activate # On Windows source venv/bin/activate # On Linux/Mac
-
Install the required packages:
pip install -r requirements.txt
-
Install system dependencies:
- FFmpeg: Download from ffmpeg.org and add to PATH
- Tesseract OCR: Download from GitHub - tesseract-ocr/tesseract and add to PATH
-
Run the application:
python main.py
-
Open your web browser and go to:
http://127.0.0.1:7860 -
Upload a video file and click "Process Video"
-
Once processed, enter natural language queries to find relevant video segments
main.py: Main application coderequirements.txt: Python dependenciesuploads/: Directory for uploaded videosprocessed/: Directory for processed data and FAISS indexclips/: Directory for generated video clips
This project is licensed under the MIT License - see the LICENSE file for details.
- YOLOv8 for object detection
- OpenAI Whisper for speech recognition
- Tesseract OCR for text recognition
- FAISS for efficient similarity search
- Gradio for the web interface