A professional web application that converts comic PDFs into interactive audio experiences using advanced AI technology. The app analyzes comic panels, extracts text, and generates natural-sounding audio narration with support for multiple languages.
- AI Vision Analysis: Advanced computer vision technology analyzes comic panels, identifies speech bubbles, and determines the correct reading order
- Natural Voice Synthesis: High-quality text-to-speech with 50+ voice options and automatic gender detection
- Interactive Reading: Navigate panel by panel or page by page with synchronized audio playback
- Professional UI: Modern, responsive design with intuitive controls
- Text Translation: Automatically translate comic text to 10 most common languages
- Native Audio Narration: Generate audio in the target language with native speaker voices
- Language Selection: Choose your preferred language at the start of each session
- Real-time Translation: Text is translated and audio is generated in your selected language
- Background Processing: Pages are analyzed and audio is generated in the background
- Seamless Navigation: No delays when moving between pages - everything is preloaded
- Progress Tracking: Real-time status updates showing preload progress
- Resource Optimization: Intelligent preloading of upcoming pages while current page plays
- English: US, UK
- Spanish: Spain, Mexico
- French: France
- German: Germany
- Italian: Italy
- Portuguese: Brazil
- Chinese: China
- Hindi: India
- Python 3.8+
- Murf AI API key (for text-to-speech and translation)
- OpenAI API key (for vision analysis)
-
Clone the repository
git clone <repository-url> cd audio-comic-reader
-
Install dependencies
pip install -r requirements.txt
-
Set up environment variables
cp env.example .env # Edit .env with your API keys -
Run the application
python main.py
-
Open your browser Navigate to
http://localhost:8000
- Upload a Comic: Drag and drop or select a PDF comic file
- Choose Language: Select your preferred language for translation and audio
- Start Reading: The app will automatically analyze and process your comic
- Enjoy: Navigate through panels with synchronized audio narration
- Before Upload: Select your preferred language from the dropdown menu
- Translation: All comic text will be automatically translated to your chosen language
- Audio: Narration will be generated in your selected language with native speaker voices
- Display: The current language is shown in the reader header
- Auto-play: Automatically play audio when selecting panels
- Auto-advance: Automatically move to the next panel when audio finishes
- Voice Settings: Adjust speed and pitch of audio narration
- Page Summary: Generate audio summaries of entire pages
# Required API Keys
OPENAI_API_KEY=your_openai_api_key_here
MURF_API_KEY=your_murf_api_key_here
# Optional Settings
DEBUG=true
HOST=0.0.0.0
PORT=8000
MAX_FILE_SIZE_MB=50- OpenAI API Key: Get from OpenAI Platform
- Used for vision analysis and text extraction
- Murf AI API Key: Get from Murf AI
- Used for text-to-speech and translation services
python test_translation.pyThis will test:
- Supported languages listing
- Language validation
- Voice mapping
- Translation API connectivity
- PDFProcessor: Extracts pages from PDF files
- VisionAnalyzer: Analyzes comic panels using OpenAI Vision API
- MurfTTSService: Generates speech using Murf AI
- TranslationService: Translates text using Murf AI Translation API
- ComicReader: Orchestrates the reading experience
- PreloadManager: Handles background processing and preloading of upcoming pages
POST /upload: Upload comic with language preferencePOST /analyze-page/{session_id}/{page_num}: Analyze comic pagePOST /translate-panels/{session_id}: Translate panel textPOST /translate-and-generate-audio/{session_id}: Translate and generate audioGET /languages: Get supported languagesGET /session/{session_id}/status: Get session status with preload statisticsGET /session/{session_id}/preload-status/{page_num}: Get preload status for specific page
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Murf AI for text-to-speech and translation services
- OpenAI for vision analysis capabilities
- FastAPI for the web framework
- Bootstrap for the UI components