An intelligent EPUB chapter summarization tool that leverages OpenAI's GPT models to generate detailed, interview-ready notes from system design books and technical literature.
Note: Currently, this tool is designed specifically to support the "System Design Interview" books by Alex Xu. Support for general EPUB books is under development, and future updates will expand compatibility to a wider range of technical literature.
- Automatic Chapter Detection - Intelligently detects and extracts chapters from EPUB files
- AI-Powered Summarization - Generates comprehensive, interview-ready notes using OpenAI GPT models
- Smart Text Chunking - Handles large chapters by intelligently splitting text while respecting token limits
- Detailed Logging - Comprehensive logging system for tracking the summarization pipeline
- Interactive CLI - User-friendly command-line interface with chapter selection
- Markdown Output - Summaries saved as clean, readable markdown files
- Python 3.8+
- OpenAI API key
- Clone the repository:
git clone <repository-url>
cd ChapterSummarizer- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Create a .env file in the project root:
OPENAI_API_KEY=your_api_key_hereRun the tool and select a chapter interactively:
python main.py path/to/book.epub
For example,
python main.py data/system_design_interview_alex_xu_1.epubView all detected chapters without summarizing:
python main.py path/to/book.epub --listDirectly summarize a specific chapter by number:
python main.py path/to/book.epub --chapter 3Specify where to save summaries:
python main.py path/to/book.epub --chapter 3 --output-dir my_summariesControl verbosity of logging output:
python main.py path/to/book.epub --log-level DEBUGThe project follows a modular pipeline architecture:
┌─────────────┐
│ main.py │ Entry point & CLI
└──────┬──────┘
│
▼
┌─────────────┐
│ pipeline.py │ Orchestrates the workflow
└──────┬──────┘
│
├─► epub_loader.py ────► Load EPUB & extract HTML
│
├─► chapter_extractor.py ──► Detect & extract chapters
│
├─► chunker.py ──────────► Split text into token-limited chunks
│
├─► summarizer.py ───────► Generate AI summaries
│
└─► config.py ────────────► Configuration & prompts
The generated summaries include:
- Clear, descriptive title
- Comprehensive summary with technical details
- Keywords, terms, and acronyms
- Bullet-point breakdowns
- Diagrams and code snippets (where relevant)
- Real-life examples and case studies
- Common pitfalls and best practices
- Interview tips and takeaways
ChapterSummarizer/
├── main.py # CLI entry point
├── pipeline.py # Summarization orchestrator
├── epub_loader.py # EPUB parsing
├── chapter_extractor.py # Chapter detection
├── chunker.py # Text chunking
├── summarizer.py # OpenAI interface
├── config.py # Configuration
├── logger_config.py # Logging setup
├── .env # Environment variables (not in git)
├── .gitignore # Git ignore rules
├── data/ # EPUB files (optional)
├── summaries/ # Generated summaries
└── logs/ # Application logs
Contributions are welcome! Areas for improvement:
- Support for additional ebook formats (PDF, MOBI)
- Parallel chunk processing
- Summary quality metrics
- Custom summarization templates
- Export to different formats (PDF, HTML)