Skip to content

An intelligent EPUB chapter summarization tool that leverages OpenAI's GPT models to generate detailed, interview-ready notes from system design books and technical literature.

Notifications You must be signed in to change notification settings

sahanmndl/ChapterSummarizer

Repository files navigation

ChapterSummarizer

An intelligent EPUB chapter summarization tool that leverages OpenAI's GPT models to generate detailed, interview-ready notes from system design books and technical literature.

Note: Currently, this tool is designed specifically to support the "System Design Interview" books by Alex Xu. Support for general EPUB books is under development, and future updates will expand compatibility to a wider range of technical literature.

Features

  • Automatic Chapter Detection - Intelligently detects and extracts chapters from EPUB files
  • AI-Powered Summarization - Generates comprehensive, interview-ready notes using OpenAI GPT models
  • Smart Text Chunking - Handles large chapters by intelligently splitting text while respecting token limits
  • Detailed Logging - Comprehensive logging system for tracking the summarization pipeline
  • Interactive CLI - User-friendly command-line interface with chapter selection
  • Markdown Output - Summaries saved as clean, readable markdown files

Quick Start

Prerequisites

  • Python 3.8+
  • OpenAI API key

Installation

  1. Clone the repository:
git clone <repository-url>
cd ChapterSummarizer
  1. Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Create a .env file in the project root:
OPENAI_API_KEY=your_api_key_here

Usage

Interactive Mode

Run the tool and select a chapter interactively:

python main.py path/to/book.epub

For example,
python main.py data/system_design_interview_alex_xu_1.epub

List All Chapters

View all detected chapters without summarizing:

python main.py path/to/book.epub --list

Summarize Specific Chapter

Directly summarize a specific chapter by number:

python main.py path/to/book.epub --chapter 3

Custom Output Directory

Specify where to save summaries:

python main.py path/to/book.epub --chapter 3 --output-dir my_summaries

Adjust Logging Level

Control verbosity of logging output:

python main.py path/to/book.epub --log-level DEBUG

Architecture

The project follows a modular pipeline architecture:

┌─────────────┐
│  main.py    │  Entry point & CLI
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ pipeline.py │  Orchestrates the workflow
└──────┬──────┘
       │
       ├─► epub_loader.py ────► Load EPUB & extract HTML
       │
       ├─► chapter_extractor.py ──► Detect & extract chapters
       │
       ├─► chunker.py ──────────► Split text into token-limited chunks
       │
       ├─► summarizer.py ───────► Generate AI summaries
       │
       └─► config.py ────────────► Configuration & prompts

Summary Output Format

The generated summaries include:

  1. Clear, descriptive title
  2. Comprehensive summary with technical details
  3. Keywords, terms, and acronyms
  4. Bullet-point breakdowns
  5. Diagrams and code snippets (where relevant)
  6. Real-life examples and case studies
  7. Common pitfalls and best practices
  8. Interview tips and takeaways

Project Structure

ChapterSummarizer/
├── main.py                  # CLI entry point
├── pipeline.py              # Summarization orchestrator
├── epub_loader.py           # EPUB parsing
├── chapter_extractor.py     # Chapter detection
├── chunker.py               # Text chunking
├── summarizer.py            # OpenAI interface
├── config.py                # Configuration
├── logger_config.py         # Logging setup
├── .env                     # Environment variables (not in git)
├── .gitignore              # Git ignore rules
├── data/                    # EPUB files (optional)
├── summaries/              # Generated summaries
└── logs/                    # Application logs

Contributing

Contributions are welcome! Areas for improvement:

  • Support for additional ebook formats (PDF, MOBI)
  • Parallel chunk processing
  • Summary quality metrics
  • Custom summarization templates
  • Export to different formats (PDF, HTML)

About

An intelligent EPUB chapter summarization tool that leverages OpenAI's GPT models to generate detailed, interview-ready notes from system design books and technical literature.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages