ChapterSummarizer

An intelligent EPUB chapter summarization tool that leverages OpenAI's GPT models to generate detailed, interview-ready notes from system design books and technical literature.

Note: Currently, this tool is designed specifically to support the "System Design Interview" books by Alex Xu. Support for general EPUB books is under development, and future updates will expand compatibility to a wider range of technical literature.

Features

Automatic Chapter Detection - Intelligently detects and extracts chapters from EPUB files
AI-Powered Summarization - Generates comprehensive, interview-ready notes using OpenAI GPT models
Smart Text Chunking - Handles large chapters by intelligently splitting text while respecting token limits
Detailed Logging - Comprehensive logging system for tracking the summarization pipeline
Interactive CLI - User-friendly command-line interface with chapter selection
Markdown Output - Summaries saved as clean, readable markdown files

Quick Start

Prerequisites

Python 3.8+
OpenAI API key

Installation

Clone the repository:

git clone <repository-url>
cd ChapterSummarizer

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create a .env file in the project root:

OPENAI_API_KEY=your_api_key_here

Usage

Interactive Mode

Run the tool and select a chapter interactively:

python main.py path/to/book.epub

For example,
python main.py data/system_design_interview_alex_xu_1.epub

List All Chapters

View all detected chapters without summarizing:

python main.py path/to/book.epub --list

Summarize Specific Chapter

Directly summarize a specific chapter by number:

python main.py path/to/book.epub --chapter 3

Custom Output Directory

Specify where to save summaries:

python main.py path/to/book.epub --chapter 3 --output-dir my_summaries

Adjust Logging Level

Control verbosity of logging output:

python main.py path/to/book.epub --log-level DEBUG

Architecture

The project follows a modular pipeline architecture:

┌─────────────┐
│  main.py    │  Entry point & CLI
└──────┬──────┘
       │
       ▼
┌─────────────┐
│ pipeline.py │  Orchestrates the workflow
└──────┬──────┘
       │
       ├─► epub_loader.py ────► Load EPUB & extract HTML
       │
       ├─► chapter_extractor.py ──► Detect & extract chapters
       │
       ├─► chunker.py ──────────► Split text into token-limited chunks
       │
       ├─► summarizer.py ───────► Generate AI summaries
       │
       └─► config.py ────────────► Configuration & prompts

Summary Output Format

The generated summaries include:

Clear, descriptive title
Comprehensive summary with technical details
Keywords, terms, and acronyms
Bullet-point breakdowns
Diagrams and code snippets (where relevant)
Real-life examples and case studies
Common pitfalls and best practices
Interview tips and takeaways

Project Structure

ChapterSummarizer/
├── main.py                  # CLI entry point
├── pipeline.py              # Summarization orchestrator
├── epub_loader.py           # EPUB parsing
├── chapter_extractor.py     # Chapter detection
├── chunker.py               # Text chunking
├── summarizer.py            # OpenAI interface
├── config.py                # Configuration
├── logger_config.py         # Logging setup
├── .env                     # Environment variables (not in git)
├── .gitignore              # Git ignore rules
├── data/                    # EPUB files (optional)
├── summaries/              # Generated summaries
└── logs/                    # Application logs

Contributing

Contributions are welcome! Areas for improvement:

Support for additional ebook formats (PDF, MOBI)
Parallel chunk processing
Summary quality metrics
Custom summarization templates
Export to different formats (PDF, HTML)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ChapterSummarizer

Features

Quick Start

Prerequisites

Installation

Usage

Interactive Mode

List All Chapters

Summarize Specific Chapter

Custom Output Directory

Adjust Logging Level

Architecture

Summary Output Format

Project Structure

Contributing

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
summaries		summaries
.gitignore		.gitignore
README.md		README.md
chapter_extractor.py		chapter_extractor.py
chunker.py		chunker.py
config.py		config.py
epub_loader.py		epub_loader.py
logger_config.py		logger_config.py
main.py		main.py
pipeline.py		pipeline.py
requirements.txt		requirements.txt
summarizer.py		summarizer.py

sahanmndl/ChapterSummarizer

Folders and files

Latest commit

History

Repository files navigation

ChapterSummarizer

Features

Quick Start

Prerequisites

Installation

Usage

Interactive Mode

List All Chapters

Summarize Specific Chapter

Custom Output Directory

Adjust Logging Level

Architecture

Summary Output Format

Project Structure

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages