Skip to content

Extract, process, and export YouTube podcast transcripts with AI-powered summaries. Built with Next.js, TypeScript, and yt-dlp. Features speaker detection, deduplication, TXT export format, and multi-LLM summary generation.

Notifications You must be signed in to change notification settings

shrimpy8/youtube-transcript-processor

Repository files navigation

YouTube Podcast Transcript Processor

A Next.js application for extracting, processing, and exporting YouTube podcast transcripts with advanced features including speaker detection, deduplication, and TXT export format.

πŸ“‹ Project Documentation

  • docs/PRD-UI.md - Product Requirements Document for the User Interface
  • docs/MILESTONES.md - Development milestones with validation criteria and success metrics

πŸ“Έ Screenshots

Main Interface

Main Interface Main application interface with URL input and processing options

AI Summary

AI Summary AI-powered summary generation from multiple LLM providers

Transcript Viewer

Transcript Viewer Interactive transcript viewer with search and export options

πŸ€– AI Summary Examples

The ai_summary folder contains example summaries generated by different LLM providers for podcast episodes:

  • AI Summary Folder - Contains summaries from Anthropic Sonnet 4.5, Google Gemini 2.5 Flash, and Perplexity Sonar Online

⚑ Performance Optimizations

The application includes comprehensive performance optimizations:

Runtime Optimizations

  • Session-based caching: Channel data is cached in memory for 5 minutes, enabling instant tab switching
  • Request deduplication: Prevents duplicate concurrent API requests
  • Component memoization: React.memo and useMemo prevent unnecessary re-renders
  • Optimized video enrichment: Parallel processing for video metadata fetching
  • Tab persistence: Channel tab stays mounted once viewed for faster subsequent access
  • Debounce & throttle: Optimized user input handling and API calls
  • Lazy loading: Images and heavy components loaded on demand
  • Code splitting: Automatic bundle splitting for optimal loading

Build Optimizations

  • Bundle optimization: Webpack code splitting with vendor/common chunks
  • Image optimization: AVIF and WebP format support with caching
  • Font optimization: Font display swap for faster rendering
  • Tree shaking: Unused code elimination
  • SWC minification: Fast JavaScript minification

Performance Monitoring

  • Web Vitals tracking: FCP, LCP, FID, CLS, and TTFB monitoring
  • Performance metrics: Page load time, DOM content loaded time
  • Memory usage tracking: JavaScript heap size monitoring
  • Bundle size analysis: Resource size tracking and optimization

🎨 User Interface

The application features a clean, modern interface with:

  • Tabbed interface: Video tab shows preview and transcript, Channel tab shows top 10 videos
  • Real-time processing: Visual feedback during transcript processing
  • Search functionality: Search within transcripts with highlighting
  • Export options: TXT format export with customizable options (metadata, timestamps)
  • Dark mode: Full dark mode support with system preference detection
  • Responsive design: Works seamlessly on mobile, tablet, and desktop
  • Loading skeletons: Smooth loading states for async content
  • Smooth animations: CSS transitions with reduced motion support
  • Micro-interactions: Visual feedback for all user actions

β™Ώ Accessibility

The application is built with accessibility in mind:

  • WCAG 2.1 AA compliant: Meets accessibility standards
  • Keyboard navigation: Full keyboard support for all interactions
  • Screen reader support: ARIA labels and semantic HTML
  • Focus management: Proper focus trapping and restoration
  • Color contrast: Meets WCAG contrast requirements
  • Skip links: Quick navigation for keyboard users
  • Reduced motion: Respects user's motion preferences

🎯 Current Status

Project Status: βœ… 100% Complete - All milestones achieved!

Backend Logic: βœ… 100% Complete

  • Transcript processing library with deduplication
  • Speaker detection (Host/Guest patterns)
  • TXT format export with customizable options
  • Utility functions for YouTube URL handling
  • yt-dlp integration for transcript fetching
  • Channel and playlist video discovery
  • Comprehensive error handling and edge case coverage

Frontend UI: βœ… 100% Complete

  • Complete UI with shadcn/ui components
  • URL input with real-time validation
  • Video preview with tabbed interface (Video/Channel tabs)
  • Processing options panel with localStorage persistence
  • Real-time transcript processing with progress tracking
  • Interactive transcript viewer with search functionality
  • Export controls for TXT format
  • Channel details with top 10 videos display
  • Performance optimizations (caching, memoization, request deduplication)
  • Dark mode support
  • Responsive design
  • Accessibility improvements (WCAG 2.1 AA compliant)
  • Mobile optimization with touch support
  • Performance monitoring and Web Vitals tracking
  • Cross-browser compatibility

πŸš€ Getting Started

Environment Setup

Before running the development server, you need to configure your environment variables. Create a .env.local file in the root directory:

# Copy the example and add your API keys
cp .env.example .env.local  # If .env.example exists
# Or create .env.local manually

Add your API keys to .env.local:

# Anthropic API Configuration (Required for AI Summary feature)
ANTHROPIC_API_KEY=sk-ant-your-api-key-here
ANTHROPIC_MODEL=claude-sonnet-4-20250514
ANTHROPIC_MODEL_NAME=Anthropic Sonnet 4.5

# Google Gemini API Configuration (Optional)
GOOGLE_GEMINI_API_KEY=your_google_gemini_api_key_here
GOOGLE_GEMINI_MODEL=gemini-2.5-flash
GOOGLE_GEMINI_MODEL_NAME=Google Gemini 2.5 Flash

# Perplexity API Configuration (Optional)
PERPLEXITY_API_KEY=your_perplexity_api_key_here
PERPLEXITY_MODEL=sonar-online
PERPLEXITY_MODEL_NAME=Perplexity Sonar Online

Note: The ANTHROPIC_API_KEY is required if you want to use the AI Summary feature. You can get your API key from Anthropic's Console.

For more details, see docs/ENV_VARIABLES.md.

Running the Development Server

npm run dev
# or
yarn dev
# or
pnpm dev
# or
bun dev

Open http://localhost:3000 with your browser to see the result.

You can start editing the page by modifying app/page.tsx. The page auto-updates as you edit the file.

This project uses next/font to automatically optimize and load Geist, a new font family for Vercel.

πŸ› οΈ Tech Stack

  • Framework: Next.js 15+ (App Router)
  • Language: TypeScript 5+
  • Styling: Tailwind CSS 4+
  • UI Components: shadcn/ui (to be installed)
  • React: 19+

πŸ“¦ Features

βœ… Core Features

  • βœ… YouTube URL validation and parsing (multiple formats)
  • βœ… Transcript processing with deduplication
  • βœ… Automatic speaker detection (Host/Guest)
  • βœ… TXT format export with customizable options
  • βœ… Single video transcript processing
  • βœ… Channel and playlist video browsing
  • βœ… Interactive transcript viewer with search
  • βœ… Real-time processing options with persistence
  • βœ… Channel information display with top 10 videos
  • βœ… AI-powered transcript summaries (Anthropic, Google Gemini, Perplexity)

βœ… Performance & Optimization

  • βœ… Session-based caching for instant tab switching
  • βœ… Request deduplication to prevent duplicate API calls
  • βœ… Component memoization (React.memo, useMemo, useCallback)
  • βœ… Code splitting and lazy loading
  • βœ… Bundle optimization and tree shaking
  • βœ… Image optimization (AVIF, WebP)
  • βœ… Performance monitoring (Web Vitals tracking)
  • βœ… Debounce and throttle utilities

βœ… User Experience

  • βœ… Dark mode support with system preference detection
  • βœ… Responsive mobile design with touch optimization
  • βœ… Loading skeletons for smooth loading states
  • βœ… Smooth animations with reduced motion support
  • βœ… Micro-interactions and visual feedback
  • βœ… Error handling with recovery options
  • βœ… Empty states with helpful messages
  • βœ… Comprehensive error boundaries

βœ… Accessibility

  • βœ… WCAG 2.1 AA compliance
  • βœ… Full keyboard navigation support
  • βœ… Screen reader optimization
  • βœ… ARIA labels on all interactive elements
  • βœ… Focus management and trapping
  • βœ… Color contrast compliance
  • βœ… Skip links for quick navigation

🚧 Future Enhancements

  • Transcript history and local storage
  • Advanced speaker identification (ML-based)
  • Transcript summarization
  • Multi-language support
  • Browser extension

πŸ“– Development Milestones

See docs/MILESTONES.md for detailed development plan. All 9 milestones are now complete:

  1. βœ… Foundation & Setup
  2. βœ… URL Input & Validation
  3. βœ… Transcript Fetching & API Integration
  4. βœ… Processing Options UI
  5. βœ… Processing Integration
  6. βœ… Transcript Viewer
  7. βœ… Export Functionality
  8. βœ… Error Handling & Edge Cases
  9. βœ… Polish & Optimization

Status: πŸŽ‰ 100% Complete - All milestones achieved with comprehensive testing!

πŸ—οΈ Project Structure

src/
β”œβ”€β”€ app/                    # Next.js App Router
β”‚   β”œβ”€β”€ api/               # API routes
β”‚   β”‚   β”œβ”€β”€ transcript/    # Transcript fetching endpoints
β”‚   β”‚   β”œβ”€β”€ channel/       # Channel information endpoint
β”‚   β”‚   └── discover/      # Video discovery endpoint
β”‚   β”œβ”€β”€ layout.tsx         # Root layout with theme provider
β”‚   └── page.tsx           # Home page with main UI
β”œβ”€β”€ components/            # React components
β”‚   β”œβ”€β”€ ui/               # shadcn/ui components
β”‚   β”‚   └── skeleton.tsx  # Loading skeleton component
β”‚   β”œβ”€β”€ layout/           # Layout components (Header, Footer, Container)
β”‚   β”œβ”€β”€ features/         # Feature-specific components
β”‚   β”‚   β”œβ”€β”€ VideoPreview.tsx      # Video metadata and tabs
β”‚   β”‚   β”œβ”€β”€ ChannelDetails.tsx    # Channel info and top videos
β”‚   β”‚   β”œβ”€β”€ TranscriptViewer.tsx  # Transcript display with search
β”‚   β”‚   β”œβ”€β”€ ProcessingOptions.tsx # Processing configuration
β”‚   β”‚   β”œβ”€β”€ ExportControls.tsx    # Export functionality
β”‚   β”‚   β”œβ”€β”€ ErrorDisplay.tsx      # Error display component
β”‚   β”‚   β”œβ”€β”€ EmptyState.tsx        # Empty state components
β”‚   β”‚   └── RetryButton.tsx      # Retry action component
β”‚   └── ErrorBoundary.tsx # React error boundary
β”œβ”€β”€ lib/                   # Utility functions
β”‚   β”œβ”€β”€ transcript-processor.ts  # Processing logic
β”‚   β”œβ”€β”€ ytdlp-service.ts         # yt-dlp integration
β”‚   β”œβ”€β”€ api-client.ts            # API client with caching
β”‚   β”œβ”€β”€ channel-cache.ts          # Session-based caching
β”‚   β”œβ”€β”€ youtube-validator.ts    # URL validation
β”‚   β”œβ”€β”€ performance-utils.ts     # Performance utilities
β”‚   β”œβ”€β”€ accessibility-utils.ts   # Accessibility helpers
β”‚   β”œβ”€β”€ mobile-utils.ts          # Mobile optimization
β”‚   β”œβ”€β”€ performance-monitor.ts   # Performance monitoring
β”‚   β”œβ”€β”€ animations.ts            # Animation utilities
β”‚   └── utils.ts                # General utilities
β”œβ”€β”€ hooks/                 # Custom React hooks
β”‚   β”œβ”€β”€ useChannelData.ts        # Channel data with caching
β”‚   β”œβ”€β”€ useTranscriptProcessing.ts # Transcript processing
β”‚   β”œβ”€β”€ useProcessingOptions.ts  # Options management
β”‚   └── useUrlValidation.ts      # URL validation
└── types/                 # TypeScript definitions
    └── index.ts          # Type definitions

πŸ§ͺ Testing

The project includes comprehensive testing:

  • Unit Tests: Vitest + React Testing Library (80%+ coverage)
  • Integration Tests: API routes and utility functions
  • E2E Tests: Playwright for user flows and cross-browser testing
  • Performance Tests: Web Vitals and bundle size monitoring
  • Accessibility Tests: WCAG compliance and keyboard navigation

Run tests:

npm test              # Unit tests
npm run test:coverage # With coverage report
npm run test:e2e      # E2E tests

πŸ“š Documentation

πŸ“ Learn More

🚒 Deployment

Deploy on Vercel

The easiest way to deploy your Next.js app is to use the Vercel Platform.

Pre-Deployment Checklist

  • Set all required environment variables in Vercel dashboard
  • Ensure yt-dlp binary is available in deployment environment
  • Verify API keys are configured correctly
  • Run npm run build locally to verify build succeeds
  • Run npm run test:e2e to verify E2E tests pass
  • Check bundle size meets performance targets (< 1MB initial JS)

Performance Targets

  • βœ… Page load time < 2 seconds
  • βœ… Lighthouse Performance score > 90
  • βœ… Lighthouse Accessibility score > 95
  • βœ… Bundle size < 1MB initial JavaScript
  • βœ… Memory usage < 100MB typical operations

πŸŽ‰ Project Completion

This project has successfully completed all 9 development milestones with:

  • βœ… Comprehensive error handling and edge case coverage
  • βœ… Full accessibility compliance (WCAG 2.1 AA)
  • βœ… Performance optimizations and monitoring
  • βœ… Mobile-first responsive design
  • βœ… Cross-browser compatibility
  • βœ… Extensive test coverage (unit, integration, E2E)

Ready for production deployment! πŸš€

About

Extract, process, and export YouTube podcast transcripts with AI-powered summaries. Built with Next.js, TypeScript, and yt-dlp. Features speaker detection, deduplication, TXT export format, and multi-LLM summary generation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages