Real-time audio transcription and AI-powered meeting summaries using Next.js, Socket.io, and Google Gemini
ScribeAI transforms live audio into searchable, summarized transcripts in real-time. Capture meeting audio from your microphone or browser tabs (Google Meet, Zoom, YouTube) and receive instant AI-powered transcriptions with automatic summaries.
Built for: AttackCapital Technical Assignment
Timeline: 4 days
Status: ✅ Production-ready prototype
- Real-time Transcription - Live streaming audio chunks (5s intervals) to Gemini API
- Dual Input Sources
- Microphone recording
- Browser tab audio capture (Google Meet, Zoom, Spotify, YouTube)
- Pause/Resume - Control recording flow with state preservation
- AI-Powered Summaries - Automatic generation of key points, action items, and decisions
- Session Management - Complete history with search, filter, and export capabilities
- Live UI Updates - Real-time transcript display via Socket.io
Frontend
- Next.js 16.0.1
- Tailwind CSS + shadcn/ui
- Zustand (State Management)
- Socket.io Client
Backend
- Node.js 20+ with TypeScript
- Socket.io Server
- Express
- Prisma ORM
AI & APIs
- Google Gemini 2.5 Flash (Transcription)
- Gemini via Vercel AI SDK (Structured Summaries)
Database
- PostgreSQL
Authentication
- Better Auth
DevOps
- pnpm (Package Manager)
- Turbopack (Next.js Dev Server)
- Node.js 20+
- PostgreSQL 15+ (local or cloud)
- pnpm installed (
npm install -g pnpm) - Google Gemini API Key (Get one free)
- Clone Repository
git clone https://github.com/RitamPal26/ScribeAI.git
cd ScribeAI
- Install Dependencies
pnpm install
-
Setup Database
a. Create Supabase Project:
- Go to Supabase Dashboard
- Click "New Project"
- Choose a name and set a secure database password
- Wait for project initialization (~2 minutes)
b. Get Database URL:
- Go to Project Settings → Database
- Scroll to Connection String section
- Copy the Connection pooling URI (recommended for production)
- Format:
postgresql://postgres.[project-ref]:[password]@aws-0-[region].pooler.supabase.com:6543/postgres
-
Configure Environment Variables
cp .env.example .env
And put in all the required varibles
- Start Development Server
pnpm dev
ScribeAI/
├── src/
│ ├── app/ # Next.js App Router
│ │ ├── (dashboard)/
│ │ │ └── dashboard/
│ │ │ ├── page.tsx # Dashboard home
│ │ │ ├── record/ # Recording interface
│ │ │ └── sessions/ # Session history
│ │ ├── api/ # API routes
│ │ ├── login/ # Auth pages
│ │ └── signup/
│ │
│ ├── components/
│ │ ├── auth/ # Auth forms
│ │ ├── dashboard/ # Dashboard components
│ │ ├── recording/ # Recording UI
│ │ ├── sessions/ # Session cards
│ │ └── ui/ # shadcn/ui components
│ │
│ ├── hooks/
│ │ ├── useRecording.ts # Core recording logic
│ │ └── useSocket.ts # Socket.io client hook
│ │
│ ├── lib/
│ │ ├── auth.ts # Better Auth config
│ │ ├── prisma.ts # Database client
│ │ └── socket-client.ts # Socket.io setup
│ │
│ └── stores/
│ └── recordingStore.ts # Zustand state
│
├── server/
│ ├── services/
│ │ ├── audioProcessor.ts # Gemini integration
│ │ └── sessionService.ts # Database operations
│ │
│ ├── socket/
│ │ └── recording.ts # Socket.io handlers
│ │
│ └── index.ts # Server entry point
│
├── prisma/
│ └── schema.prisma # Database schema
│
└── server.js # Socket.io server
View complete transcript and download options
Automatic summary with key points and action items
Browse and manage all recorded sessions
Real-time text appears as you speak
Main dashboard showing session statistics and recent recordings
Simple landing page
For complete system architecture, data flow diagrams, and technical decisions:
👉 View Architecture Documentation
Demonstration includes:
- ✅ Microphone recording with live transcription
- ✅ Tab audio capture from YouTube video
- ✅ Pause/Resume functionality
- ✅ Stop recording & AI summary generation
- ✅ Session management & transcript export
- ✅ 5-minute continuous microphone recording
- ✅ Tab audio from Google Meet call
- ✅ Pause/Resume mid-recording
- ✅ Network interruption recovery
- ✅ 1-hour marathon session
- TypeScript strict mode enabled
- ESLint + Prettier formatting
- JSDoc comments on key functions
- Prisma type safety throughout
MIT License - See LICENSE file
Ritam Pal
- GitHub: @RitamPal26
- Email: ritamjunior26@gmail.com
Built as part of AttackCapital technical assignment (November 2025)
- Google Gemini API for powerful audio transcription
- Vercel AI SDK for structured output generation
- Better Auth for authentication
- shadcn/ui for beautiful components
- AttackCapital for giving this idea