AIE7 Certification Challenge - Advanced RAG system for federal student loan customer service
An intelligent assistant that combines official federal loan policies with real customer experiences to provide comprehensive guidance on student loan questions, repayment options, forgiveness programs, and servicer issues.
: : |
: : |
|---|
Get the entire RAG system running with a single command! All services (Vector DB + Backend API + Jupyter + Frontend) are now fully containerized with automated service management, health checks, and persistent volumes.
Written report can be found here
Loom video can be found here
- π Quick Start
- β¨ Core Features
- π Project Structure
- π API Usage
- π Development
- π Requirements
Get the entire RAG system running in 2 simple steps:
# Copy environment template and add your API keys
cp .env-example .env
# Edit .env file with your required API keys:
# OPENAI_API_KEY=your_key_here (Required)
# COHERE_API_KEY=your_key_here (Required)
# TAVILY_API_KEY=your_key_here (Required)
# LANGCHAIN_API_KEY=your_key_here (Optional - for tracing)# π Interactive Menu (Default) - Choose your startup mode
./start-services.sh
# The script presents 4 startup options:
# 1. π Full startup (recommended)
# β’ Stops existing containers
# β’ Cleans up: dangling images, build cache
# β’ Rebuilds and starts all services
# β’ Includes: Backend + Jupyter + Frontend + Qdrant
#
# 2. β‘ Quick restart (development)
# β’ Skips Docker cleanup (faster)
# β’ Rebuilds and starts all services
# β’ Best for active development
#
# 3. π¬ Backend + Jupyter only
# β’ Skips frontend service
# β’ Ideal for notebook experiments
# β’ Faster startup
#
# 4. π― Custom configuration
# β’ Choose individual options interactively
# β’ Skip cleanup? Skip frontend?
# β‘ Non-Interactive Mode (for automation/scripts)
./start-services.sh --mode=full # Full startup with cleanup
./start-services.sh --mode=quick # Quick restart (no cleanup)
./start-services.sh --mode=backend # Backend + Jupyter only
./start-services.sh --non-interactive # Non-interactive full startup
# π§ Additional flags
./start-services.sh --skip-cleanup # Custom: skip cleanup
./start-services.sh --no-frontend # Custom: skip frontend
./start-services.sh --help # Show all options
# Alternative: Manual Docker Compose
docker compose up --build -dπ Single Command Deployment! All services start automatically with:
- β Service Dependencies - Proper startup ordering
- β Health Checks - Automated service validation
- β Data Persistence - Volumes for cache and data
- β Network Isolation - Dedicated Docker network
- β Multi-stage Builds - Optimized container images
- β Automatic Cleanup - Docker image/cache management
Services Available:
- π Qdrant Vector Database: http://localhost:6333/dashboard
- π€ Backend RAG API: http://localhost:8000
- π Jupyter Lab: http://localhost:8888
- π API Documentation: http://localhost:8000/docs
- π¨ Frontend Dashboard: http://localhost:3000
# π Interactive Menu (Default) - Choose your stop method
./stop-services.sh
# The script presents 4 stop options:
# 1. π Standard stop (recommended for daily use)
# β’ Stops all containers
# β’ Cleans up: dangling images, build cache
# β’ Preserves: stopped containers, volumes, used images
#
# 2. βΈοΈ Quick pause (fastest restart)
# β’ Stops containers only
# β’ No cleanup performed
# β’ Next startup will be faster
#
# 3. π§ Deep cleanup (reclaim disk space)
# β’ Stops and removes containers
# β’ Cleans up: dangling images, build cache
# β’ Preserves: volumes (your data)
#
# 4. π£ Nuclear reset (β οΈ DATA LOSS WARNING)
# β’ Removes containers AND volumes
# β’ β οΈ DELETES: Vector DB, cache, notebooks
# β’ Use only when starting completely fresh
# β‘ Non-Interactive Mode (for automation/scripts)
./stop-services.sh --mode=standard # Standard stop with cleanup
./stop-services.sh --mode=quick # Quick pause (no cleanup)
./stop-services.sh --mode=deep # Deep cleanup
./stop-services.sh --mode=nuclear # Nuclear reset β οΈ
./stop-services.sh --non-interactive # Non-interactive standard
# π§ Legacy flags (backward compatible)
./stop-services.sh --skip-cleanup # Maps to --mode=quick
./stop-services.sh --remove # Maps to --mode=deep
./stop-services.sh --clean # Maps to --mode=nuclear
./stop-services.sh --help # Show all options
# π§ Alternative: Direct Docker Compose
docker compose down # Stop and remove containers (keep volumes)The system loads a hybrid dataset (749 documents β 2,172 chunks β vector embeddings):
- β±οΈ Startup Time: 60-90 seconds for full RAG agent initialization
- π Progress Monitoring: Watch logs via
docker compose logs -f backend - π― Ready Indicator: Backend health endpoint returns
"status": "healthy"
# View logs for all services
docker compose logs -f
# View logs for specific service
docker compose logs -f backend
docker compose logs -f jupyter
docker compose logs -f frontend
docker compose logs -f qdrant
# Check service health status
docker compose ps
# Restart specific service
docker compose restart backend
# π§Ή Service Management with Interactive Menus
./start-services.sh # Interactive startup menu (4 options)
./stop-services.sh # Interactive stop menu (4 options)
./start-services.sh --help # Show all startup options
./stop-services.sh --help # Show all shutdown options
# Non-interactive examples (for scripts/automation):
./start-services.sh --mode=quick # Quick restart for development
./stop-services.sh --mode=quick # Fast stop without cleanup
# Scale services (if needed)
docker compose up --scale backend=2 -dOnce all services are running, you can access:
- π¨ Frontend Dashboard: http://localhost:3000 - Interactive chat with persona-based interactions
- π Jupyter Notebooks: http://localhost:8888 - RAG experiments and analysis
- π API Documentation: http://localhost:8000/docs - REST API endpoints
- π Qdrant Dashboard: http://localhost:6333/dashboard - Vector database monitoring
The web interface includes advanced persona-based interactions:
- π₯ Multi-Persona Support: Student, Parent, Financial Counselor, Loan Servicer roles
- π Context-Aware Questions: Pre-built question templates per persona
- π¬ Session Management: Persistent chat sessions across role changes
- π Performance Transparency: Response times, token usage, and source relevance scores
- π¨ Professional UI: Clean design with role-specific styling and tooltips
- β‘ Real-time Responses: Live streaming of responses with cancel functionality
# Jupyter is already running at http://localhost:8888
# Open the notebooks directly in your browser# Install dependencies locally
uv sync
# Start Jupyter from project root
uv run jupyter labWhen inside Jupyter Labs, you can access the main evaluation notebook:
- Federal Loan Expert - Trained on official policies + real customer complaints
- Multi-Persona Interface - Role-based interactions (Student, Parent, Counselor, etc.)
- Context-Aware Responses - Understands user focus and provides targeted guidance
- Source Transparency - Shows relevance scores and document sources for all answers
- Hybrid Dataset - PDF policies + CSV complaints for comprehensive knowledge
- Multiple Retrieval Methods - Naive, Multi-Query, Parent-Document, Contextual Compression
- Agent Orchestration - LangGraph-based tool selection and workflow management
- Performance Evaluation - RAGAS metrics with comprehensive benchmarking
- Complete Docker Orchestration - Multi-service containerization with health checks
- One-Command Deployment - Automated service management and startup
- Auto-Scaling Architecture - Horizontal scaling with load balancing support
- Real-time Monitoring - Qdrant dashboard, health endpoints, and comprehensive logging
- Interactive Chat Interface - Clean, responsive web interface with session management
- Role-Based Personas - Tailored question templates and response styles
- Performance Metrics - Real-time response times, token usage, and source tracking
- RESTful API - Production-ready
/askendpoint with comprehensive metrics
βββ π Core Application
β βββ src/backend/ # FastAPI server with RAG endpoint
β βββ src/core/ # RAG retrieval implementations
β βββ src/agents/ # LangGraph agent orchestration
β βββ src/utils/ # Utilities and helper functions
βββ π¨ Frontend
β βββ frontend/ # Next.js chat interface
βββ π Data & Analysis
β βββ data/ # Federal loan PDFs + complaints CSV
β βββ notebooks/ # Jupyter research & evaluation
β βββ golden-masters/ # Generated test datasets
β βββ metrics/ # Performance evaluation results
βββ π³ Docker Infrastructure
β βββ docker-compose.yml # Multi-service orchestration
β βββ start-services.sh # Automated deployment script
β βββ stop-services.sh # Graceful shutdown script
β βββ setup.sh # Development setup utilities
βββ π Documentation
β βββ docs/ # Project documentation
β βββ README.md # Main project documentation
β βββ CLAUDE.md # Development guidelines
βββ βοΈ Configuration
βββ .env-example # Environment variables template
βββ pyproject.toml # Python dependencies (uv)
βββ uv.lock # Locked dependency versions
POST /ask - Ask any federal student loan question
{
"question": "What are income-driven repayment plans?",
"max_response_length": 2000
}Response includes:
- Generated answer with contextual sources and relevance scores
- Comprehensive performance metrics (response time, tokens used, retrieval method)
- Source document transparency with relevance scoring
- Tool usage tracking and agent decision logs
This project implements cutting-edge RAG techniques with comprehensive evaluation:
- Hybrid Dataset: Official policies + real customer scenarios (4,547 β 825 β 480 quality-filtered records)
- Multiple Retrievers: Naive (best performer), Multi-Query, Parent-Document, Contextual Compression
- Agent Framework: LangGraph with StateGraph orchestration and tool selection
- Evaluation Pipeline: RAGAS metrics with 6 core measurements (all higher=better scale)
- Retrieval Ranking: Comprehensive benchmarking across all methods
- Visualization Tools: Heatmap generation for metric pattern analysis
- Golden Master datasets: Cached evaluation datasets to avoid regeneration
- Performance Tracking: Response times, token usage, and retrieval quality metrics
- Agentic RAG Evaluation: Main notebook for agent-based experiments
- Retriever Comparison: Traditional retrieval method benchmarking
- Performance Visualization: Heatmap and metric analysis tools
- Backend Details:
src/backend/README.md - Frontend Setup:
frontend/README.md
ποΈ Multi-Service Architecture:
- Qdrant (Vector Database) - Persistent storage for embeddings
- Backend (FastAPI + RAG Agent) - Python API with LangGraph orchestration
- Jupyter (Analysis Environment) - Notebook server for experiments
- Frontend (Next.js) - React-based chat interface
π§ Advanced Features:
- Multi-stage Docker builds with uv for optimized Python dependencies
- Service health checks with automatic restart policies
- Persistent volumes for data, cache, and Jupyter notebooks
- Network isolation with dedicated Docker network
- Environment-based configuration for local/production deployment
π Deployment Options:
# π Interactive Deployment (Recommended)
./start-services.sh # Interactive menu with 4 startup modes
./stop-services.sh # Interactive menu with 4 stop methods
# β‘ Non-Interactive Deployment (for automation)
./start-services.sh --mode=full # Full startup with cleanup
./start-services.sh --mode=quick # Quick restart (no cleanup)
./start-services.sh --mode=backend # Backend + Jupyter only
./start-services.sh --mode=custom --skip-cleanup --no-frontend # Custom flags
./start-services.sh --help # See all options
# π Non-Interactive Shutdown
./stop-services.sh --mode=standard # Standard stop with cleanup
./stop-services.sh --mode=quick # Quick pause (no cleanup)
./stop-services.sh --mode=deep # Deep cleanup (remove containers)
./stop-services.sh --mode=nuclear # Nuclear reset β οΈ DELETES ALL DATA
./stop-services.sh --help # See all options
# Manual: Individual services
docker compose up qdrant backend jupyter frontend -d
# Development: Backend + Qdrant only
docker compose up qdrant backend -d- Docker (20.10+) - For containerized deployment
- Docker Compose (2.0+) - For multi-container orchestration
- Git (2.25+) - For cloning the repository
- Modern Browser - Chrome, Firefox, Safari, or Edge
- Memory: 4GB+ RAM (8GB+ recommended for optimal performance)
- Storage: 3GB+ free space (for images, data, and dependencies)
- Python 3.11+ (tested with 3.11, minimum 3.8)
- System Dependencies (for native installation):
gccandg++(build tools)curl(for health checks)build-essential(Linux/WSL)
- Node.js 18+ with npm
- Next.js 14.2+ (React framework)
- TypeScript 5.1+ (for type safety)
Create a .env file in the project root with:
# Required API Keys
OPENAI_API_KEY=your_openai_key_here # For LLM and embeddings
COHERE_API_KEY=your_cohere_key_here # For reranking functionality
# Optional API Keys
TAVILY_API_KEY=your_tavily_key_here # For external search (optional)
LANGCHAIN_API_KEY=your_langsmith_key_here # For tracing/monitoring (optional)- Git (2.25+) - Version control and repository cloning
- Docker (20.10+) - Container runtime and orchestration
- Docker Compose (2.0+) - Multi-container application management
- curl or wget - For API testing and health checks
The backend requires 50+ Python packages including:
- AI/ML:
langchain,langgraph,openai,cohere,ragas - Vector DB:
qdrant-client,langchain-qdrant - Document Processing:
pypdf2,pymupdf,unstructured - Web Framework:
fastapi,uvicorn,pydantic - Data Science:
numpy,pandas,matplotlib,seaborn - Search:
tavily-python,rank-bm25 - Utilities:
python-dotenv,joblib,tqdm
- React 18.2+ with Next.js framework
- UI Components:
lucide-react(icons) - Styling:
tailwindcss,autoprefixer,postcss - Development: TypeScript, ESLint, development server
- Memory: 4GB+ RAM (8GB+ recommended for better performance)
- Storage: 2GB+ free space for dependencies and data
- CPU: Multi-core processor recommended for faster RAG processing
- Network: Stable internet connection for API calls
| Service | Port | URL | Purpose |
|---|---|---|---|
| Qdrant | 6333 | http://localhost:6333/dashboard | Vector database dashboard |
| Backend API | 8000 | http://localhost:8000 | RAG API endpoints |
| API Documentation | 8000 | http://localhost:8000/docs | OpenAPI/Swagger docs |
| Jupyter Lab | 8888 | http://localhost:8888 | Notebook environment |
| Frontend | 3000 | http://localhost:3000 | Web interface |
π Network Configuration:
- All services run on isolated Docker network
student-loan-network - Only necessary ports exposed to host machine
- Internal service communication via Docker DNS
π Hybrid Dataset Pipeline:
π PDF Documents (4 files, ~4MB)
β DirectoryLoader + PyMuPDFLoader
269 PDF pages
β RecursiveCharacterTextSplitter (750 chars)
615 PDF chunks
π CSV Complaints (~12MB)
β CSVLoader + Quality Filtering
4,547 raw β 825 loaded β 480 filtered (58% retention)
β RecursiveCharacterTextSplitter (750 chars)
1,557 CSV chunks
β Combined Hybrid Dataset
2,172 total chunks β OpenAI Embeddings β Qdrant Vector Store
π― Quality Filtering (Complaints):
- β Narratives < 100 characters (34 removed)
- β Excessive redaction (>5 XXXX tokens) (311 removed)
- β Empty/None/N/A content (0 removed)
- β Final retention: 58.2% (480/825 loaded records)
πΎ Storage & Memory:
- Vector embeddings: ~39.2MB in Qdrant (in-memory)
- Docker volumes: Persistent cache, data, and notebooks
- Total footprint: ~2GB including all container images
β‘ Response Times (Typical):
- RAG Query Processing: 3-8 seconds
- Backend Initialization: 60-90 seconds
- Container Startup: 10-30 seconds per service
π Throughput Capabilities:
- Concurrent Users: 10-50 (single backend instance)
- Vector Search: Sub-second retrieval from 2,172 chunks
- Memory Usage: ~500MB per backend container
π§ Scaling & Maintenance:
# Scale backend for higher throughput
docker compose up --scale backend=3 -d
# Production: External Qdrant cluster
export QDRANT_URL=http://your-qdrant-cluster:6333
./start-services.sh
# π§Ή Docker disk space management with interactive menus
./start-services.sh # Interactive menu: choose Full (with cleanup) or Quick (no cleanup)
./stop-services.sh # Interactive menu: 4 cleanup levels
# Non-interactive cleanup options
./start-services.sh --mode=full # Cleans before starting
./start-services.sh --mode=quick # Skip cleanup (faster)
./stop-services.sh --mode=standard # Cleans dangling images/cache
./stop-services.sh --mode=quick # No cleanup (fastest)
./stop-services.sh --mode=deep # Aggressive cleanup
./stop-services.sh --mode=nuclear # Full cleanup β οΈ DELETES ALL DATA
# Manual cleanup if needed
docker system prune -f # Remove unused images/containers
docker builder prune -f # Clear build cache
# π Monitor disk usage
docker system df # Show Docker disk usage
docker images | grep student-loan # Show project imagesReady to help students navigate federal loan complexities with AI-powered guidance! π

