Skip to content

SenayYakut/sql-query-buddy

Repository files navigation

πŸ€– SQL Query Buddy

AI-Powered Conversational Database Assistant with Hybrid Memory System

Python React FastAPI Redis LangChain

Transform natural language questions into SQL queries with AI-powered insights, conversation memory, and intelligent optimization suggestions.


🌟 Features

Core Capabilities

  • 🧠 RAG-Powered SQL Generation - Semantic schema retrieval using ChromaDB vector database
  • πŸ’¬ Hybrid Memory System - Redis for short-term + Mem0/Qdrant for long-term semantic memory
  • πŸ”„ Conversation Context - Understands follow-up questions like "filter them to California"
  • πŸ“Š AI-Driven Insights - GPT-4 powered data analysis and business recommendations
  • ⚑ Query Optimization - Automatic performance suggestions and indexing recommendations
  • πŸ“– Beginner-Friendly Explanations - Plain English SQL explanations
  • 🎨 Modern UI - Beautiful React interface with real-time results

Technical Highlights

  • Two-Tier Memory Architecture: Redis (fast, recent) + Mem0/Qdrant (semantic, permanent)
  • RAG Implementation: Retrieves only relevant table schemas using semantic search
  • Multi-Step AI Pipeline: SQL generation β†’ Execution β†’ Analysis β†’ Insights
  • Production-Ready: Error handling, logging, session management, TTL caching

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Frontend (React)               β”‚
β”‚           Natural Language Interface             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                  β”‚
                  β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              FastAPI Backend                     β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   1. Memory Retrieval (Redis + Mem0)     β”‚  β”‚
β”‚  β”‚      β€’ Short-term: Recent conversation   β”‚  β”‚
β”‚  β”‚      β€’ Long-term: Semantic search        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                 β–Ό                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   2. Schema Retrieval (RAG/ChromaDB)     β”‚  β”‚
β”‚  β”‚      β€’ Semantic search for tables        β”‚  β”‚
β”‚  β”‚      β€’ Top-k relevant schemas            β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                 β–Ό                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   3. SQL Generation (GPT-4 + LangChain)  β”‚  β”‚
β”‚  β”‚      β€’ Context-aware query creation      β”‚  β”‚
β”‚  β”‚      β€’ SQLite syntax optimization        β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                 β–Ό                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   4. Query Execution (SQLite)            β”‚  β”‚
β”‚  β”‚      β€’ Parameterized queries             β”‚  β”‚
β”‚  β”‚      β€’ Performance timing                β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                 β–Ό                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   5. AI Analysis (GPT-4)                 β”‚  β”‚
β”‚  β”‚      β€’ Explanation generation            β”‚  β”‚
β”‚  β”‚      β€’ Optimization suggestions          β”‚  β”‚
β”‚  β”‚      β€’ Business insights                 β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                 β–Ό                                β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   6. Memory Storage (Redis + Mem0)       β”‚  β”‚
β”‚  β”‚      β€’ Store for future reference        β”‚  β”‚
β”‚  β”‚      β€’ TTL management                    β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ› οΈ Technology Stack

Backend

  • FastAPI - Modern Python web framework
  • LangChain - LLM orchestration and RAG implementation
  • OpenAI GPT-4 - SQL generation and analysis
  • ChromaDB - Vector database for schema embeddings
  • Redis - Fast in-memory short-term conversation cache
  • Mem0 - Intelligent long-term memory with semantic search
  • Qdrant - Vector database backend for Mem0
  • SQLite - Sample retail database

Frontend

  • React 18 - Modern UI library
  • TypeScript - Type-safe JavaScript
  • CSS-in-JS - Styled components

AI/ML

  • text-embedding-3-large - Schema embeddings (OpenAI)
  • text-embedding-3-small - Memory embeddings (OpenAI)
  • gpt-4 - SQL generation and analysis
  • gpt-4o-mini - Memory extraction

πŸ“‹ Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Redis (via Homebrew or Docker)
  • OpenAI API Key

πŸš€ Installation & Setup

1. Clone Repository

git clone https://github.com/yourusername/sql-query-buddy.git
cd sql-query-buddy

2. Backend Setup

# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Create .env file
cat > .env << EOF
OPENAI_API_KEY=your_openai_api_key_here
EOF

3. Initialize Vector Databases

# Embed database schemas into ChromaDB
python backend/rag/embed_schema.py

Expected output:

πŸ“₯ Loading schema.sql...
πŸ”ͺ Splitting schema into table chunks...
πŸ“„ Found 4 tables to embed.
βœ… Embedded table: customers
βœ… Embedded table: products
βœ… Embedded table: orders
βœ… Embedded table: order_items
πŸŽ‰ All schema embeddings stored successfully!

4. Start Redis

# Using Homebrew (Mac)
brew install redis
redis-server

# Using Docker
docker run -d -p 6379:6379 redis:latest

5. Start Backend

uvicorn backend.main:app --reload

Backend runs on: http://localhost:8000

6. Frontend Setup

# Open new terminal
cd frontend

# Install dependencies
npm install

# Start development server
npm start

Frontend runs on: http://localhost:3000


🎯 Usage Examples

Basic Query

User: "Show me the top 5 customers by total purchase amount"

Response:
βœ… SQL: SELECT c.name, SUM(o.total_amount) as total...
βœ… Results: 5 rows
βœ… Explanation: "This query finds the top 5 customers..."
βœ… Insights: "Alice Chen is your top customer with $5,000..."
βœ… Optimization: "Consider adding an index on orders.customer_id..."

Follow-up with Context (Memory in Action!)

User: "Now filter them to California only"

Response:
βœ… SQL: WITH top_customers AS (SELECT...) WHERE region='California'
βœ… Understands "them" refers to previous top 5 customers
βœ… Uses Redis short-term + Mem0 long-term memory

Complex Multi-Table Query

User: "Which product category made the most revenue last month?"

Response:
βœ… Automatically retrieves: products, orders, order_items schemas
βœ… Generates proper JOIN query
βœ… Provides revenue breakdown and trends

πŸ”Œ API Endpoints

Query Endpoint

POST /rag/query
Content-Type: application/json

{
  "question": "Show me the top 5 customers",
  "session_id": "default",
  "user_id": "anonymous"
}

Response:

{
  "sql": "SELECT ...",
  "results": [...],
  "insights": "Key findings...",
  "explanation": "This query...",
  "optimization": "Performance tips...",
  "execution_time_ms": 15.42,
  "memory_context": {
    "short_term": "Recent conversation...",
    "long_term": "Relevant past context...",
    "combined": "Full context..."
  }
}

Memory Stats

GET /rag/memory/stats?session_id=default&user_id=anonymous

Response:

{
  "redis": {
    "recent_exchanges": 3,
    "expires_in_seconds": 3421
  },
  "mem0": {
    "total_memories": 5
  },
  "total": 8
}

Clear Session Memory

DELETE /rag/memory/redis/{session_id}

Clear User Long-term Memory

DELETE /rag/memory/mem0/{user_id}

View All Memories

GET /rag/memory/all/{user_id}

πŸ“Š Database Schema

The project includes a sample retail database with:

Tables

  • customers - Customer information (id, name, email, region)
  • products - Product catalog (id, name, category, price, stock)
  • orders - Order records (id, customer_id, order_date, total_amount)
  • order_items - Order line items (id, order_id, product_id, quantity, price)

Sample Data

  • 10+ customers across different regions
  • 15+ products in various categories
  • 20+ orders with multiple line items

πŸ§ͺ Testing with Postman

Import Collection

  1. Open Postman
  2. Click Import β†’ Raw Text
  3. Paste this collection:
{
  "info": {
    "name": "SQL Query Buddy API",
    "schema": "https://schema.getpostman.com/json/collection/v2.1.0/collection.json"
  },
  "item": [
    {
      "name": "Query - Basic",
      "request": {
        "method": "POST",
        "header": [{"key": "Content-Type", "value": "application/json"}],
        "body": {
          "mode": "raw",
          "raw": "{\n  \"question\": \"Show me the top 5 customers by total purchase amount\",\n  \"session_id\": \"test-session\",\n  \"user_id\": \"test-user\"\n}"
        },
        "url": "http://localhost:8000/rag/query"
      }
    },
    {
      "name": "Query - Follow-up",
      "request": {
        "method": "POST",
        "header": [{"key": "Content-Type", "value": "application/json"}],
        "body": {
          "mode": "raw",
          "raw": "{\n  \"question\": \"Now filter them to California only\",\n  \"session_id\": \"test-session\",\n  \"user_id\": \"test-user\"\n}"
        },
        "url": "http://localhost:8000/rag/query"
      }
    },
    {
      "name": "Memory Stats",
      "request": {
        "method": "GET",
        "url": "http://localhost:8000/rag/memory/stats?session_id=test-session&user_id=test-user"
      }
    },
    {
      "name": "Get All Memories",
      "request": {
        "method": "GET",
        "url": "http://localhost:8000/rag/memory/all/test-user"
      }
    },
    {
      "name": "Clear Redis Memory",
      "request": {
        "method": "DELETE",
        "url": "http://localhost:8000/rag/memory/redis/test-session"
      }
    }
  ]
}

Test Sequence

  1. Test Basic Query - Run "Query - Basic"
  2. Check Memory - Run "Memory Stats" (should show 1 exchange)
  3. Test Context Memory - Run "Query - Follow-up"
  4. Verify Memory - Run "Memory Stats" (should show 2 exchanges)
  5. View Memories - Run "Get All Memories"
  6. Clean Up - Run "Clear Redis Memory"

🎨 Screenshots

Main Interface

Main Interface

Query Results with AI Insights

Query Results

Conversation Memory in Action

Memory Demo


🧠 How It Works

1. Memory System (Redis + Mem0)

Short-term (Redis):

  • Stores last 10 conversation exchanges per session
  • TTL: 1 hour
  • Use case: Immediate follow-up questions

Long-term (Mem0/Qdrant):

  • Semantic memory with automatic extraction
  • Persistent storage with vector embeddings
  • Use case: Historical patterns, user preferences

2. RAG Implementation

# Semantic search for relevant schemas
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Only retrieves top 3 most relevant tables
# Reduces token usage and improves accuracy

3. SQL Generation Pipeline

User Question 
  β†’ Memory Retrieval (Redis + Mem0)
  β†’ Schema Retrieval (ChromaDB RAG)
  β†’ LLM Prompt Construction
  β†’ GPT-4 SQL Generation
  β†’ Query Execution
  β†’ Multi-step Analysis
  β†’ Memory Storage

πŸ”’ Security Considerations

  • API Keys: Never commit .env files
  • SQL Injection: Uses parameterized queries
  • Input Validation: Pydantic models validate all inputs
  • Rate Limiting: Redis-based rate limiting ready
  • Error Handling: Comprehensive exception handling

🚧 Future Enhancements

  • Support for multiple database types (PostgreSQL, MySQL)
  • Query history visualization
  • Export results to CSV/Excel
  • Collaborative query sharing
  • Advanced analytics dashboard
  • Natural language to database schema generation
  • Multi-tenant support
  • API authentication (OAuth2/JWT)

πŸ“ Project Structure

sql-query-buddy/
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ db/
β”‚   β”‚   β”œβ”€β”€ retail.db          # SQLite database
β”‚   β”‚   └── schema.sql         # Database schema
β”‚   β”œβ”€β”€ rag/
β”‚   β”‚   β”œβ”€β”€ router.py          # Main API endpoints
β”‚   β”‚   β”œβ”€β”€ redis_mem0_memory.py  # Hybrid memory manager
β”‚   β”‚   β”œβ”€β”€ embed_schema.py    # Schema embedding script
β”‚   β”‚   └── vectorstore/       # ChromaDB storage
β”‚   └── main.py               # FastAPI app
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ RagQuery.tsx      # Main UI component
β”‚   β”‚   └── App.tsx           # App entry point
β”‚   └── package.json
β”œβ”€β”€ qdrant_storage/           # Mem0 vector database
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ .env.example
└── README.md

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit changes (git commit -m 'Add amazing feature')
  4. Push to branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • OpenAI - GPT-4 and embedding models
  • LangChain - RAG orchestration framework
  • Redis - High-performance in-memory database
  • Mem0 - Intelligent memory management
  • ChromaDB - Vector database for embeddings
  • FastAPI - Modern Python web framework

πŸ“ž Contact

Your Name


⭐ Star History

If you find this project helpful, please give it a star! ⭐


Built with ❀️ using AI, RAG, and Modern Web Technologies

About

AI-powered SQL assistant with RAG, Redis memory, and intelligent insights

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published