Skip to content

🧠 Document-aware chatbot using RAG, FAISS, and LLMs. Upload, ask, and get grounded answers.

Notifications You must be signed in to change notification settings

EmmanuelM-A/rag_chatbot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

98 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RAG Chatbot πŸ§ πŸ’¬

A document-based chatbot using Retrieval-Augmented Generation (RAG) built with Python, LangChain, FAISS, and OpenAI. Ask questions about your uploaded documents and get precise answers with source citations.

✨ Features

  • Document Processing: Upload and index .txt, .pdf, .docx, .md files
  • Smart Retrieval: Uses FAISS for fast vector similarity search
  • Embedding Cache: Optimizes performance by caching embeddings
  • Web Search Fallback: Optional web search when documents don't contain answers
  • Source Citations: Shows which documents were used to generate answers
  • Modular Architecture: Easy to extend or integrate into APIs

πŸ“ Project Structure

rag_chatbot/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”œβ”€β”€ chatbot/          # Query handling and response generation
β”‚   β”‚   β”œβ”€β”€ config/           # Settings and logging configuration
β”‚   β”‚   β”œβ”€β”€ ingestion/        # Document loading and processing
β”‚   β”‚   β”œβ”€β”€ retrieval/        # Vector storage, embeddings, web search
β”‚   β”‚   └── prompts/          # LLM prompt templates
β”‚   β”œβ”€β”€ utils/                # Helper functions and exceptions
β”‚   └── main.py              # Application entry point
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ raw_docs/            # Place your documents here
β”‚   β”œβ”€β”€ db/                  # Vector index and metadata storage
β”‚   └── prompts/             # YAML prompt files
└── logs/                    # Application logs (if file logging enabled)

βš™οΈ Installation

  1. Clone the repository
git clone https://github.com/yourusername/rag_chatbot.git
cd rag_chatbot
  1. Install dependencies
pip install -r requirements.txt
  1. Setup environment variables
cp .env.example .env
# Edit .env and add your OpenAI API key
  1. Add your documents
# Place your documents in data/raw_docs/
# Supported formats: .pdf, .docx, .txt, .md

πŸš€ Usage

Start the chatbot:

python src/main.py

The application will:

  • Process documents from data/raw_docs/
  • Create vector embeddings (cached for performance)
  • Start an interactive chat session

Example interaction:

πŸ” Ask me: What are the main topics covered in the documents?
πŸ“ Response: Based on the documents, the main topics include...
πŸ“š Sources (documents):
  1. ../data/raw_docs/document1.pdf
  2. ../data/raw_docs/document2.txt

βš™οΈ Configuration

Key settings in your .env file:

Variable Description Required
OPENAI_API_KEY OpenAI API key for embeddings and LLM βœ…
SEARCH_API_KEY Google Custom Search API key ❌
SEARCH_ENGINE_ID Google Custom Search Engine ID ❌
LOG_LEVEL Logging level (DEBUG, INFO, WARNING, ERROR) ❌
IS_FILE_LOGGING_ENABLED Enable file-based logging ❌

πŸ”§ Advanced Features

  • Embedding Cache: Automatically caches embeddings to avoid recomputation
  • Web Search: Falls back to web search when documents don't contain answers
  • Configurable Chunking: Adjust chunk size and overlap for optimal retrieval
  • Source Attribution: Always shows which documents were used for answers

πŸ“‹ Commands

  • Type quit, exit, or bye to exit the chatbot
  • Ctrl+C for immediate shutdown

πŸ› οΈ Development

The codebase uses a modular architecture:

  • Document Processing: Handles loading and chunking of various file formats
  • Vector Storage: FAISS-based similarity search with metadata
  • Embedding Cache: Redis-like caching for embeddings
  • Query Processing: LLM-based response generation with retrieval

πŸ“ Logging

Logs are written to:

  • Console (colored output for development)
  • Files in logs/ directory (when IS_FILE_LOGGING_ENABLED=True)

🚨 Troubleshooting

No documents processed?

  • Check that documents are in data/raw_docs/
  • Verify file formats are supported (.pdf, .docx, .txt, .md)

API errors?

  • Ensure OPENAI_API_KEY is set in .env
  • Check API quota and billing

Web search not working?

  • Set SEARCH_API_KEY and SEARCH_ENGINE_ID for Google Custom Search
  • Or leave unset to use fallback search methods