This project allows users to run Retrieval Augmentation Generation (RAG) using locally hosted models.
RAG effectively combines retrieval and generation techniques, enabling the chatbot to deliver accurate and context-aware responses. Leveraging semantic search and keyword search, followed by a reranker to get the top most relevant documents as context for QA.
-
Use docker to run the ollama service
You could use Ollama to host AI model locally through https://ollama.com/download
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
-
Use docker to run the qdrant vector database service
docker run -d \ --name qdrant \ -p 6333:6333 \ -v qdrant_data:/qdrant/storage \ qdrant/qdrant
-
Using conda
To install the required dependencies, run:
conda create -n rag python==3.9.21 -y conda activate rag pip install -r requirements.txt
Then start the software with
streamlit run src\app.py -
Using uv
To install the required dependencies, run:
uv sync
Then start the software with
uv run streamlit run src\app.py
Set the following configuration parameters:
# Model
OLLAMA_URL = "http://localhost:11434"
MODEL_NAME="qwen3_30B_A3B_ctx_8K" # LLM model for chat
OLLAMA_EMBED_URL = "http://localhost:11435"
EMBED_MODEL_NAME="nomic-embed-text" # Embedding model for documents
# Qdrant vector database
QDRANT_URL = "http://localhost:6333"
COLLECTION_NAME = "test"
# DocumentLoader
CHUNK_SIZE = 800
CHUNK_OVERLAP = 50
# Semantic Retriever
TOP_K = 12
SEMANTIC_SCORE = 0.6
# Ensemble Retriever
SEMANTIC_WEIGHT = 0.6
KEYWORD_WEIGHT = 0.4
# Reranker
TOP_N = 6
RERANKER_SCORE = 0.7-
Upload Documents:
- Navigate to the "Upload Documents" page.
- Select and upload multiple files (PDF, TXT, MD, HTML).
- Choose different chunk size and overlap size.
- Split the documents into chunks and display them before uploading to the database.
-
View and Delete Documents:
- Navigate to the "Knowledge Base" page.
- View all the chunks available in the vector database.
- Delete documents as needed.
-
Chat with the Content:
- Navigate to the "Chat" page.
- Enter your query in the chat input.
- The chatbot will respond based retrieved documents.