Local RAG

A local Retrieval-Augmented Generation (RAG) application built with Kotlin and Ktor that allows you to upload PDF documents and query them using local LLMs via Ollama. Document embeddings are stored in PostgreSQL with pgVector for fast similarity search.

Features

PDF document ingestion and processing
Vector similarity search using PostgreSQL pgVector
Local LLM inference with Ollama
Fast embedding generation with mxbai-embed-large
Query expansion for better retrieval
Fully local deployment - no external API calls

Prerequisites

Java 17 or higher
Docker and Docker Compose
Gradle (or use included wrapper)

Quick Start

Option 1: Run with Local Ollama (Recommended)

This option runs Ollama in Docker alongside the database.

# 1. Start all services including Ollama
docker compose --profile ollama up -d

# 2. Wait for Ollama to download models (first time only)
docker logs -f local-rag-ollama-setup

# 3. Build and run the application
./gradlew run

Option 2: Run with External Ollama

If you have Ollama running elsewhere (e.g., locally installed or remote server):

# 1. Start only the database
docker compose up -d

# 2. Update src/main/resources/application.yaml
# Change ollamaBaseUrl to your Ollama instance URL

# 3. Build and run the application
./gradlew run

The application will be available at http://localhost:8080

API Endpoints

1. Upload and Embed PDF Document

Upload a PDF file to be processed and stored in the vector database.

Endpoint: POST /embed

Example using curl:

curl -X POST http://localhost:8080/embed \
  -F "file=@/path/to/your/document.pdf"

Success Response (200 OK):

{
  "message": "File embedded successfully"
}

Error Response (400 Bad Request):

{
  "error": "No valid PDF file provided"
}

What happens behind the scenes:

PDF text is extracted using Apache PDFBox
Text is split into chunks (7500 characters with 100 characters overlap)
Each chunk is embedded using Ollama's mxbai-embed-large model (1024 dimensions)
Document chunks and embeddings are stored in PostgreSQL with metadata
IVFFlat index enables fast similarity search

2. Query the Knowledge Base

Ask questions about your uploaded documents.

Endpoint: POST /query

Example using curl:

curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What are the main findings in the document?"
  }'

Success Response (200 OK):

{
  "message": "Based on the document, the main findings include... [AI-generated answer based on retrieved context]"
}

Error Response (400 Bad Request):

{
  "error": "Query cannot be empty"
}

If you receive a timeout error, retry the request. It takes time to load the model into memory for the first time.

What happens behind the scenes:

Query is expanded into 5 variations using the LLM for better retrieval
Each variation is embedded using mxbai-embed-large
Top 3 similar document chunks are retrieved for each variation using cosine similarity
Retrieved chunks are deduplicated and combined into context
Context is sent to Ollama's qwen3 LLM to generate an answer
Answer is returned to the client

API Documentation

Interactive API documentation is available via Swagger UI:

http://localhost:8080/swagger

OpenAPI specification:

http://localhost:8080/openapi

Configuration

Edit src/main/resources/application.yaml to customize.

Docker Services

PostgreSQL with pgVector

# Start only the database
docker compose up db -d

### Ollama (Optional)

```bash
# Start Ollama with the ollama profile
docker compose --profile ollama up ollama -d

# View available models
docker exec local-rag-ollama ollama list

# Pull additional models
docker exec local-rag-ollama ollama pull llama3.2

# Test Ollama directly
curl http://localhost:11434/api/tags

Building and Testing

# Build the project
./gradlew build

# Run tests
./gradlew test

# Run the application
./gradlew run

# Build executable JAR
./gradlew shadowJar

Troubleshooting

Application can't connect to Ollama

Error: Connection refused to http://localhost:11434

Solution:

If using Docker Ollama: Ensure the service is running with docker ps | grep ollama
If using external Ollama: Update ollamaBaseUrl in application.yaml
Check Ollama is responding: curl http://localhost:11434/api/tags

Database connection error

Error: Connection to localhost:5432 refused

Solution:

# Start PostgreSQL
docker compose up db -d

# Check it's running
docker ps | grep postgres

# Check logs
docker logs postgres-pgvector

Request timeout on /query

Solution: The application is configured with 5-minute timeouts. If you still experience timeouts:

First query takes longer as the model loads into memory
Consider using a smaller/faster model (e.g., llama3.2:1b instead of qwen3)
Check Ollama logs: docker logs local-rag-ollama

Empty responses from /query

Solution:

Ensure you've uploaded at least one PDF document via /embed first
Check that embeddings were generated successfully in the logs

Verify documents are in the database:

docker exec -it postgres-pgvector psql -U dev_user -d embedding_db \
  -c "SELECT COUNT(*) FROM documents;"

Performance Considerations

Vector Index: IVFFlat index trades some accuracy for speed (~100 lists)
Connection Pool: 2-10 concurrent database connections
Chunk Size: 7500 characters balances context vs. precision
Timeout: 5-minute timeout accommodates slower LLM inference
Embedding Model: mxbai-embed-large provides good accuracy at 1024 dimensions

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
gradle/wrapper		gradle/wrapper
src/main		src/main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
docker-compose.yaml		docker-compose.yaml
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local RAG

Features

Prerequisites

Quick Start

Option 1: Run with Local Ollama (Recommended)

Option 2: Run with External Ollama

API Endpoints

1. Upload and Embed PDF Document

2. Query the Knowledge Base

API Documentation

Configuration

Docker Services

PostgreSQL with pgVector

Building and Testing

Troubleshooting

Application can't connect to Ollama

Database connection error

Request timeout on /query

Empty responses from /query

Performance Considerations

License

About

Uh oh!

Releases

Packages

Languages

License

mohaqeq/local-rag

Folders and files

Latest commit

History

Repository files navigation

Local RAG

Features

Prerequisites

Quick Start

Option 1: Run with Local Ollama (Recommended)

Option 2: Run with External Ollama

API Endpoints

1. Upload and Embed PDF Document

2. Query the Knowledge Base

API Documentation

Configuration

Docker Services

PostgreSQL with pgVector

Building and Testing

Troubleshooting

Application can't connect to Ollama

Database connection error

Request timeout on /query

Empty responses from /query

Performance Considerations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages