FastAPI RAG Gateway

A production-ready Retrieval-Augmented Generation (RAG) API built with FastAPI, LangChain, and ChromaDB.

✅ Memory Optimized: Uses OpenAI API embeddings for minimal memory footprint (~200MB). Fully compatible with Render's 512MB free tier!

Features

FastAPI-based REST API
Multi-document retrieval using ChromaDB
LLM integration via OpenRouter (DeepSeek)
Async query processing
Comprehensive testing with DeepEval
Memory-efficient: OpenAI embeddings instead of local models (fits in 512MB RAM)

Local Setup

Clone the repository

git clone https://github.com/DevaanshKathuria/FastAPI_RAG_Gateway.git
cd FastAPI_RAG_Gateway

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -r requirements.txt

Set up environment variables Create a .env file:

OPENAI_API_KEY=your_openai_api_key
OPENROUTER_API_KEY=your_openrouter_api_key
RAG_LLM_MODEL=deepseek/deepseek-chat
RAG_LLM_BASE_URL=https://openrouter.ai/api/v1
UVICORN_PORT=8000

Index documents

python indexing.py

Run the server

python main.py

The API will be available at http://localhost:8000

API Usage

Query Endpoint

POST /query

curl -X POST "http://localhost:8000/query" \
  -H "Content-Type: application/json" \
  -d '{"question": "What is machine learning?"}'

Response:

{
  "answer": "Machine learning is...",
  "context": ["Retrieved document chunk 1", "Retrieved document chunk 2"]
}

Deployment

See DEPLOYMENT.md for detailed deployment guides and MEMORY_OPTIMIZATION.md for memory optimization details.

Deploy to Render (Free Tier - 512MB RAM)

Push your code to GitHub
Go to Render Dashboard
Click "New +" → "Web Service"
Connect your GitHub repository
Render will auto-detect the render.yaml configuration
Add your environment variables:
- OPENAI_API_KEY (required for embeddings)
- OPENROUTER_API_KEY (required for LLM)
Click "Create Web Service"
Wait 5-10 minutes for deployment ✅

Deploy to Railway

Install Railway CLI: npm i -g @railway/cli
Login: railway login
Initialize: railway init
Add environment variables: railway variables set OPENAI_API_KEY=your_key
Deploy: railway up

Deploy to Fly.io

See fly.toml for configuration. Run:

fly launch
fly secrets set OPENAI_API_KEY=your_key
fly deploy

Project Structure

├── main.py              # FastAPI application
├── rag_core.py          # RAG logic and LLM integration
├── indexing.py          # Document loading and vector store
├── requirements.txt     # Python dependencies
├── data/                # Source documents
├── eval/                # Testing suite
└── chroma_store/        # Vector database (gitignored)

Testing

Run tests with pytest:

pytest eval/test_rag.py -v

Environment Variables

Variable	Description	Required
`OPENAI_API_KEY`	OpenAI API key for embeddings	Yes
`OPENROUTER_API_KEY`	OpenRouter API key for LLM	Yes
`RAG_LLM_MODEL`	Model name (e.g., deepseek/deepseek-chat)	Yes
`RAG_LLM_BASE_URL`	OpenRouter base URL	Yes
`UVICORN_PORT`	Server port (default: 8000)	No

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastAPI RAG Gateway

Features

Local Setup

API Usage

Query Endpoint

Deployment

Deploy to Render (Free Tier - 512MB RAM)

Deploy to Railway

Deploy to Fly.io

Project Structure

Testing

Environment Variables

License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
eval		eval
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CHANGES_SUMMARY.md		CHANGES_SUMMARY.md
DEPLOYMENT.md		DEPLOYMENT.md
Dockerfile		Dockerfile
MEMORY_OPTIMIZATION.md		MEMORY_OPTIMIZATION.md
README.md		README.md
fly.toml		fly.toml
indexing.py		indexing.py
main.py		main.py
rag_core.py		rag_core.py
render.yaml		render.yaml
requirements.txt		requirements.txt

DevaanshKathuria/FastAPI_RAG_Gateway

Folders and files

Latest commit

History

Repository files navigation

FastAPI RAG Gateway

Features

Local Setup

API Usage

Query Endpoint

Deployment

Deploy to Render (Free Tier - 512MB RAM)

Deploy to Railway

Deploy to Fly.io

Project Structure

Testing

Environment Variables

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages