A production-ready Retrieval-Augmented Generation (RAG) API built with FastAPI, LangChain, and ChromaDB.
✅ Memory Optimized: Uses OpenAI API embeddings for minimal memory footprint (~200MB). Fully compatible with Render's 512MB free tier!
- FastAPI-based REST API
- Multi-document retrieval using ChromaDB
- LLM integration via OpenRouter (DeepSeek)
- Async query processing
- Comprehensive testing with DeepEval
- Memory-efficient: OpenAI embeddings instead of local models (fits in 512MB RAM)
- Clone the repository
git clone https://github.com/DevaanshKathuria/FastAPI_RAG_Gateway.git
cd FastAPI_RAG_Gateway- Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Set up environment variables
Create a
.envfile:
OPENAI_API_KEY=your_openai_api_key
OPENROUTER_API_KEY=your_openrouter_api_key
RAG_LLM_MODEL=deepseek/deepseek-chat
RAG_LLM_BASE_URL=https://openrouter.ai/api/v1
UVICORN_PORT=8000- Index documents
python indexing.py- Run the server
python main.pyThe API will be available at http://localhost:8000
POST /query
curl -X POST "http://localhost:8000/query" \
-H "Content-Type: application/json" \
-d '{"question": "What is machine learning?"}'Response:
{
"answer": "Machine learning is...",
"context": ["Retrieved document chunk 1", "Retrieved document chunk 2"]
}See DEPLOYMENT.md for detailed deployment guides and MEMORY_OPTIMIZATION.md for memory optimization details.
- Push your code to GitHub
- Go to Render Dashboard
- Click "New +" → "Web Service"
- Connect your GitHub repository
- Render will auto-detect the
render.yamlconfiguration - Add your environment variables:
OPENAI_API_KEY(required for embeddings)OPENROUTER_API_KEY(required for LLM)
- Click "Create Web Service"
- Wait 5-10 minutes for deployment ✅
- Install Railway CLI:
npm i -g @railway/cli - Login:
railway login - Initialize:
railway init - Add environment variables:
railway variables set OPENAI_API_KEY=your_key - Deploy:
railway up
See fly.toml for configuration. Run:
fly launch
fly secrets set OPENAI_API_KEY=your_key
fly deploy├── main.py # FastAPI application
├── rag_core.py # RAG logic and LLM integration
├── indexing.py # Document loading and vector store
├── requirements.txt # Python dependencies
├── data/ # Source documents
├── eval/ # Testing suite
└── chroma_store/ # Vector database (gitignored)
Run tests with pytest:
pytest eval/test_rag.py -v| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key for embeddings | Yes |
OPENROUTER_API_KEY |
OpenRouter API key for LLM | Yes |
RAG_LLM_MODEL |
Model name (e.g., deepseek/deepseek-chat) | Yes |
RAG_LLM_BASE_URL |
OpenRouter base URL | Yes |
UVICORN_PORT |
Server port (default: 8000) | No |
MIT