This project demonstrates how to build a Retrieval-Augmented Generation (RAG) pipeline for question answering on a collection of cardiology research articles in PDF format. Full Linkedin article at Building a Retrieval-Augmented Generation (RAG) System for Cardiology Research: A Step-by-Step Guide
The system uses LangChain, ChromaDB, and a local Hugging Face model (Qwen-7B-Instruct by default) to retrieve relevant text chunks and generate grounded answers.
- Parse multiple PDF documents (e.g., cardiology articles)
- Split text into chunks for efficient retrieval
- Store embeddings in a persistent ChromaDB vector database
- Run a local LLM (no API key required)
- Answer questions with sources included for transparency
cardiology_rag/
βββ articles/ # Folder with cardiology PDFs
β βββ atrial_fibrillation.pdf
β βββ heart_failure.pdf
β βββ coronary_disease.pdf
βββ chroma_db/ # Vector DB (auto-created)
βββ main.py # Main RAG pipeline script
Create a clean environment and install dependencies:
# Clone repository
git clone https://github.com/yourusername/cardiology_rag.git
cd cardiology_rag
# Create virtual environment
python -m venv rag_env
source rag_env/bin/activate # Linux / Mac
rag_env\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txtIf you donβt have a requirements.txt yet, install manually:
pip install langchain langchain-community langchain-huggingface langchain-chroma chromadb pypdf sentence-transformers transformers torch accelerate- Place your cardiology PDFs inside the
articles/folder. - Run the main script:
python main.pyExample output:
Loaded 90 pages from 10 PDF files.
Total chunks: 634
=== Answer ===
Atrial fibrillation can cause stroke, heart failure, and increased mortality.
=== Sources ===
[1] atrial_fibrillation.pdf | p.12
[2] heart_failure.pdf | p.4
-
Change model:
Default isQwen/Qwen2.5-7B-Instruct. You can try smaller models likegoogle/flan-t5-baseor larger ones likemistralai/Mistral-7B-Instruct-v0.2. -
Adjust retrieval:
Modifykindb.as_retriever(search_kwargs={"k": 4})to return more or fewer chunks. -
Expand knowledge base:
Add more PDFs (guidelines, textbooks, papers) intoarticles/.
- Medical students preparing for exams
- Clinicians reviewing updated guidelines
- Researchers summarizing across papers
- Startups building intelligent assistants
MIT License β see LICENSE for details.
Copyright (c) 2025 pvrego