This is a Streamlit-based chatbot that allows users to upload PDFs and ask questions about the content. It runs entirely locally, meaning no API costs, no internet requirements, and full privacy.
- 📂 Upload multiple PDFs
- 🔍 Vector search using FAISS
- 🧠 Embeddings with Hugging Face models
- 🤖 Uses Ollama (Llama 2) for local inference
- ⚡ No API keys required
- 🔒 Runs 100% on your local machine
python -m venv pdf-chat-env
source pdf-chat-env/bin/activate # On Windows: pdf-chat-env\Scripts\activate
pip install -r requirements.txt
conda install -c conda-forge pytorch transformers sentence-transformers faiss-cpu
pip install -r requirements.txt
📂 pdf-chatbot
├── 📄 app.py # Main Streamlit app
├── 📂 modules
│ ├── process_pdf.py # PDF processing (text extraction, chunking)
│ ├── vector_store.py # FAISS vector database management
│ ├── llm_inference.py # Running Ollama for answering queries
│ ├── config.py # Configurations (chunk size, model settings)
│ ├── __init__.py # Module initialization
├── 📂 faiss_index # Saved FAISS index (generated after processing PDFs)
├── 📄 requirements.txt # Dependencies
├── 📄 README.md # This file
streamlit run app.py
Then open http://localhost:8501 in your browser.
- Upload one or more PDF files from the sidebar.
- Set the chunk size (default:
300). - Click "Process Documents" to generate embeddings.
- Enter your question in the text box.
- Get an AI-powered answer based on the PDFs!
- Streamlit (Web UI)
- PyPDF2 (PDF text extraction)
- LangChain (Text chunking, LLM inference)
- FAISS (Vector search)
- Hugging Face Embeddings (Text representations)
- Ollama (Llama 2) (Local LLM for answering queries)
- No answer or incorrect response?
- Ensure PDFs contain selectable text (scanned images won’t work)
- Increase chunk size if responses lack context
- Performance issues?
- Reduce chunk size for faster processing
- Use
tinyllamainstead ofllama2for quicker inference
- ✅ Add support for PDFs with scanned text (OCR)
- ✅ Implement multi-document summarization
- ✅ Improve LLM response formatting and citations
Developed by Swagath Babu. Free for personal and research use.
