This project implements a PDF Question Answering system using LangChain, Ollama, and Streamlit. It allows users to upload PDF documents and ask questions about their content, getting AI-powered responses based on the document's content.
This project uses Ollama instead of cloud-based APIs for several key advantages:
-
Privacy & Security
- All processing happens locally on your machine
- No data is sent to external servers
- Complete control over your documents and queries
-
Cost-Effective
- No API usage fees or subscription costs
- No token-based pricing
- Free to use without limitations
-
Offline Capability
- Works without internet connection
- No dependency on external services
- Consistent performance regardless of network status
-
Customization
- Ability to use different models locally
- Custom model fine-tuning possibilities
- Full control over model parameters
-
Performance
- Lower latency as everything runs locally
- No rate limiting or API quotas
- Consistent response times
- PDF document upload and processing
- Document chunking and vectorization using GPT4All embeddings
- Question answering using Llama2 model through Ollama
- Interactive web interface built with Streamlit
- History of previous queries and answers
- Concise and accurate responses
- Python 3.8 or higher
- Ollama installed locally
- Virtual environment (recommended)
- Clone the repository:
git clone <your-repository-url>
cd <repository-name>- Create and activate a virtual environment:
For macOS/Linux:
python -m venv env
source env/bin/activateFor Windows:
python -m venv env
.\env\Scripts\activate-
Install Ollama:
- Visit Ollama's official website and download the appropriate version for your OS
- Follow the installation instructions for your platform
- Pull the Llama2 model:
ollama pull llama2:7b
-
Install project dependencies:
pip install -r requirements.txt- Start the Streamlit application:
streamlit run app.py-
Open your web browser and navigate to the URL shown in the terminal (typically http://localhost:8501)
-
Upload a PDF file using the file uploader
-
Click "Process PDF" to initialize the document processing
-
Enter your questions in the query input field and click "Submit Query"
-
View the answers and previous conversation history below
app.py: Main Streamlit applicationrequirements.txt: Project dependenciesuploaded_file.pdf: Temporary storage for uploaded PDFsmy_vectorstore.pkl: Vector store for document embeddings
-
Enhanced Document Processing
- Support for multiple document formats (DOCX, TXT, etc.)
- Batch processing of multiple documents
- Improved text chunking strategies
-
Advanced Features
- Document summarization
- Key points extraction
- Citation tracking for answers
- Support for images and tables in documents
-
UI/UX Improvements
- Dark mode support
- Better error handling and user feedback
- Export conversation history
- Customizable response length and style
-
Performance Optimization
- Caching mechanisms for faster responses
- Optimized vector storage
- Support for larger documents
-
Model Enhancements
- Support for multiple LLM models
- Fine-tuning capabilities
- Custom model integration
Contributions are welcome! Please feel free to submit a Pull Request.