An AI-powered application that allows users to upload documents and ask questions to get accurate, context-aware answers using LangChain, Pinecone, and Ollama.
- Multi-format Support: Upload PDFs, DOCX, TXT, and Markdown files
- Smart Text Processing: Automatic text extraction and chunking
- Vector Search: Fast semantic search using Pinecone
- Local AI Models: Uses Ollama for privacy and cost-free LLM inference
- User-friendly UI: Clean Streamlit interface
- Source Tracking: See which documents and pages were used for answers
- Real-time Processing: Live progress tracking during document processing
- LangChain: LLM orchestration and retrieval
- Ollama: Local LLM inference (supports Llama3, Qwen, and other models)
- Sentence Transformers: Open-source embeddings for semantic search
- Pinecone: Vector database for similarity search
- Streamlit: Web interface
- PyMuPDF: PDF processing
- python-docx: DOCX processing
- Python 3.8+
- Pinecone account and API key
- Ollama installed and running locally
- At least one Ollama model downloaded (e.g., llama3:latest)
First, install Ollama on your system:
# macOS
curl -fsSL https://ollama.ai/install.sh | sh
# Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Windows
# Download from https://ollama.ai/download# Start Ollama service
ollama serve
# Download a model (in a new terminal)
ollama pull llama3:latestgit clone <your-repo-url>
cd TalkDoc
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install -r requirements.txtCreate a .env file in the project root:
# Required
PINECONE_API_KEY=your-pinecone-api-key
PINECONE_INDEX=talkdoc-indexstreamlit run main.pyThe app will be available at http://localhost:8501
- Upload Documents: Use the sidebar to upload PDF, DOCX, TXT, or MD files
- Process Documents: Click "Process Documents" to extract, chunk, and embed your files
- Ask Questions: Enter questions in natural language and get AI-powered answers
- View Sources: See which documents and pages were used to generate answers
Document Upload β Text Extraction β Chunking β Embedding β Pinecone Storage
β
User Question β Embedding β Similarity Search β Context Retrieval β Ollama LLM β Answer
main.py: Application entry point and Streamlit interfaceapp/ingestion.py: Document parsing and text extractionapp/chunking.py: Text chunking with metadata preservationapp/embeddings.py: Vector embedding generation using sentence-transformersapp/vector_store.py: Pinecone integration and similarity searchapp/qa_chain.py: LangChain RetrievalQA orchestration with Ollama
Modify in app/chunking.py:
chunk_size: Number of characters per chunk (default: 500)chunk_overlap: Overlapping characters between chunks (default: 50)
Change in app/embeddings.py:
- Model:
all-mpnet-base-v2(768 dimensions, high quality) - Automatically padded to 1024 dimensions for Pinecone compatibility
Modify in app/qa_chain.py:
- Current:
llama3:latest - Available models:
qwen2:7b,qwen2.5:latest,llama3.2:latest,tinyllama:latest
To use a different model:
llm = OllamaLLM(
model="qwen2.5:latest", # Change this to any model from ollama list
temperature=0.7,
top_p=0.9,
repeat_penalty=1.1
)streamlit run main.pyFROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "main.py", "--server.port=8501"]- Streamlit Cloud: Direct deployment from GitHub
- Heroku: Use Procfile and requirements.txt
- AWS/GCP: Container deployment
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
-
Ollama Connection Error
# Make sure Ollama is running ollama serve # Check available models ollama list
-
Pinecone Connection Error
- Verify API key in
.env - Check Pinecone dashboard for index status
- Verify API key in
-
Model Not Found
# Download the model ollama pull llama3:latest -
Memory Issues
- Use smaller models like
tinyllama:latest - Reduce batch size in
app/embeddings.py
- Use smaller models like
- Use GPU if available for faster embedding generation
- Adjust chunk size based on document complexity
- Consider using Pinecone's serverless option for cost optimization
- Use smaller Ollama models for faster inference
For issues and questions:
- Create an issue on GitHub
- Check the troubleshooting section above
- Review Pinecone and Ollama documentation
- Ollama Integration: Switched from local HuggingFace models to Ollama for better performance
- Improved Embeddings: Using all-mpnet-base-v2 for higher quality semantic search
- Better Prompt Templates: Enhanced answer generation with custom prompts
- Source Tracking: Improved metadata storage and retrieval
- Error Handling: Better error messages and fallback mechanisms