PDF-Based AI Chat System

A Node.js application that creates an intelligent chat system capable of answering questions about PDF documents using AI embeddings and vector search.

📺 Tutorial

This project was built following this YouTube tutorial: PDF-Based AI Chat System Tutorial

🚀 Features

PDF Processing: Load and chunk PDF documents for better processing
Vector Embeddings: Convert text chunks into vector embeddings using Google's Generative AI
Vector Database: Store embeddings in Pinecone for efficient similarity search
Intelligent Chat: Interactive chat system that answers questions based on PDF content
Context-Aware: Maintains conversation history for better context understanding
Query Transformation: Rewrites follow-up questions for better search results

📋 Prerequisites

Node.js (v16 or higher)
npm or yarn
Google Gemini API key
Pinecone account and API key

🛠️ Installation

Clone this repository:

git clone <your-repo-url>
cd Day08

Install dependencies:

npm install

Set up environment variables: Create a .env file in the root directory with:

GEMINI_API_KEY=your_gemini_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here
PINECONE_ENVIRONMENT=your_pinecone_environment
PINECONE_INDEX_NAME=your_pinecone_index_name

📂 Project Structure

├── index.js          # PDF processing and vector storage script
├── query.js           # Interactive chat interface
├── dsa.pdf           # Sample PDF document (Data Structures & Algorithms)
├── package.json      # Project dependencies
├── .env              # Environment variables (not tracked in git)
└── README.md         # Project documentation

🎯 Usage

1. Process PDF and Store Embeddings

First, run the indexing script to process your PDF and store it in Pinecone:

node index.js

This will:

Load the PDF document (dsa.pdf)
Split it into manageable chunks
Generate vector embeddings
Store everything in Pinecone database

2. Start Interactive Chat

Once indexing is complete, start the chat interface:

node query.js

The system will prompt you to ask questions about the PDF content.

🔧 Dependencies

Core Dependencies

@google/genai: Google Generative AI client
@langchain/community: LangChain community modules
@langchain/google-genai: Google AI integration for LangChain
@langchain/pinecone: Pinecone integration for LangChain
@langchain/textsplitters: Text splitting utilities
@pinecone-database/pinecone: Pinecone database client
dotenv: Environment variable management
readline-sync: Synchronous user input

🏗️ How It Works

Document Processing (index.js):
- Loads PDF using PDFLoader
- Splits text into chunks with overlap for better context
- Converts chunks to vector embeddings using Google's text-embedding-004 model
- Stores embeddings in Pinecone with metadata
Query Processing (query.js):
- Takes user input and transforms follow-up questions for clarity
- Converts questions to vector embeddings
- Performs similarity search in Pinecone to find relevant content
- Uses retrieved context to generate answers via Gemini AI
- Maintains conversation history for context-aware responses

🔑 Key Features

Chunking Strategy: Uses RecursiveCharacterTextSplitter with 1000 character chunks and 200 character overlap
Embedding Model: Google's text-embedding-004 for high-quality vector representations
Vector Search: Top-K similarity search (K=10) for relevant context retrieval
AI Model: Gemini-2.0-flash for generating human-like responses
Error Handling: Graceful handling when answers aren't found in the document

🚨 Important Notes

Ensure your PDF file is placed in the root directory as dsa.pdf or update the path in index.js
Run index.js before query.js to populate the vector database
The system is configured for Data Structure and Algorithm content but can be adapted for other domains
Keep your API keys secure and never commit them to version control

🤝 Contributing

Fork the repository
Create a feature branch
Make your changes
Test thoroughly
Submit a pull request

� Tutorial

This project was built following this YouTube tutorial: PDF-Based AI Chat System Tutorial

🐛 Troubleshooting

Common Issues

Module Type Error: Ensure "type": "module" is set in package.json
API Key Issues: Verify your environment variables are correctly set
Dependency Conflicts: Use npm install --legacy-peer-deps if needed
Pinecone Connection: Ensure your Pinecone index exists and is accessible

Support

If you encounter any issues, please check:

Node.js version compatibility
API key validity
Network connectivity
Pinecone index configuration

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
Dsa.pdf		Dsa.pdf
README.md		README.md
index.js		index.js
package-lock.json		package-lock.json
package.json		package.json
query.js		query.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF-Based AI Chat System

📺 Tutorial

🚀 Features

📋 Prerequisites

🛠️ Installation

📂 Project Structure

🎯 Usage

1. Process PDF and Store Embeddings

2. Start Interactive Chat

🔧 Dependencies

Core Dependencies

🏗️ How It Works

🔑 Key Features

🚨 Important Notes

🤝 Contributing

� Tutorial

🐛 Troubleshooting

Common Issues

Support

About

Uh oh!

Releases

Packages

Languages

aashutosh585/RAG-PDF

Folders and files

Latest commit

History

Repository files navigation

PDF-Based AI Chat System

📺 Tutorial

🚀 Features

📋 Prerequisites

🛠️ Installation

📂 Project Structure

🎯 Usage

1. Process PDF and Store Embeddings

2. Start Interactive Chat

🔧 Dependencies

Core Dependencies

🏗️ How It Works

🔑 Key Features

🚨 Important Notes

🤝 Contributing

� Tutorial

🐛 Troubleshooting

Common Issues

Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages