Diverse Text Generation - Multi-Agent RAG System

An iterative multi-agent RAG (Retrieval-Augmented Generation) system built with LangGraph for generating diverse, comprehensive, and factually accurate responses.

Overview

This system implements a 6-agent pipeline that iteratively refines responses through:

Planner (Agent 1): Decomposes queries into aspects/perspectives
Retriever (Agent 2): Retrieves relevant documents for each aspect
Synthesizer (Agent 3): Generates comprehensive answers
Fact Extractor (Agent 4): Extracts atomic facts from answers
Verifier (Agent 5): Verifies facts against evidence
Coverage Evaluator (Agent 6): Evaluates topic coverage

Agents 5 and 6 run in parallel for improved performance (40-50% faster iterations).

Termination: The system automatically terminates when:

Quality criteria are met (no refuted facts, minimal unclear facts, comprehensive coverage), OR
Memory limits are exceeded (RAM/GPU usage thresholds)

Project Structure

diverseTextGen/
├── state.py               # LangGraph state schema
├── graph.py               # LangGraph workflow construction
├── requirements.txt       # Python dependencies
├── env.example            # Environment variable template
│
├── config/                # Configuration module
│   ├── __init__.py
│   └── settings.py        # All configurable settings
│
├── agents/                # Agent implementations (logic)
│   ├── planner.py
│   ├── retriever.py
│   ├── synthesizer.py
│   ├── fact_extractor.py
│   ├── verifier.py
│   └── coverage_evaluator.py
│
├── nodes/                 # LangGraph node wrappers
│   ├── planner.py
│   ├── retriever.py
│   ├── synthesizer.py
│   ├── fact_extractor.py
│   ├── parallel_verification.py
│   └── iteration_gate.py
│
├── llm/                   # LLM clients and prompts
│   ├── server_llm.py      # vLLM server client
│   ├── hf_llm.py          # HuggingFace direct inference
│   └── prompts/           # Prompt templates (optional)
│
├── retrieval/             # Dense retrieval system
│   └── retriever.py
│
├── data/                  # Data handling
│   ├── formatters.py
│   └── dataset.py
│
├── eval/                  # Evaluation modules
│   ├── icat.py            # ICAT-A evaluation
│   ├── llm_evaluator.py
│   ├── retriever.py
│   ├── experiment_tracker.py
│   └── visualizer.py
│
├── scripts/               # Entry point scripts
│   ├── run_langgraph.py   # Main RAG runner
│   ├── run_baseline_experiment.py
│   ├── run_full_experiment.py
│   ├── evaluate_icat.py
│   ├── compare_runs.py
│   └── *.sh               # SLURM job scripts
│
├── artifacts/             # Generated outputs (gitignored)
│   ├── runs/
│   ├── outputs/
│   └── logs/
│
└── docs/                  # Documentation

Setup

1. Create and activate virtual environment

# Using conda (recommended)
conda create -n rag python=3.10
conda activate rag

# Or use an existing environment
source /path/to/conda/etc/profile.d/conda.sh
conda activate /path/to/env

2. Install dependencies

pip install -r requirements.txt

3. Configure environment

Copy the example environment file and customize:

cp env.example .env
# Edit .env with your paths

Or set environment variables directly:

export RAG_DATA_DIR="/path/to/data"
export RAG_CACHE_DIR="/path/to/cache"
export RAG_CORPUS_PATH="/path/to/corpus.jsonl"
export RAG_SERVER_LOGS_DIR="/path/to/server_logs"

4. Start vLLM server

The system requires a vLLM server running:

# Start vLLM server
python -m vllm.entrypoints.openai.api_server \
    --model Qwen/Qwen3-4B-Instruct-2507 \
    --host 0.0.0.0 \
    --port 8000

# Create server log file with host and port
echo "localhost" > server_logs/log.txt
echo "8000" >> server_logs/log.txt

Usage

All scripts should be run from the project root directory.

Single Query

python scripts/run_langgraph.py \
    --query "What causes headaches and how can they be treated?" \
    --query_id "test_001" \
    --output ./output/test_result.json

Batch Processing

python scripts/run_langgraph.py \
    --input_file ./data/queries.jsonl \
    -n 10 \
    --output ./output/batch_results

With Custom Memory Limits

python scripts/run_langgraph.py \
    --query "Explain quantum computing" \
    --query_id "quantum_001" \
    --max_ram_percent 85 \
    --max_gpu_percent 85 \
    --output ./output/result.json

Baseline Experiment (No RAG)

python scripts/run_baseline_experiment.py \
    --queries_path data/antique/train.jsonl \
    -n 10

ICAT Evaluation

python scripts/evaluate_icat.py \
    --output_path ./output/batch_results \
    --corpus_path ./data/antique/corpus_filtered_50.jsonl

Compare Runs

python scripts/compare_runs.py --list  # List all runs
python scripts/compare_runs.py --runs run1 run2 --output comparisons/

Input Format

Queries should be in JSONL format:

{"query_id": "q001", "query_description": "What causes headaches?"}
{"query_id": "q002", "query_description": "How to learn programming?"}

Output Format

Results are saved as JSON with structure:

{
  "query_id": "q001",
  "query": "What causes headaches?",
  "final_answer": "...",
  "total_iterations": 3,
  "termination_reason": "quality_complete",
  "iteration_history": [...],
  "total_runtime_seconds": 45.2,
  "timestamps": {...},
  "memory_config": {...}
}

Termination Reasons:

quality_complete: All quality metrics met (primary)
quality_complete_by_agents: Both verifier and coverage agents indicate completion
memory_exceeded: ...: RAM or GPU memory limit reached

Configuration Options

See env.example for all available environment variables. Key options:

Parameter	Environment Variable	Default	Description
Data directory	`RAG_DATA_DIR`	`./data`	Data storage path
Cache directory	`RAG_CACHE_DIR`	See config	Embedding cache path
Server log file	`RAG_SERVER_LOG_FILE`	`./server_logs/log.txt`	vLLM server info
Default model	`RAG_DEFAULT_MODEL`	`Qwen/Qwen3-4B-Instruct-2507`	LLM model name
Max RAM %	`RAG_MAX_RAM_PERCENT`	`90`	RAM usage termination threshold
Max GPU %	`RAG_MAX_GPU_PERCENT`	`90`	GPU memory termination threshold
Top-K retrieval	`RAG_DEFAULT_TOP_K`	`5`	Documents per aspect

Architecture

Query → Planner → Retriever → Synthesizer → Fact Extractor
                                        ┌───────┴───────┐
                                        ↓               ↓
                                    Verifier    Coverage Evaluator
                                        └───────┬───────┘
                                                ↓
                                        Iteration Gate
                                                ↓
                                Check Quality & Memory Constraints
                                                ↓
                                    [Continue or Terminate]

Iteration Control:

Primary: Quality-based termination (comprehensive, factual answers)
Safety: Memory-based termination (RAM/GPU thresholds)
No fixed iteration limits - runs until quality criteria are met or memory is exhausted

Documentation

📚 Complete documentation is available in the docs/ directory:

ARCHITECTURE.md - Complete technical documentation
- 6-agent pipeline architecture
- LangGraph workflow details
- State management and termination logic
- Technical implementation details
USAGE.md - How to run experiments
- Quick start guide
- Single query and batch processing
- Command reference
- Output structure and monitoring
OBSERVABILITY.md - LLM decision tracking
- Real-time decision logging
- Quality metrics tracking
- Plateau detection
- Log analysis examples

License

MIT License

Citation

If you use this code, please cite:

@software{diverse_text_gen,
  title={Diverse Text Generation - Multi-Agent RAG System},
  author={...},
  year={2024},
  url={https://github.com/...}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diverse Text Generation - Multi-Agent RAG System

Overview

Project Structure

Setup

1. Create and activate virtual environment

2. Install dependencies

3. Configure environment

4. Start vLLM server

Usage

Single Query

Batch Processing

With Custom Memory Limits

Baseline Experiment (No RAG)

ICAT Evaluation

Compare Runs

Input Format

Output Format

Configuration Options

Architecture

Documentation

License

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
agents		agents
config		config
data		data
docs		docs
eval		eval
llm		llm
nodes		nodes
retrieval		retrieval
scripts		scripts
.gitignore		.gitignore
README.md		README.md
env.example		env.example
graph.py		graph.py
llm_observability.py		llm_observability.py
requirements.txt		requirements.txt
setup.sh		setup.sh
state.py		state.py

Bpriya42/diverseTextGen

Folders and files

Latest commit

History

Repository files navigation

Diverse Text Generation - Multi-Agent RAG System

Overview

Project Structure

Setup

1. Create and activate virtual environment

2. Install dependencies

3. Configure environment

4. Start vLLM server

Usage

Single Query

Batch Processing

With Custom Memory Limits

Baseline Experiment (No RAG)

ICAT Evaluation

Compare Runs

Input Format

Output Format

Configuration Options

Architecture

Documentation

License

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages