Diagram Annotator for Technical Documentation

A powerful Python tool that automatically identifies, categorizes, and generates detailed technical descriptions for diagrams in markdown documentation using vision-capable Large Language Models (LLMs) through Ollama.

Since the software is intended to be used in academical settings, we use Ollama to leverage local MLLMs like qwen3-vl:32b (the tested model).

The system performs well with md. files generated with PDF OCR analyzers like https://github.com/granludo/deepseekocr-mlx (the one we tested).

By Marc Alier & Juanan Pereira https://lamb-project.org

🎯 Overview

This tool processes markdown files containing software engineering diagrams and:

Automatically categorizes diagrams into 35+ types (UML, C4, ERD, flowcharts, etc.)
Generates detailed technical descriptions tailored to each diagram type
Uses context-aware analysis to improve categorization accuracy
Produces annotated documentation with inline technical descriptions
Creates comprehensive summaries of all diagrams found

✨ Key Features

Context-Aware Categorization

Analyzes surrounding text to predict diagram types before visual inspection
Combines textual context with visual analysis for higher accuracy
Tracks prediction accuracy to measure context usefulness

Extensive Diagram Support

Supports 35+ diagram types including:

UML Diagrams: Class, Sequence, Use Case, State, Activity, Component, etc.
Architecture: C4 Model, System Architecture, Cloud Architecture, Microservices
Data Modeling: ERD, Database Schema, Data Flow Diagrams
Process: Flowcharts, BPMN, Gantt Charts
Technical: Network Diagrams, Git Workflows, API Specifications
Design: UI Mockups, Wireframes
Analysis: Decision Trees, Fault Trees, Mind Maps

Intelligent Description Generation

Custom prompts for each diagram type focusing on relevant details
Structured analysis based on diagram-specific elements
Technical accuracy in terminology and notation identification

📋 Requirements

Python 3.8+
Ollama installed and running locally
A vision-capable model installed in Ollama (e.g., qwen2-vl:7b, llava, bakllava)
uv for dependency management (recommended)

🚀 Installation

1. Install Ollama

# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
ollama serve

2. Pull a Vision Model

# Recommended: Qwen2-VL (good balance of quality and speed)
ollama pull qwen2-vl:7b

# Alternative options:
# ollama pull llava:13b
# ollama pull bakllava

3. Install Python Dependencies

# Using uv (recommended)
uv add requests pillow rich

# Or using pip
pip install requests pillow rich

💻 Usage

Basic Usage

uv run annotate_images_enhanced.py \
    --input docs/architecture.md \
    --output docs/architecture_annotated.md \
    --summary docs/diagram_summary.md \
    --categories image_categories_enhanced.json \
    --model qwen3-vl:8b

Advanced Options

uv run annotate_images_enhanced.py \
    --input docs/architecture.md \
    --output docs/architecture_annotated.md \
    --summary docs/diagram_summary.md \
    --categories image_categories_enhanced.json \
    --model qwen3-vl:32b \
    --context-size 750 \    # Amount of surrounding text to analyze
    --verbose              # Show detailed progress

Command-Line Arguments

Argument	Description	Required	Default
`--input`	Path to source markdown file	Yes	-
`--output`	Path for annotated markdown output	Yes	-
`--summary`	Path for diagram summary output	Yes	-
`--categories`	JSON file with diagram categories	Yes	-
`--model`	Ollama vision model to use	No	`qwen2-vl:7b`
`--context-size`	Characters of context to analyze	No	500
`--verbose`	Show detailed progress	No	False

📁 Project Structure

diagram-annotator/
├── annotate_images_enhanced.py    # Main script
├── image_categories_enhanced.json # Diagram categories & prompts
├── README.md                       # This file
├── examples/                       # Example documents
│   ├── input/                     # Sample markdown files
│   └── output/                    # Generated outputs
└── tests/                         # Test documents

🔧 Configuration

Customizing Categories

Edit image_categories_enhanced.json to:

Add new diagram types
Modify categorization prompts
Adjust context indicators
Customize description generation prompts

Example structure:

{
  "categories": ["class diagram", "sequence diagram", ...],
  "category_prompts": {
    "class diagram": {
      "prompt": "Describe this UML Class Diagram...",
      "focus_areas": ["classes", "methods", ...],
      "keywords": ["class", "inheritance", ...]
    }
  },
  "context_indicators": {
    "class diagram": ["UML", "inheritance", "class", ...]
  }
}

Model Selection

Different models offer different trade-offs:

Model	Quality	Speed	Memory	Best For
`qwen2-vl:7b`	Good	Fast	8GB	General use
`qwen2-vl:72b`	Excellent	Slow	40GB+	High accuracy
`llava:13b`	Good	Medium	16GB	Balanced
`bakllava`	Fair	Fast	8GB	Quick processing

📊 Output Examples

Annotated Markdown

The tool inserts technical descriptions after each diagram:

![System Architecture](diagrams/architecture.png)

**Diagram Type:** Architecture Diagram

**Technical Description:**
This architecture diagram shows a microservices-based system with:
1. API Gateway serving as the entry point
2. Three microservices: User Service, Order Service, Payment Service
3. PostgreSQL database for User Service
4. MongoDB for Order Service
5. Redis cache layer
6. RabbitMQ message broker for inter-service communication
7. All services deployed in Docker containers
...

Summary Document

Generates a comprehensive summary with:

Total diagram count
Category distribution statistics
Context prediction accuracy
Detailed entry for each diagram with description

🎯 Use Cases

Documentation Generation: Automatically document existing diagrams
Documentation Validation: Verify diagrams match their descriptions
Knowledge Extraction: Extract technical details from visual documentation
Accessibility: Generate text descriptions for screen readers
Documentation Migration: Convert visual-heavy docs to text-searchable format
Quality Assurance: Ensure diagram completeness and clarity

License

This project is licensed under the GNU General Public License v3.0. See the full license text in the LICENSE file.

For a concise summary of the GPL‑3.0 terms, you can also refer to the SPDX license identifier.

🐛 Troubleshooting

Common Issues

Ollama Connection Error

# Check if Ollama is running
curl http://localhost:11434/api/tags

# Start Ollama if needed
ollama serve

Model Not Found

# List available models
ollama list

# Pull the required model
ollama pull qwen2-vl:7b

Image Processing Errors

Ensure images are in supported formats (PNG, JPG, GIF, WebP)
Check image file sizes (default limit: 5MB)
Verify image paths are relative to the markdown file

Low Accuracy

Try a larger model (e.g., qwen3-vl:72b)
Increase context size with --context-size 1000
Ensure diagram images are clear and high-resolution

🤝 Contributing

Contributions are welcome! Areas for improvement:

Additional Diagram Types: Add support for more specialized diagrams
Improved Prompts: Refine categorization and description prompts
Performance Optimization: Batch processing, caching
Output Formats: Support for different output formats (HTML, PDF)
Integration: GitHub Actions, documentation pipelines

📄 License

This software is licensed GPL 3.0 (c) Marc Alier, Juanan Pereira LAMB project https://lamb-project.org Universitat Politècnica de Catalunya (www.upc.edu) Universidad del Pais Vasco / Euskal Herriko Universitea (www.ehu.eus)

🙏 Acknowledgments

Built with Ollama for local LLM inference
Uses vision models like Qwen3-VL
Grial Research Group - Universidad de Salamanca

📧 Support

For issues, questions, or suggestions:

Open an issue on GitHub
Check existing issues for solutions
Consult the troubleshooting section

Note: This tool requires significant computational resources for vision model inference. Performance will vary based on your hardware capabilities and chosen model size.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
0.jpg		0.jpg
LICENSE		LICENSE
README.md		README.md
annotate_images-v0.1.py		annotate_images-v0.1.py
annotate_images_enhanced.py		annotate_images_enhanced.py
batch-process.sh		batch-process.sh
batch-simple.sh		batch-simple.sh
image_categories copy.json		image_categories copy.json
image_categories.json		image_categories.json
image_categories_IS_revised.json		image_categories_IS_revised.json
image_categories_enhanced copy.json		image_categories_enhanced copy.json
image_categories_enhanced.json		image_categories_enhanced.json
image_categories_refinements.md		image_categories_refinements.md
pyproject.toml		pyproject.toml
test_ollama.py		test_ollama.py
uv.lock		uv.lock
white-paper-draft.md		white-paper-draft.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Diagram Annotator for Technical Documentation

🎯 Overview

✨ Key Features

Context-Aware Categorization

Extensive Diagram Support

Intelligent Description Generation

📋 Requirements

🚀 Installation

1. Install Ollama

2. Pull a Vision Model

3. Install Python Dependencies

💻 Usage

Basic Usage

Advanced Options

Command-Line Arguments

📁 Project Structure

🔧 Configuration

Customizing Categories

Model Selection

📊 Output Examples

Annotated Markdown

Summary Document

🎯 Use Cases

License

🐛 Troubleshooting

Common Issues

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Support

About

Uh oh!

Releases

Packages

Languages

License

Lamb-Project/DiagramLens

Folders and files

Latest commit

History

Repository files navigation

Diagram Annotator for Technical Documentation

🎯 Overview

✨ Key Features

Context-Aware Categorization

Extensive Diagram Support

Intelligent Description Generation

📋 Requirements

🚀 Installation

1. Install Ollama

2. Pull a Vision Model

3. Install Python Dependencies

💻 Usage

Basic Usage

Advanced Options

Command-Line Arguments

📁 Project Structure

🔧 Configuration

Customizing Categories

Model Selection

📊 Output Examples

Annotated Markdown

Summary Document

🎯 Use Cases

License

🐛 Troubleshooting

Common Issues

🤝 Contributing

📄 License

🙏 Acknowledgments

📧 Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages