A powerful Python tool that automatically identifies, categorizes, and generates detailed technical descriptions for diagrams in markdown documentation using vision-capable Large Language Models (LLMs) through Ollama.
Since the software is intended to be used in academical settings, we use Ollama to leverage local MLLMs like qwen3-vl:32b (the tested model).
The system performs well with md. files generated with PDF OCR analyzers like https://github.com/granludo/deepseekocr-mlx (the one we tested).
By Marc Alier & Juanan Pereira https://lamb-project.org
This tool processes markdown files containing software engineering diagrams and:
- Automatically categorizes diagrams into 35+ types (UML, C4, ERD, flowcharts, etc.)
- Generates detailed technical descriptions tailored to each diagram type
- Uses context-aware analysis to improve categorization accuracy
- Produces annotated documentation with inline technical descriptions
- Creates comprehensive summaries of all diagrams found
- Analyzes surrounding text to predict diagram types before visual inspection
- Combines textual context with visual analysis for higher accuracy
- Tracks prediction accuracy to measure context usefulness
Supports 35+ diagram types including:
- UML Diagrams: Class, Sequence, Use Case, State, Activity, Component, etc.
- Architecture: C4 Model, System Architecture, Cloud Architecture, Microservices
- Data Modeling: ERD, Database Schema, Data Flow Diagrams
- Process: Flowcharts, BPMN, Gantt Charts
- Technical: Network Diagrams, Git Workflows, API Specifications
- Design: UI Mockups, Wireframes
- Analysis: Decision Trees, Fault Trees, Mind Maps
- Custom prompts for each diagram type focusing on relevant details
- Structured analysis based on diagram-specific elements
- Technical accuracy in terminology and notation identification
- Python 3.8+
- Ollama installed and running locally
- A vision-capable model installed in Ollama (e.g.,
qwen2-vl:7b,llava,bakllava) - uv for dependency management (recommended)
# macOS/Linux
curl -fsSL https://ollama.ai/install.sh | sh
# Start Ollama service
ollama serve# Recommended: Qwen2-VL (good balance of quality and speed)
ollama pull qwen2-vl:7b
# Alternative options:
# ollama pull llava:13b
# ollama pull bakllava# Using uv (recommended)
uv add requests pillow rich
# Or using pip
pip install requests pillow richuv run annotate_images_enhanced.py \
--input docs/architecture.md \
--output docs/architecture_annotated.md \
--summary docs/diagram_summary.md \
--categories image_categories_enhanced.json \
--model qwen3-vl:8buv run annotate_images_enhanced.py \
--input docs/architecture.md \
--output docs/architecture_annotated.md \
--summary docs/diagram_summary.md \
--categories image_categories_enhanced.json \
--model qwen3-vl:32b \
--context-size 750 \ # Amount of surrounding text to analyze
--verbose # Show detailed progress| Argument | Description | Required | Default |
|---|---|---|---|
--input |
Path to source markdown file | Yes | - |
--output |
Path for annotated markdown output | Yes | - |
--summary |
Path for diagram summary output | Yes | - |
--categories |
JSON file with diagram categories | Yes | - |
--model |
Ollama vision model to use | No | qwen2-vl:7b |
--context-size |
Characters of context to analyze | No | 500 |
--verbose |
Show detailed progress | No | False |
diagram-annotator/
βββ annotate_images_enhanced.py # Main script
βββ image_categories_enhanced.json # Diagram categories & prompts
βββ README.md # This file
βββ examples/ # Example documents
β βββ input/ # Sample markdown files
β βββ output/ # Generated outputs
βββ tests/ # Test documents
Edit image_categories_enhanced.json to:
- Add new diagram types
- Modify categorization prompts
- Adjust context indicators
- Customize description generation prompts
Example structure:
{
"categories": ["class diagram", "sequence diagram", ...],
"category_prompts": {
"class diagram": {
"prompt": "Describe this UML Class Diagram...",
"focus_areas": ["classes", "methods", ...],
"keywords": ["class", "inheritance", ...]
}
},
"context_indicators": {
"class diagram": ["UML", "inheritance", "class", ...]
}
}Different models offer different trade-offs:
| Model | Quality | Speed | Memory | Best For |
|---|---|---|---|---|
qwen2-vl:7b |
Good | Fast | 8GB | General use |
qwen2-vl:72b |
Excellent | Slow | 40GB+ | High accuracy |
llava:13b |
Good | Medium | 16GB | Balanced |
bakllava |
Fair | Fast | 8GB | Quick processing |
The tool inserts technical descriptions after each diagram:

**Diagram Type:** Architecture Diagram
**Technical Description:**
This architecture diagram shows a microservices-based system with:
1. API Gateway serving as the entry point
2. Three microservices: User Service, Order Service, Payment Service
3. PostgreSQL database for User Service
4. MongoDB for Order Service
5. Redis cache layer
6. RabbitMQ message broker for inter-service communication
7. All services deployed in Docker containers
...Generates a comprehensive summary with:
- Total diagram count
- Category distribution statistics
- Context prediction accuracy
- Detailed entry for each diagram with description
- Documentation Generation: Automatically document existing diagrams
- Documentation Validation: Verify diagrams match their descriptions
- Knowledge Extraction: Extract technical details from visual documentation
- Accessibility: Generate text descriptions for screen readers
- Documentation Migration: Convert visual-heavy docs to text-searchable format
- Quality Assurance: Ensure diagram completeness and clarity
This project is licensed under the GNU General Public License v3.0. See the full license text in the LICENSE file.
For a concise summary of the GPLβ3.0 terms, you can also refer to the SPDX license identifier.
Ollama Connection Error
# Check if Ollama is running
curl http://localhost:11434/api/tags
# Start Ollama if needed
ollama serveModel Not Found
# List available models
ollama list
# Pull the required model
ollama pull qwen2-vl:7bImage Processing Errors
- Ensure images are in supported formats (PNG, JPG, GIF, WebP)
- Check image file sizes (default limit: 5MB)
- Verify image paths are relative to the markdown file
Low Accuracy
- Try a larger model (e.g.,
qwen3-vl:72b) - Increase context size with
--context-size 1000 - Ensure diagram images are clear and high-resolution
Contributions are welcome! Areas for improvement:
- Additional Diagram Types: Add support for more specialized diagrams
- Improved Prompts: Refine categorization and description prompts
- Performance Optimization: Batch processing, caching
- Output Formats: Support for different output formats (HTML, PDF)
- Integration: GitHub Actions, documentation pipelines
This software is licensed GPL 3.0 (c) Marc Alier, Juanan Pereira LAMB project https://lamb-project.org Universitat Politècnica de Catalunya (www.upc.edu) Universidad del Pais Vasco / Euskal Herriko Universitea (www.ehu.eus)
- Built with Ollama for local LLM inference
- Uses vision models like Qwen3-VL
- Grial Research Group - Universidad de Salamanca
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check existing issues for solutions
- Consult the troubleshooting section
Note: This tool requires significant computational resources for vision model inference. Performance will vary based on your hardware capabilities and chosen model size.