Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
f870d73
Update LLM cache [skip ci]
actions-user Sep 29, 2025
a5f7dc3
Update non-browsable doc map [skip ci]
actions-user Sep 29, 2025
57d943d
Conversion of HRL Croplands PUM + media
Stina-Gremme Sep 30, 2025
af06cfa
Update LLM cache [skip ci]
actions-user Sep 30, 2025
da0623c
Media url fix in HRL_Croplands doc
mckeea Sep 30, 2025
b6bab21
Update LLM cache [skip ci]
actions-user Sep 30, 2025
0e5d969
Added category to yaml header
Stina-Gremme Sep 30, 2025
58dbaba
Update LLM cache [skip ci]
actions-user Sep 30, 2025
01f9def
Moved HRL Cropland in its own project folder
Stina-Gremme Oct 1, 2025
03d0a4d
Update LLM cache [skip ci]
actions-user Oct 1, 2025
c51e776
converted CLC+ Backbone (in respecitve project folder)
Stina-Gremme Oct 1, 2025
1cfa522
Update LLM cache [skip ci]
actions-user Oct 1, 2025
5bb9fc0
moved HRL Grasslands to specific project folder + renamed existing pr…
Stina-Gremme Oct 1, 2025
ace96d6
Update LLM cache [skip ci]
actions-user Oct 1, 2025
37a1695
moved the converted documents into their respective project folders
Stina-Gremme Oct 1, 2025
bcb2aea
Merge branch 'develop' of https://github.com/eea/CLMS_documents into …
Stina-Gremme Oct 1, 2025
1bfa222
changed date of document from 2025 to 2024
Stina-Gremme Oct 2, 2025
3e761ba
Update LLM cache [skip ci]
actions-user Oct 2, 2025
9bfd4ce
Update non-browsable doc map [skip ci]
actions-user Oct 2, 2025
f50aa61
moved VLCC ATBD to respective product category (HRL Cropland)
Stina-Gremme Oct 2, 2025
f29b67b
Merge branch 'develop' of https://github.com/eea/CLMS_documents into …
Stina-Gremme Oct 2, 2025
bcf73f0
Update LLM cache [skip ci]
actions-user Oct 2, 2025
3f1a9ff
specified file name of CLC+BB User Manual 2021
Stina-Gremme Oct 6, 2025
503ed01
Update LLM cache [skip ci]
actions-user Oct 6, 2025
58c0e19
embedded media correctly
Stina-Gremme Oct 6, 2025
9193da7
Merge branch 'develop' of https://github.com/eea/CLMS_documents into …
Stina-Gremme Oct 6, 2025
f1aeda9
Update LLM cache [skip ci]
actions-user Oct 6, 2025
27c63bb
DOCS dir reorganization
mckeea Oct 8, 2025
66e41a4
cleanup
mckeea Oct 8, 2025
c6156d2
Update LLM cache [skip ci]
actions-user Oct 8, 2025
48ce4dc
config update
mckeea Oct 8, 2025
0de3437
Config update
mckeea Oct 8, 2025
c435120
corrected media embedding
Stina-Gremme Oct 9, 2025
9c92335
Update LLM cache [skip ci]
actions-user Oct 9, 2025
917aba5
corrected media embedding
Stina-Gremme Oct 9, 2025
0fd4a88
Update LLM cache [skip ci]
actions-user Oct 9, 2025
8b8ca4e
retrieved lost pictures in Riparian Zones Nomenclature Guideline
Stina-Gremme Oct 10, 2025
a21b791
test commit
mckeea Oct 13, 2025
c6a918c
corrected file name Coastal Zones version 1
Stina-Gremme Oct 21, 2025
982c329
Update LLM cache [skip ci]
actions-user Oct 21, 2025
0196d2d
embedded the media following the new named media path
Stina-Gremme Oct 21, 2025
e587217
Update LLM cache [skip ci]
actions-user Oct 21, 2025
401f696
added the major version digit to all document file names that were al…
Stina-Gremme Oct 27, 2025
69c66bb
Removed the duplicated contact line
Stina-Gremme Oct 29, 2025
f428f49
Update LLM cache [skip ci]
actions-user Oct 29, 2025
76003e2
Update non-browsable doc map [skip ci]
actions-user Oct 29, 2025
9c41b47
converted CLC+BB 2023 & fixed title of CLC+BB 2021
Stina-Gremme Oct 29, 2025
72cc099
Merge branch 'develop' of https://github.com/eea/CLMS_documents into …
Stina-Gremme Oct 29, 2025
45be6e0
Update LLM cache [skip ci]
actions-user Oct 29, 2025
5c5e22d
Updated category "products"
Stina-Gremme Oct 29, 2025
bf2df50
Update LLM cache [skip ci]
actions-user Oct 30, 2025
4bcee90
correcting document name "+" to "plus"
Stina-Gremme Oct 30, 2025
5ad928a
Merge branch 'develop' of https://github.com/eea/CLMS_documents into …
Stina-Gremme Oct 30, 2025
e230574
Update LLM cache [skip ci]
actions-user Oct 30, 2025
ba7b297
Cleaned up folder structure + file naming following the newest decisi…
Stina-Gremme Nov 10, 2025
b65d6e4
Update LLM cache [skip ci]
actions-user Nov 10, 2025
604b1b9
uploaded original PDF files to convert
Stina-Gremme Nov 19, 2025
16b1443
Update LLM cache [skip ci]
actions-user Nov 19, 2025
06890ec
Renamed folder name
Stina-Gremme Nov 20, 2025
9e578b3
converted pdf to qmd for QC
Stina-Gremme Nov 21, 2025
836cae3
Update LLM cache [skip ci]
actions-user Nov 21, 2025
dd5fbd3
updated python script to include catgeory in Yaml header + converted …
Stina-Gremme Nov 21, 2025
265e9e7
Quality checked CLCplus BB 2023 ATBD
Stina-Gremme Nov 21, 2025
3531970
moved original pdf file in converted folder
Stina-Gremme Nov 21, 2025
25de9cb
Update LLM cache [skip ci]
actions-user Nov 21, 2025
43209ac
Cleaned up document names + file names
Stina-Gremme Nov 21, 2025
6846770
deleted Grassland PUM, to prepare for conversion of new version
Stina-Gremme Nov 21, 2025
9c22933
Update LLM cache [skip ci]
actions-user Nov 21, 2025
d9bd406
Merge branch 'develop' of https://github.com/eea/CLMS_documents into …
mckeea Nov 24, 2025
6f75868
Few files renamed
mckeea Nov 30, 2025
be08b78
Update LLM cache [skip ci]
actions-user Nov 30, 2025
6056d8e
New and upgraded logic for version change detection and changelogs ge…
mckeea Dec 1, 2025
b42c9d0
Merge branch 'develop' of https://github.com/eea/CLMS_documents into …
mckeea Dec 1, 2025
38375e2
Merge branch 'test' into develop
mckeea Dec 1, 2025
f9b040d
Update non-browsable doc map [skip ci]
actions-user Dec 1, 2025
eb05d11
Better naming for PR message/title & squash merge for PR added
mckeea Dec 1, 2025
3f815e7
converted pdf to qmd for quality check
Stina-Gremme Dec 3, 2025
042f5a0
Worked up until some point
Dec 3, 2025
a6108bc
improvement: smart batching for keywords generation
mckeea Dec 4, 2025
64dbe2e
fix: .gitattributes [skip ci]
mckeea Dec 4, 2025
87fd073
Merge branch 'test' into develop
mckeea Dec 4, 2025
1d1a7d0
Update LLM cache [skip ci]
actions-user Dec 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
486 changes: 415 additions & 71 deletions .github/scripts/generate_intros_and_keywords.py

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
You are an AI assistant enriching technical documents for the Copernicus Land Monitoring Service (CLMS).

**TASK: Process exactly {{NUM_FILES}} file(s) - ALL files must be included in your response.**

Files to process:
{{FILE_LIST}}

**INSTRUCTIONS FOR EACH FILE:**

1. **Read the entire document** (ignore YAML frontmatter, focus on content)
2. **Write an Introduction** (60-100 words):
- Professional, engaging, single paragraph
- Clearly explain: document purpose, scope, and technical focus
- Use British English spelling
3. **Extract exactly 10 keywords**:
- Focus on SPECIFIC concepts: methods, indicators, systems, algorithms, data types
- Use multi-word phrases for precision (e.g., "Synthetic Aperture Radar", "land cover classification")
- AVOID generic terms: "documentation", "metadata", "nomenclature", "report", "Urban Atlas"
- Think like a scientific indexer for semantic search

**OUTPUT FORMAT (strict JSON):**

{
"filename1.qmd": {
"introduction": "Single paragraph introduction here...",
"keywords": ["keyword1", "keyword2", "keyword3", "keyword4", "keyword5", "keyword6", "keyword7", "keyword8", "keyword9", "keyword10"]
},
"filename2.qmd": {
"introduction": "Single paragraph introduction here...",
"keywords": ["keyword1", "keyword2", "keyword3", "keyword4", "keyword5", "keyword6", "keyword7", "keyword8", "keyword9", "keyword10"]
}
}

**CRITICAL REQUIREMENTS:**
✓ Include ALL {{NUM_FILES}} files in response
✓ Each file needs exactly 10 keywords
✓ Return ONLY valid JSON (no markdown, no explanations, no code fences)
✓ Use exact filenames as keys
✓ Process files in order listed above

**VERIFICATION CHECKLIST BEFORE RESPONDING:**
□ Did I process file #1?
□ Did I process file #{{NUM_FILES}}?
□ Do ALL files have introductions?
□ Do ALL files have exactly 10 keywords?
□ Is my JSON valid (no trailing commas)?

Begin your response with { and end with }
198 changes: 198 additions & 0 deletions .github/scripts/prompt_templates/prompt_template.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
You are analyzing git diffs of technical documentation files.

**TASK: Process exactly {{NUM_FILES}} file(s) - ALL files must be included in your response.**

Files to analyze:
{{FILE_LIST}}

**For EACH file, provide:**
1. Semantic version bump decision (MINOR or PATCH)
2. Changelog summary of changes

═══════════════════════════════════════════════════════════════════════
PART 1: VERSION BUMP ANALYSIS
═══════════════════════════════════════════════════════════════════════

**Understanding Git Diffs:**
- Lines with `+` = additions
- Lines with `-` = deletions
- Lines with `-` then `+` = modifications
- `@@` lines = line numbers (ignore)
- Context lines (no prefix) = unchanged content
- Focus on content changes, not diff syntax

**Version Bump Rules:**

**MINOR (Y++) when:**
- New chapters/sections/subsections (## or ### headers)
- New features, APIs, or functionality documented
- Significant content additions (>20% new content or >50 lines)
- Major restructuring or reorganization
- New diagrams, tables, or substantial examples
- Breaking changes in documented procedures
- New requirements or specifications
- Substantial updates to existing sections

**PATCH (Z++) when:**
- Typo/grammar fixes
- Formatting improvements (spacing, styling, markdown)
- Clarifications without new information
- Minor wording improvements
- Link updates or corrections
- Small additions (<20 lines, <5 sentences)
- Metadata-only updates (YAML frontmatter)
- Fixing broken references or cross-links
- Minor corrections to existing content

**Decision Guidelines:**
- Mixed changes: MINOR if ANY significant change
- Only YAML/metadata: PATCH
- Only formatting/whitespace: PATCH
- If uncertain: lean towards MINOR
- Empty/no changes: PATCH

═══════════════════════════════════════════════════════════════════════
PART 2: CHANGELOG SUMMARY GENERATION
═══════════════════════════════════════════════════════════════════════

**Format Selection Rules:**

Use **PARAGRAPH format** when:
- 1-3 related changes that flow naturally together
- Changes are to a single section or closely related topics
- Simple modifications or updates
- Format: Write 1-3 clear sentences in prose

Use **BULLET POINT format** when:
- 4 or more distinct changes
- Changes span multiple unrelated sections or topics
- Mix of additions, modifications, and removals
- Each change is independent and actionable
- Format: Use HTML unordered list format: `<ul><li>Change 1</li><li>Change 2</li></ul>`, maximum 10 items

**Changelog Writing Guidelines:**
- **Be concise but informative**: Changelogs appear in document version tables - balance brevity with clarity
- **Length limits**:
- Paragraph format: Maximum 3-4 sentences (60-100 words)
- Bullet format: Maximum 8-10 items, each item 8-20 words
- **Be specific**: Instead of "updated information", write "updated processing algorithm from version 2.1 to 2.3"
- **Use concrete details**: Include version numbers, dates, parameter names, section titles, dataset names
- **Quantify when relevant**: "added 5 new validation metrics" not "added validation information"
- **Prioritize**: Lead with the most significant changes (new features > modifications > removals)
- **User-focused**: Explain the practical impact, not just the technical change
- **Active voice**: "Added new quality flags" not "New quality flags were added"
- **British English** spelling and terminology

**Changelog Examples:**

*Paragraph format (1-3 related changes) - CONCISE:*
"Updated processing algorithm from version 2.1 to 2.3, improving cloud detection accuracy by 15%. Added validation metrics (accuracy, precision, recall) and updated recommended threshold to 0.8."

*Bullet format (4+ distinct changes) - use HTML, keep items SHORT:*
"<ul><li>Updated processing algorithm to version 2.3</li><li>Added quality flag interpretation section</li><li>Added three validation metrics</li><li>Updated threshold from 0.7 to 0.8</li><li>Replaced deprecated v1 endpoints with v2</li></ul>"

*BAD - Too verbose:*
"<ul><li>Updated processing algorithm from version 2.1 to 2.3 which resulted in a 15% improvement in cloud detection accuracy according to validation tests</li><li>Added a completely new section on quality flag interpretation that includes detailed examples and explanations for each flag type</li></ul>"

*Good specificity examples (but keep BRIEF):*
✓ "Corrected Sentinel-2 spatial resolution from 20m to 10m for visible bands"
✗ "Fixed technical details about resolution"

✓ "Added troubleshooting section covering authentication, timeout, and format errors"
✗ "Added troubleshooting information"

*Remember: BREVITY is critical - these appear in version history tables*

**For minor changes only:**
- Use "Minor formatting and metadata updates" if ONLY markdown formatting, spacing, or metadata changed
- Use "Document maintenance updates" if changes are purely technical (fixing typos, broken links, formatting consistency) with no content impact

**SECURITY: HTML restrictions (CRITICAL):**
- ONLY use `<ul>`, `<li>`, and `</ul>`, `</li>` tags - NO other HTML tags allowed
- NO JavaScript, CSS, style attributes, or event handlers
- NO external links, images, or embedded content
- NO script tags, iframe, object, embed, or similar elements
- NO HTML attributes except standard list structure
- Keep content as plain text within `<li>` tags

═══════════════════════════════════════════════════════════════════════
OUTPUT FORMAT REQUIREMENTS
═══════════════════════════════════════════════════════════════════════

**CRITICAL REQUIREMENT:**
You MUST return a result for EVERY file in the batch, even if you cannot analyze it.
If a file's diff is unreadable, empty, or causes analysis issues, use:
```json
{
"version": {
"bump": "error",
"reason": "Cannot analyze: [specific reason]",
"confidence": "none"
},
"changelog": {
"format": "error",
"summary": "Unable to generate changelog due to analysis error"
}
}
```

**Required JSON Structure:**
Return a JSON object with ALL file paths as keys (one entry per file in the batch):

```json
{
"DOCS/path/to/file1_v1.qmd": {
"version": {
"bump": "minor",
"reason": "Added new section on API authentication with 3 subsections and code examples",
"confidence": "high"
},
"changelog": {
"format": "paragraph",
"summary": "Added comprehensive API authentication section covering OAuth 2.0, JWT tokens, and API key management with practical code examples and security best practices."
}
},
"DOCS/path/to/file2_v2.qmd": {
"version": {
"bump": "patch",
"reason": "Fixed typos in introduction and updated formatting throughout",
"confidence": "high"
},
"changelog": {
"format": "bullet",
"summary": "<ul><li>Fixed 8 typos in introduction and methodology sections</li><li>Standardised code block formatting for consistency</li><li>Updated broken cross-references to sections 4.2 and 5.1</li><li>Corrected unit notation from 'meters' to 'm' throughout document</li></ul>"
}
}
}
```

**Field Definitions:**
- `version.bump`: "minor", "patch", or "error"
- `version.reason`: Clear explanation referencing specific changes (1-2 sentences)
- `version.confidence`: "high", "medium", "low", or "none"
- `changelog.format`: "paragraph", "bullet", or "error"
- `changelog.summary`: The actual changelog text (plain text for paragraph, HTML list for bullet)

**CRITICAL REQUIREMENTS:**
✓ Include ALL {{NUM_FILES}} files in response
✓ Each file needs both version AND changelog
✓ Version bumps must match changelog significance
✓ HTML only: `<ul>`, `<li>` tags (properly closed)
✓ Changelogs: specific and concise
✓ Use British English
✓ Return ONLY valid JSON (no markdown, no explanations)

**VERIFICATION CHECKLIST BEFORE RESPONDING:**
□ Did I process file #1 in the list?
□ Did I process file #{{NUM_FILES}} in the list?
□ Do ALL files have version.bump, version.reason, version.confidence?
□ Do ALL files have changelog.format and changelog.summary?
□ Are MINOR bumps justified by significant changes in changelog?
□ Are PATCH bumps justified by minor changes in changelog?
□ Is my JSON valid (no trailing commas, all braces closed)?
□ Did I count: do I have exactly {{NUM_FILES}} keys?

**Files to include in response (verify each one):**
{{FILE_LIST}}

Begin your response with { and end with }

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading