Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 21, 2025

Relative image paths in markdown (e.g., ./images/logo.png, ../assets/icon.png) break when the HTML is rendered in a temp directory. This is common with GitHub repository markdown files that use both markdown image syntax and HTML inline images.

Changes

  • Token processing: Walk parsed markdown tokens to find Image tokens with relative paths
  • HTML image processing: Process HTMLInline and HTMLBlock tokens to handle <img src="..."> tags with relative paths
  • Base64 encoding: Read image files relative to the markdown file's directory and encode as data URIs
  • Path detection: Skip http://, https://, //, data:, and absolute paths
  • Format support: PNG, JPEG, GIF, SVG, WebP, BMP, ICO
  • Performance: Package-level regex compilation to avoid recompilation overhead

Security

  • 10MB file size limit to prevent memory exhaustion
  • Path validation with max 3 parent directory levels to prevent traversal attacks
  • Proper path component counting (not substring matching)
  • Quote matching validation for HTML tags
  • Graceful degradation preserves original paths on errors

Example

Markdown syntax:

# My Project
![Logo](assets/logo.png)
![Screenshot](docs/screenshot.png)
![External](https://example.com/image.png)

HTML inline images:

<img src="assets/logo.png" alt="Logo">
<img src="docs/screenshot.png" alt="Screenshot" width="100">
<img src="https://example.com/image.png" alt="External">

Both markdown and HTML relative paths are embedded as ... while external URLs remain unchanged.

Testing

Tested with real-world examples including the deb-get README.md which uses extensive HTML inline images.

Markdown images:

HTML inline images (deb-get README):

Original prompt

This section details on the original issue you should resolve

<issue_title>in-line html and relative links are not ideal</issue_title>
<issue_description>Whilst acknowledging the "problems" are mostly due the source markdown I think some enhancements might be possible. For example if you throw some github markdown from repositories you might find relative links to in-line images etc. Perhaps a recursive option (with limits maybe) to partially rebuild the tree in the output would be useful. I also noticed that in-line html comments (when e.g. cog is being used) are rendered as text (e.g.



<p>&lt;!– [[[cog
--
  | import subprocess
  | import cog</p>

whereas I think mdview would be better to catch and leave these as in-line html to be rendered as comments.</issue_description>

<agent_instructions>The inline-HTML issue is already fixed. Focus on the relative links. If the links are to images, either download the images and store them next to the temporary HTML file or if it is possible, encode them into the resulting HTML. The second option would be better. </agent_instructions>

Comments on the Issue (you are @copilot in this section)

@mapitman I'm using a third-party library to handle the conversion. Maybe there's a better option out there? @mapitman Do you have a link to a markdown file that exhibits this behavior? That would help me figure out what I can do about this. @mapitman Created #26 to for in-line HTML issue. @mapitman In-line HTML fixed by #27

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits December 21, 2025 00:20
Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
Copilot AI changed the title [WIP] Enhance handling of relative links in markdown processing Convert relative image links to base64 data URIs Dec 21, 2025
Copilot AI requested a review from mapitman December 21, 2025 00:28
@mapitman
Copy link
Owner

@copilot Use this markdown file to test https://github.com/wimpysworld/deb-get/blob/main/README.md

Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Dec 21, 2025

@copilot Use this markdown file to test https://github.com/wimpysworld/deb-get/blob/main/README.md

Tested with the deb-get README.md. Added support for HTML inline images (<img src="...">) in addition to markdown image syntax. All relative images are now converted to data URIs and display correctly.

Commit: 5d19cae

deb-get rendered output

Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
Copilot AI changed the title Convert relative image links to base64 data URIs Convert relative image links to base64 data URIs (markdown and HTML) Dec 21, 2025
Copy link
Owner

@mapitman mapitman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Tested locally and was able to render images.

@mapitman mapitman marked this pull request as ready for review December 21, 2025 01:06
Copilot AI review requested due to automatic review settings December 21, 2025 01:06
@mapitman mapitman merged commit bc87e96 into main Dec 21, 2025
5 checks passed
@mapitman mapitman deleted the copilot/enhance-relative-links-handling branch December 21, 2025 01:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automatic conversion of relative image paths to base64-encoded data URIs in both markdown image syntax and HTML inline images. This solves the issue where relative image paths break when HTML is rendered in a temporary directory, which is common with GitHub repository markdown files.

Key Changes

  • Token processing system that walks parsed markdown tokens to find and convert relative image paths to base64 data URIs
  • HTML image processing using regex to handle <img src="..."> tags with relative paths
  • Security measures including 10MB file size limit and path traversal validation with up to 3 parent directory levels

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return true
}

// imageToDataURI reads an image file and converts it to a base64 data URI
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function comment should specify what happens when image conversion fails. Consider adding that the function returns an empty string on failure, which causes the original path to be preserved (graceful degradation).

Suggested change
// imageToDataURI reads an image file and converts it to a base64 data URI
// imageToDataURI reads an image file and converts it to a base64 data URI.
// On any failure it logs a warning and returns an empty string so callers can
// gracefully fall back to using the original image path.

Copilot uses AI. Check for mistakes.
return fmt.Sprintf("data:%s;base64,%s", mimeType, encoded)
}

// getMimeType returns the MIME type based on file extension
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function comment should document the return value more clearly. Consider adding that it returns the appropriate MIME type string for the image format, or "image/*" for unknown formats.

Suggested change
// getMimeType returns the MIME type based on file extension
// getMimeType returns the appropriate image MIME type string based on the file
// extension, or "image/*" for unknown or unsupported image formats.

Copilot uses AI. Check for mistakes.
// Clean and validate the path to prevent path traversal attacks
cleanedPath, err := filepath.Abs(fullPath)
if err != nil {
log.Printf("Warning: Invalid image path %s: %v", fullPath, err)
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message could be more helpful by suggesting potential causes. Consider rephrasing to something like "Warning: Invalid image path %s (failed to resolve absolute path): %v" to clarify what operation failed.

Suggested change
log.Printf("Warning: Invalid image path %s: %v", fullPath, err)
log.Printf("Warning: Invalid image path %s (failed to resolve absolute path): %v", fullPath, err)

Copilot uses AI. Check for mistakes.
// Ensure the resolved path is within or relative to the base directory
cleanedBase, err := filepath.Abs(baseDir)
if err != nil {
log.Printf("Warning: Invalid base directory %s: %v", baseDir, err)
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message could be more helpful by suggesting potential causes. Consider rephrasing to something like "Warning: Invalid base directory %s (failed to resolve absolute path): %v" to clarify what operation failed.

Suggested change
log.Printf("Warning: Invalid base directory %s: %v", baseDir, err)
log.Printf("Warning: Invalid base directory %s (failed to resolve absolute path): %v", baseDir, err)

Copilot uses AI. Check for mistakes.
Comment on lines +357 to +359
// For unknown extensions, log a warning but try with generic image type
log.Printf("Warning: Unknown image extension %s for file %s, using image/* MIME type", ext, path)
return "image/*"
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "image/*" as a fallback MIME type for unknown extensions is not standard and may not work correctly in all browsers. The asterisk is typically used in HTTP Accept headers, not in data URIs. Consider using a more specific default like "application/octet-stream" or "image/png", or simply skip conversion for unknown formats by returning an empty string.

Suggested change
// For unknown extensions, log a warning but try with generic image type
log.Printf("Warning: Unknown image extension %s for file %s, using image/* MIME type", ext, path)
return "image/*"
// For unknown extensions, log a warning but fall back to a generic binary MIME type
log.Printf("Warning: Unknown image extension %s for file %s, using application/octet-stream MIME type", ext, path)
return "application/octet-stream"

Copilot uses AI. Check for mistakes.
Comment on lines +24 to +26
// imgSrcRegex matches <img> tags with src attributes
// Captures: 1=prefix, 2=opening quote, 3=src path, 4=closing quote
var imgSrcRegex = regexp.MustCompile(`(<img[^>]*\ssrc=)(["']?)([^"'\s>]+)(["']?)`)
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern for src paths uses [^"'\s>]+ which excludes spaces. This means unquoted src attributes containing spaces (e.g., <img src=my image.png>) would only capture the first part before the space ("my"), potentially causing incorrect path processing. While unquoted attributes with spaces are invalid HTML, consider adding validation to ensure src attributes are properly quoted, or update the regex to only match properly quoted or space-free unquoted values.

Suggested change
// imgSrcRegex matches <img> tags with src attributes
// Captures: 1=prefix, 2=opening quote, 3=src path, 4=closing quote
var imgSrcRegex = regexp.MustCompile(`(<img[^>]*\ssrc=)(["']?)([^"'\s>]+)(["']?)`)
// imgSrcRegex matches <img> tags with quoted src attributes
// Captures: 1=prefix, 2=opening quote, 3=src path, 4=closing quote
var imgSrcRegex = regexp.MustCompile(`(<img[^>]*\ssrc=)(["'])([^"']+)(["'])`)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

in-line html and relative links are not ideal

2 participants