Convert relative image links to base64 data URIs (markdown and HTML)#47
Conversation
Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
|
@copilot Use this markdown file to test https://github.com/wimpysworld/deb-get/blob/main/README.md |
Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
Tested with the deb-get README.md. Added support for HTML inline images ( Commit: 5d19cae |
Co-authored-by: mapitman <4205286+mapitman@users.noreply.github.com>
mapitman
left a comment
There was a problem hiding this comment.
Looks good. Tested locally and was able to render images.
There was a problem hiding this comment.
Pull request overview
This PR adds automatic conversion of relative image paths to base64-encoded data URIs in both markdown image syntax and HTML inline images. This solves the issue where relative image paths break when HTML is rendered in a temporary directory, which is common with GitHub repository markdown files.
Key Changes
- Token processing system that walks parsed markdown tokens to find and convert relative image paths to base64 data URIs
- HTML image processing using regex to handle
<img src="...">tags with relative paths - Security measures including 10MB file size limit and path traversal validation with up to 3 parent directory levels
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| return true | ||
| } | ||
|
|
||
| // imageToDataURI reads an image file and converts it to a base64 data URI |
There was a problem hiding this comment.
The function comment should specify what happens when image conversion fails. Consider adding that the function returns an empty string on failure, which causes the original path to be preserved (graceful degradation).
| // imageToDataURI reads an image file and converts it to a base64 data URI | |
| // imageToDataURI reads an image file and converts it to a base64 data URI. | |
| // On any failure it logs a warning and returns an empty string so callers can | |
| // gracefully fall back to using the original image path. |
| return fmt.Sprintf("data:%s;base64,%s", mimeType, encoded) | ||
| } | ||
|
|
||
| // getMimeType returns the MIME type based on file extension |
There was a problem hiding this comment.
The function comment should document the return value more clearly. Consider adding that it returns the appropriate MIME type string for the image format, or "image/*" for unknown formats.
| // getMimeType returns the MIME type based on file extension | |
| // getMimeType returns the appropriate image MIME type string based on the file | |
| // extension, or "image/*" for unknown or unsupported image formats. |
| // Clean and validate the path to prevent path traversal attacks | ||
| cleanedPath, err := filepath.Abs(fullPath) | ||
| if err != nil { | ||
| log.Printf("Warning: Invalid image path %s: %v", fullPath, err) |
There was a problem hiding this comment.
The error message could be more helpful by suggesting potential causes. Consider rephrasing to something like "Warning: Invalid image path %s (failed to resolve absolute path): %v" to clarify what operation failed.
| log.Printf("Warning: Invalid image path %s: %v", fullPath, err) | |
| log.Printf("Warning: Invalid image path %s (failed to resolve absolute path): %v", fullPath, err) |
| // Ensure the resolved path is within or relative to the base directory | ||
| cleanedBase, err := filepath.Abs(baseDir) | ||
| if err != nil { | ||
| log.Printf("Warning: Invalid base directory %s: %v", baseDir, err) |
There was a problem hiding this comment.
The error message could be more helpful by suggesting potential causes. Consider rephrasing to something like "Warning: Invalid base directory %s (failed to resolve absolute path): %v" to clarify what operation failed.
| log.Printf("Warning: Invalid base directory %s: %v", baseDir, err) | |
| log.Printf("Warning: Invalid base directory %s (failed to resolve absolute path): %v", baseDir, err) |
| // For unknown extensions, log a warning but try with generic image type | ||
| log.Printf("Warning: Unknown image extension %s for file %s, using image/* MIME type", ext, path) | ||
| return "image/*" |
There was a problem hiding this comment.
Using "image/*" as a fallback MIME type for unknown extensions is not standard and may not work correctly in all browsers. The asterisk is typically used in HTTP Accept headers, not in data URIs. Consider using a more specific default like "application/octet-stream" or "image/png", or simply skip conversion for unknown formats by returning an empty string.
| // For unknown extensions, log a warning but try with generic image type | |
| log.Printf("Warning: Unknown image extension %s for file %s, using image/* MIME type", ext, path) | |
| return "image/*" | |
| // For unknown extensions, log a warning but fall back to a generic binary MIME type | |
| log.Printf("Warning: Unknown image extension %s for file %s, using application/octet-stream MIME type", ext, path) | |
| return "application/octet-stream" |
| // imgSrcRegex matches <img> tags with src attributes | ||
| // Captures: 1=prefix, 2=opening quote, 3=src path, 4=closing quote | ||
| var imgSrcRegex = regexp.MustCompile(`(<img[^>]*\ssrc=)(["']?)([^"'\s>]+)(["']?)`) |
There was a problem hiding this comment.
The regex pattern for src paths uses [^"'\s>]+ which excludes spaces. This means unquoted src attributes containing spaces (e.g., <img src=my image.png>) would only capture the first part before the space ("my"), potentially causing incorrect path processing. While unquoted attributes with spaces are invalid HTML, consider adding validation to ensure src attributes are properly quoted, or update the regex to only match properly quoted or space-free unquoted values.
| // imgSrcRegex matches <img> tags with src attributes | |
| // Captures: 1=prefix, 2=opening quote, 3=src path, 4=closing quote | |
| var imgSrcRegex = regexp.MustCompile(`(<img[^>]*\ssrc=)(["']?)([^"'\s>]+)(["']?)`) | |
| // imgSrcRegex matches <img> tags with quoted src attributes | |
| // Captures: 1=prefix, 2=opening quote, 3=src path, 4=closing quote | |
| var imgSrcRegex = regexp.MustCompile(`(<img[^>]*\ssrc=)(["'])([^"']+)(["'])`) |

Relative image paths in markdown (e.g.,
./images/logo.png,../assets/icon.png) break when the HTML is rendered in a temp directory. This is common with GitHub repository markdown files that use both markdown image syntax and HTML inline images.Changes
Imagetokens with relative pathsHTMLInlineandHTMLBlocktokens to handle<img src="...">tags with relative pathsSecurity
Example
Markdown syntax:
HTML inline images:
Both markdown and HTML relative paths are embedded as
data:image/png;base64,iVBORw0KG...while external URLs remain unchanged.Testing
Tested with real-world examples including the deb-get README.md which uses extensive HTML inline images.
Markdown images:
HTML inline images (deb-get README):
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.