Skip to content

Fix php-transformer dropping inline SVG icons/diagrams as decorative#331

Merged
chubes4 merged 1 commit into
trunkfrom
fix/svg-allowlist-complete
Jun 29, 2026
Merged

Fix php-transformer dropping inline SVG icons/diagrams as decorative#331
chubes4 merged 1 commit into
trunkfrom
fix/svg-allowlist-complete

Conversation

@chubes4

@chubes4 chubes4 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Inline SVG artwork (service icons, hero pipe-diagrams, boiler illustrations) was still being stripped to empty during HTML-to-blocks conversion despite #311 — service icons surfaced as empty core/html blocks and diagrams as core/html with only whitespace + HTML comments, every drawing element gone.

Root cause (the hypothesis was inverted)

The investigation hypothesis was that #311's safe-SVG element allowlist was too narrow and stripped shapes like line/polyline/ellipse/defs/linearGradient/stop. That is not what happens. The sanitizer (sanitizeInlineSvgMarkupsafeFallbackHtml) preserves all shape markup verbatim and strips only script/style/foreignObject/event-handlers/javascript:. Reproductions confirmed diagrams using the full geometry set survive the sanitizer untouched.

The real data loss is in HtmlTransformer::convertElement()'s SVG branch. An SVG classified safe-decorative (isSafeDecorativeSvgElement = passive-shape allowlist + aria-hidden/role=presentation) was:

  • dropped entirely (return null) when it was neither icon-like nor a visual layer — e.g. a <svg aria-hidden="true"> inside <div class="svc">, the empty-content service-icon case; or
  • converted to an empty core/group when isVisualLayerElement() matched a fuzzy class token like hero — the comments/whitespace-only diagram case.

aria-hidden="true" hides a graphic from the accessibility tree; it does not mean the artwork is visually disposable. Real icons and diagrams routinely set it, so they were silently erased. Expanding the allowlist as hypothesized would have made this worse (more SVGs classified passive-decorative → more dropped).

Fix (SVG sanitize/classification code only)

  • Preserve drawable artwork. A safe-decorative SVG that has real drawable content is now emitted faithfully as inline core/html instead of being dropped. The only SVGs that still collapse to a styleable group / are dropped are genuine decorative chrome the materialized CSS recreates: a positioned visual layer or a preserveAspectRatio="none" stretched band (distorts geometry → never used for meaningful icons/diagrams). This keeps the intentional decorative wave-layer/wave-divider behavior (html-hero-decorative-wave-layer, html-inline-svg-decorative) green.
  • Restore camelCase SVG element names. libxml lowercases tag names, so <linearGradient> was emitted as <lineargradient> — an unknown element the browser ignores, so the gradient fill disappeared even when "preserved". restoreSvgAttributeCasing only fixed attributes; it now also restores element names (linearGradient, radialGradient, clipPath, filter primitives, ...) and is renamed restoreSvgCasing.
  • External-sprite <use> verdict. <use href="sprite.svg#id"> references a symbol in another file that does not travel with the imported markup, so it cannot be inlined. Such SVGs are now reported via the bounded inline-SVG fallback rather than emitted as broken external markup. Local <use href="#id"> against an in-document <symbol>/<defs> is preserved.
  • Expanded the passive-SVG safe allowlist to the full set of safe structure/geometry/text tags + attributes (text, tspan, marker, pattern, symbol, use, font-*, text-anchor, marker-*, pattern*, ref{x,y}, ...).

Security stripping is unchanged<script>, on* handlers, <foreignObject>, javascript: URLs, and external url() references are still removed. Verified a decorative SVG carrying an inline <script> + onclick keeps the artwork while both unsafe parts are scrubbed.

Before / after (real diagram)

Input: <section class="system-diagram"><svg aria-hidden="true" viewBox="0 0 400 300">…defs/linearGradient/stop, line, polyline, polygon, ellipse, circle, rect, comments, script, onclick…</svg></section>

  • Before: SVG dropped → core/group with empty innerBlocks (no core/html; drawing gone).
  • After: faithful core/html containing every shape, with <linearGradient> correctly cased, <script>/onclick stripped.

Tests

  • New parity fixtures: html-inline-svg-diagram-preserved (aria-hidden diagram survives, unsafe parts scrubbed) and html-inline-svg-local-use-symbol (local use(#id)+symbol preserved; external-sprite use reported via fallback).
  • Full parity suite green: 170 fixtures. php -l clean. Existing SVG/security fixtures unchanged and green.
  • Pre-existing, unrelated baseline failures on trunk (corpus-detectors classed/styled-span assertions + html-paragraph-heading-classed-span-hoist) are in the span/paragraph-hoist path and untouched by this PR.

AI assistance

  • AI assistance: Yes
  • Tool(s): Claude Opus 4.8 via Claude Code
  • Used for: Root-cause investigation, code changes, and parity fixtures.

Inline SVG artwork (service icons, pipe/boiler diagrams) was silently
discarded during HTML-to-blocks conversion, surfacing as empty
`core/html` blocks and diagrams collapsed to whitespace + comments.

Root cause was NOT the safe-SVG element allowlist stripping shapes: the
sanitizer (`sanitizeInlineSvgMarkup`/`safeFallbackHtml`) already preserves
all shape markup and strips only script/style/foreignObject/handlers. The
data loss was in `convertElement`'s SVG branch: any SVG classified
"safe decorative" (passive shapes + `aria-hidden`/`role=presentation`)
that was neither icon-like nor a positioned visual layer was dropped
entirely (`return null`), and visual-layer matches became an empty group.
`aria-hidden` hides a graphic from assistive tech; it does not make the
artwork visually disposable, so real icons/diagrams marked `aria-hidden`
vanished.

Changes:
- Preserve any safe-decorative SVG that carries real drawable artwork as
  faithful inline `core/html`, instead of dropping it. The only SVGs that
  still collapse to a styleable group / are dropped are genuine decorative
  chrome the materialized CSS recreates: a positioned visual layer or a
  `preserveAspectRatio="none"` stretched band (which distorts geometry and
  is never used for meaningful icons/diagrams). Keeps the existing
  decorative wave-layer/divider behavior intact.
- Restore camelCase casing of SVG element names (linearGradient,
  radialGradient, clipPath, filter primitives, ...) the HTML parser
  lowercases. A lowercased `<lineargradient>` is an unknown element the
  browser ignores, so the gradient fill disappeared even when preserved.
- External-sprite `<use href="sprite.svg#id">` cannot be inlined (the
  symbol lives in another file), so an SVG whose only artwork is an
  external reference is reported through the bounded inline-SVG fallback
  rather than emitted as broken markup. Local `<use href="#id">` against
  an in-document `<symbol>`/`<defs>` is preserved.
- Expand the passive-SVG safe element/attribute allowlist to the full set
  of safe structure/geometry/text tags and attributes. Security stripping
  (script, on* handlers, foreignObject, javascript:/external url()) is
  unchanged.

Adds parity fixtures html-inline-svg-diagram-preserved (aria-hidden
diagram with defs/linearGradient/stop/line/polyline/polygon/ellipse/
circle/rect survives; script + onclick stripped) and
html-inline-svg-local-use-symbol (local use(#id)+symbol preserved;
external-sprite use reported via fallback). Full parity suite green
(170 fixtures); php -l clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chubes4 chubes4 merged commit 906e2fb into trunk Jun 29, 2026
1 check failed
@chubes4 chubes4 deleted the fix/svg-allowlist-complete branch June 29, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant