Skip to content

Harden corpus-diagnostics harness: severity-ranked, defect-faithful worklist#337

Merged
chubes4 merged 1 commit into
trunkfrom
feat/harness-detector-hardening
Jun 29, 2026
Merged

Harden corpus-diagnostics harness: severity-ranked, defect-faithful worklist#337
chubes4 merged 1 commit into
trunkfrom
feat/harness-detector-hardening

Conversation

@chubes4

@chubes4 chubes4 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Hardens the php-transformer corpus-diagnostics harness (added in #327) so its numbers are trustworthy. This closes four known blind spots where the harness either under-reported real defects or inflated the worklist with working behavior. Reporting/detector-only — no transformer conversion logic is touched. All changes stay within php-transformer/src/CorpusDiagnostics/ and its tests.

Clusters now rank by severity tier first, then occurrence count, so the actionable worklist leads with real, editor-visible defects.

Blind spots closed

1. Validity headline no longer lies (RichText invalidity is now the headline)

The structural wp_block_validity proxy reports invalid_blocks=0 even when the editor would flag content invalid, because it does not model RichText stripping class/style off inline <span>/<a> in content. The classed-span detector is promoted to the authoritative editor-invalid-risk signal (richtext_invalid_content_risk, HIGH severity), extended to also cover <a> and core/list-item content, surfaced via a new richtext_invalid_risk_count metric, and the summary stops presenting structural invalid_blocks=0 as "no invalid content."

2. Layout-direction faithfulness

New layout_direction_misrecognition :: columns_from_vertical_flex detector: a core/columns emitted from a display:flex; flex-direction:column source (a vertical stack rendered as horizontal columns) is a misrecognition. Conservative — only inline column-direction flex on container elements with 2+ children, confirmed by a verifier that the fragment actually converts to core/columns. Genuine horizontal flex / grid is never flagged.

3. SVG-loss surfaced as HIGH severity

New svg_content_lost lane routes the transformer's inline-SVG fallback diagnostics and empty/comment-only core/html blocks that carry an SVG remnant into one signal, while keeping the distinction from SVG preserved as core/html with real shape elements (acceptable, not flagged). Previously this hid at rank ~61 under generic asset findings.

4. CSS var() false-positive down-ranked to informational

var() references are materialized downstream by SSI (verified end-to-end), so resolved var density is not a repair gap. Relabeled informational_var_density (severity info) and ranked below all actionable clusters, instead of flooding the top of the worklist with 233 css clusters.

Before / after ranking

Full corpus: 368 documents / 77 fixtures, 54,123 blocks.

Before (count-only ranking):

  1. preserve_runtime_island :: runtime_script — 1224
  2. native_block_recognition :: <svg> — 696
  3. richtext_inline_span_normalization :: core/paragraph — 308
  4. preserve_runtime_island :: interactive_form — 206
  5. semantic_structure_parity_restoration :: navigation_menu — 95
    6–30. all 25 remaining slots are css_custom_property_materialization :: --* (working behavior)
  • materialize_static_asset :: inline_svg (the SVG-loss signal): rank 61
  • headline: invalid_blocks=0 (the lie)

After (severity-first ranking) — new actionable top-15:

# sev count files cluster
1 HIGH 1280 350 richtext_invalid_content_risk :: core/paragraph
2 HIGH 311 38 richtext_invalid_content_risk :: core/list-item
3 HIGH 36 13 richtext_invalid_content_risk :: core/heading
4 HIGH 21 17 layout_direction_misrecognition :: columns_from_vertical_flex
5 HIGH 16 15 svg_content_lost :: inline_svg_dropped
6 MEDIUM 1224 347 preserve_runtime_island :: runtime_script
7 MEDIUM 696 162 native_block_recognition :: <svg> (svg preserved — acceptable)
8 MEDIUM 206 95 preserve_runtime_island :: interactive_form
9 MEDIUM 95 82 semantic_structure_parity_restoration :: navigation_menu
10 MEDIUM 68 25 restore_interactive_behavior :: interactive_control
11 MEDIUM 46 46 semantic_structure_parity_restoration :: semantic_landmark
12 MEDIUM 38 33 typography_parity_restoration :: typography
13 MEDIUM 15 3 preserve_runtime_island :: html_template
14 MEDIUM 15 3 preserve_runtime_island :: runtime_template
15 MEDIUM 14 7 materialize_commerce_products :: commerce_product_grid

The 233 informational_var_density clusters now begin at rank 18 (severity info), out of the actionable worklist. New headline surfaces richtext_invalid_risk=1627, svg_content_lost=16, columns_from_vertical_flex=21, and labels var density as informational.

Tests

  • tests/unit/corpus-detectors.php extended for all four cases (classed <span>/<a> in paragraph/list-item = richtext invalid risk; vertical-flex→columns flags layout misrecognition while horizontal flex does not; <svg>→empty/comment core/html flags svg_content_lost while a shape-bearing svg core/html does not; var density is informational, not top-ranked). 23 assertions pass.
  • composer test green (canonical + 171 parity fixtures + packaging); php -l clean.

AI assistance

  • AI assistance: Yes
  • Tool(s): Claude Opus 4.8 via Claude Code
  • Used for: Detector/reporting implementation, tests, corpus run analysis, and this PR description. All changes reviewed by the submitter.

…orklist

Close four blind spots in the php-transformer corpus-diagnostics harness so its
numbers reflect real, editor-visible defects instead of structural proxies and
working behavior. Reporting/detector-only — no transformer conversion logic is
touched.

1. RichText invalidity is now the headline signal. The structural
   wp_block_validity round-trip reports invalid_blocks=0 even when the editor
   would mark content invalid, because it does not model RichText stripping
   class/style off inline <span>/<a> in paragraph/heading/list-item content. The
   classed-span detector is promoted to the authoritative editor-invalid-risk
   signal (richtext_invalid_content_risk, HIGH), extended to cover <a> and
   list-item content, surfaced via a richtext_invalid_risk_count metric, and the
   summary no longer presents structural invalid_blocks=0 as "no invalid
   content".

2. Layout-direction faithfulness. New layout_direction_misrecognition detector
   flags a core/columns emitted from a display:flex;flex-direction:column source
   (a vertical stack rendered as horizontal columns). Conservative: only inline
   column-direction flex on container elements with 2+ children, confirmed by a
   verifier that the fragment actually converts to core/columns. Horizontal flex
   and grid are never flagged.

3. SVG-loss is now HIGH-severity and surfaced. svg_content_lost routes inline-svg
   fallback diagnostics plus empty/comment-only core/html that bears an SVG
   remnant into one lane, while preserving the distinction from svg kept as
   core/html with real shape elements (acceptable, not flagged).

4. CSS var() density is informational. var() references are materialized
   downstream by SSI, so resolved var density is relabeled
   informational_var_density (severity=info) and down-ranked below all actionable
   clusters instead of inflating the worklist with 233 css clusters.

Clusters now rank by severity tier first, then count, so real defects lead the
worklist. Adds/extends tests/unit/corpus-detectors.php for all four cases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chubes4 chubes4 merged commit 5c72bf1 into trunk Jun 29, 2026
1 check passed
@chubes4 chubes4 deleted the feat/harness-detector-hardening branch June 29, 2026 01:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant