Skip to content

Credit collapsed-wrapper equivalence in static-parity coverage (honest score)#347

Merged
chubes4 merged 1 commit into
trunkfrom
cook/parity-coverage-matching-refinement
Jun 29, 2026
Merged

Credit collapsed-wrapper equivalence in static-parity coverage (honest score)#347
chubes4 merged 1 commit into
trunkfrom
cook/parity-coverage-matching-refinement

Conversation

@chubes4

@chubes4 chubes4 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What

Makes the deterministic static-parity score honest: stop counting collapsed presentational wrapper <div>s as coverage loss. The transformer intentionally merges wrappers, so a source wrapper has no 1:1 candidate and was counted as a drop — deflating coverage with alignment noise rather than real divergence.

Fix (src/VisualParity/StaticStyleParityComparator.php)

  • New non-consuming collapsed-wrapper equivalence pass: a source element with no 1:1 candidate earns coverage credit only if some candidate is a style superset (reproduces every declared, non-empty tracked value) AND content-subsumes it (candidate text contains source text, or source has none). Content subsumption is directional so a short candidate (e.g. a one-letter icon) can't spuriously absorb a text-bearing wrapper.
  • coverage = (matched + absorbed) / source_total. Report gains absorbed_source / covered_total for audit.
  • Property comparison untouched → no property regression is masked.

Verification

  • Determinism: gate --json 2× on both fixtures → byte-identical; full 15-saas report 2× → byte-identical (353,291 bytes).
  • Honest rise (property_parity unchanged): 15-saas score 0.733→0.811 (coverage 0.752→0.833, property_parity 0.974 flat); 38-medical 0.620→0.775 (coverage 0.639→0.799, property_parity 0.971 flat).
  • Real, not masked: absorbed elements spot-checked as genuinely preserved (div.colp.lead, nav <li>s into their containing sections, decorative empty dots); real divergences stayed counted (nav-link typography change, dropped logo, display:none menus). The mismatch fixture still fails (0.5455).
  • composer test + composer parity: 183 fixtures green (+1 collapsed-wrapper fixture; strengthened match/mismatch to assert absorbed_source_total: 0).

Honest limit

Left the structural-tier 1:1 matching artifact (~40 false display:flex→'' deltas, affects property_parity which is already 0.97) unchanged — a defensible scope cut; coverage was the dominant clean lever.

AI assistance

  • AI assistance: Yes
  • Tool(s): Claude Code (Claude Opus 4.8, 1M context)
  • Used for: Root-cause diagnosis, the equivalence pass, determinism + not-masked verification under human review.

The deterministic static-style parity comparator scored coverage as
matched/source, where source elements include the presentational wrapper
divs the transformer intentionally collapses. A collapsed wrapper owns no
1:1 candidate, so it fell to misaligned structural matches or counted as
an outright drop — deflating coverage (15-saas 0.752, 38-medical 0.639)
with false "no candidate" loss even when the wrapper's styling and content
were faithfully preserved on the merged element.

Add a non-consuming collapsed-wrapper equivalence pass. When a source
element finds no 1:1 candidate, it earns coverage credit only if some
candidate is a style superset (every declared, non-empty tracked-style
value reproduced) AND subsumes its content (candidate text contains the
source text, or the source has no text). This credits faithfully-absorbed
wrappers while keeping genuine divergence as loss: a dropped or restyled
element whose style is absent from every candidate, or whose content has
no home, finds no absorbing candidate and stays counted, so the score
still falls for real regressions. Property comparison is untouched, so no
property regression is masked (property_parity unchanged: 0.9744 / 0.9707).

Coverage = (matched + absorbed) / source. Content subsumption is
directional so a short candidate (e.g. a one-letter icon) cannot
spuriously absorb a text-bearing wrapper.

Results — 15-saas 0.7328 -> 0.8113 (cov 0.7521 -> 0.8326), 38-medical
0.6201 -> 0.7752 (cov 0.6389 -> 0.7986); the compare-mismatch fixture
still fails (coverage already 1.0, dropped hero bg / button radius still
surface). Deterministic: same inputs -> byte-identical report.

Report adds absorbed_source plus absorbed_source_total / covered_total.
New fixture locks collapsed-wrapper equivalence (preserved wrapper is
credited, dropped styled element still counts); match/mismatch fixtures
assert absorption never fires spuriously.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@chubes4 chubes4 merged commit d9b69df into trunk Jun 29, 2026
1 check passed
@chubes4 chubes4 deleted the cook/parity-coverage-matching-refinement branch June 29, 2026 13:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant