Skip to content

php-transformer: detect div-based pseudo-forms as materializable forms#339

Merged
chubes4 merged 1 commit into
trunkfrom
fix/div-based-form-detection
Jun 29, 2026
Merged

php-transformer: detect div-based pseudo-forms as materializable forms#339
chubes4 merged 1 commit into
trunkfrom
fix/div-based-form-detection

Conversation

@chubes4

@chubes4 chubes4 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Extends the issue #315 form-detection path in HtmlTransformer to div-based pseudo-forms — signup/contact widgets that pair input controls with a submit button inside a plain <div> and never wrap them in a <form>.

The bug

A newsletter signup built without a <form>:

<div id="newsletterForm"><input type="email" ><button>Subscribe</button></div>

The merged form detection (#315, formRequiresRuntimePreservation / formHasDataEntryControls, emitting html_form_fallback for SSI) keys off the <form> element, so a <div>-based form is missed entirely. It imports as a plain paragraph (Email address: you@example.com (required)) plus a dead Subscribe button — the form is lost.

The detection signal (generic, structural)

A div-based pseudo-form is the tightest non-<form> container that:

  • holds at least one data-entry control — a non-search <input>, <select>, or <textarea> (reuses Surface data-entry forms as materializable form findings #315's isDataEntryControl; search inputs are excluded because they have dedicated standalone-search handling), AND
  • holds a submit-like control — a button/input whose type is submit/image, or whose text/value/class/id/name/aria carries submit / subscribe / sign up / signup / send semantics (a plain <button> defaults to type="submit" and qualifies directly), AND
  • is not owned by a real <form> (no <form> ancestor or descendant).

The signal is purely structural — container + data-entry control + submit-like control. No fixture ids/classes/names (newsletterForm etc.) are referenced.

How it reuses #315

Conservative bounds (no false positives, no double emit)

  • A lone search box, a stray input with no submit control, and ordinary text/link containers never qualify.
  • Tightest-container scoping: if a descendant container also pairs the controls, the wrapper defers to it, so nested wrappers emit once and sibling pseudo-forms each emit their own finding.
  • A real <form> (as ancestor or descendant) always owns the subtree, so the finding fires exactly once and is never double-counted.

Before / after

Input Before After
<div><input type="email"><button>Subscribe</button></div> paragraph + dead button, no finding readable content + one html_form_fallback finding (tag div, 2 controls) → materializes as a working form
<form>…</form> one html_form_fallback one html_form_fallback (unchanged)
<div> with only text/links no finding no finding (unchanged)
<div> search box (input[type=search] + Search button) search block, no finding search block, no finding (unchanged)

Tests

  • Baseline composer test:canonical + composer parity captured green first (173 fixtures).
  • Added 4 parity fixtures: div pseudo-form emits html_form_fallback; div text/links → not a form; div search box → not a form; real <form> (in a wrapper) still emits exactly once.
  • Full composer test green (177 parity fixtures, all contract/unit/packaging suites pass); php -l clean.

AI assistance

  • AI assistance: Yes
  • Tool(s): Claude Opus 4.8 via Claude Code
  • Used for: Studying the Surface data-entry forms as materializable form findings #315 detection path, implementing the div-based pseudo-form detection + shared finding helper, and authoring parity fixtures. All changes reviewed against the existing form-detection contract.

Extend the issue #315 form-detection path to non-<form> containers. Some
signup/contact widgets pair data-entry controls with a submit-like control
inside a plain <div> and never wrap them in a <form>; the existing detection
keys off the <form> element, so these flatten into a paragraph plus a dead
button and the form is lost.

Recognize a div-based pseudo-form structurally: the tightest non-<form>
container that pairs at least one data-entry control (a non-search input,
<select>, or <textarea>) with a submit-like control (a button/input whose
type or text/class/id/name/aria carries submit/subscribe/sign-up/send
semantics), where no real <form> owns the subtree. Such a container emits the
SAME html_form_fallback finding a real <form> produces (shared via a new
formFallbackFinding helper), so the downstream materializer handles it
identically. Reuses the #315 control-detection helpers (formControlElements,
isDataEntryControl) rather than duplicating them.

Conservative bounds: a lone search box, a stray input with no submit control,
and ordinary text/link containers never qualify; tightest-container scoping
keeps wrappers from swallowing nested pseudo-forms and avoids double emission
when a real <form> is present.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chubes4 chubes4 merged commit f7d7459 into trunk Jun 29, 2026
1 check passed
@chubes4 chubes4 deleted the fix/div-based-form-detection branch June 29, 2026 01:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant