php-transformer: detect div-based pseudo-forms as materializable forms#339
Merged
Conversation
Extend the issue #315 form-detection path to non-<form> containers. Some signup/contact widgets pair data-entry controls with a submit-like control inside a plain <div> and never wrap them in a <form>; the existing detection keys off the <form> element, so these flatten into a paragraph plus a dead button and the form is lost. Recognize a div-based pseudo-form structurally: the tightest non-<form> container that pairs at least one data-entry control (a non-search input, <select>, or <textarea>) with a submit-like control (a button/input whose type or text/class/id/name/aria carries submit/subscribe/sign-up/send semantics), where no real <form> owns the subtree. Such a container emits the SAME html_form_fallback finding a real <form> produces (shared via a new formFallbackFinding helper), so the downstream materializer handles it identically. Reuses the #315 control-detection helpers (formControlElements, isDataEntryControl) rather than duplicating them. Conservative bounds: a lone search box, a stray input with no submit control, and ordinary text/link containers never qualify; tightest-container scoping keeps wrappers from swallowing nested pseudo-forms and avoids double emission when a real <form> is present. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Extends the issue #315 form-detection path in
HtmlTransformerto div-based pseudo-forms — signup/contact widgets that pair input controls with a submit button inside a plain<div>and never wrap them in a<form>.The bug
A newsletter signup built without a
<form>:The merged form detection (#315,
formRequiresRuntimePreservation/formHasDataEntryControls, emittinghtml_form_fallbackfor SSI) keys off the<form>element, so a<div>-based form is missed entirely. It imports as a plain paragraph (Email address: you@example.com (required)) plus a dead Subscribe button — the form is lost.The detection signal (generic, structural)
A div-based pseudo-form is the tightest non-
<form>container that:<input>,<select>, or<textarea>(reuses Surface data-entry forms as materializable form findings #315'sisDataEntryControl; search inputs are excluded because they have dedicated standalone-search handling), ANDsubmit/image, or whose text/value/class/id/name/aria carriessubmit/subscribe/sign up/signup/sendsemantics (a plain<button>defaults totype="submit"and qualifies directly), AND<form>(no<form>ancestor or descendant).The signal is purely structural — container + data-entry control + submit-like control. No fixture ids/classes/names (
newsletterFormetc.) are referenced.How it reuses #315
html_form_fallbackfinding is now built by a sharedformFallbackFinding()helper used by both the real<form>path and the new pseudo-form path, so the emitted finding (controls, form metadata, classification, bounded HTML) is byte-for-byte the same shape Surface data-entry forms as materializable form findings #315 produces. SSI's existinginteractive_formmaterializer handles a<div>pseudo-form identically to a real form.formControlElementsandisDataEntryControl— no duplication.Conservative bounds (no false positives, no double emit)
<form>(as ancestor or descendant) always owns the subtree, so the finding fires exactly once and is never double-counted.Before / after
<div><input type="email"><button>Subscribe</button></div>html_form_fallbackfinding (tagdiv, 2 controls) → materializes as a working form<form>…</form>html_form_fallbackhtml_form_fallback(unchanged)<div>with only text/links<div>search box (input[type=search]+ Search button)Tests
composer test:canonical+composer paritycaptured green first (173 fixtures).html_form_fallback; div text/links → not a form; div search box → not a form; real<form>(in a wrapper) still emits exactly once.composer testgreen (177 parity fixtures, all contract/unit/packaging suites pass);php -lclean.AI assistance