Skip to content

fix: make repair tests portable on Windows#376

Open
brokemac79 wants to merge 3 commits into
openclaw:mainfrom
brokemac79:codex/windows-test-portability
Open

fix: make repair tests portable on Windows#376
brokemac79 wants to merge 3 commits into
openclaw:mainfrom
brokemac79:codex/windows-test-portability

Conversation

@brokemac79

@brokemac79 brokemac79 commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

What Problem This Solves

ClawSweeper's repair validation path and several test fixtures assumed POSIX command lookup, LF-only source reads, executable fake gh shims, and /-only path containment. That made Windows local validation fail even when the runtime behavior being tested was otherwise portable.

Why This Change Was Made

This updates repair command execution and GitHub CLI helpers to support explicit tool shims on Windows, makes .cmd/.bat validation commands run through cmd.exe with double-escaped batch arguments, fixes state-root publish containment with platform-native path checks, and normalizes source-fixture reads where assertions inspect tracked files. The test fixtures now use explicit Node-backed fake tools instead of depending on extensionless executables being discoverable through PATH.

The follow-up quoting hardening replaces regex-based Windows argument quoting with an explicit scanner that doubles trailing backslashes and backslashes before embedded quotes before applying the existing cmd.exe metacharacter escaping.

User Impact

Maintainers can run the ClawSweeper repair and unit suites from Windows without patching local tool shims or line endings. The state publish path also no longer rejects valid state-root paths on Windows, and Windows batch validation launchers preserve spaces, metacharacters, embedded quotes, and trailing backslashes in arguments.

Evidence

Head proofed: 3f76fb935334e82131ce043c360bcca9903895f9.

Exact-head note: the top commit is the session-only empty ClawSweeper re-review trigger. git diff --exit-code HEAD~1 HEAD --stat produced no tree diff, so the proofed code tree is the PR head tree. The code fix commit is c9b398bd2071b958bf8d8d2f48006f2c2d5bdb0c.

  • pnpm install --frozen-lockfile passed earlier on this branch; dependencies did not change in the follow-up.
  • pnpm exec oxfmt --write src/repair/command-runner.ts test/repair/command-runner.test.ts passed.
  • node --test test\repair\command-runner.test.ts passed on Windows after the quoting hardening: 4 tests, 4 pass.
  • A real .cmd launcher probe round-tripped arguments containing embedded quotes, trailing backslashes, double trailing backslashes, spaces plus trailing backslashes, ^, %PATH%, and ] with equal: true.
  • pnpm run check passed after the quoting hardening, including check:active-surface, check:limits, build:all, lint, test:unit, test:repair, changed/full coverage, and format:check. Final summary: 1118 tests, 1117 pass, 1 skipped; coverage 100% lines, branches, and functions for the covered report.
  • codex review -c service_tier='fast' --uncommitted passed with no actionable correctness issues. The review also reran pnpm run build:repair, node --test test/repair/command-runner.test.ts, and a 116-case real .cmd argument probe with 0 mismatches.

Redacted Windows terminal excerpt for the focused batch launcher path:

> node --test test\repair\command-runner.test.ts
? runCommand handles validation output larger than Node's sync spawn default (69.8391ms)
? runCommand reports command timeouts with the rendered command (26.8365ms)
? runCommand double-escapes Windows batch launcher arguments (2.9253ms)
? runCommand preserves Windows batch launcher arguments at runtime (99.3071ms)
? tests 4
? suites 0
? pass 4
? fail 0
? cancelled 0
? skipped 0
? todo 0
? duration_ms 341.6374

Additional .cmd edge-case probe after the fix:

{
  "equal": true,
  "args": [
    "quote\"x",
    "trailing\\",
    "double\\\\",
    "space trailing\\",
    "a b\\",
    "a b\\\"",
    "caret^x",
    "percent%PATH%",
    "brack]"
  ]
}

Full check closeout:

pnpm run check
...
? tests 1118
? pass 1117
? fail 0
? skipped 1
? coverage: 100.00% lines, 100.00% branches, 100.00% functions
All matched files use the correct format.
PNPM_CHECK_EXIT=0

@clawsweeper

clawsweeper Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Codex review: needs maintainer review before merge. Reviewed June 27, 2026, 5:21 PM ET / 21:21 UTC.

Summary
The PR updates repair command execution, GitHub CLI shim routing, state publish path containment, and Windows-portable repair/unit test fixtures.

Reproducibility: yes. from source inspection and contributor Windows proof, but I did not run current main on Windows in this Linux review environment. Current main lacks the Windows command-resolution and batch-launch path, while the PR body shows the repaired .cmd path passing.

Review metrics: 3 noteworthy metrics.

  • Changed surface: 20 files, +508/-194. The PR is broader than fixture-only cleanup and changes runtime helpers plus tests.
  • Repair runtime helpers: 3 src/repair files changed. Command execution, GitHub CLI invocation, and state publishing are automation-sensitive paths.
  • Exact-head checks: pnpm check and Windows Codex launcher succeeded. The previous pending-check risk is resolved for the current PR head.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Maintainers should explicitly accept the cmd.exe validation-launcher risk before merge.

Risk before merge

  • [P1] The PR changes repair command execution, GitHub CLI invocation, and state publish path handling, so a regression could affect ClawSweeper automation rather than only Windows tests.
  • [P1] The .cmd/.bat path intentionally sends validation commands through cmd.exe with custom escaping; contributor proof covers important edge cases, but maintainers still need to accept that shell-boundary behavior before merge.

Maintainer options:

  1. Land with shell-boundary acceptance (recommended)
    The exact-head checks and Windows terminal proof are strong enough to merge if maintainers accept the cmd.exe batch-launcher strategy.
  2. Add more Windows edge coverage first
    Maintainers can ask for one more focused test pass around command paths with spaces and PATH/PATHEXT resolution if they want extra assurance before landing.
  3. Pause for command-launch strategy
    If cmd.exe wrapping is not the desired long-term validation path, keep the PR open until a maintainer chooses the permanent Windows command strategy.

Next step before merge

  • No automated repair is needed; the next action is maintainer merge review and acceptance of the automation-sensitive shell-launch behavior.

Security
Cleared: No concrete security or supply-chain defect was found; the cmd.exe launcher remains a maintainer-visible shell-boundary merge risk rather than a line-level vulnerability.

Review details

Best possible solution:

Merge after maintainer review accepts the cmd.exe validation-launcher boundary and the automation-sensitive helper changes.

Do we have a high-confidence way to reproduce the issue?

Yes from source inspection and contributor Windows proof, but I did not run current main on Windows in this Linux review environment. Current main lacks the Windows command-resolution and batch-launch path, while the PR body shows the repaired .cmd path passing.

Is this the best way to solve the issue?

Yes, with maintainer review: the patch follows the existing adjacent Windows launcher pattern, adds focused coverage, and keeps the fix scoped to repair portability. The remaining question is maintainer acceptance of the cmd.exe shell boundary, not a concrete code defect.

AGENTS.md: found and applied where relevant.

Codex review notes: model internal, reasoning high; reviewed against ae63b16d6c74.

Label changes

Label justifications:

  • P2: This is a normal-priority Windows repair automation portability fix with limited blast radius and no blocking correctness finding.
  • merge-risk: 🚨 automation: The diff changes repair command execution, GitHub CLI shim routing, and state publish handling used by repair automation.
  • merge-risk: 🚨 security-boundary: The diff introduces a cmd.exe batch-launch path for validation commands where quoting mistakes could turn arguments into shell syntax.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body includes redacted Windows terminal output for the focused command-runner test and an after-fix .cmd argument probe with equal true.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes redacted Windows terminal output for the focused command-runner test and an after-fix .cmd argument probe with equal true.
Evidence reviewed

What I checked:

  • Repository policy read: AGENTS.md was read fully; its conservative repair-lane and automation-safety guidance applies because this PR touches src/repair runtime helpers and repair tests. (AGENTS.md:1, ae63b16d6c74)
  • Current main lacks Windows command invocation handling: Current main's runCommand calls spawnSync directly, with no Windows PATHEXT resolution or .cmd/.bat wrapper path. (src/repair/command-runner.ts:18, ae63b16d6c74)
  • PR adds Windows batch launcher handling: The PR resolves Windows commands, wraps .cmd/.bat launchers through cmd.exe, uses windowsVerbatimArguments, and quotes/double-escapes validation arguments. (src/repair/command-runner.ts:49, 3f76fb935334)
  • PR adds focused Windows runtime coverage: The command-runner test includes a Windows-only .cmd launcher runtime check for spaces, shell metacharacters, quotes, and trailing backslashes. (test/repair/command-runner.test.ts:64, 3f76fb935334)
  • PR extends repair GH shim routing: The repair GitHub CLI helper now honors GH_BIN and GH_BIN_ARGS for sync, async, and spawn helper paths. (src/repair/github-cli.ts:143, 3f76fb935334)
  • PR fixes native state-root containment: The state publish path check switches from POSIX string-prefix containment to path.relative/isAbsolute containment and normalizes preserved paths to POSIX separators. (src/repair/git-publish.ts:209, 3f76fb935334)

Likely related people:

  • Dallin Romney: git blame shows the current repair command runner, state publish helper, and adjacent Windows launcher pattern largely date to the repository import commit. (role: introduced repair helper surface; confidence: high; commits: 4e5c4d47c83f; files: src/repair/command-runner.ts, src/repair/git-publish.ts, src/codex-spawn.ts)
  • Vincent Koc: Commit 1566ad6 recently changed retry behavior in src/repair/github-cli.ts adjacent to this PR's GH shim routing changes. (role: recent GitHub CLI helper contributor; confidence: medium; commits: 1566ad6a0a5b; files: src/repair/github-cli.ts)
  • brokemac79: Prior merged local-review work touched Windows Codex launcher handling and local command execution, making the author relevant beyond this proposal. (role: adjacent Windows/local review contributor; confidence: medium; commits: 79e768844291; files: src/codex-spawn.ts, src/command.ts, test/command.test.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. labels Jun 27, 2026
@brokemac79 brokemac79 force-pushed the codex/windows-test-portability branch from a183094 to ca6ff16 Compare June 27, 2026 18:58
@brokemac79

Copy link
Copy Markdown
Contributor Author

@clawsweeper re-review

@clawsweeper

clawsweeper Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🧂 unranked krab Not merge-ready due to missing proof or serious correctness/safety concerns. status: 📣 needs proof The PR needs real behavior proof before ClawSweeper can clear the contributor ask. labels Jun 27, 2026
@brokemac79 brokemac79 force-pushed the codex/windows-test-portability branch from ca2f607 to 3f76fb9 Compare June 27, 2026 19:38
@brokemac79

Copy link
Copy Markdown
Contributor Author

@clawsweeper review

@clawsweeper

clawsweeper Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

🦞🧹
ClawSweeper re-review requested.

I asked ClawSweeper to review this item again.
Action: item re-review queued (workflow sweep.yml, event repository_dispatch).
Result: the existing ClawSweeper review comment will be edited in place when the review finishes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 automation 🚨 Merging this PR could break CI, automerge, proof capture, label sync, or automation. merge-risk: 🚨 security-boundary 🚨 Merging this PR could weaken sandboxing, authorization, credentials, or sensitive data. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant