fix(ci): re-run only failed specs on Windows test retries (PER-9011) by AkashBrowserStack · Pull Request #2267 · percy/cli

AkashBrowserStack · 2026-06-05T10:26:26Z

Problem

The Windows Test @percy/core check takes ~241 min, vs ~19 min for the same check on Linux (Linux even runs with coverage). This slows every PR's required Windows checks. Tracked in PER-9011.

Root cause

The 241 min is not one slow run — it's the same @percy/core suite run 4 times by the retry loop in .github/workflows/windows.yml:

Step	Duration	True outcome
`Run tests`	~62 min	failed → retried
`Retry (1/4)`	~59 min	failed → retried
`Retry (2/4)`	~59 min	failed → retried
`Retry (3/4)`	~59 min	passed (retry 4 skipped)

continue-on-error: true rewrites each failed step's conclusion to success, so the check looks green even though 3 of 4 attempts failed. Only ~1 spec out of 1078 flakes per run (browser/local-server timing on Windows) — and it's a different spec each time (e.g. Discovery > captures favicon when the server provides one, ... promise only for sync snapshot). Re-running the entire ~60-min suite to recover one flaky spec is the real waste.

Fix: make retries cheap instead of fighting every flaky spec

scripts/test.js — the Jasmine node runner records failed spec names to PERCY_NODE_FAILURES_FILE; when run with PERCY_ONLY_FAILED_SPECS=1 it uses a specFilter to re-run only those specs. A filtered run reports incomplete (the rest are intentionally skipped), so success there means the targeted specs passed; full runs keep Jasmine's strict status. No behavior change when the env vars are unset (Linux/local/coverage all unaffected).
.github/workflows/windows.yml — retry0 runs the full suite once; retries 1-4 set PERCY_ONLY_FAILED_SPECS=1 so they re-run only the spec(s) that just failed (seconds, not ~60 min).
Honest check — a Flag flaky tests step emits a ::warning:: whenever a retry recovered a flake, so "green via retry" stays visible.

Verification (local, against the real `@percy/core` suite)

Retry re-runs only the failed subset — proven on real failures: 28 specs in 12s vs the full 1078 specs in 1304s (~108× cheaper retry).
Retry of passing spec → exit 0; retry of failing spec → exit 1 with the name recorded for the next retry.
Caught and fixed a real bug along the way: filtered runs report incomplete, so the exit logic now treats "targeted specs passed" as success (otherwise every retry would have looked like a failure).

Expected impact

~241 min → ~62 min for @percy/core on Windows (one full pass + cheap retries), full coverage preserved.

The real Windows wall-clock is validated by this PR's Windows CI run. The ~3× single-pass gap (Windows ~60 vs Linux ~19 min) is intentionally out of scope here — it's a separate per-run-speedup follow-up.

Draft until Windows CI confirms the timing.

🤖 Generated with Claude Code

Windows @percy/core CI took ~241 min vs ~19 min on Linux. The cause was not a single slow run: ~1 of 1078 node specs flakes per run on Windows (browser/local-server timing), and windows.yml retried the entire ~60-min suite up to 4x, with continue-on-error masking the failures as green. Make retries cheap instead of fighting every flaky spec: - scripts/test.js records failed spec names to PERCY_NODE_FAILURES_FILE and, when PERCY_ONLY_FAILED_SPECS=1, uses a jasmine specFilter to re-run only those specs. A filtered run reports "incomplete" (rest skipped), so success there means the targeted specs passed; full runs keep jasmine's strict status. No behavior change when the env vars are unset (Linux/local). - windows.yml: retry0 runs the full suite; retries 1-4 re-run only the specs that just failed (seconds, not ~60 min). - Add a ::warning:: when a retry recovers a flake so "green via retry" stays visible. Verified locally against the real @percy/core suite: a 28-spec failure set re-ran in 12s vs 1304s for the full 1078-spec suite. Expected on Windows: ~241 min -> ~62 min, full coverage kept. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cli-doctor's env-audit check flags any process.env key starting with PERCY_, so injecting PERCY_NODE_FAILURES_FILE at the job level broke its "no Percy vars are set" specs on Windows CI. Rename to CLI_TEST_FAILURES_FILE and CLI_TEST_ONLY_FAILED (non-PERCY_) so the retry plumbing stays invisible to Percy-env detection. Verified locally: cli-doctor 508/508 pass with the renamed vars (both specs fail when a PERCY_-prefixed var is injected, confirming the cause); spec-level retry behavior unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

rishigupta1599 · 2026-06-08T06:42:58Z

+    # specs that flaked instead of the whole ~60-min suite. See PER-9011.
+    # Name is deliberately non-PERCY_ so it doesn't trip env-audit tests.
+    env:
+      CLI_TEST_FAILURES_FILE: ${{ github.workspace }}/.cli-test-failures.json


why only implement for windows?, lets do for linux as well will help in improvement overall

Good call — done in 9c81399. The runner support in scripts/test.js was already platform-agnostic; only the wiring was Windows-only, so I've extended it to test.yml (Linux) too. Linux flakes on the same timing-sensitive specs and had no retry at all.

One Linux-specific nuance worth flagging: Linux runs test:coverage with a 100% coverage gate (.nycrc), and a retry that re-runs only a few specs can't reach 100%. So I structured it as:

retry0 → test:coverage (full suite + coverage gate, records any failed specs)

retries 1–4 → plain test on only the failed specs, without the coverage gate (a subset can't hit 100%)

if retry0 fails with no recorded spec failures (i.e. a genuine coverage drop or a crash, not a flaky spec), the runner now preserves that failure instead of masking it — so real coverage regressions still fail the build.

Verified locally before pushing: empty-failures retry → exit 1 (preserved); real-failure retry → runs only that spec → exit 0; and the coverage path correctly writes the failures file the retries consume. CI is re-running now.

Per review feedback: the spec-level retry was only wired into windows.yml, but the runner support in scripts/test.js is platform-agnostic. Wire it into test.yml (Linux) too — Linux flakes on the same timing-sensitive specs and has no retry today. Linux runs test:coverage with a 100% coverage gate (.nycrc), which needs care: - retry0 runs `test:coverage` (full suite + coverage gate, records failed specs) - retries 1-4 run plain `test` on ONLY the failed specs, WITHOUT coverage (a subset can't hit the 100% threshold) - if retry0 fails with no recorded spec failures (a real coverage drop or a crash, not a flaky spec), the runner now preserves the failure instead of masking it Verified locally: empty-failures retry -> exit 1 (preserved); real-failure retry -> runs only that spec, exit 0; the coverage path writes the failures file that retries consume. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

AkashBrowserStack and others added 2 commits June 5, 2026 15:55

AkashBrowserStack marked this pull request as ready for review June 5, 2026 17:07

AkashBrowserStack requested a review from a team as a code owner June 5, 2026 17:07

AkashBrowserStack requested review from ninadbstack and rishigupta1599 June 5, 2026 17:07

rishigupta1599 reviewed Jun 8, 2026

View reviewed changes

rishigupta1599 approved these changes Jun 8, 2026

View reviewed changes

AkashBrowserStack merged commit 02acd47 into master Jun 8, 2026
45 checks passed

AkashBrowserStack deleted the PER-9011_windows-spec-level-retry branch June 8, 2026 10:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): re-run only failed specs on Windows test retries (PER-9011)#2267

fix(ci): re-run only failed specs on Windows test retries (PER-9011)#2267
AkashBrowserStack merged 3 commits into
masterfrom
PER-9011_windows-spec-level-retry

AkashBrowserStack commented Jun 5, 2026

Uh oh!

rishigupta1599 Jun 8, 2026

Uh oh!

AkashBrowserStack Jun 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AkashBrowserStack commented Jun 5, 2026

Problem

Root cause

Fix: make retries cheap instead of fighting every flaky spec

Verification (local, against the real @percy/core suite)

Expected impact

Uh oh!

rishigupta1599 Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

AkashBrowserStack Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Verification (local, against the real `@percy/core` suite)