Skip to content

fix(ci): re-run only failed specs on Windows test retries (PER-9011)#2267

Merged
AkashBrowserStack merged 3 commits into
masterfrom
PER-9011_windows-spec-level-retry
Jun 8, 2026
Merged

fix(ci): re-run only failed specs on Windows test retries (PER-9011)#2267
AkashBrowserStack merged 3 commits into
masterfrom
PER-9011_windows-spec-level-retry

Conversation

@AkashBrowserStack

Copy link
Copy Markdown
Contributor

Problem

The Windows Test @percy/core check takes ~241 min, vs ~19 min for the same check on Linux (Linux even runs with coverage). This slows every PR's required Windows checks. Tracked in PER-9011.

Root cause

The 241 min is not one slow run — it's the same @percy/core suite run 4 times by the retry loop in .github/workflows/windows.yml:

Step Duration True outcome
Run tests ~62 min failed → retried
Retry (1/4) ~59 min failed → retried
Retry (2/4) ~59 min failed → retried
Retry (3/4) ~59 min passed (retry 4 skipped)

continue-on-error: true rewrites each failed step's conclusion to success, so the check looks green even though 3 of 4 attempts failed. Only ~1 spec out of 1078 flakes per run (browser/local-server timing on Windows) — and it's a different spec each time (e.g. Discovery > captures favicon when the server provides one, ... promise only for sync snapshot). Re-running the entire ~60-min suite to recover one flaky spec is the real waste.

Fix: make retries cheap instead of fighting every flaky spec

  • scripts/test.js — the Jasmine node runner records failed spec names to PERCY_NODE_FAILURES_FILE; when run with PERCY_ONLY_FAILED_SPECS=1 it uses a specFilter to re-run only those specs. A filtered run reports incomplete (the rest are intentionally skipped), so success there means the targeted specs passed; full runs keep Jasmine's strict status. No behavior change when the env vars are unset (Linux/local/coverage all unaffected).
  • .github/workflows/windows.ymlretry0 runs the full suite once; retries 1-4 set PERCY_ONLY_FAILED_SPECS=1 so they re-run only the spec(s) that just failed (seconds, not ~60 min).
  • Honest check — a Flag flaky tests step emits a ::warning:: whenever a retry recovered a flake, so "green via retry" stays visible.

Verification (local, against the real @percy/core suite)

  • Retry re-runs only the failed subset — proven on real failures: 28 specs in 12s vs the full 1078 specs in 1304s (~108× cheaper retry).
  • Retry of passing spec → exit 0; retry of failing spec → exit 1 with the name recorded for the next retry.
  • Caught and fixed a real bug along the way: filtered runs report incomplete, so the exit logic now treats "targeted specs passed" as success (otherwise every retry would have looked like a failure).

Expected impact

~241 min → ~62 min for @percy/core on Windows (one full pass + cheap retries), full coverage preserved.

The real Windows wall-clock is validated by this PR's Windows CI run. The ~3× single-pass gap (Windows ~60 vs Linux ~19 min) is intentionally out of scope here — it's a separate per-run-speedup follow-up.

Draft until Windows CI confirms the timing.


🤖 Generated with Claude Code

AkashBrowserStack and others added 2 commits June 5, 2026 15:55
Windows @percy/core CI took ~241 min vs ~19 min on Linux. The cause was not
a single slow run: ~1 of 1078 node specs flakes per run on Windows
(browser/local-server timing), and windows.yml retried the entire ~60-min
suite up to 4x, with continue-on-error masking the failures as green.

Make retries cheap instead of fighting every flaky spec:
- scripts/test.js records failed spec names to PERCY_NODE_FAILURES_FILE and,
  when PERCY_ONLY_FAILED_SPECS=1, uses a jasmine specFilter to re-run only
  those specs. A filtered run reports "incomplete" (rest skipped), so success
  there means the targeted specs passed; full runs keep jasmine's strict
  status. No behavior change when the env vars are unset (Linux/local).
- windows.yml: retry0 runs the full suite; retries 1-4 re-run only the specs
  that just failed (seconds, not ~60 min).
- Add a ::warning:: when a retry recovers a flake so "green via retry" stays
  visible.

Verified locally against the real @percy/core suite: a 28-spec failure set
re-ran in 12s vs 1304s for the full 1078-spec suite. Expected on Windows:
~241 min -> ~62 min, full coverage kept.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cli-doctor's env-audit check flags any process.env key starting with PERCY_,
so injecting PERCY_NODE_FAILURES_FILE at the job level broke its
"no Percy vars are set" specs on Windows CI. Rename to CLI_TEST_FAILURES_FILE
and CLI_TEST_ONLY_FAILED (non-PERCY_) so the retry plumbing stays invisible to
Percy-env detection.

Verified locally: cli-doctor 508/508 pass with the renamed vars (both specs
fail when a PERCY_-prefixed var is injected, confirming the cause); spec-level
retry behavior unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AkashBrowserStack AkashBrowserStack marked this pull request as ready for review June 5, 2026 17:07
@AkashBrowserStack AkashBrowserStack requested a review from a team as a code owner June 5, 2026 17:07
# specs that flaked instead of the whole ~60-min suite. See PER-9011.
# Name is deliberately non-PERCY_ so it doesn't trip env-audit tests.
env:
CLI_TEST_FAILURES_FILE: ${{ github.workspace }}/.cli-test-failures.json

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why only implement for windows?, lets do for linux as well will help in improvement overall

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call — done in 9c81399. The runner support in scripts/test.js was already platform-agnostic; only the wiring was Windows-only, so I've extended it to test.yml (Linux) too. Linux flakes on the same timing-sensitive specs and had no retry at all.

One Linux-specific nuance worth flagging: Linux runs test:coverage with a 100% coverage gate (.nycrc), and a retry that re-runs only a few specs can't reach 100%. So I structured it as:

  • retry0test:coverage (full suite + coverage gate, records any failed specs)
  • retries 1–4 → plain test on only the failed specs, without the coverage gate (a subset can't hit 100%)
  • if retry0 fails with no recorded spec failures (i.e. a genuine coverage drop or a crash, not a flaky spec), the runner now preserves that failure instead of masking it — so real coverage regressions still fail the build.

Verified locally before pushing: empty-failures retry → exit 1 (preserved); real-failure retry → runs only that spec → exit 0; and the coverage path correctly writes the failures file the retries consume. CI is re-running now.

Per review feedback: the spec-level retry was only wired into windows.yml, but
the runner support in scripts/test.js is platform-agnostic. Wire it into
test.yml (Linux) too — Linux flakes on the same timing-sensitive specs and has
no retry today.

Linux runs test:coverage with a 100% coverage gate (.nycrc), which needs care:
- retry0 runs `test:coverage` (full suite + coverage gate, records failed specs)
- retries 1-4 run plain `test` on ONLY the failed specs, WITHOUT coverage
  (a subset can't hit the 100% threshold)
- if retry0 fails with no recorded spec failures (a real coverage drop or a
  crash, not a flaky spec), the runner now preserves the failure instead of
  masking it

Verified locally: empty-failures retry -> exit 1 (preserved); real-failure
retry -> runs only that spec, exit 0; the coverage path writes the failures
file that retries consume.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AkashBrowserStack AkashBrowserStack merged commit 02acd47 into master Jun 8, 2026
45 checks passed
@AkashBrowserStack AkashBrowserStack deleted the PER-9011_windows-spec-level-retry branch June 8, 2026 10:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants