test(console-ui): Playwright E2E — shell, hydration + authenticated user flows#465
test(console-ui): Playwright E2E — shell, hydration + authenticated user flows#465anupsv wants to merge 7 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Found 1 test failure on Blacksmith runners: Failure
|
…guardrails) Adds a real-browser E2E layer (Playwright + Chromium) for the console UI. Unlike the jsdom Vitest unit tests, these boot the app and drive an actual browser, so they catch SSR-hydration and client-navigation regressions jsdom cannot. - playwright.config.ts: hermetic mock-auth dev server (Privy unconfigured) — no coordinator or secrets needed. - e2e/navigation.spec.ts: every shell route loads with no React hydration error; a persisted verification-mode preference hydrates cleanly; sidebar links and the provider-dashboard tabs switch routes. - vitest.config.ts: exclude e2e/ so Vitest doesn't pick up the Playwright specs. - npm scripts: test:e2e / test:e2e:ui. - CI: new "Console UI E2E (Playwright)" job (installs chromium, runs the suite). Honest scope note: this hermetic mock-auth harness does NOT reproduce the production #463 hydration break (the verification-mode consumers need real authenticated trust data to render divergent DOM), so the suite is a broad hydration + navigation guardrail rather than proof of that specific fix. Verified locally: 11/11 pass. Co-authored-by: Cursor <cursoragent@cursor.com>
e398b04 to
538003c
Compare
Expand the browser E2E from shell/hydration guardrails to real authenticated flows: - provider onboarding (empty fleet onboarding + "set up" nav; linked fleet renders a machine and unlocks the Setup/Earnings tabs) - API-key creation (open form, submit, one-time secret reveal, list) - chat send (streamed SSE assistant response) Hermetic: an env-gated mock-auth hook (NEXT_PUBLIC_E2E_AUTH, set only by the Playwright dev server) returns a usable token+user, and all /api/* coordinator calls are route-mocked in e2e/fixtures.ts. No Privy tenant, coordinator, or secrets required; production builds leave the hook off. Also raise assertion timeouts to absorb dev-server on-demand route compilation under parallel workers. Co-authored-by: Cursor <cursoragent@cursor.com>
|
This PR improves E2E test infrastructure for the mock-auth path but introduces a new production attack surface via a client-side env-var gate that has the same shape as the existing T-020 misconfiguration risk. Trust boundaries touched
Threat analysisT-020 — Mock auth active in production due to missing Privy configThe original The concern is two-fold:
The mitigating factor noted in the comment ("unset in every real build") is a process control, not a code control — there is no Recommended hardening at const E2E_AUTH =
process.env.NEXT_PUBLIC_E2E_AUTH === "1" &&
process.env.NODE_ENV !== "production";This makes the gate fail-closed in production builds regardless of what env vars CI accidentally passes, and costs nothing at runtime. New attack surface not covered by an existing threatHardcoded credential string in a client-side bundle. SEC-* findings resolvedNone — this PR does not close any open SEC-* findings. SEC-026 (the tracking issue for T-020) remains open; this change makes the code path more capable when accidentally triggered, not less. Summary of recommended changes
🔐 Threat model: |
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186
Adds authenticated-flow coverage on top of the existing suite:
- API-key management: edit (spend cap), disable, rotate (secret reveal),
revoke (confirm dialog), and the empty-list state — backed by a full
stateful CRUD mock (GET/POST/PATCH/DELETE/rotate).
- Error + retry paths: provider fleet load failure -> error state -> Retry
recovers; chat send failure -> inline error bubble -> Retry streams ok.
The E2E surfaced a real defect: the /models page hard-crashed (root error
boundary, "prices is not iterable") whenever the pricing payload lacked a
`prices` array. Harden buildPricingLookup to tolerate it, and correct the
test mock to the real { prices: [] } shape.
Switch the Playwright webServer from `next dev` to a production build + start.
Dev compiled routes on-demand and served SSR single-threaded, which raced the
heavy /models page under parallel workers; a prod build pre-compiles and serves
static/optimized output, so the suite is deterministic (72/72 under
--repeat-each=3). Also decouple the empty-fleet tests from the fixture default.
Co-authored-by: Cursor <cursoragent@cursor.com>
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186
…ng guards
Review follow-ups from the E2E expansion:
- earn/calc.ts had the same unguarded `pricing.prices.map(...)` crash the
prior commit fixed in /models — a malformed pricing payload (200 body
without `prices`) threw "reading 'map'" inside a fetch .then(), an
unhandled rejection that left the earnings calculator broken. Guard it.
- Strengthen both guards to Array.isArray (covers null/undefined AND any
non-array shape, matching the comments).
- Add regression tests (e2e/flows.spec.ts "page resilience") that seed a
malformed { } pricing payload and assert /models and /earn still render —
covering both failure modes (root error boundary for the render-path crash
on /models, unhandled-rejection capture for the async crash on /earn).
Verified red without the guards, green with them.
- Refresh the stale playwright.config timeout comment (no longer dev-server).
Co-authored-by: Cursor <cursoragent@cursor.com>
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186
- billing: assert the balance renders from /api/payments/balance, then drive Buy Credits → Continue; the mocked Stripe checkout "redirects" back with the success flag so the full round-trip (checkout → success toast) is exercised. - chat: seed two models and switch via the composer's model selector, asserting the active model flips. Adds default payments route mocks (balance/usage/stripe-status/checkout) and a seedModels() helper to the fixture. Co-authored-by: Cursor <cursoragent@cursor.com>
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186
| import { | ||
| test, | ||
| expect, | ||
| makeProvider, | ||
| makeProvidersResponse, | ||
| seedProviders, | ||
| seedKeys, | ||
| seedModels, | ||
| chatSse, | ||
| CHAT_REPLY, | ||
| } from "./fixtures"; |
…omposer buttons
- chat: hold the chat request open, assert the composer shows Stop, click it,
and assert generation cancels back to the idle Send state with no error bubble.
- invite: redeem a valid code (asserts the credited-success message) and an
invalid code (asserts the error toast), mocking /api/invite/redeem.
The Send/Stop composer buttons were icon-only with no accessible name; add
aria-labels ("Send message" / "Stop generating") — an a11y fix that also gives
the tests a stable, semantic selector.
Co-authored-by: Cursor <cursoragent@cursor.com>
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186
…y case
The existing "page resilience" tests seed a missing `prices` payload, which a
weaker `?? []` guard would also survive — so they didn't pin the Array.isArray
strengthening (review note). Parametrize over a second malformed shape,
`{ prices: {} }` (non-array, truthy): with `?? []` it still throws
("is not iterable" on /models, "map is not a function" on /earn), with
Array.isArray it renders cleanly. Verified red against `?? []`, green with the
guards. Now 33 tests.
Co-authored-by: Cursor <cursoragent@cursor.com>
ethenotethan
left a comment
There was a problem hiding this comment.
Automated Code Review — Layr-Labs/d-inference#
Verdict: COMMENT
Security — ✅ No issues found
Performance — ✅ No issues found
Type_diligence — ✅ No issues found
Additive_complexity — ✅ No issues found
✅ All four passes clean. No issues found.
🤖 Automated review by Centaur · DAR-186

Summary
Adds the real-browser test layer for
console-ui— it didn't exist before (no Playwright/Cypress/Puppeteer; only jsdom Vitest unit tests, which can't see SSR hydration or App Router navigation, which is exactly why the nav/hydration regressions rotted silently).This boots the app and drives Chromium, asserting both the shell and real authenticated user flows (incl. error/retry paths) actually work in a browser.
Hermetic auth + API mocking
NEXT_PUBLIC_E2E_AUTH=1(set only by the Playwright server), so mock-auth returns a usable token+user./api/*) is route-mocked with seeded data ine2e/fixtures.ts, including a stateful API-key store (GET/POST/PATCH/DELETE/rotate) and SSE chat.NEXT_PUBLIC_E2E_AUTHunset → behaviour identical to today).Coverage (33 tests)
e2e/navigation.spec.ts— shell + hydration:/,/stats,/providers,/providers/setup,/providers/earnings,/earn,/api-console,/models,/billing,/settings) loads with no React hydration error;technicalverification-mode preference;e2e/flows.spec.ts— authenticated user flows:/api/payments/balance, and Buy Credits → Continue completes a (mocked) Stripe checkout round-trip through to the success toast;Real defect found by the E2E
A pricing payload lacking a
pricesarray crashed two pages from one endpoint's shape:/modelshard-crashed the root error boundary (prices is not iterable, during render), and/earnthrew the same way inside a fetch.then()(an unhandled rejection that broke the earnings calculator). Hardened bothbuildPricingLookups withArray.isArray(src/app/models/page.tsx,src/app/earn/calc.ts), and added "page resilience" regression tests (each page × two malformed shapes: missingpricesand a non-arrayprices) — verified red without the guards, green with them, covering both the render-boundary and async-rejection failure modes.Determinism
The Playwright webServer runs a production build + start (not
next dev). Dev compiled routes on-demand and served SSR single-threaded, which raced the heavy/modelspage under parallel workers; a prod build pre-compiles + serves static/optimized output. Result: 72/72 under--repeat-each=3, zero flakes.Plumbing
vitest.config.tsexcludese2e/; npm scriptstest:e2e/test:e2e:ui.Console UI E2E (Playwright)job (installs Chromium, builds, runs the suite on PRs).Test plan
npm run test:e2e— 33/33 in Chromium; stable under--repeat-each=3.npx eslint src/clean;npm run buildpasses (hook gated off in prod).npx vitest run— e2e specs correctly excluded.Made with Cursor