Skip to content

test(console-ui): Playwright E2E — shell, hydration + authenticated user flows#465

Open
anupsv wants to merge 7 commits into
masterfrom
test/console-ui-playwright
Open

test(console-ui): Playwright E2E — shell, hydration + authenticated user flows#465
anupsv wants to merge 7 commits into
masterfrom
test/console-ui-playwright

Conversation

@anupsv

@anupsv anupsv commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds the real-browser test layer for console-ui — it didn't exist before (no Playwright/Cypress/Puppeteer; only jsdom Vitest unit tests, which can't see SSR hydration or App Router navigation, which is exactly why the nav/hydration regressions rotted silently).

This boots the app and drives Chromium, asserting both the shell and real authenticated user flows (incl. error/retry paths) actually work in a browser.

Hermetic auth + API mocking

  • The app runs with Privy unconfigured + NEXT_PUBLIC_E2E_AUTH=1 (set only by the Playwright server), so mock-auth returns a usable token+user.
  • Every coordinator call (/api/*) is route-mocked with seeded data in e2e/fixtures.ts, including a stateful API-key store (GET/POST/PATCH/DELETE/rotate) and SSE chat.
  • No Privy tenant, coordinator, or secrets required. Production builds leave the hook off (NEXT_PUBLIC_E2E_AUTH unset → behaviour identical to today).

Coverage (33 tests)

e2e/navigation.spec.ts — shell + hydration:

e2e/flows.spec.ts — authenticated user flows:

  • provider onboarding — empty fleet → onboarding + "Set up a provider" nav; a linked machine renders and unlocks the Setup/Earnings tabs;
  • API-key management — create, edit (spend cap), disable, rotate (one-time secret reveal), revoke (confirm dialog), and the empty-list state;
  • chat — typing + Enter renders the streamed SSE assistant response; switching the model via the composer selector flips the active model; Stop cancels an in-flight generation back to the idle state;
  • billing — the balance renders from /api/payments/balance, and Buy Credits → Continue completes a (mocked) Stripe checkout round-trip through to the success toast;
  • invite codes — redeeming a valid code shows the credited confirmation; an invalid code surfaces an error;
  • error + retry — provider fleet load failure → error state → Retry recovers; chat send failure → inline error bubble → Retry streams successfully.

Real defect found by the E2E

A pricing payload lacking a prices array crashed two pages from one endpoint's shape: /models hard-crashed the root error boundary (prices is not iterable, during render), and /earn threw the same way inside a fetch .then() (an unhandled rejection that broke the earnings calculator). Hardened both buildPricingLookups with Array.isArray (src/app/models/page.tsx, src/app/earn/calc.ts), and added "page resilience" regression tests (each page × two malformed shapes: missing prices and a non-array prices) — verified red without the guards, green with them, covering both the render-boundary and async-rejection failure modes.

Determinism

The Playwright webServer runs a production build + start (not next dev). Dev compiled routes on-demand and served SSR single-threaded, which raced the heavy /models page under parallel workers; a prod build pre-compiles + serves static/optimized output. Result: 72/72 under --repeat-each=3, zero flakes.

Plumbing

  • vitest.config.ts excludes e2e/; npm scripts test:e2e / test:e2e:ui.
  • CI: Console UI E2E (Playwright) job (installs Chromium, builds, runs the suite on PRs).

Test plan

  • npm run test:e2e — 33/33 in Chromium; stable under --repeat-each=3.
  • npx eslint src/ clean; npm run build passes (hook gated off in prod).
  • npx vitest run — e2e specs correctly excluded.

Made with Cursor

@vercel

vercel Bot commented Jun 24, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
d-inference Ready Ready Preview Jun 25, 2026 4:27am
d-inference-console-ui-dev Ready Ready Preview Jun 25, 2026 4:27am
d-inference-landing Ready Ready Preview Jun 25, 2026 4:27am

Request Review

@blacksmith-sh

blacksmith-sh Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Found 1 test failure on Blacksmith runners:

Failure

Test View Logs
github.com/eigeninference/d-inference/e2e/TestIntegration_ConcurrentRequests View Logs

Fix with Codesmith
Need help on this PR? Tag /codesmith with what you need.

…guardrails)

Adds a real-browser E2E layer (Playwright + Chromium) for the console UI. Unlike
the jsdom Vitest unit tests, these boot the app and drive an actual browser, so
they catch SSR-hydration and client-navigation regressions jsdom cannot.

- playwright.config.ts: hermetic mock-auth dev server (Privy unconfigured) — no
  coordinator or secrets needed.
- e2e/navigation.spec.ts: every shell route loads with no React hydration error;
  a persisted verification-mode preference hydrates cleanly; sidebar links and
  the provider-dashboard tabs switch routes.
- vitest.config.ts: exclude e2e/ so Vitest doesn't pick up the Playwright specs.
- npm scripts: test:e2e / test:e2e:ui.
- CI: new "Console UI E2E (Playwright)" job (installs chromium, runs the suite).

Honest scope note: this hermetic mock-auth harness does NOT reproduce the
production #463 hydration break (the verification-mode consumers need real
authenticated trust data to render divergent DOM), so the suite is a broad
hydration + navigation guardrail rather than proof of that specific fix.
Verified locally: 11/11 pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
Expand the browser E2E from shell/hydration guardrails to real
authenticated flows:
- provider onboarding (empty fleet onboarding + "set up" nav; linked
  fleet renders a machine and unlocks the Setup/Earnings tabs)
- API-key creation (open form, submit, one-time secret reveal, list)
- chat send (streamed SSE assistant response)

Hermetic: an env-gated mock-auth hook (NEXT_PUBLIC_E2E_AUTH, set only by
the Playwright dev server) returns a usable token+user, and all /api/*
coordinator calls are route-mocked in e2e/fixtures.ts. No Privy tenant,
coordinator, or secrets required; production builds leave the hook off.

Also raise assertion timeouts to absorb dev-server on-demand route
compilation under parallel workers.

Co-authored-by: Cursor <cursoragent@cursor.com>
@anupsv anupsv changed the title test(console-ui): add Playwright browser E2E (navigation + hydration guardrails) test(console-ui): Playwright E2E — shell, hydration + authenticated user flows Jun 24, 2026
@github-actions

github-actions Bot commented Jun 24, 2026

Copy link
Copy Markdown

This PR improves E2E test infrastructure for the mock-auth path but introduces a new production attack surface via a client-side env-var gate that has the same shape as the existing T-020 misconfiguration risk.


Trust boundaries touched

  • TB-004 – Browser ↔ Coordinator (console UI auth state)

Threat analysis

T-020 — Mock auth active in production due to missing Privy config

⚠️ Weakens the mitigation (partially)

The original MOCK_AUTH was unconditionally authenticated: true with a null user and a noopToken. This PR keeps authenticated: true unconditionally in MOCK_AUTH but now conditionally populates a non-null user object and a non-null token string when NEXT_PUBLIC_E2E_AUTH === "1".

The concern is two-fold:

  1. NEXT_PUBLIC_ prefix means the value is baked into the JS bundle at build time by Next.js and shipped to the browser. It is not a server-side secret. If a CI/CD pipeline, staging environment, or developer accidentally produces a build with NEXT_PUBLIC_E2E_AUTH=1 set, every visitor will receive authenticated: true with a valid-looking token string ("e2e-mock-token") and a user object. The coordinator will reject that token, but the UI will present a fully authenticated state — exactly the T-020 scenario.

  2. "e2e-mock-token" is a static, hardcoded string. If any route or component passes getAccessToken() output to a coordinator request without checking the coordinator's response, or if a future BFF route is added that does something with the token before forwarding it, the blast radius of a mistaken build grows.

The mitigating factor noted in the comment ("unset in every real build") is a process control, not a code control — there is no if (process.env.NODE_ENV === 'production') throw guard, no CI enforcement visible in this diff, and no assertion at startup that prevents a production Next.js bundle from carrying this flag.

Recommended hardening at PrivyClientProvider.tsx line 24:

const E2E_AUTH =
  process.env.NEXT_PUBLIC_E2E_AUTH === "1" &&
  process.env.NODE_ENV !== "production";

This makes the gate fail-closed in production builds regardless of what env vars CI accidentally passes, and costs nothing at runtime.


New attack surface not covered by an existing threat

Hardcoded credential string in a client-side bundle.
"e2e-mock-token" at line 35 is a static string that ships in the browser bundle whenever E2E_AUTH is true. If any logging, analytics, or error-reporting integration (Sentry, Datadog RUM, etc.) captures getAccessToken() output, this string will appear in those systems. More importantly, it creates a false sense that a "token" is present, which can mask missing-auth bugs in new routes during E2E development. This isn't covered by T-020 (which is about the UI presenting an authenticated state, not about a concrete token string being emitted). No existing threat ID maps to "hardcoded test credential leaking into production bundle analytics." Severity is low in isolation but worth noting.


SEC-* findings resolved

None — this PR does not close any open SEC-* findings. SEC-026 (the tracking issue for T-020) remains open; this change makes the code path more capable when accidentally triggered, not less.


Summary of recommended changes

Location Change
PrivyClientProvider.tsx line 24 Add && process.env.NODE_ENV !== "production" guard
PrivyClientProvider.tsx line 35 Consider a non-token-shaped value (e.g. "e2e-mock-token-not-valid") to make misuse obvious
CI (.github/workflows/ci.yml) Verify NEXT_PUBLIC_E2E_AUTH is absent from any workflow that produces or deploys a production artifact

🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

Adds authenticated-flow coverage on top of the existing suite:
- API-key management: edit (spend cap), disable, rotate (secret reveal),
  revoke (confirm dialog), and the empty-list state — backed by a full
  stateful CRUD mock (GET/POST/PATCH/DELETE/rotate).
- Error + retry paths: provider fleet load failure -> error state -> Retry
  recovers; chat send failure -> inline error bubble -> Retry streams ok.

The E2E surfaced a real defect: the /models page hard-crashed (root error
boundary, "prices is not iterable") whenever the pricing payload lacked a
`prices` array. Harden buildPricingLookup to tolerate it, and correct the
test mock to the real { prices: [] } shape.

Switch the Playwright webServer from `next dev` to a production build + start.
Dev compiled routes on-demand and served SSR single-threaded, which raced the
heavy /models page under parallel workers; a prod build pre-compiles and serves
static/optimized output, so the suite is deterministic (72/72 under
--repeat-each=3). Also decouple the empty-fleet tests from the fixture default.

Co-authored-by: Cursor <cursoragent@cursor.com>

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

Comment thread console-ui/e2e/flows.spec.ts Fixed
…ng guards

Review follow-ups from the E2E expansion:
- earn/calc.ts had the same unguarded `pricing.prices.map(...)` crash the
  prior commit fixed in /models — a malformed pricing payload (200 body
  without `prices`) threw "reading 'map'" inside a fetch .then(), an
  unhandled rejection that left the earnings calculator broken. Guard it.
- Strengthen both guards to Array.isArray (covers null/undefined AND any
  non-array shape, matching the comments).
- Add regression tests (e2e/flows.spec.ts "page resilience") that seed a
  malformed { } pricing payload and assert /models and /earn still render —
  covering both failure modes (root error boundary for the render-path crash
  on /models, unhandled-rejection capture for the async crash on /earn).
  Verified red without the guards, green with them.
- Refresh the stale playwright.config timeout comment (no longer dev-server).

Co-authored-by: Cursor <cursoragent@cursor.com>

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

- billing: assert the balance renders from /api/payments/balance, then drive
  Buy Credits → Continue; the mocked Stripe checkout "redirects" back with the
  success flag so the full round-trip (checkout → success toast) is exercised.
- chat: seed two models and switch via the composer's model selector, asserting
  the active model flips.

Adds default payments route mocks (balance/usage/stripe-status/checkout) and a
seedModels() helper to the fixture.

Co-authored-by: Cursor <cursoragent@cursor.com>

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

Comment on lines +1 to +11
import {
test,
expect,
makeProvider,
makeProvidersResponse,
seedProviders,
seedKeys,
seedModels,
chatSse,
CHAT_REPLY,
} from "./fixtures";
…omposer buttons

- chat: hold the chat request open, assert the composer shows Stop, click it,
  and assert generation cancels back to the idle Send state with no error bubble.
- invite: redeem a valid code (asserts the credited-success message) and an
  invalid code (asserts the error toast), mocking /api/invite/redeem.

The Send/Stop composer buttons were icon-only with no accessible name; add
aria-labels ("Send message" / "Stop generating") — an a11y fix that also gives
the tests a stable, semantic selector.

Co-authored-by: Cursor <cursoragent@cursor.com>

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

…y case

The existing "page resilience" tests seed a missing `prices` payload, which a
weaker `?? []` guard would also survive — so they didn't pin the Array.isArray
strengthening (review note). Parametrize over a second malformed shape,
`{ prices: {} }` (non-array, truthy): with `?? []` it still throws
("is not iterable" on /models, "map is not a function" on /earn), with
Array.isArray it renders cleanly. Verified red against `?? []`, green with the
guards. Now 33 tests.

Co-authored-by: Cursor <cursoragent@cursor.com>

@ethenotethan ethenotethan left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants