test(console-ui): Playwright E2E — shell, hydration + authenticated user flows by anupsv · Pull Request #465 · Layr-Labs/d-inference

anupsv · 2026-06-24T21:21:13Z

Summary

Adds the real-browser test layer for console-ui — it didn't exist before (no Playwright/Cypress/Puppeteer; only jsdom Vitest unit tests, which can't see SSR hydration or App Router navigation, which is exactly why the nav/hydration regressions rotted silently).

This boots the app and drives Chromium, asserting both the shell and real authenticated user flows (incl. error/retry paths) actually work in a browser.

Hermetic auth + API mocking

The app runs with Privy unconfigured + NEXT_PUBLIC_E2E_AUTH=1 (set only by the Playwright server), so mock-auth returns a usable token+user.
Every coordinator call (/api/*) is route-mocked with seeded data in e2e/fixtures.ts, including a stateful API-key store (GET/POST/PATCH/DELETE/rotate) and SSE chat.
No Privy tenant, coordinator, or secrets required. Production builds leave the hook off (NEXT_PUBLIC_E2E_AUTH unset → behaviour identical to today).

Coverage (33 tests)

e2e/navigation.spec.ts — shell + hydration:

every shell route (/, /stats, /providers, /providers/setup, /providers/earnings, /earn, /api-console, /models, /billing, /settings) loads with no React hydration error;
clean hydration with a persisted technical verification-mode preference;
sidebar links switch routes; provider dashboard hides Setup/Earnings tabs until a machine is linked (fix(console-ui): repair provider dashboard tab navigation + gate post-install tabs #462).

e2e/flows.spec.ts — authenticated user flows:

provider onboarding — empty fleet → onboarding + "Set up a provider" nav; a linked machine renders and unlocks the Setup/Earnings tabs;
API-key management — create, edit (spend cap), disable, rotate (one-time secret reveal), revoke (confirm dialog), and the empty-list state;
chat — typing + Enter renders the streamed SSE assistant response; switching the model via the composer selector flips the active model; Stop cancels an in-flight generation back to the idle state;
billing — the balance renders from /api/payments/balance, and Buy Credits → Continue completes a (mocked) Stripe checkout round-trip through to the success toast;
invite codes — redeeming a valid code shows the credited confirmation; an invalid code surfaces an error;
error + retry — provider fleet load failure → error state → Retry recovers; chat send failure → inline error bubble → Retry streams successfully.

Real defect found by the E2E

A pricing payload lacking a prices array crashed two pages from one endpoint's shape: /models hard-crashed the root error boundary (prices is not iterable, during render), and /earn threw the same way inside a fetch .then() (an unhandled rejection that broke the earnings calculator). Hardened both buildPricingLookups with Array.isArray (src/app/models/page.tsx, src/app/earn/calc.ts), and added "page resilience" regression tests (each page × two malformed shapes: missing prices and a non-array prices) — verified red without the guards, green with them, covering both the render-boundary and async-rejection failure modes.

Determinism

The Playwright webServer runs a production build + start (not next dev). Dev compiled routes on-demand and served SSR single-threaded, which raced the heavy /models page under parallel workers; a prod build pre-compiles + serves static/optimized output. Result: 72/72 under --repeat-each=3, zero flakes.

Plumbing

vitest.config.ts excludes e2e/; npm scripts test:e2e / test:e2e:ui.
CI: Console UI E2E (Playwright) job (installs Chromium, builds, runs the suite on PRs).

Test plan

npm run test:e2e — 33/33 in Chromium; stable under --repeat-each=3.
npx eslint src/ clean; npm run build passes (hook gated off in prod).
npx vitest run — e2e specs correctly excluded.

Made with Cursor

vercel · 2026-06-24T21:21:19Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
d-inference	Ready	Preview	Jun 25, 2026 4:27am
d-inference-console-ui-dev	Ready	Preview	Jun 25, 2026 4:27am
d-inference-landing	Ready	Preview	Jun 25, 2026 4:27am

blacksmith-sh · 2026-06-24T21:23:55Z

Found 1 test failure on Blacksmith runners:

Failure

Test	View Logs
`github.com/eigeninference/d-inference/e2e/TestIntegration_ConcurrentRequests`	View Logs

^{Need help on this PR? Tag /codesmith with what you need.}

…guardrails) Adds a real-browser E2E layer (Playwright + Chromium) for the console UI. Unlike the jsdom Vitest unit tests, these boot the app and drive an actual browser, so they catch SSR-hydration and client-navigation regressions jsdom cannot. - playwright.config.ts: hermetic mock-auth dev server (Privy unconfigured) — no coordinator or secrets needed. - e2e/navigation.spec.ts: every shell route loads with no React hydration error; a persisted verification-mode preference hydrates cleanly; sidebar links and the provider-dashboard tabs switch routes. - vitest.config.ts: exclude e2e/ so Vitest doesn't pick up the Playwright specs. - npm scripts: test:e2e / test:e2e:ui. - CI: new "Console UI E2E (Playwright)" job (installs chromium, runs the suite). Honest scope note: this hermetic mock-auth harness does NOT reproduce the production #463 hydration break (the verification-mode consumers need real authenticated trust data to render divergent DOM), so the suite is a broad hydration + navigation guardrail rather than proof of that specific fix. Verified locally: 11/11 pass. Co-authored-by: Cursor <cursoragent@cursor.com>

Expand the browser E2E from shell/hydration guardrails to real authenticated flows: - provider onboarding (empty fleet onboarding + "set up" nav; linked fleet renders a machine and unlocks the Setup/Earnings tabs) - API-key creation (open form, submit, one-time secret reveal, list) - chat send (streamed SSE assistant response) Hermetic: an env-gated mock-auth hook (NEXT_PUBLIC_E2E_AUTH, set only by the Playwright dev server) returns a usable token+user, and all /api/* coordinator calls are route-mocked in e2e/fixtures.ts. No Privy tenant, coordinator, or secrets required; production builds leave the hook off. Also raise assertion timeouts to absorb dev-server on-demand route compilation under parallel workers. Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-06-24T21:55:12Z

This PR improves E2E test infrastructure for the mock-auth path but introduces a new production attack surface via a client-side env-var gate that has the same shape as the existing T-020 misconfiguration risk.

Trust boundaries touched

TB-004 – Browser ↔ Coordinator (console UI auth state)

Threat analysis

T-020 — Mock auth active in production due to missing Privy config

⚠️ Weakens the mitigation (partially)

The original MOCK_AUTH was unconditionally authenticated: true with a null user and a noopToken. This PR keeps authenticated: true unconditionally in MOCK_AUTH but now conditionally populates a non-null user object and a non-null token string when NEXT_PUBLIC_E2E_AUTH === "1".

The concern is two-fold:

NEXT_PUBLIC_ prefix means the value is baked into the JS bundle at build time by Next.js and shipped to the browser. It is not a server-side secret. If a CI/CD pipeline, staging environment, or developer accidentally produces a build with NEXT_PUBLIC_E2E_AUTH=1 set, every visitor will receive authenticated: true with a valid-looking token string ("e2e-mock-token") and a user object. The coordinator will reject that token, but the UI will present a fully authenticated state — exactly the T-020 scenario.
"e2e-mock-token" is a static, hardcoded string. If any route or component passes getAccessToken() output to a coordinator request without checking the coordinator's response, or if a future BFF route is added that does something with the token before forwarding it, the blast radius of a mistaken build grows.

The mitigating factor noted in the comment ("unset in every real build") is a process control, not a code control — there is no if (process.env.NODE_ENV === 'production') throw guard, no CI enforcement visible in this diff, and no assertion at startup that prevents a production Next.js bundle from carrying this flag.

Recommended hardening at PrivyClientProvider.tsx line 24:

const E2E_AUTH =
  process.env.NEXT_PUBLIC_E2E_AUTH === "1" &&
  process.env.NODE_ENV !== "production";

This makes the gate fail-closed in production builds regardless of what env vars CI accidentally passes, and costs nothing at runtime.

New attack surface not covered by an existing threat

Hardcoded credential string in a client-side bundle.
"e2e-mock-token" at line 35 is a static string that ships in the browser bundle whenever E2E_AUTH is true. If any logging, analytics, or error-reporting integration (Sentry, Datadog RUM, etc.) captures getAccessToken() output, this string will appear in those systems. More importantly, it creates a false sense that a "token" is present, which can mask missing-auth bugs in new routes during E2E development. This isn't covered by T-020 (which is about the UI presenting an authenticated state, not about a concrete token string being emitted). No existing threat ID maps to "hardcoded test credential leaking into production bundle analytics." Severity is low in isolation but worth noting.

SEC-* findings resolved

None — this PR does not close any open SEC-* findings. SEC-026 (the tracking issue for T-020) remains open; this change makes the code path more capable when accidentally triggered, not less.

Summary of recommended changes

Location	Change
`PrivyClientProvider.tsx` line 24	Add `&& process.env.NODE_ENV !== "production"` guard
`PrivyClientProvider.tsx` line 35	Consider a non-token-shaped value (e.g. `"e2e-mock-token-not-valid"`) to make misuse obvious
CI (`.github/workflows/ci.yml`)	Verify `NEXT_PUBLIC_E2E_AUTH` is absent from any workflow that produces or deploys a production artifact

🔐 Threat model: docs/threat-model.yaml · Updates on each push to this PR

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

Adds authenticated-flow coverage on top of the existing suite: - API-key management: edit (spend cap), disable, rotate (secret reveal), revoke (confirm dialog), and the empty-list state — backed by a full stateful CRUD mock (GET/POST/PATCH/DELETE/rotate). - Error + retry paths: provider fleet load failure -> error state -> Retry recovers; chat send failure -> inline error bubble -> Retry streams ok. The E2E surfaced a real defect: the /models page hard-crashed (root error boundary, "prices is not iterable") whenever the pricing payload lacked a `prices` array. Harden buildPricingLookup to tolerate it, and correct the test mock to the real { prices: [] } shape. Switch the Playwright webServer from `next dev` to a production build + start. Dev compiled routes on-demand and served SSR single-threaded, which raced the heavy /models page under parallel workers; a prod build pre-compiles and serves static/optimized output, so the suite is deterministic (72/72 under --repeat-each=3). Also decouple the empty-fleet tests from the fixture default. Co-authored-by: Cursor <cursoragent@cursor.com>

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

…ng guards Review follow-ups from the E2E expansion: - earn/calc.ts had the same unguarded `pricing.prices.map(...)` crash the prior commit fixed in /models — a malformed pricing payload (200 body without `prices`) threw "reading 'map'" inside a fetch .then(), an unhandled rejection that left the earnings calculator broken. Guard it. - Strengthen both guards to Array.isArray (covers null/undefined AND any non-array shape, matching the comments). - Add regression tests (e2e/flows.spec.ts "page resilience") that seed a malformed { } pricing payload and assert /models and /earn still render — covering both failure modes (root error boundary for the render-path crash on /models, unhandled-rejection capture for the async crash on /earn). Verified red without the guards, green with them. - Refresh the stale playwright.config timeout comment (no longer dev-server). Co-authored-by: Cursor <cursoragent@cursor.com>

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

- billing: assert the balance renders from /api/payments/balance, then drive Buy Credits → Continue; the mocked Stripe checkout "redirects" back with the success flag so the full round-trip (checkout → success toast) is exercised. - chat: seed two models and switch via the composer's model selector, asserting the active model flips. Adds default payments route mocks (balance/usage/stripe-status/checkout) and a seedModels() helper to the fixture. Co-authored-by: Cursor <cursoragent@cursor.com>

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

+import {
+  test,
+  expect,
+  makeProvider,
+  makeProvidersResponse,
+  seedProviders,
+  seedKeys,
+  seedModels,
+  chatSse,
+  CHAT_REPLY,
+} from "./fixtures";


…omposer buttons - chat: hold the chat request open, assert the composer shows Stop, click it, and assert generation cancels back to the idle Send state with no error bubble. - invite: redeem a valid code (asserts the credited-success message) and an invalid code (asserts the error toast), mocking /api/invite/redeem. The Send/Stop composer buttons were icon-only with no accessible name; add aria-labels ("Send message" / "Stop generating") — an a11y fix that also gives the tests a stable, semantic selector. Co-authored-by: Cursor <cursoragent@cursor.com>

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

…y case The existing "page resilience" tests seed a missing `prices` payload, which a weaker `?? []` guard would also survive — so they didn't pin the Array.isArray strengthening (review note). Parametrize over a second malformed shape, `{ prices: {} }` (non-array, truthy): with `?? []` it still throws ("is not iterable" on /models, "map is not a function" on /earn), with Array.isArray it renders cleanly. Verified red against `?? []`, green with the guards. Now 33 tests. Co-authored-by: Cursor <cursoragent@cursor.com>

ethenotethan

Automated Code Review — Layr-Labs/d-inference#

Verdict: COMMENT

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

✅ All four passes clean. No issues found.

🤖 Automated review by Centaur · DAR-186

anupsv requested a deployment to benchmarks June 24, 2026 21:21 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference June 24, 2026 21:21 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 24, 2026 21:21 View deployment

anupsv force-pushed the test/console-ui-playwright branch from e398b04 to 538003c Compare June 24, 2026 21:29

anupsv requested a deployment to benchmarks June 24, 2026 21:30 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 24, 2026 21:30 View deployment

anupsv changed the base branch from fix/verification-mode-hydration-nav to master June 24, 2026 21:30

vercel Bot deployed to Preview – d-inference June 24, 2026 21:30 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 24, 2026 21:30 View deployment

anupsv requested a deployment to benchmarks June 24, 2026 21:54 — with GitHub Actions Waiting

anupsv changed the title ~~test(console-ui): add Playwright browser E2E (navigation + hydration guardrails)~~ test(console-ui): Playwright E2E — shell, hydration + authenticated user flows Jun 24, 2026

ethenotethan reviewed Jun 24, 2026

View reviewed changes

anupsv requested a deployment to benchmarks June 25, 2026 02:52 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 25, 2026 02:52 View deployment

ethenotethan reviewed Jun 25, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference-console-ui-dev June 25, 2026 02:53 View deployment

vercel Bot deployed to Preview – d-inference June 25, 2026 02:53 View deployment

github-code-quality Bot found potential problems Jun 25, 2026

View reviewed changes

Comment thread console-ui/e2e/flows.spec.ts Fixed

anupsv requested a deployment to benchmarks June 25, 2026 02:59 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 25, 2026 03:00 View deployment

ethenotethan reviewed Jun 25, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference June 25, 2026 03:00 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 25, 2026 03:00 View deployment

anupsv requested a deployment to benchmarks June 25, 2026 04:13 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 25, 2026 04:13 View deployment

ethenotethan reviewed Jun 25, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference June 25, 2026 04:14 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 25, 2026 04:14 View deployment

github-code-quality Bot found potential problems Jun 25, 2026

View reviewed changes

Comment thread console-ui/e2e/flows.spec.ts

Comment on lines +1 to +11

import {

test,

expect,

makeProvider,

makeProvidersResponse,

seedProviders,

seedKeys,

seedModels,

chatSse,

CHAT_REPLY,

} from "./fixtures";

anupsv requested a deployment to benchmarks June 25, 2026 04:19 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 25, 2026 04:19 View deployment

ethenotethan reviewed Jun 25, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference June 25, 2026 04:20 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 25, 2026 04:20 View deployment

anupsv requested a deployment to benchmarks June 25, 2026 04:26 — with GitHub Actions Waiting

vercel Bot deployed to Preview – d-inference-landing June 25, 2026 04:27 View deployment

ethenotethan reviewed Jun 25, 2026

View reviewed changes

vercel Bot deployed to Preview – d-inference June 25, 2026 04:27 View deployment

vercel Bot deployed to Preview – d-inference-console-ui-dev June 25, 2026 04:27 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(console-ui): Playwright E2E — shell, hydration + authenticated user flows#465

test(console-ui): Playwright E2E — shell, hydration + authenticated user flows#465
anupsv wants to merge 7 commits into
masterfrom
test/console-ui-playwright

anupsv commented Jun 24, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

blacksmith-sh Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

ethenotethan left a comment

Uh oh!

ethenotethan left a comment

Uh oh!

ethenotethan left a comment

Uh oh!

ethenotethan left a comment

Uh oh!

Uh oh!

ethenotethan left a comment

Uh oh!

ethenotethan left a comment

Uh oh!

ethenotethan left a comment

Uh oh!

ethenotethan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

anupsv commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Hermetic auth + API mocking

Coverage (33 tests)

Real defect found by the E2E

Determinism

Plumbing

Test plan

Uh oh!

vercel Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blacksmith-sh Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Failure

Uh oh!

github-actions Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Trust boundaries touched

Threat analysis

T-020 — Mock auth active in production due to missing Privy config

New attack surface not covered by an existing threat

SEC-* findings resolved

Summary of recommended changes

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

Uh oh!

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

Security — ✅ No issues found

Performance — ✅ No issues found

Type_diligence — ✅ No issues found

Additive_complexity — ✅ No issues found

Uh oh!

ethenotethan left a comment

Choose a reason for hiding this comment

Automated Code Review — Layr-Labs/d-inference#

anupsv commented Jun 24, 2026 •

edited

Loading

vercel Bot commented Jun 24, 2026 •

edited

Loading

blacksmith-sh Bot commented Jun 24, 2026 •

edited

Loading

github-actions Bot commented Jun 24, 2026 •

edited

Loading