Skip to content

fix(core): pre-bundle runtime-reached SSR deps for cold Cloudflare dev cache#1361

Open
MA2153 wants to merge 5 commits into
emdash-cms:mainfrom
MA2153:fix/cold-cache-ssr-dep-optimizer
Open

fix(core): pre-bundle runtime-reached SSR deps for cold Cloudflare dev cache#1361
MA2153 wants to merge 5 commits into
emdash-cms:mainfrom
MA2153:fix/cold-cache-ssr-dep-optimizer

Conversation

@MA2153
Copy link
Copy Markdown
Contributor

@MA2153 MA2153 commented Jun 5, 2026

What does this PR do?

On a cold Vite cache, pnpm dev in a Cloudflare-adapter EmDash site could throw repeatedly on the first SSR requests:

[vite] An error happened during full reload
The file does not exist at ".../node_modules/.vite/deps_ssr/chunk-XXXX.js?v=HASH" ...

Root cause: the cloudflare branch of ssr.optimizeDeps.include in packages/core/src/astro/integration/vite-config.ts pre-bundles SSR deps so workerd loads them in one cold-startup pass, but several deps are only reached at request time — the middleware chain (behind the excluded virtual:emdash boundary), lazy adapter entrypoints, first-render imports, and re-export specifiers like astro/zod. Vite's startup scan misses them, discovers each at request time, and re-optimizes; overlapping re-optimizations race on deps_ssr/ chunk hashes and crash in-flight requests.

This PR:

  • Adds a cold-cache guard test (packages/core/tests/integration/smoke/dep-optimizer-cold-cache.test.ts) that boots templates/starter-cloudflare with a wiped .vite cache, drives the dev-bypass / frontend / admin routes, and asserts Vite logs zero post-startup dependency re-optimizations. This fails automatically whenever a request-time SSR dep is missing from the include list — including future ones — instead of someone chasing the crash by hand. (The crash itself is a non-deterministic race; the test asserts on the deterministic upstream re-optimize signal.)
  • Seeds the missing specifiers into the cloudflare include branch (middleware chain, @emdash-cms/cloudflare/db/d1 + storage/r2, emdash/ui + runtime + media/local-runtime, astro/zod, plus bare forms of a few admin/auth deps the test caught re-optimizing).
  • Serializes smoke test files (fileParallelism: false) so the new test never races the existing site-matrix smoke suite on a shared site's .vite cache.

These additions are dev-server-only (ssr.optimizeDeps.include) and Cloudflare-only — no effect on production astro build output or the Node adapter branch.

Type of change

  • Bug fix
  • Feature (requires maintainer-approved Discussion)
  • Refactor (no behavior change)
  • Translation
  • Documentation
  • Performance improvement
  • Tests
  • Chore (dependencies, CI, tooling)

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm lint passes — clean for all files changed in this PR; the repo has 8 pre-existing diagnostics in demos/* / templates/* astro.config.mjs files unrelated to this change.
  • pnpm test passes (or targeted tests for my change) — the cold-cache guard test was verified RED→GREEN locally; CI's test-smoke job runs it as the authoritative check.
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • User-visible strings wrapped for translation — n/a, no admin UI strings.
  • I have added a changeset (emdash patch)
  • New features link to an approved Discussion — n/a, bug fix.

AI-generated code disclosure

  • This PR includes AI-generated code — model/tool: Claude Opus 4.8 (Claude Code)

Screenshots / test output

RED (before the fix), the guard reported re-optimize lines such as:

✨ new dependencies optimized: @lingui/react, @oslojs/crypto/hmac, @oslojs/crypto/subtle, @oslojs/crypto/rsa
✨ optimized dependencies changed. reloading
✨ new dependencies optimized: @cloudflare/kumo/primitives

GREEN after seeding the specifiers: 0 re-optimizations, 0 crash lines.

🤖 Generated with Claude Code

MA2153 and others added 2 commits June 5, 2026 14:53
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…v cache

Adds a cold-cache guard test that fails whenever a request-time SSR dep is
missing from the cloudflare optimizeDeps.include list, and seeds the 11
currently-missing specifiers so the cold optimize completes in one startup
pass (no deps_ssr re-optimize race).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Jun 5, 2026

🦋 Changeset detected

Latest commit: 0f88ba6

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages
Name Type
emdash Patch
@emdash-cms/cloudflare Patch
@emdash-cms/sandbox-workerd Patch
@emdash-cms/fixture-perf-site Patch
@emdash-cms/perf-demo-site Patch
@emdash-cms/cache-demo-site Patch
@emdash-cms/admin Patch
@emdash-cms/auth Patch
@emdash-cms/blocks Patch
@emdash-cms/gutenberg-to-portable-text Patch
@emdash-cms/x402 Patch
create-emdash Patch
@emdash-cms/auth-atproto Patch
@emdash-cms/plugin-embeds Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Jun 5, 2026

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@1361

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@1361

@emdash-cms/auth-atproto

npm i https://pkg.pr.new/@emdash-cms/auth-atproto@1361

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@1361

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@1361

@emdash-cms/contentful-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/contentful-to-portable-text@1361

emdash

npm i https://pkg.pr.new/emdash@1361

create-emdash

npm i https://pkg.pr.new/create-emdash@1361

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@1361

@emdash-cms/plugin-cli

npm i https://pkg.pr.new/@emdash-cms/plugin-cli@1361

@emdash-cms/plugin-types

npm i https://pkg.pr.new/@emdash-cms/plugin-types@1361

@emdash-cms/registry-client

npm i https://pkg.pr.new/@emdash-cms/registry-client@1361

@emdash-cms/registry-lexicons

npm i https://pkg.pr.new/@emdash-cms/registry-lexicons@1361

@emdash-cms/sandbox-workerd

npm i https://pkg.pr.new/@emdash-cms/sandbox-workerd@1361

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@1361

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@1361

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@1361

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@1361

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@1361

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@1361

@emdash-cms/plugin-field-kit

npm i https://pkg.pr.new/@emdash-cms/plugin-field-kit@1361

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@1361

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@1361

commit: 0f88ba6

Copy link
Copy Markdown
Contributor

@emdashbot emdashbot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is sound and targeted. Adding a deterministic guard test that asserts zero post-startup re-optimizations is exactly the right way to prevent a non-deterministic deps_ssr re-optimize race from regressing. The seeding of runtime-reached specifiers into the Cloudflare-only ssr.optimizeDeps.include branch is appropriately scoped — no production build impact, no Node adapter impact.

I checked the new optimize-deps entries against package exports, verified the shared port doesn't collide with the existing site matrix, and traced the virtual-module call sites that justify each middleware and adapter entrypoint. The changeset is present and correctly scoped.

Two issues in the new test file should be addressed before merge:

  1. fetchWithRetry silently returns when exhausted. Unlike the identical helper in site-matrix-smoke.test.ts, this version falls through without throwing if the server is down or consistently 500s. That means the route-driving fetches can fail silently, the imports are never reached, and the test may falsely pass because no re-optimize lines appear. It should throw lastError after the loop.

  2. server.killed prevents the SIGKILL fallback. In Node.js ChildProcess, killed is set to true as soon as kill() is called, not when the process exits. After server.kill("SIGTERM") the if (!server.killed) guard is always false, so SIGKILL is never sent. Zombie dev-server processes can leak in CI. The idiomatic fix is to check server.exitCode === null instead.

Once those reliability issues are fixed, this is good to merge.

Comment on lines +51 to +54
// route so its imports are reached.
for (let attempt = 0; attempt <= retries; attempt++) {
try {
const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[needs fixing] fetchWithRetry exhausts all retries without throwing, so a crashed or persistently-500 server causes the route-driving fetches to fail silently. The imports are never reached, and the test may falsely pass because no re-optimize logs appear.

The sibling helper in site-matrix-smoke.test.ts correctly throws lastError after the loop. This implementation should do the same.

Suggested change
// route so its imports are reached.
for (let attempt = 0; attempt <= retries; attempt++) {
try {
const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) });
async function fetchWithRetry(url: string, retries = 8, delayMs = 1500): Promise<void> {
let lastError: unknown;
for (let attempt = 0; attempt <= retries; attempt++) {
try {
const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) });
if (res.status < 500) return;
lastError = new Error(`${url} returned ${res.status}`);
} catch (error) {
lastError = error;
}
if (attempt < retries) await new Promise((r) => setTimeout(r, delayMs));
}
throw lastError instanceof Error ? lastError : new Error(`Request failed for ${url}`);
}

Copy link
Copy Markdown
Contributor

@emdashbot emdashbot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a well-targeted bug fix with a good approach. Pre-bundling request-time SSR deps in the Cloudflare branch of ssr.optimizeDeps.include is the right fix for the cold-cache re-optimize race, and adding a regression smoke test that asserts zero post-startup re-optimizations is exactly the guard needed to prevent future deps from slipping through. Serializing smoke tests with fileParallelism: false is the correct call because the new test wipes the shared .vite cache.

I checked the diff against origin/main, verified all added specifiers are real package exports (emdash/ui, emdash/runtime, emdash/media/local-runtime, @emdash-cms/cloudflare/db/d1, astro/zod, etc.), traced sibling smoke-test patterns, and reviewed AGENTS.md conventions. The production change (vite-config.ts) is clean and correct. The changeset is present and properly scoped.

The new test has one real issue that should be fixed before merge: fetchWithRetry silently returns after exhausting retries instead of throwing, which means the test can pass without driving the frontend and dev-bypass routes and miss future missing deps. I also noted a minor cleanup-code issue copied from the existing smoke suite.

Copy link
Copy Markdown
Contributor

@emdashbot emdashbot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR fixes a real, well-understood Cloudflare dev-server issue where missing ssr.optimizeDeps.include entries cause Vite to re-optimize SSR deps at request time on a cold cache, leading to a deps_ssr chunk-hash race and in-flight request crashes. The approach is sound and narrowly scoped:

  • The new specifiers added to the Cloudflare branch are all verified exports (emdash/ui, emdash/runtime, emdash/middleware/*, astro/zod, @emdash-cms/cloudflare/db/d1, @emdash-cms/cloudflare/storage/r2, plus bare forms of @lingui/react, @oslojs/crypto/*, and @cloudflare/kumo/primitives). The inline comments clearly explain why each group is needed and warn against removing apparent "duplicates."
  • The regression test is a pragmatic guard: it boots templates/starter-cloudflare with a wiped .vite cache, drives the core routes, and asserts zero post-startup re-optimization log lines. This is deterministic and catches the root cause rather than the flaky downstream crash.
  • fileParallelism: false in vitest.smoke.config.ts is the correct fix to prevent cache races between smoke test files.
  • The changeset targets emdash with a patch, which is correct.

One blocking finding on the test: fetchWithRetry in the new smoke test silently returns when all retries are exhausted instead of throwing. If a route persistently fails, the test proceeds without having exercised that route's imports, which defeats the test's core purpose of discovering missing force-includes. The sibling site-matrix-smoke.test.ts uses the same helper name but throws on exhaustion; the new test should match that behavior.

Once the test helper fails loudly on exhaustion, this is good to go.

Copy link
Copy Markdown
Contributor

@emdashbot emdashbot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach is sound: this is a targeted dev-server-only bug fix for a well-understood Vite cold-cache behavior on the Cloudflare adapter. Adding a failing smoke test, expanding ssr.optimizeDeps.include with the missing specifiers, and serializing smoke files so they don't race on a shared .vite cache is the right fix.

I checked the Vite config additions against the package export maps (emdash/* and @emdash-cms/cloudflare/*), traced the astro/zod usage in the tree, and compared the new test utilities to the existing smoke-suite patterns. The include list and changeset are correct.

However, the new test file copies two patterns from sibling smoke code that are actually bugs:

  1. fetchWithRetry silently returns after exhausting retries. Unlike the same-named helper in site-matrix-smoke.test.ts, the new version never throws. If the server persistently 500s or drops connections, the test proceeds without having actually driven the routes, producing a false green (no re-optimize logs because the lazy imports were never reached).

  2. server.killed is already true after kill("SIGTERM"), so the SIGKILL fallback is dead code. A hung server process would leak. The check should look at whether the process has actually exited.

Both issues are in the new dep-optimizer-cold-cache.test.ts file and should be fixed before merge.

} finally {
server.kill("SIGTERM");
await new Promise((r) => setTimeout(r, 1200));
if (!server.killed) server.kill("SIGKILL");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[needs fixing] server.kill("SIGTERM") sets server.killed = true synchronously, so if (!server.killed) is always false and the SIGKILL fallback is dead code. A hung server process would leak past the finally block.

Use exitCode / signalCode to check whether the process has actually exited:

Suggested change
if (!server.killed) server.kill("SIGKILL");
server.kill("SIGTERM");
await new Promise((r) => setTimeout(r, 1200));
if (server.exitCode === null && server.signalCode === null) {
server.kill("SIGKILL");
}
await new Promise((r) => setTimeout(r, 500));

Copy link
Copy Markdown
Contributor

@emdashbot emdashbot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right change for the right problem. The cold-cache SSR deps_ssr race on the Cloudflare adapter branch is real, and extending ssr.optimizeDeps.include with runtime-reached specifiers (middleware chain, lazy adapter entrypoints, UI/runtime subpaths, and bare duplicates of already-chained deps) matches the existing pattern for auth and MCP entrypoints in the same file. The new smoke-test guard asserting zero post-startup re-optimizations is a good regression-prevention mechanism, and serializing smoke files with fileParallelism: false correctly prevents shared .vite cache races.

I found one confirmed issue in the new test: fetchWithRetry silently returns after exhausting all retries instead of throwing. If the server is still returning 500s or is unreachable when the retries run out, the test proceeds as if the route was successfully driven, so imports for that route are never reached and a missing dep could go undetected. The sibling site-matrix-smoke.test.ts already implements the correct throw-after-exhaustion pattern. Fix that and this looks good to land.

Comment on lines +51 to +64
// route so its imports are reached.
for (let attempt = 0; attempt <= retries; attempt++) {
try {
const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) });
if (res.status < 500) return;
} catch {
// retry
}
if (attempt < retries) await new Promise((r) => setTimeout(r, delayMs));
}
}

/**
* Resolve once server output stops growing for `quietMs`, capped at `capMs`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[needs fixing] fetchWithRetry silently returns when every attempt fails (connection error or status ≥ 500). The callers don’t check the return value, so the test can proceed without ever having driven the route. If the imports for that route are never reached, a missing dep won’t be discovered and the test will give a false negative.

Mirror the sibling implementation in site-matrix-smoke.test.ts: track the last error and throw after the loop.

Suggested change
// route so its imports are reached.
for (let attempt = 0; attempt <= retries; attempt++) {
try {
const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) });
if (res.status < 500) return;
} catch {
// retry
}
if (attempt < retries) await new Promise((r) => setTimeout(r, delayMs));
}
}
/**
* Resolve once server output stops growing for `quietMs`, capped at `capMs`.
async function fetchWithRetry(url: string, retries = 8, delayMs = 1500): Promise<void> {
let lastError: unknown;
for (let attempt = 0; attempt <= retries; attempt++) {
try {
const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) });
if (res.status < 500) return;
lastError = new Error(`${url} returned ${res.status}`);
} catch (error) {
lastError = error;
}
if (attempt < retries) await new Promise((r) => setTimeout(r, delayMs));
}
throw lastError instanceof Error ? lastError : new Error(`Request failed for ${url}`);
}

@github-actions github-actions Bot added review/awaiting-author Reviewed; waiting on the author to respond and removed review/needs-review No maintainer or bot review yet labels Jun 5, 2026
@ascorbic
Copy link
Copy Markdown
Collaborator

ascorbic commented Jun 5, 2026

This should have been fixed already by #1359

Address PR review feedback on the dep-optimizer cold-cache guard:

- fetchWithRetry now throws lastError after exhausting retries instead of
  returning silently, so a persistently-5xx/unreachable server fails the
  test rather than producing a false green with the routes never driven.
- The SIGKILL fallback now checks `server.exitCode === null` instead of
  `server.killed` (which flips true the moment kill() is called, not on
  exit), so a server that ignores SIGTERM is actually force-killed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions github-actions Bot added review/needs-rereview Author pushed changes since the last review and removed review/awaiting-author Reviewed; waiting on the author to respond labels Jun 5, 2026
@MA2153
Copy link
Copy Markdown
Contributor Author

MA2153 commented Jun 5, 2026

Thanks for the reviews. Addressed the test reliability feedback and reconciled with the #1359 revert.

Test fixes (dep-optimizer-cold-cache.test.ts):

  • fetchWithRetry now throws lastError after exhausting retries instead of returning silently, matching site-matrix-smoke.test.ts. A persistently-5xx/unreachable server now fails the test rather than producing a false green with the routes never driven.
  • The SIGKILL fallback now checks server.exitCode === null instead of server.killed (which flips true the instant kill() is called, not on exit), so a server that ignores SIGTERM is actually force-killed.

Re: @ascorbic — "already fixed by #1359": #1359 was reverted on main in #1363, so that fix is no longer present. I've merged post-revert main into this branch, dropping the exclude/noExternal code that had been pulled in via an earlier merge. The diff against main is now just the ssr.optimizeDeps.include additions, the regression smoke test, fileParallelism: false, and the changeset.

But the include-list change isn't really the point of this PR — the regression test is. Whichever mechanism we land for the immediate cold-cache race, the durable value here is the guard: it boots a cold-cache Cloudflare site, drives the routes, and asserts zero post-startup re-optimizations. That fails automatically the next time any request-time SSR dep goes unoptimized — including future deps that don't exist yet — instead of someone rediscovering the deps_ssr crash by hand. #1359 fixed the instances that existed at the time but shipped no guard against the next one. This test is approach-agnostic; it would equally protect a re-landed exclude/noExternal fix.

Happy to keep just the test on top of whatever fix you prefer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/core cla: signed review/needs-rereview Author pushed changes since the last review size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants