fix(core): pre-bundle runtime-reached SSR deps for cold Cloudflare dev cache#1361
fix(core): pre-bundle runtime-reached SSR deps for cold Cloudflare dev cache#1361MA2153 wants to merge 5 commits into
Conversation
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…v cache Adds a cold-cache guard test that fails whenever a request-time SSR dep is missing from the cloudflare optimizeDeps.include list, and seeds the 11 currently-missing specifiers so the cold optimize completes in one startup pass (no deps_ssr re-optimize race). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
🦋 Changeset detectedLatest commit: 0f88ba6 The changes in this PR will be included in the next version bump. This PR includes changesets to release 14 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
@emdash-cms/admin
@emdash-cms/auth
@emdash-cms/auth-atproto
@emdash-cms/blocks
@emdash-cms/cloudflare
@emdash-cms/contentful-to-portable-text
emdash
create-emdash
@emdash-cms/gutenberg-to-portable-text
@emdash-cms/plugin-cli
@emdash-cms/plugin-types
@emdash-cms/registry-client
@emdash-cms/registry-lexicons
@emdash-cms/sandbox-workerd
@emdash-cms/x402
@emdash-cms/plugin-ai-moderation
@emdash-cms/plugin-atproto
@emdash-cms/plugin-audit-log
@emdash-cms/plugin-color
@emdash-cms/plugin-embeds
@emdash-cms/plugin-field-kit
@emdash-cms/plugin-forms
@emdash-cms/plugin-webhook-notifier
commit: |
There was a problem hiding this comment.
The approach is sound and targeted. Adding a deterministic guard test that asserts zero post-startup re-optimizations is exactly the right way to prevent a non-deterministic deps_ssr re-optimize race from regressing. The seeding of runtime-reached specifiers into the Cloudflare-only ssr.optimizeDeps.include branch is appropriately scoped — no production build impact, no Node adapter impact.
I checked the new optimize-deps entries against package exports, verified the shared port doesn't collide with the existing site matrix, and traced the virtual-module call sites that justify each middleware and adapter entrypoint. The changeset is present and correctly scoped.
Two issues in the new test file should be addressed before merge:
-
fetchWithRetrysilently returns when exhausted. Unlike the identical helper insite-matrix-smoke.test.ts, this version falls through without throwing if the server is down or consistently 500s. That means the route-driving fetches can fail silently, the imports are never reached, and the test may falsely pass because no re-optimize lines appear. It should throwlastErrorafter the loop. -
server.killedprevents the SIGKILL fallback. In Node.jsChildProcess,killedis set totrueas soon askill()is called, not when the process exits. Afterserver.kill("SIGTERM")theif (!server.killed)guard is always false, so SIGKILL is never sent. Zombie dev-server processes can leak in CI. The idiomatic fix is to checkserver.exitCode === nullinstead.
Once those reliability issues are fixed, this is good to merge.
| // route so its imports are reached. | ||
| for (let attempt = 0; attempt <= retries; attempt++) { | ||
| try { | ||
| const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) }); |
There was a problem hiding this comment.
[needs fixing] fetchWithRetry exhausts all retries without throwing, so a crashed or persistently-500 server causes the route-driving fetches to fail silently. The imports are never reached, and the test may falsely pass because no re-optimize logs appear.
The sibling helper in site-matrix-smoke.test.ts correctly throws lastError after the loop. This implementation should do the same.
| // route so its imports are reached. | |
| for (let attempt = 0; attempt <= retries; attempt++) { | |
| try { | |
| const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) }); | |
| async function fetchWithRetry(url: string, retries = 8, delayMs = 1500): Promise<void> { | |
| let lastError: unknown; | |
| for (let attempt = 0; attempt <= retries; attempt++) { | |
| try { | |
| const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) }); | |
| if (res.status < 500) return; | |
| lastError = new Error(`${url} returned ${res.status}`); | |
| } catch (error) { | |
| lastError = error; | |
| } | |
| if (attempt < retries) await new Promise((r) => setTimeout(r, delayMs)); | |
| } | |
| throw lastError instanceof Error ? lastError : new Error(`Request failed for ${url}`); | |
| } |
There was a problem hiding this comment.
This is a well-targeted bug fix with a good approach. Pre-bundling request-time SSR deps in the Cloudflare branch of ssr.optimizeDeps.include is the right fix for the cold-cache re-optimize race, and adding a regression smoke test that asserts zero post-startup re-optimizations is exactly the guard needed to prevent future deps from slipping through. Serializing smoke tests with fileParallelism: false is the correct call because the new test wipes the shared .vite cache.
I checked the diff against origin/main, verified all added specifiers are real package exports (emdash/ui, emdash/runtime, emdash/media/local-runtime, @emdash-cms/cloudflare/db/d1, astro/zod, etc.), traced sibling smoke-test patterns, and reviewed AGENTS.md conventions. The production change (vite-config.ts) is clean and correct. The changeset is present and properly scoped.
The new test has one real issue that should be fixed before merge: fetchWithRetry silently returns after exhausting retries instead of throwing, which means the test can pass without driving the frontend and dev-bypass routes and miss future missing deps. I also noted a minor cleanup-code issue copied from the existing smoke suite.
There was a problem hiding this comment.
This PR fixes a real, well-understood Cloudflare dev-server issue where missing ssr.optimizeDeps.include entries cause Vite to re-optimize SSR deps at request time on a cold cache, leading to a deps_ssr chunk-hash race and in-flight request crashes. The approach is sound and narrowly scoped:
- The new specifiers added to the Cloudflare branch are all verified exports (
emdash/ui,emdash/runtime,emdash/middleware/*,astro/zod,@emdash-cms/cloudflare/db/d1,@emdash-cms/cloudflare/storage/r2, plus bare forms of@lingui/react,@oslojs/crypto/*, and@cloudflare/kumo/primitives). The inline comments clearly explain why each group is needed and warn against removing apparent "duplicates." - The regression test is a pragmatic guard: it boots
templates/starter-cloudflarewith a wiped.vitecache, drives the core routes, and asserts zero post-startup re-optimization log lines. This is deterministic and catches the root cause rather than the flaky downstream crash. fileParallelism: falseinvitest.smoke.config.tsis the correct fix to prevent cache races between smoke test files.- The changeset targets
emdashwith a patch, which is correct.
One blocking finding on the test: fetchWithRetry in the new smoke test silently returns when all retries are exhausted instead of throwing. If a route persistently fails, the test proceeds without having exercised that route's imports, which defeats the test's core purpose of discovering missing force-includes. The sibling site-matrix-smoke.test.ts uses the same helper name but throws on exhaustion; the new test should match that behavior.
Once the test helper fails loudly on exhaustion, this is good to go.
There was a problem hiding this comment.
The approach is sound: this is a targeted dev-server-only bug fix for a well-understood Vite cold-cache behavior on the Cloudflare adapter. Adding a failing smoke test, expanding ssr.optimizeDeps.include with the missing specifiers, and serializing smoke files so they don't race on a shared .vite cache is the right fix.
I checked the Vite config additions against the package export maps (emdash/* and @emdash-cms/cloudflare/*), traced the astro/zod usage in the tree, and compared the new test utilities to the existing smoke-suite patterns. The include list and changeset are correct.
However, the new test file copies two patterns from sibling smoke code that are actually bugs:
-
fetchWithRetrysilently returns after exhausting retries. Unlike the same-named helper insite-matrix-smoke.test.ts, the new version never throws. If the server persistently 500s or drops connections, the test proceeds without having actually driven the routes, producing a false green (no re-optimize logs because the lazy imports were never reached). -
server.killedis alreadytrueafterkill("SIGTERM"), so the SIGKILL fallback is dead code. A hung server process would leak. The check should look at whether the process has actually exited.
Both issues are in the new dep-optimizer-cold-cache.test.ts file and should be fixed before merge.
| } finally { | ||
| server.kill("SIGTERM"); | ||
| await new Promise((r) => setTimeout(r, 1200)); | ||
| if (!server.killed) server.kill("SIGKILL"); |
There was a problem hiding this comment.
[needs fixing] server.kill("SIGTERM") sets server.killed = true synchronously, so if (!server.killed) is always false and the SIGKILL fallback is dead code. A hung server process would leak past the finally block.
Use exitCode / signalCode to check whether the process has actually exited:
| if (!server.killed) server.kill("SIGKILL"); | |
| server.kill("SIGTERM"); | |
| await new Promise((r) => setTimeout(r, 1200)); | |
| if (server.exitCode === null && server.signalCode === null) { | |
| server.kill("SIGKILL"); | |
| } | |
| await new Promise((r) => setTimeout(r, 500)); |
There was a problem hiding this comment.
This is the right change for the right problem. The cold-cache SSR deps_ssr race on the Cloudflare adapter branch is real, and extending ssr.optimizeDeps.include with runtime-reached specifiers (middleware chain, lazy adapter entrypoints, UI/runtime subpaths, and bare duplicates of already-chained deps) matches the existing pattern for auth and MCP entrypoints in the same file. The new smoke-test guard asserting zero post-startup re-optimizations is a good regression-prevention mechanism, and serializing smoke files with fileParallelism: false correctly prevents shared .vite cache races.
I found one confirmed issue in the new test: fetchWithRetry silently returns after exhausting all retries instead of throwing. If the server is still returning 500s or is unreachable when the retries run out, the test proceeds as if the route was successfully driven, so imports for that route are never reached and a missing dep could go undetected. The sibling site-matrix-smoke.test.ts already implements the correct throw-after-exhaustion pattern. Fix that and this looks good to land.
| // route so its imports are reached. | ||
| for (let attempt = 0; attempt <= retries; attempt++) { | ||
| try { | ||
| const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) }); | ||
| if (res.status < 500) return; | ||
| } catch { | ||
| // retry | ||
| } | ||
| if (attempt < retries) await new Promise((r) => setTimeout(r, delayMs)); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Resolve once server output stops growing for `quietMs`, capped at `capMs`. |
There was a problem hiding this comment.
[needs fixing] fetchWithRetry silently returns when every attempt fails (connection error or status ≥ 500). The callers don’t check the return value, so the test can proceed without ever having driven the route. If the imports for that route are never reached, a missing dep won’t be discovered and the test will give a false negative.
Mirror the sibling implementation in site-matrix-smoke.test.ts: track the last error and throw after the loop.
| // route so its imports are reached. | |
| for (let attempt = 0; attempt <= retries; attempt++) { | |
| try { | |
| const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) }); | |
| if (res.status < 500) return; | |
| } catch { | |
| // retry | |
| } | |
| if (attempt < retries) await new Promise((r) => setTimeout(r, delayMs)); | |
| } | |
| } | |
| /** | |
| * Resolve once server output stops growing for `quietMs`, capped at `capMs`. | |
| async function fetchWithRetry(url: string, retries = 8, delayMs = 1500): Promise<void> { | |
| let lastError: unknown; | |
| for (let attempt = 0; attempt <= retries; attempt++) { | |
| try { | |
| const res = await fetch(url, { redirect: "manual", signal: AbortSignal.timeout(15_000) }); | |
| if (res.status < 500) return; | |
| lastError = new Error(`${url} returned ${res.status}`); | |
| } catch (error) { | |
| lastError = error; | |
| } | |
| if (attempt < retries) await new Promise((r) => setTimeout(r, delayMs)); | |
| } | |
| throw lastError instanceof Error ? lastError : new Error(`Request failed for ${url}`); | |
| } |
|
This should have been fixed already by #1359 |
Address PR review feedback on the dep-optimizer cold-cache guard: - fetchWithRetry now throws lastError after exhausting retries instead of returning silently, so a persistently-5xx/unreachable server fails the test rather than producing a false green with the routes never driven. - The SIGKILL fallback now checks `server.exitCode === null` instead of `server.killed` (which flips true the moment kill() is called, not on exit), so a server that ignores SIGTERM is actually force-killed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Thanks for the reviews. Addressed the test reliability feedback and reconciled with the #1359 revert. Test fixes (
Re: @ascorbic — "already fixed by #1359": #1359 was reverted on But the include-list change isn't really the point of this PR — the regression test is. Whichever mechanism we land for the immediate cold-cache race, the durable value here is the guard: it boots a cold-cache Cloudflare site, drives the routes, and asserts zero post-startup re-optimizations. That fails automatically the next time any request-time SSR dep goes unoptimized — including future deps that don't exist yet — instead of someone rediscovering the Happy to keep just the test on top of whatever fix you prefer. |
What does this PR do?
On a cold Vite cache,
pnpm devin a Cloudflare-adapter EmDash site could throw repeatedly on the first SSR requests:Root cause: the
cloudflarebranch ofssr.optimizeDeps.includeinpackages/core/src/astro/integration/vite-config.tspre-bundles SSR deps so workerd loads them in one cold-startup pass, but several deps are only reached at request time — the middleware chain (behind the excludedvirtual:emdashboundary), lazy adapter entrypoints, first-render imports, and re-export specifiers likeastro/zod. Vite's startup scan misses them, discovers each at request time, and re-optimizes; overlapping re-optimizations race ondeps_ssr/chunk hashes and crash in-flight requests.This PR:
packages/core/tests/integration/smoke/dep-optimizer-cold-cache.test.ts) that bootstemplates/starter-cloudflarewith a wiped.vitecache, drives the dev-bypass / frontend / admin routes, and asserts Vite logs zero post-startup dependency re-optimizations. This fails automatically whenever a request-time SSR dep is missing from the include list — including future ones — instead of someone chasing the crash by hand. (The crash itself is a non-deterministic race; the test asserts on the deterministic upstream re-optimize signal.)@emdash-cms/cloudflare/db/d1+storage/r2,emdash/ui+runtime+media/local-runtime,astro/zod, plus bare forms of a few admin/auth deps the test caught re-optimizing).fileParallelism: false) so the new test never races the existing site-matrix smoke suite on a shared site's.vitecache.These additions are dev-server-only (
ssr.optimizeDeps.include) and Cloudflare-only — no effect on productionastro buildoutput or the Node adapter branch.Type of change
Checklist
pnpm typecheckpassespnpm lintpasses — clean for all files changed in this PR; the repo has 8 pre-existing diagnostics indemos/*/templates/*astro.config.mjsfiles unrelated to this change.pnpm testpasses (or targeted tests for my change) — the cold-cache guard test was verified RED→GREEN locally; CI'stest-smokejob runs it as the authoritative check.pnpm formathas been runemdashpatch)AI-generated code disclosure
Screenshots / test output
RED (before the fix), the guard reported re-optimize lines such as:
GREEN after seeding the specifiers: 0 re-optimizations, 0 crash lines.
🤖 Generated with Claude Code