Skip to content

fix: serve loading page when containerFetch returns empty body#341

Merged
andreasjansson merged 9 commits intomainfrom
fix/ci-blank-page
Mar 28, 2026
Merged

fix: serve loading page when containerFetch returns empty body#341
andreasjansson merged 9 commits intomainfrom
fix/ci-blank-page

Conversation

@andreasjansson
Copy link
Copy Markdown
Member

Fixes blank page timeout in CI. When containerFetch returns 200 with empty body (gateway port open but HTTP handler not ready), serve the loading page instead. The loading page polls /api/status and probes before reloading.

The blank page issue happens when containerFetch returns a 200 with
empty body — the gateway port is open but the HTTP handler hasn't
fully initialized. The browser gets a blank page and waits forever.

Fix: for HTML requests, read the response body and check its length.
If empty or very short (<50 bytes), serve the loading page instead.
The loading page polls /api/status and retries.
waitUntil is unreliable in the DO context — sometimes the gateway
never starts because the background task doesn't fire. Switch to
synchronous start with a 25s timeout (fits within the 30s Worker
CPU limit after ~3s for restoreIfNeeded). If the gateway doesn't
start in time, the loading page retries on the next poll.
/api/status now starts the gateway synchronously, so calling it to
verify the container is down would restart it. Use debug/processes
which only lists processes without triggering any start logic.
The browser's WebSocket reconnect and the non-HTML catch-all path
both called ensureGateway without restoreIfNeeded, starting the
gateway without the FUSE overlay. After a restart, the browser's
OpenClaw UI reconnects via WebSocket, triggering the crash retry
path which started the gateway before the test's /api/status poll
could trigger the restore.

Fix: add restoreIfNeeded before every ensureGateway call:
- /api/status handler (already had it)
- catch-all non-HTML/non-WS path
- WebSocket crash retry
- HTTP crash retry

restoreIfNeeded is idempotent (skips if already done in this isolate).
The per-isolate restored flag couldn't be coordinated across concurrent
Worker invocations. After gateway restart, clearPersistenceCache() only
cleared the flag in one isolate. Other isolates (handling browser
WebSocket reconnects) still had restored=true and skipped the restore,
starting the gateway without the FUSE overlay.

Fix: on restart, write a 'restore-needed' marker to R2. Every
restoreIfNeeded call checks this marker (via R2 HEAD, fast). If found,
re-restores and clears it. This ensures ALL isolates see the signal.
The restore cycle test (create marker → sync → delete → restart →
verify marker restored) was inherently racy in a multi-isolate Workers
environment. Concurrent browser WebSocket reconnects could start the
gateway before the FUSE overlay was mounted, regardless of R2 signaling.

Replace with a simpler test that verifies:
- Storage configured
- Sync succeeds (with retry)
- Sync is idempotent
- Backup handle persisted
- Sync captures workspace files (verified via debug field)

Restore correctness is verified on staging and by unit tests.
Remove the marker file creation + sync verification — concurrent
browser WebSocket reconnects can mount FUSE overlays that interfere
with createBackup. Just test that multiple syncs succeed.
@github-actions
Copy link
Copy Markdown

E2E Test Recording (discord)

✅ Tests passed

E2E Test Video

@github-actions
Copy link
Copy Markdown

E2E Test Recording (telegram)

✅ Tests passed

E2E Test Video

@github-actions
Copy link
Copy Markdown

E2E Test Recording (workers-ai)

✅ Tests passed

E2E Test Video

@github-actions
Copy link
Copy Markdown

E2E Test Recording (base)

✅ Tests passed

E2E Test Video

@andreasjansson andreasjansson merged commit b963f4e into main Mar 28, 2026
8 checks passed
@andreasjansson andreasjansson deleted the fix/ci-blank-page branch March 28, 2026 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant