Problem
GET /api/sessions/{id}/codex-models intermittently returns 500 after 120 seconds, which gets wrapped as a 502 Bad Gateway by Cloudflare Tunnel.
Root Cause
In cli/src/modules/common/codexModels.ts, listCodexModels() creates a new CodexAppServerClient on every call, which spawns a fresh codex app-server child process:
export async function listCodexModels(includeHidden: boolean = false): Promise<CodexModelSummary[]> {
const client = new CodexAppServerClient();
try {
await client.connect(); // spawns "codex app-server" process
await client.initialize(...); // 30s timeout
const response = await client.listModels({ includeHidden }); // 30s timeout
...
} finally {
await client.disconnect(); // kills the process
}
}
The hub-side RPC timeout is MODEL_LIST_RPC_TIMEOUT_MS = 120_000 (120s). When the spawned codex app-server is slow to respond (e.g., OpenAI token refresh stalls), the RPC times out at 120s and returns 500.
Evidence
From hub.log, out of ~168 codex-models requests:
- 163 (97%) succeeded in 1-5 seconds
- 5 (3%) timed out at exactly 120s → 500
From runner.log, the codex app-server processes spawned for model listing consistently exit within ~1 second:
[09:21:39.748] List Codex models request
[09:21:39.748] [CodexAppServer] Connected
[09:21:40.147] Codex app-server exited (code=0, signal=null)
[09:21:40.157] [CodexAppServer] Disconnected
The 1-second exit suggests the app-server sometimes fails silently (exits before completing listModels), triggering the full 120s RPC timeout on the hub side.
Environment
- hapi: 0.19.0
- codex-cli: 0.136.0
- OS: Ubuntu 24.04 (Linux x86_64)
- codex auth mode: chatgpt (OAuth token with refresh)
Suggested Fix
- Cache model list on the runner/machine level (models rarely change, cache for 5-10 minutes)
- Reuse a persistent app-server instead of spawn-per-request
- Reduce RPC timeout — 120s is excessive for a model list that normally takes <5s
- Add faster failure detection — if the app-server exits early, return an error immediately instead of waiting for the full RPC timeout
Problem
GET /api/sessions/{id}/codex-modelsintermittently returns 500 after 120 seconds, which gets wrapped as a 502 Bad Gateway by Cloudflare Tunnel.Root Cause
In
cli/src/modules/common/codexModels.ts,listCodexModels()creates a newCodexAppServerClienton every call, which spawns a freshcodex app-serverchild process:The hub-side RPC timeout is
MODEL_LIST_RPC_TIMEOUT_MS = 120_000(120s). When the spawnedcodex app-serveris slow to respond (e.g., OpenAI token refresh stalls), the RPC times out at 120s and returns 500.Evidence
From
hub.log, out of ~168codex-modelsrequests:From
runner.log, thecodex app-serverprocesses spawned for model listing consistently exit within ~1 second:The 1-second exit suggests the app-server sometimes fails silently (exits before completing
listModels), triggering the full 120s RPC timeout on the hub side.Environment
Suggested Fix