perf(onboard): add deadline-based gateway wait#2492
perf(onboard): add deadline-based gateway wait#2492HOYALIM wants to merge 7 commits intoNVIDIA:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a synchronous polling primitive Changes
Sequence Diagram(s)sequenceDiagram
participant Onboard as Onboard/startGatewayWithOptions
participant Wait as waitUntil
participant Repair as RepairSecrets
participant Meta as AttachMetadata
participant Select as SelectGateway
participant Probe as HealthProbes
rect rgba(200,220,255,0.5)
Onboard->>Wait: call waitUntil(condition, options)
end
rect rgba(220,255,200,0.5)
Wait->>Repair: attempt repair of missing bootstrap secrets
Wait->>Meta: conditionally attach gateway metadata
Wait->>Select: re-select gateway if applicable
Wait->>Probe: run probes (status, gateway info -g <name>, gateway info)
Probe-->>Wait: return probe results
Wait-->>Onboard: isGatewayHealthy? (condition result)
end
rect rgba(255,230,200,0.5)
alt healthy
Onboard->>Onboard: proceed
else not healthy & attempts remain
Wait->>Wait: sleep (bounded/backoff) and retry
else deadline/attempts exhausted
Wait-->>Onboard: return false -> throw Error("Gateway failed to start")
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Pull request overview
This PR introduces a reusable, synchronous, deadline-based polling helper and applies it to the gateway startup health wait loop as part of the onboard latency work (refs #2001), while aiming to preserve existing health poll env knob behavior.
Changes:
- Added
waitUntil(...)tosrc/lib/wait.tswith deadline, backoff, injectable clock/sleeper hooks, and optional attempt cap. - Replaced the gateway startup health polling loop in
src/lib/onboard.tswith the newwaitUntil(...)helper. - Expanded
test/wait.test.tswith focused unit tests forwaitUntil(...)behavior (immediate success, retries, deadline, backoff, and attempt caps).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/wait.test.ts | Adds unit coverage for the new waitUntil(...) helper. |
| src/lib/wait.ts | Introduces the waitUntil(...) synchronous polling primitive and options. |
| src/lib/onboard.ts | Switches gateway health polling to use waitUntil(...) with existing poll count/interval inputs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
test/wait.test.ts (1)
42-172: Add a contract test for invaliddeadlineMs.Please add a test that asserts
waitUntilthrowsTypeErrorfor non-finitedeadlineMs(e.g.,NaN), so the input validation path is locked in.✅ Suggested test
+ it("waitUntil throws when deadlineMs is non-finite", () => { + expect(() => + waitUntil(() => false, { + deadlineMs: Number.NaN, + }), + ).toThrow(TypeError); + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/wait.test.ts` around lines 42 - 172, Add a new test in wait.test.ts using the existing waitUntil harness that verifies passing a non-finite deadlineMs (e.g., NaN) causes waitUntil to throw a TypeError: call waitUntil with a simple predicate and options including deadlineMs: NaN plus the same now and sleep mocks used in other tests, and assert with expect(() => waitUntil(...)).toThrow(TypeError) so the input validation path for deadlineMs is locked in.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3512-3541: The deadline calculation (healthDeadlineMs) only
accounts for sleep budget and can expire while each waitUntil attempt runs
multiple shell-outs (repairGatewayBootstrapSecrets, runCaptureOpenshell,
isGatewayHealthy), shortening the effective retry window; fix by extending or
removing the deadline: either compute healthDeadlineMs to also include an
allowance for per-attempt probe runtime (e.g., add healthPollCount *
probeTimeoutMs or add a conservative maxProbeRuntimeMs) so the deadline >= total
sleep budget + total probe runtime, or omit deadlineMs from the waitUntil call
to rely solely on maxAttempts/healthPollCount; update the healthDeadlineMs
computation or the waitUntil invocation accordingly (references:
healthPollIntervalMs, healthDeadlineMs, healthPollCount, waitUntil,
repairGatewayBootstrapSecrets, runCaptureOpenshell, isGatewayHealthy).
In `@src/lib/wait.ts`:
- Around line 98-99: The loop can hot-spin when intervalMs (or maxIntervalMs) is
0 and maxAttempts is unbounded; change the sleeper call and interval update to
enforce a minimum non-zero sleep (e.g., MIN_SLEEP_MS = 1) so you never call
sleeper(0). Specifically, in the block using sleeper(Math.min(intervalMs,
deadlineMs - currentMs)) and intervalMs = Math.min(maxIntervalMs, intervalMs *
backoffFactor), clamp the sleep duration with Math.max(MIN_SLEEP_MS, ...) and
ensure intervalMs is also floored to at least MIN_SLEEP_MS after applying
backoff so the loop yields CPU even when initialIntervalMs/maxIntervalMs are
zero (references: sleeper, intervalMs, maxIntervalMs, backoffFactor, deadlineMs,
currentMs).
---
Nitpick comments:
In `@test/wait.test.ts`:
- Around line 42-172: Add a new test in wait.test.ts using the existing
waitUntil harness that verifies passing a non-finite deadlineMs (e.g., NaN)
causes waitUntil to throw a TypeError: call waitUntil with a simple predicate
and options including deadlineMs: NaN plus the same now and sleep mocks used in
other tests, and assert with expect(() => waitUntil(...)).toThrow(TypeError) so
the input validation path for deadlineMs is locked in.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 88a34e7e-6332-4895-82d8-e2cfc9f03551
📒 Files selected for processing (3)
src/lib/onboard.tssrc/lib/wait.tstest/wait.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/wait.ts`:
- Around line 94-108: The loop currently calls condition() even after the
deadline guard because the deadline check (using now() and deadlineMs) comes
after condition(); move the deadline/time check so it runs before invoking
condition() (and keep the existing attempts checks), ensuring you check
Number.isFinite(currentMs) and currentMs >= deadlineMs prior to calling
condition() to avoid the extra probe; update the loop in src/lib/wait.ts
(references: attempts, maxAttempts, deadlineMs, now(), condition()) accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d13eeb32-c6e5-4ffb-bbea-dae420740d03
📒 Files selected for processing (2)
src/lib/wait.tstest/wait.test.ts
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/lib/wait.ts (1)
93-109:⚠️ Potential issue | 🟠 MajorHonor an already-expired deadline before the first probe.
Line 95 only checks the deadline after at least one attempt, so
condition()on Line 107 still runs once whendeadlineMs <= now()at entry. For callers using this to drive real health probes, that is still one probe past the requested budget. Please move the deadline guard out of theattempts > 0branch and add a regression test for an already-expired deadline.Suggested fix
let attempts = 0; for (;;) { - if (attempts > 0) { - const currentMs = now(); - if (!Number.isFinite(currentMs) || currentMs >= deadlineMs) { - return false; - } + const currentMs = now(); + if (!Number.isFinite(currentMs) || currentMs >= deadlineMs) { + return false; } if (attempts >= maxAttempts) { return false; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/wait.ts` around lines 93 - 109, The loop currently only checks the deadline after the first probe (gated by attempts > 0), so move the deadline guard to run on every iteration before invoking condition(): call const currentMs = now(); if (!Number.isFinite(currentMs) || currentMs >= deadlineMs) return false; (remove the attempts > 0 gate) so an already-expired deadline causes an immediate false; also add a regression test that sets deadlineMs <= now() and asserts the wait function returns false (and that condition() is not invoked) to prevent regressions; refer to symbols attempts, now(), deadlineMs, condition(), and maxAttempts when making changes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@src/lib/wait.ts`:
- Around line 93-109: The loop currently only checks the deadline after the
first probe (gated by attempts > 0), so move the deadline guard to run on every
iteration before invoking condition(): call const currentMs = now(); if
(!Number.isFinite(currentMs) || currentMs >= deadlineMs) return false; (remove
the attempts > 0 gate) so an already-expired deadline causes an immediate false;
also add a regression test that sets deadlineMs <= now() and asserts the wait
function returns false (and that condition() is not invoked) to prevent
regressions; refer to symbols attempts, now(), deadlineMs, condition(), and
maxAttempts when making changes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ab7479bd-599b-456e-8318-8c5f61cdf8fa
📒 Files selected for processing (2)
src/lib/wait.tstest/wait.test.ts
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/wait.ts`:
- Around line 95-116: The code uses currentMs captured before calling
condition(), which can be stale if condition() is slow; update the timestamp
after the condition() call and before computing remainingMs/sleepDurationMs so
the sleep budget respects deadlineMs and avoids sleeping past the deadline; in
the wait function recompute currentMs = now() (or equivalent) after condition()
returns and use that value to derive remainingMs, requestedSleepMs and to clamp
sleeper(...) (referencing currentMs, now(), deadlineMs, condition(),
remainingMs, requestedSleepMs, MIN_UNCAPPED_SLEEP_MS, hasAttemptCap, and
sleeper).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3b8e8326-56d6-4585-bb6e-fa32ebce51cf
📒 Files selected for processing (2)
src/lib/wait.tstest/wait.test.ts
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
bb47782 to
5a709c4
Compare
|
✨ Thanks for submitting this pull request that proposes a way to improve the performance of NemoClaw's onboard process by introducing a deadline-based wait primitive. Related open issues: |
Summary
Adds a reusable deadline-based wait primitive and applies it to the gateway startup health loop.
This is a scoped first step for the broader onboard latency work in #2001 and keeps the existing health poll environment knobs compatible.
Related Issue
Refs #2001.
Changes
src/lib/wait.tswith synchronouswaitUntil(...)using an absolute deadline, configurable interval backoff, injectable clock/sleeper hooks, and an optional attempt cap.src/lib/onboard.tswithwaitUntil(...).NEMOCLAW_HEALTH_POLL_COUNTandNEMOCLAW_HEALTH_POLL_INTERVALbehavior by passing the configured attempt count and interval into the new wait helper.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)Additional validation run locally:
npm run build:clipasses./node_modules/.bin/vitest run test/wait.test.ts test/gateway-start-wait.test.tspassesnpm run typecheck:clipasses./node_modules/.bin/prettier --check src/lib/wait.ts test/wait.test.tspassesgit diff --checkpassesNotes:
npx prek run --all-filesand fullnpm testwere not completed in the Codex sandbox because the local sandbox blocks GitHub release metadata fetches and localhost/IPC socket binding.AI Disclosure
Signed-off-by: Ho Lim subhoya@gmail.com
Summary by CodeRabbit
New Features
Refactor
Tests