feat(cursor): operator-driven legacy stream-json -> ACP session migrator (transplant)#34
feat(cursor): operator-driven legacy stream-json -> ACP session migrator (transplant)#34heavygee wants to merge 20 commits into
Conversation
…tor (transplant) Adds a per-session and bulk migrator that moves pre-ACP Cursor sessions (left over from tiann#799's cutover) into the on-disk location `agent acp` reads from, then flips `cursorSessionProtocol = 'acp'` so the existing cursorAcpRemoteLauncher picks the session up on next resume. The migration is a transplant, not a replay: the legacy `store.db` is byte-identical in schema to the ACP one; only the directory layout differs. The migrator copies the legacy `~/.cursor/chats/<wsh>/<uuid>/store.db` into `~/.cursor/acp-sessions/<uuid>/`, synthesizes `meta.json`, drives `initialize + session/load + session/prompt` against a real `agent acp` in an isolated `$HOME` (staging - no pollution of the operator's real store), preserves `lastUsedModel` from the legacy meta record onto the HAPI sessions.model column, then removes the legacy source. Each step has a rollback path. Surfaces: - POST /api/sessions/:id/migrate-to-acp (web button) - POST /api/cursor/migrate-legacy + GET /api/cursor/legacy-candidates - Mirror endpoints under /cli/* (Bearer cliApiToken) for the CLI - `hapi cursor migrate <id> | --all | --dry-run | --keep-source | --force-archive-running | --list` - Web UI menu item on legacy session rows - Auto hapi.db backup before any bulk run Closes tiann#824 Refused-by-default cases: - non-cursor flavor; already on ACP; missing on-disk store; pre-existing ACP target dir; lifecycleState=running (override with --force-archive-running, which archives the session first and waits for the launcher to release the store.db lock). Verified end-to-end: - 25 unit tests covering every refusal and rollback branch (mocked `agent acp` probe + mocked metadata update) - 3 integration tests spawning real `agent acp` (opt-in via `CURSOR_AGENT_INTEGRATION=1`); one drives a synthetic store, two accept operator-supplied fodder uuids via env so real on-disk stores can be exercised without committing private fixtures.
…rict TS
CI typecheck (stricter than the local Bun-resolved tsc) flagged accesses
to ok-success fields (acpSessionId, sourceRemoved, etc.) as unsafe
because the success branch only had `dryRun?: false` while the dry-run
branch had `dryRun: true`. Making `dryRun: false` REQUIRED on the
production-success branch turns the union into a hard discriminator and
lets call sites narrow with `if (out.ok && !out.dryRun) {...}`.
No runtime change to the migrator's behavior - production-success
already returned the full payload; the test assertions just couldn't
prove it to tsc.
Also wires the explicit `dryRun: false` into the migrator's success
return and updates the tests to use the now-discriminator-friendly
narrowing.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2e3d0a18fc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Self-cold-review pass: tryRm(dirname(legacy.storeDbPath)) with recursive=true would silently destroy any non-store.db artifact a future cursor-agent version might drop into the chat dir. Use rmdirSync (only succeeds on empty dirs) so unknown files survive. In practice today only store.db, store.db-wal, store.db-shm ever live there (verified against operator's fodder), so behavior is unchanged on current cursor-agent; this is a defense-in-depth tweak.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a731c1ec55
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…, P2 401-refresh) P1 (WAL preservation) — Without checkpoint, un-checkpointed transactions in the legacy store.db-wal sibling would either appear stale in the ACP target or be lost outright when the cleanup step removes the WAL file. Added a PRAGMA wal_checkpoint(TRUNCATE) flush of the legacy store before any cp, inside the same pre-flight window where the launcher has been confirmed quiescent. Idempotent on non-WAL stores (no-op). P2 (skipVerify must still run session/load) — The schema docstring described skipVerify as skipping the prompt step. The implementation was bypassing the entire verify block, which would have let a corrupted transplant flip HAPI metadata. Split the verify into 'always-load, optionally-prompt'; skipVerify now skips ONLY the prompt. Tests updated: the existing skipVerify-happy-path expectation rewritten + a new test verifies skipVerify still refuses on session/load failure. P2 (web client 401 refresh) — migrateCursorSessionToAcp now triggers the same onUnauthorized refresh dance as the shared request() helper instead of hard-failing on an expired JWT. Matches existing session-action UX.
…ues)
P1 — Lock release must do a real SQLite busy-probe, not just stat-poll.
killSession kicks off cleanup asynchronously; an idle-but-not-yet-closed
runner can let the stat-based size-stability heuristic report 'released'
while still owning the file lock. Added sqliteLockHeldByOtherProcess()
which opens the legacy store with BEGIN IMMEDIATE; SQLITE_BUSY is the
authoritative signal that another process holds a write lock. The
size-poll is retained as a secondary signal for partial writes.
P2 — preflightSession now refuses on session.active=true as well as
lifecycleState==='running'. Some legacy rows may lack lifecycleState
but still be active in the hub cache; the CLI/API paths relied on the
lifecycle-only check, while the web UI already hid the button for
active sessions. Now uniform across all entry points.
P2 — Atomic ACP target creation. Replaced the existsSync + mkdirSync
(recursive: true) TOCTOU with an exclusive mkdirSync(recursive: false);
EEXIST is caught and converted to a target_already_exists refusal. The
rollback path only fires for the dir we exclusively created.
P2 — cursorSessionId basename validation. Reject anything that isn't
[A-Za-z0-9_.-]+ before any path join, with explicit rejection of '.'
and '..'. Prevents path-traversal in either the chats lookup or the
acp-sessions target.
P2 — hapi.db backup now uses SQLite's serialize() so committed pages in
hapi.db-wal are included. The raw cp from before could omit recent
sessions or pre-migration metadata. Added Store.backupTo(destPath)
exposing the consistent-snapshot path.
Tests:
- session.active=true refusal
- cursorSessionId path-traversal refusal (../escape, has/slash, .,
.., has\backslash)
- existing tests unchanged; full hub suite 299 pass.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2cbe6edb41
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 — Lock release uses session.active poll as the primary signal. An archive() kicks off async cleanup; the closest hub-side proxy for "runner process has exited" is sessionCache.active flipping false (set when the runner socket disconnects). The migrator now polls that before falling back to the SQLite busy-probe + size-poll, then sleeps a 500ms grace after all three pass so any in-flight kernel syscall on the WAL/SHM can complete. SyncEngine injects the cache-poll fn. P2 — Abort bulk migration when the hapi.db backup fails. The previous code logged + continued; without the backup, operators have no rescue handle if a metadata flip lands wrong. Now throws unless the DB is in-memory (test fixtures). API handlers wrap in try/catch and surface the error as HTTP 500 with the backup-failure message. P2 — Windows agent-binary spawn. AcpVerifyProbe.start() now matches cursorAcpRemoteLauncher's shape with shell:true + windowsHide:true on win32, so the `agent.cmd` shim is reachable. Without this the verify probe fails with ENOENT even though normal Cursor ACP sessions work. P2 — Bulk CLI timeout removed. A finite 30-minute axios timeout could cut off a slow operator run mid-flight (verifyTimeoutMs is 120s per session by default; ~15 sessions hits the cap). The hub-side bulk operation is idempotent per session, so the request can safely block until completion. Operators that want to cap a run can split it with a smaller --lifecycle filter. Tests: full hub suite 299 pass + integration (real agent acp) 3 pass on the 34-msg fodder fixture. Typecheck clean across cli/hub/web. Co-authored-by: Cursor <cursoragent@cursor.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@codex review Operator has topped up Codex credits after hitting the 5h Code Review cap. Current HEAD is
Last automatic review verdict was on commit |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 96578a55b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 — Wait for real runner exit before transplanting. The previous awaitSessionInactive() poll was bogus: archiveSession() synchronously calls handleSessionEnd() which sets sessionCache.active=false, so any cache-based wait returns immediately. An idle Cursor process with no SQLite write lock could also pass the busy-probe instantly while still mid-SIGTERM-cleanup. Replaced with a real minimum dwell time (2s default) inside awaitLockRelease, gated on SQLite busy-probe + size-stability. Removed the SyncEngine-side awaitSessionInactive injection (it was the source of the false reassurance). P1 — Honor the agent ACP active-process lock. Cursor's `agent` binary enforces single-instance semantics (see cli/src/agent/backends/acp/agentCliGuard.ts); spawning the verify probe while another `agent acp` is live can SIGTERM the live one and crash the operator's Cursor session. The migrator now reads the $HAPI_HOME/locks/agent-acp-active/pid file and refuses with a new 'acp_transport_active' reason if the lock points to a live PID. The check is injectable for tests. P2 — Exclude active sessions from archived bulk candidates. The default archived bulk path filtered only on lifecycleState==='running', but preflightSession also treats session.active===true as running for legacy rows that lack lifecycle metadata. Without mirroring that here, `hapi cursor migrate --all` would include an active session, migrateOne would refuse it 'running_refused', and the bulk run would exit as failed. Extracted the filter to a pure exported predicate `isLegacyCursorMigrationCandidate()` and added unit tests. Tests: 35 pass (was 28) + integration 3 pass on the 34-msg fodder. Typecheck clean across cli/hub/web. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review Round 4: addresses the 3 P1/P2 findings from the previous Codex pass on commit 96578a5:
HEAD is now eb446b3. 35 unit pass + 3 integration pass on the 34-msg fodder fixture. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: eb446b391b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-authored-by: Cursor <cursoragent@cursor.com>
P1 — Block resumes while mutating the store. Between preflight and the destructive flip+rm, a session could be resumed via the existing hub paths. The resume reads cursorSessionProtocol from hapi.db (still 'legacy' until the flip), spawns a stream-json runner against the legacy source — which we are about to remove. Added getCurrentSession dependency that the migrator calls right before the metadata flip; if it sees active=true / lifecycleState=running / a concurrent flip to acp, it rolls back the ACP placement and refuses with new reason 'session_resumed_during_migrate' (or 'already_acp' for the concurrent flip case). SyncEngine injects a real impl reading the session cache. P2 — Register ACP verify probes in the active lock. The probe now acquires the same $HAPI_HOME/locks/agent-acp-active/ lock the CLI guard uses, just before spawning agent acp, and releases it AFTER the SIGTERM in stop(). Without this, a concurrent `hapi cursor migrate` or model-list request could see no active ACP lock, spawn another agent acp, and the second spawn would SIGTERM this verify per Cursor's single-instance enforcement (cli/src/agent/backends/acp/agentCliGuard.ts). New `hapiHome` option on AcpVerifyProbe lets tests pin the lock to an isolated dir so they don't clobber the operator's real lock. P2 — Override Windows home variables for the verify spawn. HOME alone is insufficient on Windows; the agent binary may resolve via USERPROFILE / HOMEDRIVE / HOMEPATH and end up reading the operator's real .cursor tree. The probe env now sets all three on win32. P2 — Reject single-session --dry-run. The CLI accepted --dry-run for per-session use but the per-session API has no dryRun field, so the flag was silently dropped and the real migration ran. Now throws a clear error instructing the operator to use --keep-source or the bulk path. P2 — Preserve private permissions on hapi.db backups. Store.backupTo() now passes mode 0o600 to writeFileSync and chmods after as belt-and-braces. Without this the rescue copy (which contains session transcripts) could land world-readable under a permissive umask, even though the live hapi.db is locked down to 0600. Tests: 37 pass (was 35) — added rollback test for resume-mid-migration and concurrent-flip-to-acp scenarios. Integration 3 pass on the 34-msg fodder fixture; integration tests now use isolated $HAPI_HOME to avoid touching the operator's real ACP lock. Typecheck clean. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review Round 5: addresses the 5 P1/P2 findings from the previous Codex pass on commit eb446b3:
HEAD is now be7696a. 37 unit pass (+2 rollback tests) + 3 integration pass. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: be7696a857
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P2 v2 — Atomic agent-acp-active lock acquisition. The previous mkdirSync(recursive:true) + writeFileSync was not race-safe: two concurrent migrations could both pass isAgentAcpTransportActive() and then BOTH 'acquire' the lock by writing their pid in succession. Now uses mkdirSync(recursive:false) which is atomic — exactly one caller wins on EEXIST contention. Stale locks (dead pid) are cleared and retried once. start() throws a typed error on race loss; verifyInTempHome catches it and the migrator surfaces it as the acp_transport_active refusal so the operator sees the authoritative reason. Added 4 unit tests for the lock primitive (acquire / refuse-when-held / stale-cleanup / no-clobber-on-stop). P1 v2 — Authoritative active-check inside the metadata flip. The migrator's getCurrentSession recheck (round 5) was best-effort: a resume landing AFTER the recheck but BEFORE the DB update would still slip through. Moved the same check INSIDE flipCursorSessionProtocolToAcp so the read-active and write-protocol pair are within a single synchronous Bun call (no JS interleaving). Added new flip result 'session-active' + corresponding UpdateAfterMigrateResult variant + corresponding refusal mapping back to 'session_resumed_during_migrate'. Added a unit test that drives the path with updateOverride returning session_active. Tests: 313 unit pass (was 308; +4 probe lock, +1 atomic flip) plus 3 integration on the 34-msg fodder fixture. Typecheck clean across cli/hub/web. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review Round 6: addresses the 2 P1/P2 findings from previous pass on be7696a:
HEAD is now a6908df. 313 unit pass + 3 integration pass on 34-msg fodder fixture. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a6908df576
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 v3 — Detect resume-then-exit during the migration window. The active-flag rechecks (both the migrator's getCurrentSession and the flip-time atomic check) only catch resumes that are STILL active at the moment of the check. A brief resume that came in, wrote turns to store.db/store.db-wal, and exited before the check would slip through — the store.db would have diverged from the snapshot we copied to the ACP target. Added an mtime+size fingerprint of the legacy store.db captured right after the WAL checkpoint; compared immediately before the metadata flip and rolled back with new reason 'legacy_store_modified_during_migrate' on any divergence. P2 — Respect metadata.homeDir + metadata.host. Hub processes can run under a different account than the user who created the Cursor session (service-account / multi-machine setups). The legacy chats live under the recorded session-owner home, not the hub's. The migrator now prefers metadata.homeDir for resolving both the legacy source (~/.cursor/chats) and the ACP target (~/.cursor/acp-sessions), falling back to homeDir() when the field is absent. Cross-host sessions (metadata.host != local hostName) are refused with new reason 'cross_host_session' so we never touch a same-id store under the wrong user account. hostName is a new injectable dep with default process.env.HAPI_HOSTNAME || os.hostname(). P2 v3 — Don't clobber freshly-starting CLI ACP locks. The previous retry path treated 'lock dir exists but no pid file' as stale and cleared it — but the CLI guard's registerActiveAcpTransport does mkdir BEFORE writeFileSync, so a probe landing in that small window would delete the live caller's lock. probeLockHolder now returns 'starting' for the pidless case, which falls through to the refuse- path. Only a present-but-dead pid file is treated as stale. Tests: 317 hub pass (was 313; +cross-host refusal, +fingerprint divergence rollback, +homeDir resolution, +mid-startup probe lock refusal). Integration 3 pass on the 34-msg fodder fixture (with injected hostName). Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review Round 7: addresses the 3 P1/P2 findings from pass on a6908df:
HEAD is now 8145b73. 317 hub pass (+4 new tests) + 3 integration pass. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8145b73551
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P1 v4 — Include WAL/SHM sidecars in the divergence fingerprint. The previous fingerprint only stat'd the main store.db. SQLite stages new commits in store.db-wal before merging into the main file, so a brief resume that wrote turns to the WAL but did not trigger a checkpoint would leave store.db unchanged while the WAL contained the new data. Now fingerprints all three (main, -wal, -shm) and refuses on any change/appearance/disappearance. Added a unit test that simulates the WAL-only divergence case. P2 v2 — Reorder preconditions BEFORE archive. Bulk runs with --force-archive-running could kill a session and then refuse to migrate it for reasons known at preflight time (cross-host, acp_transport_active, target_already_exists). Moved all filesystem/state-only refusals to run before archiveSession so we never amputate a session we cannot then migrate. New tests assert that archiveCalls remains empty for the two refusal cases the bot flagged (cross-host + acp-transport-active). Tests: 320 hub pass (was 317; +1 WAL fingerprint, +2 no-archive-on- refusal). Integration 3 pass on 34-msg fodder. Typecheck clean. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review Round 8: addresses the 3 P1/P2 findings from pass on 8145b73:
HEAD is now 6bc2b78. 320 hub pass (+3 new tests) + 3 integration pass. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6bc2b7853a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
P2 v5 — Treat pid-less ACP locks as active. The CLI agent guard
creates the agent-acp-active lock directory BEFORE writing the pid
file. defaultIsAgentAcpTransportActive() only checked the pid file,
so a CLI startup window reported active=false; bulk migrations with
--force-archive-running would then archive a session and let
verifyInTempHome() refuse on the same lock anyway, leaving the
session orphaned. Now the default treats lock-dir-exists-but-no-pid
(and malformed pid files) as active so we refuse early, before any
archive side effect.
P3 — Bulk dry-run no longer counts as migrated. The previous
`migrated: migrated + dryRun` rollup meant a pure preview reported
`Migrated: N` even though no metadata or filesystem changes were
made. Split CursorMigrateBulkResult.dryRun out so operators and
automation can distinguish preview from real mutation. CLI summary
updated to show both counters.
P2 v5 — Resume-race recheck no longer trips on stale lifecycle.
After our own archiveSession() returns, the cache shows active=
false synchronously but the metadata cleanup that flips
lifecycleState 'running' → 'archived' may still be in flight. The
recheck was refusing as session_resumed_during_migrate in that
window — making force-archive migration of running sessions a
cleanup-race coin flip. Now we trust the active flag as the
authoritative live-runner signal:
- Resume-race recheck: refuse only on active=true (always), or on
lifecycleState=running when wasActive=false (external resume).
- Atomic flip in flipCursorSessionProtocolToAcp: refuse only on
active=true.
Tests: 2 new tests cover the stale-lifecycle no-trip + external-resume
trip branches. 46 pass in migrator suite, 320 in hub overall, 3
integration pass. Typecheck clean across hub/cli/web.
Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review Round 9: addresses the 3 P2/P3 findings from pass on 6bc2b78:
HEAD: 06cb2e6. 320 hub pass (+2 new), 3 integration pass, typecheck clean across hub/cli/web. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06cb2e6bb2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-authored-by: Cursor <cursoragent@cursor.com>
P3 v6 — Dry-run reflects real refusals. Previously every dry-run candidate emitted ok:true/would-migrate without running ANY of the refusal checks. A bulk preview could say "would migrate 50" while the real run refused half (cross-host, missing legacy store, target collision, ACP transport active). Now the migrator accepts a dryRun option and short-circuits after ALL precondition refusals but BEFORE any side effect. Operators get the safety preview they thought they were getting. New tests cover the happy preview + two refusal-during-dry-run cases. P2 v6 — Don't archive stale-running rows with no live runner. wasActive used to be `lifecycleState === 'running' || active`. A stale lifecycle metadata row with active=false has no live runner but force-archive-running would still call archiveSession() on it, which fails because the runner has no registered handler — these sessions then could not be migrated at all. Now wasActive is `session.active === true` only; the stale lifecycle value is treated as metadata to clean up via the upcoming flip. New test asserts archive RPC is NOT called for stale-running-but-inactive rows. Existing force-archive tests updated to set active=true so they still exercise the archive path. P2 v6 — AcpVerifyProbe holds the agent-acp-active lock until child exits. stop() previously sent SIGTERM and released the lock immediately, but agent acp can take 100s of ms to tear down. In serial bulk runs the next verifier could acquire the now-free lock and spawn a second `agent acp` while the previous one was still alive, hitting Cursor's single-instance kill. stop() now awaits exit/close (with a 5s safety ceiling) before releasing. P3 v6 — Close the pid-less ACP lock window. Both CLI guard and the hub probe create the lock dir BEFORE writing the pid file, leaving a sub-syscall window where the dir exists but pid does not. The CLI guard's clearStaleAcpLockIfNeeded() used to delete pid-less dirs as "stale", which could clobber a freshly-acquired lock held by a still-starting holder. The CLI guard now treats pid-less dirs as a mid-startup window (NOT stale); only dirs with a pid file pointing at a dead pid are cleared. The probe's pid write was also tightened to follow mkdirSync with no intervening async. Test updated: pid-less → active (was: → cleared); new test for pidful-but-dead → cleared. Tests: 50 migrator pass (+4 new), 326 hub pass, 5 agentCliGuard pass (1 rewritten + 1 new), 3 integration pass. Typecheck clean across hub/cli/web. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review Round 10: addresses the 4 P2/P3 findings from pass on 06cb2e6:
HEAD: 9e2cd5e. 50 migrator pass (+4 new), 326 hub overall, 5 agentCliGuard pass, 3 integration pass. Typecheck clean across hub/cli/web. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e2cd5e014
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-authored-by: Cursor <cursoragent@cursor.com>
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
@codex review Quota's back. HEAD is c3a984d (9e2cd5e + empty nudge). Round 10 addressed the 4 P2/P3 findings you flagged on 06cb2e6:
CI is green, 326 hub tests + 50 migrator tests + 3 integration tests pass, typecheck clean across hub/cli/web. 0 unresolved threads. Looking for a clean pass. |
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5010bd77b9
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Six P2/P3 findings from chatgpt-codex-connector on commit 5010bd7: 1. P3 — Treat partially-written pid files as active (cli/src/agent/backends/acp/agentCliGuard.ts) The previous round's pid-less check only protected the "no file" case. An empty/malformed pid file (mid-write window between mkdirSync and writeFileSync flush) would fall through to readLockPid() returning null and removeAcpLockDir() would clobber the live holder. Add probeLockPidFile() that returns missing | starting | live | dead. Only the 'dead' state clears the lock. 2. P2 — Clear stale running lifecycle during ACP flip (hub/src/sync/syncEngine.ts) When migrating a stale-running row (active=false, lifecycleState= running) the flip used to preserve all old metadata and only add cursorSessionProtocol:'acp'. Stale lifecycle survived and downstream code keyed on metadata.lifecycleState would keep treating the archived ACP session as live. Set lifecycleState:'archived' when the old value was 'running'. 3. P2 — Kill Windows process tree before releasing ACP lock (hub/src/cursor/acpVerifyProbe.ts) On Windows we spawn 'agent acp' through a shell (shell: true), so proc.kill only signals the wrapper. The agent child can survive and the next verifier acquires the freed lock + spawns a second agent while ours is still running. Use 'taskkill /F /T /PID' to kill the tree on win32 with SIGTERM fallback. 4. P2 — Reserve ACP lock before archiving (hub/src/cursor/cursorLegacyMigrator.ts) isAgentAcpTransportActive() was a point-in-time check; the actual lock acquisition didn't happen until verifyInTempHome much later. In the gap another agent acp could start and a --force-archive migration would kill the legacy session before refusing. Extract tryAcquireAcpActiveLock() as a free helper, acquire it in migrateOne BEFORE archive, hold across the entire mutation window, release in finally. Verify probe gets skipLockAcquire:true so it inherits the caller's hold instead of double-acquiring. 5. P2 — Refuse when ACP lock cannot be claimed (hub/src/cursor/acpVerifyProbe.ts) Non-EEXIST mkdir errors (permission denied, read-only fs, disk full) used to fall through to lockHeld=false and start() spawned agent acp UNGUARDED. Throw instead so verifyInTempHome surfaces the refusal cleanly. 6. P2 — Remove pid-less locks owned by the probe (hub/src/cursor/acpVerifyProbe.ts) If we successfully mkdir'd the lock dir but pid write failed, the old releaseLock() skipped removal because the pid file was absent. With the CLI guard now treating pid-less dirs as 'starting' (active), that left a permanently-stuck lock. We own the mkdir; release() must rmdir even when pidless. Only skip removal when the pid file contains a DIFFERENT valid pid (another holder somehow took over). Tests: - hub/cursor: 50 unit + 2 new lifecycle tests (lock acquired/released on success and on mid-flow refusal) - hub/probe: 12 unit tests including new tryAcquireAcpActiveLock helper, skipLockAcquire bypass behavior, pidless-owned release - cli/guard: 7 tests including empty pid file + malformed pid file treated as starting (mid-write race) - integration: 3 pass with injected acquireAcpActiveLock pointing at tmp dir (isolates from operator's live HAPI_HOME lock) Typecheck clean across hub/cli/web. Total: 335 hub tests + 7 cli + 3 integration pass; 0 unresolved threads. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review HEAD is now 5899a00. Addressed all 6 round-11 findings:
335 hub + 7 cli guard + 3 integration tests pass. Typecheck clean across hub/cli/web. 0 unresolved threads. Looking for a clean pass. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5899a00870
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: fc7b2e15af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Two P2 findings from chatgpt-codex-connector on commit fc7b2e1: 1. P2 — Make real ACP spawns honor the migration lock (cli/src/agent/backends/acp/AcpStdioTransport.ts + agentCliGuard.ts) When the hub-side migrator held the agent-acp-active lock, a normal Cursor ACP resume STILL spawned `agent acp`: registerActiveAcpTransport() used mkdirSync(recursive:true), which silently succeeds on an existing dir and overwrote the migrator's pid file. Worse, AcpStdioTransport spawned agent FIRST and called register SECOND, so even an atomic register couldn't prevent the racy spawn. - registerActiveAcpTransport() now uses mkdirSync(recursive:false) and throws AcpLockHeldError when another live (or mid-startup) process holds the lock. Stale (dead-pid) locks are still cleared and retried once. In-process refcount is preserved so the same CLI process can re-enter without re-claiming the on-disk lock. - AcpStdioTransport reordered: register FIRST, spawn SECOND. On spawn failure the register is released. AcpLockHeldError now propagates to the caller so resume/start refuses cleanly instead of spawning an agent that will get killed by Cursor's single- instance enforcement. - unregisterActiveAcpTransport() now only removes the lock dir when the pid file confirms we own it (or no pid file exists, which means we own the mkdir-but-pid-write-failed case). 2. P2 — Reject WAL reappearance before setting the baseline (hub/src/cursor/cursorLegacyMigrator.ts) PRAGMA wal_checkpoint(TRUNCATE) zeros store.db-wal. The post- checkpoint fingerprint captured whatever was there at the time; if a legacy process resumed and wrote a turn between checkpoint return and our stat(), the WAL grew above zero and became the baseline. Since the verify-copy and ACP-placement copy only store.db, those frames would never transplant — and the post- fingerprint mtime/size comparison would still match, allowing cleanup to delete the legacy WAL with frames lost. Added a post-checkpoint stat: if store.db-wal exists with size > 0, refuse the migration with legacy_store_modified_during_migrate ('between checkpoint and fingerprint capture'). Extracted checkpointLegacyStore as an injectable dep so the unit test can simulate a writer landing in the gap. Tests: - hub/cursor: 53 unit tests, including new 'refuses early when WAL has content immediately after the checkpoint' test that injects a checkpoint stub which leaves 4 bytes in -wal. - cli/guard: 11 tests, including 4 new ones for registerActiveAcpTransport throws-on-held, refuses-on-pidless, in-process-refcount-reuse, and stale-lock-clearing. - cli/vitest: 944 tests (full suite), all green. - integration: 3 pass with real agent acp + injected lock target. Typecheck clean across hub/cli/web. 0 unresolved threads. Co-authored-by: Cursor <cursoragent@cursor.com>
|
@codex review HEAD is now cb614b6. Round 12 addressed both P2 findings:
336 hub + 944 cli (vitest) + 11 cli (bun:test) + 3 integration tests pass. Typecheck clean. 0 unresolved threads. Looking for a clean pass. |
|
Codex Review: Didn't find any major issues. You're on a roll. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Summary
Adds an operator-driven migrator that moves pre-ACP Cursor sessions (left over after tiann#799's cutover) onto ACP via a transplant strategy:
~/.cursor/chats/<wsh>/<uuid>/store.dbinto~/.cursor/acp-sessions/<uuid>/store.dbmeta.jsonsidecar from HAPI's session metadata$HOMEby spawningagent acpand drivinginitialize+session/load+ one tinysession/promptmetadata.cursorSessionProtocol = 'acp'(and preserves legacylastUsedModelontosessions.model)Sees
tiann/hapi#824for the design and the spike report atdocs/plans/2026-06-06-cursor-legacy-to-acp-spike.mdfor the underlying evidence (identical SQLite schema, only the directory layout differs between legacy and ACP).Surfaces
POST /api/sessions/:id/migrate-to-acp(web UI button)POST /api/cursor/migrate-legacy(bulk, auto-backups~/.hapi/hapi.db)GET /cli/cursor/legacy-candidates(listing)/cli/*endpoints for the operator CLI (BearercliApiTokenauth)hapi cursor migrate <id> | --all | --dry-run | --keep-source | --force-archive-running | --listflavor=cursor,cursorSessionProtocol != 'acp', and!session.activeRefusal-by-default contract
lifecycleState === 'running'unless--force-archive-running(which archives the session first and waits for the launcher to release the file lock)Every failure path leaves the legacy
store.dbuntouched. The legacy source is removed ONLY aftersession/load+session/promptsucceed AND thehapi.dbflip lands cleanly. A--keep-sourceflag preserves the legacy store even after a successful migration (paranoid bulk runs).Verification
agent acpprobe + mocked metadata update).bun test src/cursor/cursorLegacyMigrator.test.ts→ 25/25 pass.agent acp(opt-in viaCURSOR_AGENT_INTEGRATION=1):buildSyntheticLegacyStoreLEGACY_FODDER_WSH+LEGACY_FODDER_UUIDso real on-disk stores can be exercised without committing private fixturesbun run typecheckclean across cli/web/hub.Risk profile
session/loadsuccess against the transplanted store.hapi.dbwrite is versioned (updateSessionMetadatawithmetadataVersioncheck + 1 retry on version-mismatch).hapi.db(*.bak.before-migrate-<ts>).$HOMEso the verify-prompt never pollutes the operator's realacp-sessionsstore - the real placement is just a freshcpof the original legacy store.Out of scope (mentioned in tiann#824 for future work)
hapi cursor import --from-cursor-diskcommand for the 300+ chats in~/.cursor/chats/that HAPI has no metadata row for (e.g. IDE-only chats never seen by the HAPI CLI). Separate concern (discovery + metadata synthesis), separate PR.Test plan (reviewer)
tiann/hapi#824cursorLegacyMigrator.test.tsmatch the contract inapiTypes.ts(CursorMigrateRefusalReason)flipCursorSessionProtocolToAcpengine method handlesversion-mismatchcorrectlycursor.ts(themigratesubcommand keyword)