Skip to content

feat(cursor): operator-driven legacy stream-json -> ACP session migrator (transplant)#34

Open
heavygee wants to merge 20 commits into
mainfrom
spike/cursor-legacy-to-acp-migrator
Open

feat(cursor): operator-driven legacy stream-json -> ACP session migrator (transplant)#34
heavygee wants to merge 20 commits into
mainfrom
spike/cursor-legacy-to-acp-migrator

Conversation

@heavygee
Copy link
Copy Markdown
Owner

@heavygee heavygee commented Jun 6, 2026

Summary

Adds an operator-driven migrator that moves pre-ACP Cursor sessions (left over after tiann#799's cutover) onto ACP via a transplant strategy:

  • Copies the legacy ~/.cursor/chats/<wsh>/<uuid>/store.db into ~/.cursor/acp-sessions/<uuid>/store.db
  • Synthesizes the meta.json sidecar from HAPI's session metadata
  • Verifies in an isolated temp $HOME by spawning agent acp and driving initialize + session/load + one tiny session/prompt
  • Flips metadata.cursorSessionProtocol = 'acp' (and preserves legacy lastUsedModel onto sessions.model)
  • Removes the legacy source store - gated on observable verify success

Sees tiann/hapi#824 for the design and the spike report at docs/plans/2026-06-06-cursor-legacy-to-acp-spike.md for the underlying evidence (identical SQLite schema, only the directory layout differs between legacy and ACP).

Surfaces

  • POST /api/sessions/:id/migrate-to-acp (web UI button)
  • POST /api/cursor/migrate-legacy (bulk, auto-backups ~/.hapi/hapi.db)
  • GET /cli/cursor/legacy-candidates (listing)
  • Mirrored /cli/* endpoints for the operator CLI (Bearer cliApiToken auth)
  • hapi cursor migrate <id> | --all | --dry-run | --keep-source | --force-archive-running | --list
  • Web UI "Migrate to ACP" menu item, gated on flavor=cursor, cursorSessionProtocol != 'acp', and !session.active

Refusal-by-default contract

  • not a cursor session
  • already on ACP
  • missing on-disk store
  • ACP target directory already exists (collision)
  • lifecycleState === 'running' unless --force-archive-running (which archives the session first and waits for the launcher to release the file lock)

Every failure path leaves the legacy store.db untouched. The legacy source is removed ONLY after session/load + session/prompt succeed AND the hapi.db flip lands cleanly. A --keep-source flag preserves the legacy store even after a successful migration (paranoid bulk runs).

Verification

  • 25 unit tests covering every refusal and rollback branch (mocked agent acp probe + mocked metadata update). bun test src/cursor/cursorLegacyMigrator.test.ts → 25/25 pass.
  • 3 integration tests spawning a real agent acp (opt-in via CURSOR_AGENT_INTEGRATION=1):
    • one drives a synthetic legacy store from buildSyntheticLegacyStore
    • two accept operator-supplied fodder UUIDs via LEGACY_FODDER_WSH + LEGACY_FODDER_UUID so real on-disk stores can be exercised without committing private fixtures
  • Full hub suite: 296/296 pass + 3 integration-skipped (CI-default).
  • Full CLI + shared + web suites: 1055/1055 pass.
  • bun run typecheck clean across cli/web/hub.

Risk profile

  • All mutations are gated on a session/load success against the transplanted store.
  • The hapi.db write is versioned (updateSessionMetadata with metadataVersion check + 1 retry on version-mismatch).
  • Bulk runs auto-backup hapi.db (*.bak.before-migrate-<ts>).
  • Verify happens in a temp $HOME so the verify-prompt never pollutes the operator's real acp-sessions store - the real placement is just a fresh cp of the original legacy store.

Out of scope (mentioned in tiann#824 for future work)

  • A hapi cursor import --from-cursor-disk command for the 300+ chats in ~/.cursor/chats/ that HAPI has no metadata row for (e.g. IDE-only chats never seen by the HAPI CLI). Separate concern (discovery + metadata synthesis), separate PR.

Test plan (reviewer)

  • Skim the migrator's 9-step recipe against the §Recipe block in tiann/hapi#824
  • Confirm refusal paths in cursorLegacyMigrator.test.ts match the contract in apiTypes.ts (CursorMigrateRefusalReason)
  • Verify the flipCursorSessionProtocolToAcp engine method handles version-mismatch correctly
  • Confirm the web button is invisible on already-ACP sessions and on running sessions
  • Spot-check the CLI dispatcher in cursor.ts (the migrate subcommand keyword)

heavygee added 2 commits June 6, 2026 21:28
…tor (transplant)

Adds a per-session and bulk migrator that moves pre-ACP Cursor sessions
(left over from tiann#799's cutover) into the on-disk location `agent acp` reads
from, then flips `cursorSessionProtocol = 'acp'` so the existing
cursorAcpRemoteLauncher picks the session up on next resume.

The migration is a transplant, not a replay: the legacy `store.db` is
byte-identical in schema to the ACP one; only the directory layout differs.
The migrator copies the legacy `~/.cursor/chats/<wsh>/<uuid>/store.db` into
`~/.cursor/acp-sessions/<uuid>/`, synthesizes `meta.json`, drives
`initialize + session/load + session/prompt` against a real `agent acp` in
an isolated `$HOME` (staging - no pollution of the operator's real store),
preserves `lastUsedModel` from the legacy meta record onto the HAPI
sessions.model column, then removes the legacy source. Each step has a
rollback path.

Surfaces:
  - POST /api/sessions/:id/migrate-to-acp (web button)
  - POST /api/cursor/migrate-legacy + GET /api/cursor/legacy-candidates
  - Mirror endpoints under /cli/* (Bearer cliApiToken) for the CLI
  - `hapi cursor migrate <id> | --all | --dry-run | --keep-source |
    --force-archive-running | --list`
  - Web UI menu item on legacy session rows
  - Auto hapi.db backup before any bulk run

Closes tiann#824

Refused-by-default cases:
  - non-cursor flavor; already on ACP; missing on-disk store;
    pre-existing ACP target dir; lifecycleState=running (override with
    --force-archive-running, which archives the session first and waits
    for the launcher to release the store.db lock).

Verified end-to-end:
  - 25 unit tests covering every refusal and rollback branch (mocked
    `agent acp` probe + mocked metadata update)
  - 3 integration tests spawning real `agent acp` (opt-in via
    `CURSOR_AGENT_INTEGRATION=1`); one drives a synthetic store, two
    accept operator-supplied fodder uuids via env so real on-disk
    stores can be exercised without committing private fixtures.
…rict TS

CI typecheck (stricter than the local Bun-resolved tsc) flagged accesses
to ok-success fields (acpSessionId, sourceRemoved, etc.) as unsafe
because the success branch only had `dryRun?: false` while the dry-run
branch had `dryRun: true`. Making `dryRun: false` REQUIRED on the
production-success branch turns the union into a hard discriminator and
lets call sites narrow with `if (out.ok && !out.dryRun) {...}`.

No runtime change to the migrator's behavior - production-success
already returned the full payload; the test assertions just couldn't
prove it to tsc.

Also wires the explicit `dryRun: false` into the migrator's success
return and updates the tests to use the now-discriminator-friendly
narrowing.
@heavygee heavygee marked this pull request as ready for review June 6, 2026 20:33
@heavygee heavygee marked this pull request as draft June 6, 2026 20:39
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2e3d0a18fc

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread web/src/api/client.ts
@heavygee heavygee marked this pull request as ready for review June 6, 2026 20:39
Self-cold-review pass: tryRm(dirname(legacy.storeDbPath)) with
recursive=true would silently destroy any non-store.db artifact a
future cursor-agent version might drop into the chat dir. Use
rmdirSync (only succeeds on empty dirs) so unknown files survive.

In practice today only store.db, store.db-wal, store.db-shm ever live
there (verified against operator's fodder), so behavior is unchanged
on current cursor-agent; this is a defense-in-depth tweak.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a731c1ec55

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
heavygee added 2 commits June 6, 2026 21:50
…, P2 401-refresh)

P1 (WAL preservation) — Without checkpoint, un-checkpointed transactions in
the legacy store.db-wal sibling would either appear stale in the ACP target
or be lost outright when the cleanup step removes the WAL file. Added a
PRAGMA wal_checkpoint(TRUNCATE) flush of the legacy store before any cp,
inside the same pre-flight window where the launcher has been confirmed
quiescent. Idempotent on non-WAL stores (no-op).

P2 (skipVerify must still run session/load) — The schema docstring
described skipVerify as skipping the prompt step. The implementation was
bypassing the entire verify block, which would have let a corrupted
transplant flip HAPI metadata. Split the verify into 'always-load,
optionally-prompt'; skipVerify now skips ONLY the prompt. Tests updated:
the existing skipVerify-happy-path expectation rewritten + a new test
verifies skipVerify still refuses on session/load failure.

P2 (web client 401 refresh) — migrateCursorSessionToAcp now triggers the
same onUnauthorized refresh dance as the shared request() helper instead
of hard-failing on an expired JWT. Matches existing session-action UX.
…ues)

P1 — Lock release must do a real SQLite busy-probe, not just stat-poll.
  killSession kicks off cleanup asynchronously; an idle-but-not-yet-closed
  runner can let the stat-based size-stability heuristic report 'released'
  while still owning the file lock. Added sqliteLockHeldByOtherProcess()
  which opens the legacy store with BEGIN IMMEDIATE; SQLITE_BUSY is the
  authoritative signal that another process holds a write lock. The
  size-poll is retained as a secondary signal for partial writes.

P2 — preflightSession now refuses on session.active=true as well as
  lifecycleState==='running'. Some legacy rows may lack lifecycleState
  but still be active in the hub cache; the CLI/API paths relied on the
  lifecycle-only check, while the web UI already hid the button for
  active sessions. Now uniform across all entry points.

P2 — Atomic ACP target creation. Replaced the existsSync + mkdirSync
  (recursive: true) TOCTOU with an exclusive mkdirSync(recursive: false);
  EEXIST is caught and converted to a target_already_exists refusal. The
  rollback path only fires for the dir we exclusively created.

P2 — cursorSessionId basename validation. Reject anything that isn't
  [A-Za-z0-9_.-]+ before any path join, with explicit rejection of '.'
  and '..'. Prevents path-traversal in either the chats lookup or the
  acp-sessions target.

P2 — hapi.db backup now uses SQLite's serialize() so committed pages in
  hapi.db-wal are included. The raw cp from before could omit recent
  sessions or pre-migration metadata. Added Store.backupTo(destPath)
  exposing the consistent-snapshot path.

Tests:
  - session.active=true refusal
  - cursorSessionId path-traversal refusal (../escape, has/slash, .,
    .., has\backslash)
  - existing tests unchanged; full hub suite 299 pass.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2cbe6edb41

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/acpVerifyProbe.ts
Comment thread hub/src/sync/syncEngine.ts
Comment thread cli/src/commands/cursorMigrate.ts Outdated
P1 — Lock release uses session.active poll as the primary signal.
  An archive() kicks off async cleanup; the closest hub-side proxy for
  "runner process has exited" is sessionCache.active flipping false (set
  when the runner socket disconnects). The migrator now polls that
  before falling back to the SQLite busy-probe + size-poll, then sleeps
  a 500ms grace after all three pass so any in-flight kernel syscall
  on the WAL/SHM can complete. SyncEngine injects the cache-poll fn.

P2 — Abort bulk migration when the hapi.db backup fails. The previous
  code logged + continued; without the backup, operators have no rescue
  handle if a metadata flip lands wrong. Now throws unless the DB is
  in-memory (test fixtures). API handlers wrap in try/catch and surface
  the error as HTTP 500 with the backup-failure message.

P2 — Windows agent-binary spawn. AcpVerifyProbe.start() now matches
  cursorAcpRemoteLauncher's shape with shell:true + windowsHide:true on
  win32, so the `agent.cmd` shim is reachable. Without this the verify
  probe fails with ENOENT even though normal Cursor ACP sessions work.

P2 — Bulk CLI timeout removed. A finite 30-minute axios timeout could
  cut off a slow operator run mid-flight (verifyTimeoutMs is 120s per
  session by default; ~15 sessions hits the cap). The hub-side bulk
  operation is idempotent per session, so the request can safely block
  until completion. Operators that want to cap a run can split it with
  a smaller --lifecycle filter.

Tests: full hub suite 299 pass + integration (real agent acp) 3 pass
on the 34-msg fodder fixture. Typecheck clean across cli/hub/web.

Co-authored-by: Cursor <cursoragent@cursor.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.
To continue using code reviews, you can upgrade your account or add credits to your account and enable them for code reviews in your settings.

@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 6, 2026

@codex review

Operator has topped up Codex credits after hitting the 5h Code Review cap. Current HEAD is 96578a55 (third round of Codex findings addressed) followed by:

  • a731c1ec fix(cursor): only rmdir empty parent on legacy source cleanup
  • 2e3d0a18 fix(cursor): tighten CursorMigrateOutcome dryRun discriminator
  • c0961375 feat(cursor): operator-driven legacy stream-json → ACP session migra…

Last automatic review verdict was on commit 2cbe6edb41 (2026-06-06T21:00:37Z) before the credit-cap kicked in - so the four commits above haven't been reviewed yet. Please pass over the current HEAD.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 96578a55b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts
P1 — Wait for real runner exit before transplanting. The previous
  awaitSessionInactive() poll was bogus: archiveSession() synchronously
  calls handleSessionEnd() which sets sessionCache.active=false, so any
  cache-based wait returns immediately. An idle Cursor process with no
  SQLite write lock could also pass the busy-probe instantly while
  still mid-SIGTERM-cleanup. Replaced with a real minimum dwell time
  (2s default) inside awaitLockRelease, gated on SQLite busy-probe +
  size-stability. Removed the SyncEngine-side awaitSessionInactive
  injection (it was the source of the false reassurance).

P1 — Honor the agent ACP active-process lock. Cursor's `agent` binary
  enforces single-instance semantics (see
  cli/src/agent/backends/acp/agentCliGuard.ts); spawning the verify
  probe while another `agent acp` is live can SIGTERM the live one and
  crash the operator's Cursor session. The migrator now reads the
  $HAPI_HOME/locks/agent-acp-active/pid file and refuses with a new
  'acp_transport_active' reason if the lock points to a live PID. The
  check is injectable for tests.

P2 — Exclude active sessions from archived bulk candidates. The default
  archived bulk path filtered only on lifecycleState==='running', but
  preflightSession also treats session.active===true as running for
  legacy rows that lack lifecycle metadata. Without mirroring that
  here, `hapi cursor migrate --all` would include an active session,
  migrateOne would refuse it 'running_refused', and the bulk run would
  exit as failed. Extracted the filter to a pure exported predicate
  `isLegacyCursorMigrationCandidate()` and added unit tests.

Tests: 35 pass (was 28) + integration 3 pass on the 34-msg fodder.
Typecheck clean across cli/hub/web.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 6, 2026

@codex review

Round 4: addresses the 3 P1/P2 findings from the previous Codex pass on commit 96578a5:

  • P1 syncEngine.ts:572 (real runner exit signal): removed bogus cache-based awaitSessionInactive; replaced with minimum-dwell + SQLite busy-probe + size-stability inside awaitLockRelease.
  • P2 syncEngine.ts:618 (active filter): extracted as exported predicate isLegacyCursorMigrationCandidate, added unit tests.
  • P1 acpVerifyProbe.ts:69 (ACP transport lock): migrator now reads $HAPI_HOME/locks/agent-acp-active/pid and refuses with new reason acp_transport_active when the lock holds a live PID.

HEAD is now eb446b3. 35 unit pass + 3 integration pass on the 34-msg fodder fixture.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb446b391b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/acpVerifyProbe.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread cli/src/commands/cursorMigrate.ts
Comment thread hub/src/store/index.ts Outdated
heavygee and others added 2 commits June 7, 2026 00:14
Co-authored-by: Cursor <cursoragent@cursor.com>
P1 — Block resumes while mutating the store. Between preflight and
  the destructive flip+rm, a session could be resumed via the existing
  hub paths. The resume reads cursorSessionProtocol from hapi.db (still
  'legacy' until the flip), spawns a stream-json runner against the
  legacy source — which we are about to remove. Added getCurrentSession
  dependency that the migrator calls right before the metadata flip;
  if it sees active=true / lifecycleState=running / a concurrent flip
  to acp, it rolls back the ACP placement and refuses with new reason
  'session_resumed_during_migrate' (or 'already_acp' for the concurrent
  flip case). SyncEngine injects a real impl reading the session cache.

P2 — Register ACP verify probes in the active lock. The probe now
  acquires the same $HAPI_HOME/locks/agent-acp-active/ lock the CLI
  guard uses, just before spawning agent acp, and releases it AFTER
  the SIGTERM in stop(). Without this, a concurrent
  `hapi cursor migrate` or model-list request could see no active
  ACP lock, spawn another agent acp, and the second spawn would
  SIGTERM this verify per Cursor's single-instance enforcement
  (cli/src/agent/backends/acp/agentCliGuard.ts). New `hapiHome`
  option on AcpVerifyProbe lets tests pin the lock to an isolated
  dir so they don't clobber the operator's real lock.

P2 — Override Windows home variables for the verify spawn. HOME alone
  is insufficient on Windows; the agent binary may resolve via
  USERPROFILE / HOMEDRIVE / HOMEPATH and end up reading the operator's
  real .cursor tree. The probe env now sets all three on win32.

P2 — Reject single-session --dry-run. The CLI accepted --dry-run for
  per-session use but the per-session API has no dryRun field, so the
  flag was silently dropped and the real migration ran. Now throws a
  clear error instructing the operator to use --keep-source or the
  bulk path.

P2 — Preserve private permissions on hapi.db backups. Store.backupTo()
  now passes mode 0o600 to writeFileSync and chmods after as
  belt-and-braces. Without this the rescue copy (which contains
  session transcripts) could land world-readable under a permissive
  umask, even though the live hapi.db is locked down to 0600.

Tests: 37 pass (was 35) — added rollback test for resume-mid-migration
and concurrent-flip-to-acp scenarios. Integration 3 pass on the 34-msg
fodder fixture; integration tests now use isolated $HAPI_HOME to avoid
touching the operator's real ACP lock. Typecheck clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 6, 2026

@codex review

Round 5: addresses the 5 P1/P2 findings from the previous Codex pass on commit eb446b3:

  • P1 resume-race: added getCurrentSession recheck before metadata flip; new reason session_resumed_during_migrate.
  • P2 ACP active lock: AcpVerifyProbe now acquires/releases the same agent-acp-active lock the CLI guard uses.
  • P2 Windows home: verifyInTempHome overrides USERPROFILE / HOMEDRIVE / HOMEPATH on win32.
  • P2 single-session dry-run: CLI now throws on --dry-run without --all/--list.
  • P2 backup perms: Store.backupTo() writes with mode 0o600 + chmods after.

HEAD is now be7696a. 37 unit pass (+2 rollback tests) + 3 integration pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: be7696a857

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
P2 v2 — Atomic agent-acp-active lock acquisition. The previous
  mkdirSync(recursive:true) + writeFileSync was not race-safe: two
  concurrent migrations could both pass isAgentAcpTransportActive()
  and then BOTH 'acquire' the lock by writing their pid in succession.
  Now uses mkdirSync(recursive:false) which is atomic — exactly one
  caller wins on EEXIST contention. Stale locks (dead pid) are
  cleared and retried once. start() throws a typed error on race
  loss; verifyInTempHome catches it and the migrator surfaces it as
  the acp_transport_active refusal so the operator sees the
  authoritative reason. Added 4 unit tests for the lock primitive
  (acquire / refuse-when-held / stale-cleanup / no-clobber-on-stop).

P1 v2 — Authoritative active-check inside the metadata flip. The
  migrator's getCurrentSession recheck (round 5) was best-effort: a
  resume landing AFTER the recheck but BEFORE the DB update would
  still slip through. Moved the same check INSIDE
  flipCursorSessionProtocolToAcp so the read-active and write-protocol
  pair are within a single synchronous Bun call (no JS interleaving).
  Added new flip result 'session-active' + corresponding
  UpdateAfterMigrateResult variant + corresponding refusal mapping
  back to 'session_resumed_during_migrate'. Added a unit test that
  drives the path with updateOverride returning session_active.

Tests: 313 unit pass (was 308; +4 probe lock, +1 atomic flip) plus 3
integration on the 34-msg fodder fixture. Typecheck clean across
cli/hub/web.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 6, 2026

@codex review

Round 6: addresses the 2 P1/P2 findings from previous pass on be7696a:

  • P2 atomic lock: mkdirSync(recursive:false) for race-safe acquire; throws on EEXIST + live holder; 4 new unit tests for the lock primitive.
  • P1 atomic flip: active-check moved INSIDE flipCursorSessionProtocolToAcp; new 'session-active' result type; rolled-back via session_resumed_during_migrate refusal.

HEAD is now a6908df. 313 unit pass + 3 integration pass on 34-msg fodder fixture.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a6908df576

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
P1 v3 — Detect resume-then-exit during the migration window. The
  active-flag rechecks (both the migrator's getCurrentSession and the
  flip-time atomic check) only catch resumes that are STILL active at
  the moment of the check. A brief resume that came in, wrote turns
  to store.db/store.db-wal, and exited before the check would slip
  through — the store.db would have diverged from the snapshot we
  copied to the ACP target. Added an mtime+size fingerprint of the
  legacy store.db captured right after the WAL checkpoint; compared
  immediately before the metadata flip and rolled back with new
  reason 'legacy_store_modified_during_migrate' on any divergence.

P2 — Respect metadata.homeDir + metadata.host. Hub processes can run
  under a different account than the user who created the Cursor
  session (service-account / multi-machine setups). The legacy chats
  live under the recorded session-owner home, not the hub's. The
  migrator now prefers metadata.homeDir for resolving both the legacy
  source (~/.cursor/chats) and the ACP target (~/.cursor/acp-sessions),
  falling back to homeDir() when the field is absent. Cross-host
  sessions (metadata.host != local hostName) are refused with new
  reason 'cross_host_session' so we never touch a same-id store under
  the wrong user account. hostName is a new injectable dep with default
  process.env.HAPI_HOSTNAME || os.hostname().

P2 v3 — Don't clobber freshly-starting CLI ACP locks. The previous
  retry path treated 'lock dir exists but no pid file' as stale and
  cleared it — but the CLI guard's registerActiveAcpTransport does
  mkdir BEFORE writeFileSync, so a probe landing in that small window
  would delete the live caller's lock. probeLockHolder now returns
  'starting' for the pidless case, which falls through to the refuse-
  path. Only a present-but-dead pid file is treated as stale.

Tests: 317 hub pass (was 313; +cross-host refusal, +fingerprint
divergence rollback, +homeDir resolution, +mid-startup probe lock
refusal). Integration 3 pass on the 34-msg fodder fixture (with
injected hostName).

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Round 7: addresses the 3 P1/P2 findings from pass on a6908df:

  • P1 v3 (resume-then-exit): fingerprint legacy store.db post-checkpoint, compare pre-flip; new reason legacy_store_modified_during_migrate.
  • P2 (homeDir): prefer metadata.homeDir over hub HOME; new cross_host_session refusal via hostName dep.
  • P2 v3 (probe lock): distinguish 'dead' from 'starting' (pidless mid-startup); refuse instead of clearing.

HEAD is now 8145b73. 317 hub pass (+4 new tests) + 3 integration pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8145b73551

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
P1 v4 — Include WAL/SHM sidecars in the divergence fingerprint. The
  previous fingerprint only stat'd the main store.db. SQLite stages
  new commits in store.db-wal before merging into the main file, so a
  brief resume that wrote turns to the WAL but did not trigger a
  checkpoint would leave store.db unchanged while the WAL contained
  the new data. Now fingerprints all three (main, -wal, -shm) and
  refuses on any change/appearance/disappearance. Added a unit test
  that simulates the WAL-only divergence case.

P2 v2 — Reorder preconditions BEFORE archive. Bulk runs with
  --force-archive-running could kill a session and then refuse to
  migrate it for reasons known at preflight time (cross-host,
  acp_transport_active, target_already_exists). Moved all
  filesystem/state-only refusals to run before archiveSession so we
  never amputate a session we cannot then migrate. New tests assert
  that archiveCalls remains empty for the two refusal cases the bot
  flagged (cross-host + acp-transport-active).

Tests: 320 hub pass (was 317; +1 WAL fingerprint, +2 no-archive-on-
refusal). Integration 3 pass on 34-msg fodder. Typecheck clean.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Round 8: addresses the 3 P1/P2 findings from pass on 8145b73:

  • P1 v4 (WAL sidecars): fingerprint now stats store.db + store.db-wal + store.db-shm.
  • P2 v2 (precondition ordering): cross_host_session, acp_transport_active, and target_already_exists checks all moved BEFORE archiveSession.

HEAD is now 6bc2b78. 320 hub pass (+3 new tests) + 3 integration pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6bc2b7853a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
P2 v5 — Treat pid-less ACP locks as active. The CLI agent guard
  creates the agent-acp-active lock directory BEFORE writing the pid
  file. defaultIsAgentAcpTransportActive() only checked the pid file,
  so a CLI startup window reported active=false; bulk migrations with
  --force-archive-running would then archive a session and let
  verifyInTempHome() refuse on the same lock anyway, leaving the
  session orphaned. Now the default treats lock-dir-exists-but-no-pid
  (and malformed pid files) as active so we refuse early, before any
  archive side effect.

P3 — Bulk dry-run no longer counts as migrated. The previous
  `migrated: migrated + dryRun` rollup meant a pure preview reported
  `Migrated: N` even though no metadata or filesystem changes were
  made. Split CursorMigrateBulkResult.dryRun out so operators and
  automation can distinguish preview from real mutation. CLI summary
  updated to show both counters.

P2 v5 — Resume-race recheck no longer trips on stale lifecycle.
  After our own archiveSession() returns, the cache shows active=
  false synchronously but the metadata cleanup that flips
  lifecycleState 'running' → 'archived' may still be in flight. The
  recheck was refusing as session_resumed_during_migrate in that
  window — making force-archive migration of running sessions a
  cleanup-race coin flip. Now we trust the active flag as the
  authoritative live-runner signal:
  - Resume-race recheck: refuse only on active=true (always), or on
    lifecycleState=running when wasActive=false (external resume).
  - Atomic flip in flipCursorSessionProtocolToAcp: refuse only on
    active=true.

Tests: 2 new tests cover the stale-lifecycle no-trip + external-resume
trip branches. 46 pass in migrator suite, 320 in hub overall, 3
integration pass. Typecheck clean across hub/cli/web.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Round 9: addresses the 3 P2/P3 findings from pass on 6bc2b78:

  • P2 v5 (pid-less ACP locks → active): defaultIsAgentAcpTransportActive treats lock-dir-exists-but-no-pid (and malformed pid files) as active.
  • P3 (dry-run not migrated): CursorMigrateBulkResult split into .migrated + .dryRun. CLI summary line shows both.
  • P2 v5 (stale lifecycle after force-archive): resume-race recheck and flipCursorSessionProtocolToAcp atomic check both trust the active flag, not lifecycleState. wasActive flow no longer self-trips on its own lingering 'running' lifecycle metadata.

HEAD: 06cb2e6. 320 hub pass (+2 new), 3 integration pass, typecheck clean across hub/cli/web.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 06cb2e6bb2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
heavygee and others added 2 commits June 7, 2026 01:59
Co-authored-by: Cursor <cursoragent@cursor.com>
P3 v6 — Dry-run reflects real refusals. Previously every dry-run
  candidate emitted ok:true/would-migrate without running ANY of the
  refusal checks. A bulk preview could say "would migrate 50" while
  the real run refused half (cross-host, missing legacy store,
  target collision, ACP transport active). Now the migrator accepts
  a dryRun option and short-circuits after ALL precondition refusals
  but BEFORE any side effect. Operators get the safety preview
  they thought they were getting. New tests cover the happy preview
  + two refusal-during-dry-run cases.

P2 v6 — Don't archive stale-running rows with no live runner.
  wasActive used to be `lifecycleState === 'running' || active`. A
  stale lifecycle metadata row with active=false has no live runner
  but force-archive-running would still call archiveSession() on it,
  which fails because the runner has no registered handler — these
  sessions then could not be migrated at all. Now wasActive is
  `session.active === true` only; the stale lifecycle value is
  treated as metadata to clean up via the upcoming flip. New test
  asserts archive RPC is NOT called for stale-running-but-inactive
  rows. Existing force-archive tests updated to set active=true so
  they still exercise the archive path.

P2 v6 — AcpVerifyProbe holds the agent-acp-active lock until child
  exits. stop() previously sent SIGTERM and released the lock
  immediately, but agent acp can take 100s of ms to tear down. In
  serial bulk runs the next verifier could acquire the now-free lock
  and spawn a second `agent acp` while the previous one was still
  alive, hitting Cursor's single-instance kill. stop() now awaits
  exit/close (with a 5s safety ceiling) before releasing.

P3 v6 — Close the pid-less ACP lock window. Both CLI guard and the
  hub probe create the lock dir BEFORE writing the pid file, leaving
  a sub-syscall window where the dir exists but pid does not. The
  CLI guard's clearStaleAcpLockIfNeeded() used to delete pid-less
  dirs as "stale", which could clobber a freshly-acquired lock held
  by a still-starting holder. The CLI guard now treats pid-less
  dirs as a mid-startup window (NOT stale); only dirs with a pid
  file pointing at a dead pid are cleared. The probe's pid write
  was also tightened to follow mkdirSync with no intervening async.
  Test updated: pid-less → active (was: → cleared); new test for
  pidful-but-dead → cleared.

Tests: 50 migrator pass (+4 new), 326 hub pass, 5 agentCliGuard
pass (1 rewritten + 1 new), 3 integration pass. Typecheck clean
across hub/cli/web.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Round 10: addresses the 4 P2/P3 findings from pass on 06cb2e6:

  • P3 v6 (dry-run reflect real refusals): migrator now accepts dryRun option; short-circuits after preconditions, before any side effect.
  • P2 v6 (no archive on stale-running): wasActive = active===true only; stale lifecycle is cleaned by the flip itself.
  • P2 v6 (hold ACP lock until exit): stop() awaits child exit/close before releaseLock (5s ceiling).
  • P3 v6 (pid-less lock window): CLI guard treats pid-less dirs as starting (active), not stale.

HEAD: 9e2cd5e. 50 migrator pass (+4 new), 326 hub overall, 5 agentCliGuard pass, 3 integration pass. Typecheck clean across hub/cli/web.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9e2cd5e014

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread cli/src/agent/backends/acp/agentCliGuard.ts Outdated
Comment thread hub/src/sync/syncEngine.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/cursor/cursorLegacyMigrator.ts Outdated
Co-authored-by: Cursor <cursoragent@cursor.com>
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

Quota's back. HEAD is c3a984d (9e2cd5e + empty nudge). Round 10 addressed the 4 P2/P3 findings you flagged on 06cb2e6:

  1. P3 v6 — dry-run now runs ALL precondition refusals (cross-host, target collision, missing legacy store, ACP transport, basename validation) before short-circuiting. No more false 'would-migrate' previews.
  2. P2 v6 — wasActive is now session.active === true only. Stale lifecycleState='running' on inactive rows no longer triggers archiveSession() (which would fail with no-registered-handler); the metadata flip cleans up the stale lifecycle.
  3. P2 v6 — AcpVerifyProbe.stop() awaits child exit/close before releasing agent-acp-active lock (5s safety ceiling). Prevents serial bulk runs from spawning overlapping agent acp processes.
  4. P3 v6 — CLI guard's clearStaleAcpLockIfNeeded() no longer removes pid-less lock dirs (treats them as mid-startup, not stale). Probe-side pid write also tightened to follow mkdirSync with no intervening async.

CI is green, 326 hub tests + 50 migrator tests + 3 integration tests pass, typecheck clean across hub/cli/web. 0 unresolved threads. Looking for a clean pass.

Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5010bd77b9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/acpVerifyProbe.ts Outdated
Comment thread hub/src/cursor/acpVerifyProbe.ts
Six P2/P3 findings from chatgpt-codex-connector on commit 5010bd7:

1. P3 — Treat partially-written pid files as active (cli/src/agent/backends/acp/agentCliGuard.ts)
   The previous round's pid-less check only protected the "no file" case.
   An empty/malformed pid file (mid-write window between mkdirSync and
   writeFileSync flush) would fall through to readLockPid() returning
   null and removeAcpLockDir() would clobber the live holder. Add
   probeLockPidFile() that returns missing | starting | live | dead.
   Only the 'dead' state clears the lock.

2. P2 — Clear stale running lifecycle during ACP flip (hub/src/sync/syncEngine.ts)
   When migrating a stale-running row (active=false, lifecycleState=
   running) the flip used to preserve all old metadata and only add
   cursorSessionProtocol:'acp'. Stale lifecycle survived and downstream
   code keyed on metadata.lifecycleState would keep treating the
   archived ACP session as live. Set lifecycleState:'archived' when
   the old value was 'running'.

3. P2 — Kill Windows process tree before releasing ACP lock (hub/src/cursor/acpVerifyProbe.ts)
   On Windows we spawn 'agent acp' through a shell (shell: true), so
   proc.kill only signals the wrapper. The agent child can survive and
   the next verifier acquires the freed lock + spawns a second agent
   while ours is still running. Use 'taskkill /F /T /PID' to kill the
   tree on win32 with SIGTERM fallback.

4. P2 — Reserve ACP lock before archiving (hub/src/cursor/cursorLegacyMigrator.ts)
   isAgentAcpTransportActive() was a point-in-time check; the actual
   lock acquisition didn't happen until verifyInTempHome much later.
   In the gap another agent acp could start and a --force-archive
   migration would kill the legacy session before refusing. Extract
   tryAcquireAcpActiveLock() as a free helper, acquire it in
   migrateOne BEFORE archive, hold across the entire mutation window,
   release in finally. Verify probe gets skipLockAcquire:true so it
   inherits the caller's hold instead of double-acquiring.

5. P2 — Refuse when ACP lock cannot be claimed (hub/src/cursor/acpVerifyProbe.ts)
   Non-EEXIST mkdir errors (permission denied, read-only fs, disk
   full) used to fall through to lockHeld=false and start() spawned
   agent acp UNGUARDED. Throw instead so verifyInTempHome surfaces
   the refusal cleanly.

6. P2 — Remove pid-less locks owned by the probe (hub/src/cursor/acpVerifyProbe.ts)
   If we successfully mkdir'd the lock dir but pid write failed, the
   old releaseLock() skipped removal because the pid file was absent.
   With the CLI guard now treating pid-less dirs as 'starting' (active),
   that left a permanently-stuck lock. We own the mkdir; release()
   must rmdir even when pidless. Only skip removal when the pid file
   contains a DIFFERENT valid pid (another holder somehow took over).

Tests:
- hub/cursor: 50 unit + 2 new lifecycle tests (lock acquired/released
  on success and on mid-flow refusal)
- hub/probe: 12 unit tests including new tryAcquireAcpActiveLock helper,
  skipLockAcquire bypass behavior, pidless-owned release
- cli/guard: 7 tests including empty pid file + malformed pid file
  treated as starting (mid-write race)
- integration: 3 pass with injected acquireAcpActiveLock pointing at
  tmp dir (isolates from operator's live HAPI_HOME lock)

Typecheck clean across hub/cli/web. Total: 335 hub tests + 7 cli +
3 integration pass; 0 unresolved threads.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

HEAD is now 5899a00. Addressed all 6 round-11 findings:

  1. P3 — pid-less / partially-written pid file detection: probeLockPidFile() distinguishes missing | starting | live | dead. Only 'dead' clears the lock.
  2. P2 — Stale lifecycleState='running' is now cleared alongside the ACP flip.
  3. P2 — Windows process tree kill via taskkill /F /T /PID before releasing lock.
  4. P2 — ACP lock is reserved BEFORE archive in migrateOne and held across the entire mutation window. Verify probe inherits via skipLockAcquire:true.
  5. P2 — Non-EEXIST mkdir failures now THROW; verifyInTempHome refuses cleanly instead of running unguarded.
  6. P2 — releaseLock removes the lock dir even when pid file is missing (probe owns the mkdir).

335 hub + 7 cli guard + 3 integration tests pass. Typecheck clean across hub/cli/web. 0 unresolved threads. Looking for a clean pass.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5899a00870

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc7b2e15af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread hub/src/cursor/cursorLegacyMigrator.ts
Two P2 findings from chatgpt-codex-connector on commit fc7b2e1:

1. P2 — Make real ACP spawns honor the migration lock
   (cli/src/agent/backends/acp/AcpStdioTransport.ts + agentCliGuard.ts)

   When the hub-side migrator held the agent-acp-active lock, a normal
   Cursor ACP resume STILL spawned `agent acp`: registerActiveAcpTransport()
   used mkdirSync(recursive:true), which silently succeeds on an existing
   dir and overwrote the migrator's pid file. Worse, AcpStdioTransport
   spawned agent FIRST and called register SECOND, so even an atomic
   register couldn't prevent the racy spawn.

   - registerActiveAcpTransport() now uses mkdirSync(recursive:false)
     and throws AcpLockHeldError when another live (or mid-startup)
     process holds the lock. Stale (dead-pid) locks are still cleared
     and retried once. In-process refcount is preserved so the same
     CLI process can re-enter without re-claiming the on-disk lock.
   - AcpStdioTransport reordered: register FIRST, spawn SECOND. On
     spawn failure the register is released. AcpLockHeldError now
     propagates to the caller so resume/start refuses cleanly instead
     of spawning an agent that will get killed by Cursor's single-
     instance enforcement.
   - unregisterActiveAcpTransport() now only removes the lock dir when
     the pid file confirms we own it (or no pid file exists, which
     means we own the mkdir-but-pid-write-failed case).

2. P2 — Reject WAL reappearance before setting the baseline
   (hub/src/cursor/cursorLegacyMigrator.ts)

   PRAGMA wal_checkpoint(TRUNCATE) zeros store.db-wal. The post-
   checkpoint fingerprint captured whatever was there at the time;
   if a legacy process resumed and wrote a turn between checkpoint
   return and our stat(), the WAL grew above zero and became the
   baseline. Since the verify-copy and ACP-placement copy only
   store.db, those frames would never transplant — and the post-
   fingerprint mtime/size comparison would still match, allowing
   cleanup to delete the legacy WAL with frames lost.

   Added a post-checkpoint stat: if store.db-wal exists with size
   > 0, refuse the migration with legacy_store_modified_during_migrate
   ('between checkpoint and fingerprint capture'). Extracted
   checkpointLegacyStore as an injectable dep so the unit test can
   simulate a writer landing in the gap.

Tests:
- hub/cursor: 53 unit tests, including new 'refuses early when WAL
  has content immediately after the checkpoint' test that injects a
  checkpoint stub which leaves 4 bytes in -wal.
- cli/guard: 11 tests, including 4 new ones for
  registerActiveAcpTransport throws-on-held, refuses-on-pidless,
  in-process-refcount-reuse, and stale-lock-clearing.
- cli/vitest: 944 tests (full suite), all green.
- integration: 3 pass with real agent acp + injected lock target.
Typecheck clean across hub/cli/web. 0 unresolved threads.

Co-authored-by: Cursor <cursoragent@cursor.com>
@heavygee
Copy link
Copy Markdown
Owner Author

heavygee commented Jun 7, 2026

@codex review

HEAD is now cb614b6. Round 12 addressed both P2 findings:

  1. CLI ACP spawns now honor the migration lock: registerActiveAcpTransport uses atomic mkdirSync(recursive:false) + throws AcpLockHeldError; AcpStdioTransport claims the lock BEFORE spawn; in-process refcount preserved.
  2. Post-checkpoint WAL size check refuses with legacy_store_modified_during_migrate when WAL grows in the checkpoint→fingerprint window.

336 hub + 944 cli (vitest) + 11 cli (bun:test) + 3 integration tests pass. Typecheck clean. 0 unresolved threads. Looking for a clean pass.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. You're on a roll.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant