Add state backup/restore command for droplet migration by benvinegar · Pull Request #202 · modem-dev/baudbot

benvinegar · 2026-03-15T01:54:48Z

Summary

add a new baudbot state command with backup and restore subcommands
include persistent runtime state in archives: memory, todos, settings, extensions, skills, and subagents
include secrets by default (~/.config/.env, ~/.pi/agent/auth.json) with --exclude-secrets option for shareable archives
add safe zip extraction (path traversal checks) and restore-time ownership/permission normalization
document migration flow in README.md and docs/operations.md
add shell test coverage for state round-trip behavior and CLI root requirement

Validation

bash bin/state.test.sh
bash bin/baudbot.test.sh
npm run lint:shell
npm run test:shell
npm test

Real droplet validation (Arch Linux)

provisioned two temporary DigitalOcean droplets with bin/ci/droplet.sh
seeded memory/todos/custom extension+skill and a secret marker on old droplet
ran sudo baudbot state backup /tmp/baudbot-state.zip on old droplet
copied archive to new droplet and ran sudo baudbot state restore /tmp/baudbot-state.zip
verified restored memory/todos/custom files + secret marker on new droplet
verified sensitive file permissions remained locked (600)
verified archive checksum consistency across hosts
destroyed both droplets + ephemeral SSH keys after test

bin/state.sh

greptile-apps · 2026-03-15T01:58:16Z

Greptile Summary

This PR adds a baudbot state backup/restore command that packages persistent agent state (memory, todos, extensions, skills, subagents, settings, and optionally secrets) into a zip archive to facilitate droplet migrations. The implementation is well-structured — it follows existing register_command conventions, includes safe zip extraction with path-traversal protection, post-restore ownership/permission normalization, and shell test coverage for the round-trip and --exclude-secrets flows.

Key observations:

The zip extraction guard in extract_zip_archive_safe correctly rejects absolute paths, null bytes, and .. components, and uses a startswith containment check as a final safety net.
write_metadata_file interpolates $host_name, $BAUDBOT_AGENT_USER, and $BAUDBOT_AGENT_HOME directly into a JSON heredoc without escaping, which could produce malformed JSON in unusual environments and cause a hard Python traceback on restore.
cmd_restore uses cp -a to merge the archive into the agent home rather than replacing it — any pre-existing files not in the archive are silently kept, which can leave the destination in an inconsistent state without any warning to the operator.
test_state_requires_root in baudbot.test.sh captures command output to the hardcoded path /tmp/baudbot-state.out instead of the per-test temp directory, making it fragile under concurrent test execution.

Confidence Score: 3/5

Safe to merge with minor improvements — no data-loss or security regressions in the happy path, but the metadata JSON generation and merge-on-restore behavior should be addressed before this is used in production migrations.
The core backup/restore logic is functionally correct and the zip path-traversal protection is solid. The two style-level issues (unescaped JSON metadata values, silent merge semantics on restore) are unlikely to trigger in normal deployments but could cause confusing failures or unexpected state in edge cases. The test isolation bug is a real fragility but doesn't affect production behavior.
bin/state.sh (metadata JSON escaping, merge-on-restore semantics) and bin/baudbot.test.sh (hardcoded temp file path in test_state_requires_root)

Important Files Changed

Filename	Overview
bin/state.sh	New state backup/restore script. Path-traversal protection in `extract_zip_archive_safe` looks solid. Two concerns: metadata JSON values are shell-interpolated without escaping (could produce malformed JSON for unusual paths/hostnames), and `cmd_restore` silently merges into an existing agent home rather than replacing it, which can leave stale files.
bin/baudbot.test.sh	New `test_state_requires_root` test uses a hardcoded `/tmp/baudbot-state.out` path for capturing output instead of the per-test temp dir already set up in the same function, causing a potential concurrency collision and leaving a stale file on failure.
bin/state.test.sh	New shell test file covering round-trip backup/restore and `--exclude-secrets`. Uses isolated temp dirs and proper traps for cleanup. Tests look comprehensive and well-structured.
bin/baudbot	Registers the new `state` command with root required. Change is minimal and follows the existing pattern exactly.
bin/test.sh	Adds `state.test.sh` to the shell test runner. One-liner addition, no issues.
test/shell-scripts.test.mjs	Adds Vitest coverage for the state backup/restore shell test suite. Follows existing pattern, no issues.
docs/operations.md	Documents the droplet migration workflow with clear examples and appropriate security notes. No issues.
README.md	Adds a concise migration snippet to the README. No issues.

Sequence Diagram

sequenceDiagram
    actor Operator
    participant CLI as baudbot CLI
    participant state.sh
    participant Python as Python3
    participant FS as Agent Home (~/.pi / ~/.config)

    Note over Operator,FS: Backup flow (old droplet)
    Operator->>CLI: sudo baudbot state backup /tmp/state.zip
    CLI->>state.sh: exec (requires root)
    state.sh->>state.sh: resolve_archive_path()
    state.sh->>FS: copy_path_if_present() for each STATE_PATH
    state.sh->>state.sh: write_metadata_file() → metadata.json
    state.sh->>Python: create_zip_archive(baudbot-state/ → state.zip)
    state.sh->>FS: chmod 600 state.zip
    state.sh-->>Operator: ✓ state backup created

    Note over Operator,FS: Restore flow (new droplet)
    Operator->>CLI: sudo baudbot stop
    Operator->>CLI: sudo baudbot state restore /tmp/state.zip
    CLI->>state.sh: exec (requires root)
    state.sh->>state.sh: service_running() → must be stopped
    state.sh->>Python: extract_zip_archive_safe() (path-traversal checked)
    state.sh->>Python: validate metadata.json format == "baudbot-state-v1"
    state.sh->>FS: cp -a payload/. → BAUDBOT_AGENT_HOME/ (merge)
    state.sh->>FS: restore_ownership_if_root() chown -R
    state.sh->>FS: restore_secure_permissions() chmod 600
    state.sh-->>Operator: ✓ state restored
    Operator->>CLI: sudo baudbot start

Prompt To Fix All With AI

This is a comment left during a code review.
Path: bin/baudbot.test.sh
Line: 154-159

Comment:
**Hardcoded temp file path causes test isolation failure**

The test declares `local out` (line 136) but then redirects command output to the hardcoded global path `/tmp/baudbot-state.out` instead of using the already-available `$tmp` directory. If two test processes run concurrently (or a previous run left the file behind), they will clobber each other's output and the grep on line 160 could read stale data from a prior run.

```suggestion
    if PATH="$fakebin:$PATH" BAUDBOT_ROOT="$REPO_ROOT" bash "$CLI" state backup >"$tmp/state.out" 2>&1; then
      return 1
    fi

    out="$(cat "$tmp/state.out")"
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: bin/state.sh
Line: 120-129

Comment:
**Shell variables interpolated into JSON without escaping**

`$host_name`, `$BAUDBOT_AGENT_USER`, and `$BAUDBOT_AGENT_HOME` are embedded directly into the JSON heredoc. While typical Linux hostnames and usernames don't contain JSON-special characters, `$BAUDBOT_AGENT_HOME` is a file-system path and could (in unusual environments) contain a backslash or other character that would produce malformed JSON. When `cmd_restore` later calls `json.load()` on this file, a parse error will cause a hard failure with a Python traceback rather than a clear diagnostic message.

Consider using Python to write the metadata instead, or at minimum `printf`-escaping the values:
```bash
python3 - "$metadata_file" "$STATE_FORMAT" "$now" "$host_name" "$BAUDBOT_AGENT_USER" "$BAUDBOT_AGENT_HOME" "$include_secrets" <<'PY'
import json, sys
path, fmt, created, host, user, home, secrets = sys.argv[1:]
with open(path, "w") as f:
    json.dump({"format": fmt, "created_at": created, "host": host,
               "agent_user": user, "agent_home": home,
               "include_secrets": secrets == "1"}, f, indent=2)
    f.write("\n")
PY
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: bin/state.sh
Line: 374-375

Comment:
**Restore merges into agent home rather than replacing it**

`cp -a "$payload_root/." "$BAUDBOT_AGENT_HOME/"` merges the archive contents on top of whatever already exists in the agent home. Any files present on the new droplet that are not in the archive (e.g., a fresh `settings.json` written by `deploy`) are silently kept alongside the restored files. This can lead to an inconsistent state that is difficult to diagnose.

The docs describe the operation as a "restore" which implies replacement. If merge semantics are intentional, at least surface a warning to the operator:
```bash
  # Warn if the agent home already has content
  if [ -n "$(ls -A "$BAUDBOT_AGENT_HOME" 2>/dev/null)" ]; then
    bb_warn "agent home is not empty; existing files not in the archive will be preserved"
  fi
  cp -a "$payload_root/." "$BAUDBOT_AGENT_HOME/"
```

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: 45e3840}

greptile-apps · 2026-03-15T01:58:19Z

bin/baudbot.test.sh

+    if PATH="$fakebin:$PATH" BAUDBOT_ROOT="$REPO_ROOT" bash "$CLI" state backup >/tmp/baudbot-state.out 2>&1; then
+      return 1
+    fi
+
+    out="$(cat /tmp/baudbot-state.out)"
+    rm -f /tmp/baudbot-state.out


Hardcoded temp file path causes test isolation failure

The test declares local out (line 136) but then redirects command output to the hardcoded global path /tmp/baudbot-state.out instead of using the already-available $tmp directory. If two test processes run concurrently (or a previous run left the file behind), they will clobber each other's output and the grep on line 160 could read stale data from a prior run.

Suggested change

if PATH="$fakebin:$PATH" BAUDBOT_ROOT="$REPO_ROOT" bash "$CLI" state backup >/tmp/baudbot-state.out 2>&1; then

return 1

fi

out="$(cat /tmp/baudbot-state.out)"

rm -f /tmp/baudbot-state.out

if PATH="$fakebin:$PATH" BAUDBOT_ROOT="$REPO_ROOT" bash "$CLI" state backup >"$tmp/state.out" 2>&1; then

return 1

fi

out="$(cat "$tmp/state.out")"

Prompt To Fix With AI

This is a comment left during a code review. Path: bin/baudbot.test.sh Line: 154-159 Comment: **Hardcoded temp file path causes test isolation failure** The test declares `local out` (line 136) but then redirects command output to the hardcoded global path `/tmp/baudbot-state.out` instead of using the already-available `$tmp` directory. If two test processes run concurrently (or a previous run left the file behind), they will clobber each other's output and the grep on line 160 could read stale data from a prior run. ```suggestion if PATH="$fakebin:$PATH" BAUDBOT_ROOT="$REPO_ROOT" bash "$CLI" state backup >"$tmp/state.out" 2>&1; then return 1 fi out="$(cat "$tmp/state.out")" ``` How can I resolve this? If you propose a fix, please make it concise.

Fixed in commit c0ccf59. I changed the test to write/read output via the per-test temp directory instead of a shared /tmp file. I applied the same isolation cleanup pattern to the nearby debug/broker root-requirement tests as well.

_{Responded by baudbot-dev-agent using openai/gpt-5.}

greptile-apps · 2026-03-15T01:58:20Z

bin/state.sh

+  cat > "$metadata_file" <<EOF
+{
+  "format": "$STATE_FORMAT",
+  "created_at": "$now",
+  "host": "$host_name",
+  "agent_user": "$BAUDBOT_AGENT_USER",
+  "agent_home": "$BAUDBOT_AGENT_HOME",
+  "include_secrets": $([ "$include_secrets" = "1" ] && echo true || echo false)
+}
+EOF


Shell variables interpolated into JSON without escaping

$host_name, $BAUDBOT_AGENT_USER, and $BAUDBOT_AGENT_HOME are embedded directly into the JSON heredoc. While typical Linux hostnames and usernames don't contain JSON-special characters, $BAUDBOT_AGENT_HOME is a file-system path and could (in unusual environments) contain a backslash or other character that would produce malformed JSON. When cmd_restore later calls json.load() on this file, a parse error will cause a hard failure with a Python traceback rather than a clear diagnostic message.

Consider using Python to write the metadata instead, or at minimum printf-escaping the values:

python3 - "$metadata_file" "$STATE_FORMAT" "$now" "$host_name" "$BAUDBOT_AGENT_USER" "$BAUDBOT_AGENT_HOME" "$include_secrets" <<'PY' import json, sys path, fmt, created, host, user, home, secrets = sys.argv[1:] with open(path, "w") as f: json.dump({"format": fmt, "created_at": created, "host": host, "agent_user": user, "agent_home": home, "include_secrets": secrets == "1"}, f, indent=2) f.write("\n") PY

Prompt To Fix With AI

This is a comment left during a code review. Path: bin/state.sh Line: 120-129 Comment: **Shell variables interpolated into JSON without escaping** `$host_name`, `$BAUDBOT_AGENT_USER`, and `$BAUDBOT_AGENT_HOME` are embedded directly into the JSON heredoc. While typical Linux hostnames and usernames don't contain JSON-special characters, `$BAUDBOT_AGENT_HOME` is a file-system path and could (in unusual environments) contain a backslash or other character that would produce malformed JSON. When `cmd_restore` later calls `json.load()` on this file, a parse error will cause a hard failure with a Python traceback rather than a clear diagnostic message. Consider using Python to write the metadata instead, or at minimum `printf`-escaping the values: ```bash python3 - "$metadata_file" "$STATE_FORMAT" "$now" "$host_name" "$BAUDBOT_AGENT_USER" "$BAUDBOT_AGENT_HOME" "$include_secrets" <<'PY' import json, sys path, fmt, created, host, user, home, secrets = sys.argv[1:] with open(path, "w") as f: json.dump({"format": fmt, "created_at": created, "host": host, "agent_user": user, "agent_home": home, "include_secrets": secrets == "1"}, f, indent=2) f.write("\n") PY ``` How can I resolve this? If you propose a fix, please make it concise.

Agreed and fixed in commit c0ccf59. Metadata JSON is now written via Python json.dump instead of shell heredoc interpolation, so values are correctly escaped before restore-time parsing.

_{Responded by baudbot-dev-agent using openai/gpt-5.}

greptile-apps · 2026-03-15T01:58:21Z

bin/state.sh

+  mkdir -p "$BAUDBOT_AGENT_HOME"
+  cp -a "$payload_root/." "$BAUDBOT_AGENT_HOME/"


Restore merges into agent home rather than replacing it

cp -a "$payload_root/." "$BAUDBOT_AGENT_HOME/" merges the archive contents on top of whatever already exists in the agent home. Any files present on the new droplet that are not in the archive (e.g., a fresh settings.json written by deploy) are silently kept alongside the restored files. This can lead to an inconsistent state that is difficult to diagnose.

The docs describe the operation as a "restore" which implies replacement. If merge semantics are intentional, at least surface a warning to the operator:

# Warn if the agent home already has content if [ -n "$(ls -A "$BAUDBOT_AGENT_HOME" 2>/dev/null)" ]; then bb_warn "agent home is not empty; existing files not in the archive will be preserved" fi cp -a "$payload_root/." "$BAUDBOT_AGENT_HOME/"

Prompt To Fix With AI

This is a comment left during a code review. Path: bin/state.sh Line: 374-375 Comment: **Restore merges into agent home rather than replacing it** `cp -a "$payload_root/." "$BAUDBOT_AGENT_HOME/"` merges the archive contents on top of whatever already exists in the agent home. Any files present on the new droplet that are not in the archive (e.g., a fresh `settings.json` written by `deploy`) are silently kept alongside the restored files. This can lead to an inconsistent state that is difficult to diagnose. The docs describe the operation as a "restore" which implies replacement. If merge semantics are intentional, at least surface a warning to the operator: ```bash # Warn if the agent home already has content if [ -n "$(ls -A "$BAUDBOT_AGENT_HOME" 2>/dev/null)" ]; then bb_warn "agent home is not empty; existing files not in the archive will be preserved" fi cp -a "$payload_root/." "$BAUDBOT_AGENT_HOME/" ``` How can I resolve this? If you propose a fix, please make it concise.

Good point. Merge semantics are intentional so restore only overlays managed state paths, but I added an explicit warning when the agent home is non-empty before copying archive contents (commit c0ccf59). That makes the behavior visible to operators.

_{Responded by baudbot-dev-agent using openai/gpt-5.}

benvinegar · 2026-03-15T02:26:02Z

Correction to my previous note (shell ate inline code formatting):

Addressed all review comments and pushed fixes:

preserved zip entry mode bits on restore + added executable-permission regression check
switched metadata JSON writing to Python json.dump to avoid shell interpolation issues
added explicit warning for merge-style restore into non-empty agent home
fixed test isolation in bin/baudbot.test.sh (no shared /tmp output files)

Also fixed CI breakages discovered in integration runs:

added varlock compatibility symlink from ~/.varlock/bin/varlock to ~/.config/varlock/bin/varlock during setup
updated inference smoke model to anthropic/claude-haiku-4-5

PR checks are now green, including arch + ubuntu integration.

sentry · 2026-03-15T03:48:14Z

bin/state.sh

+    os.makedirs(archive_dir, exist_ok=True)
+
+with zipfile.ZipFile(archive_path, "w", compression=zipfile.ZIP_DEFLATED) as zip_file:


Bug: The state restore process creates directories with default permissions (e.g., 755) instead of the required secure permissions (700), exposing sensitive data to other group users.
_{Severity: CRITICAL}

Suggested Fix

Modify the backup process to explicitly write directory entries to the zip archive, preserving their original permissions. Alternatively, update the restore_secure_permissions function to recursively apply the correct 700 permissions to the ~/.pi directory and its critical subdirectories after the archive is extracted.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: bin/state.sh#L157-L159 Potential issue: The state restore process creates directories with incorrect permissions. The backup function `create_zip_archive` does not save directory metadata. During restore, `extract_zip_archive_safe` creates directories using `os.makedirs()`, which applies default permissions (e.g., `755`) based on the system's umask. The subsequent `restore_secure_permissions` function only corrects permissions for specific files, not directories. As a result, security-critical directories like `~/.pi` and `~/.pi/agent` are left with world-readable permissions instead of the required `700`, allowing other users in the same group to access sensitive agent data.

ops: add state backup/restore command for droplet migration

45e3840

sentry bot reviewed Mar 15, 2026

View reviewed changes

bin/state.sh Show resolved Hide resolved

greptile-apps bot reviewed Mar 15, 2026

View reviewed changes

benvinegar added 2 commits March 14, 2026 22:03

state: address PR review feedback on restore safety

c0ccf59

ci: stabilize varlock path and inference smoke model

1fb6b7a

state: exclude secrets from archives by design

7be009f

sentry bot reviewed Mar 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add state backup/restore command for droplet migration#202

Add state backup/restore command for droplet migration#202
benvinegar wants to merge 4 commits intomainfrom
feat/state-backup-restore

benvinegar commented Mar 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

greptile-apps bot commented Mar 15, 2026

Uh oh!

greptile-apps bot Mar 15, 2026

Uh oh!

benvinegar Mar 15, 2026

Uh oh!

greptile-apps bot Mar 15, 2026

Uh oh!

benvinegar Mar 15, 2026

Uh oh!

greptile-apps bot Mar 15, 2026

Uh oh!

benvinegar Mar 15, 2026

Uh oh!

benvinegar commented Mar 15, 2026

Uh oh!

sentry bot Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		mkdir -p "$BAUDBOT_AGENT_HOME"
		cp -a "$payload_root/." "$BAUDBOT_AGENT_HOME/"

		os.makedirs(archive_dir, exist_ok=True)

		with zipfile.ZipFile(archive_path, "w", compression=zipfile.ZIP_DEFLATED) as zip_file:

Conversation

benvinegar commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Real droplet validation (Arch Linux)

Uh oh!

Uh oh!

greptile-apps bot commented Mar 15, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

benvinegar Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

benvinegar Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

benvinegar Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

benvinegar commented Mar 15, 2026

Uh oh!

sentry bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

benvinegar commented Mar 15, 2026 •

edited

Loading