Skip to content

feat: bash tool change attribution via filesystem snapshots, codex fixes#798

Merged
svarlamov merged 72 commits intomainfrom
johnw/bash-support
Apr 8, 2026
Merged

feat: bash tool change attribution via filesystem snapshots, codex fixes#798
svarlamov merged 72 commits intomainfrom
johnw/bash-support

Conversation

@jwiegley
Copy link
Copy Markdown
Collaborator

@jwiegley jwiegley commented Mar 25, 2026

Summary

  • Bash tool change attribution: Implements a stat-tuple-based filesystem snapshot system that detects which files were created, modified, or deleted when an AI coding agent executes bash commands. This enables accurate AI provenance tracking for tool invocations that bypass the normal edit/write tool path.
  • Multi-agent preset support: Integrates bash tool checkpoint handling into Claude, Gemini, Droid, ContinueCli, Amp, and OpenCode presets with tool classification filtering (only triggers on bash/shell tool invocations).
  • Async pre-hook content capture: Daemon watermark system tracks per-file and per-worktree mtimes so the bash pre-hook can identify and capture human-authored changes without blocking the tool call. Captured checkpoints are written to disk and consumed asynchronously by the daemon.
  • Two-tier path filtering: Git-tracked files are always included in change detection; new untracked files are filtered against frozen .gitignore rules to avoid noise from build artifacts.
  • Rename/move attribution preservation: Properly tracks both source and destination paths during file renames and moves, ensuring AI provenance transfers correctly through intra-commit file reorganization.
  • Hook catch-all migration: Claude Code, Droid/Factory, and Gemini hook registrations migrated from specific-tool matchers ("Write|Edit|MultiEdit", "^(Edit|Write|Create|ApplyPatch)$", "write_file|replace") to the catch-all "*" matcher so every tool call — including bash — triggers pre/post checkpoints. Full TDD migration with S1–S13/U1–U4/C1–C3 test matrix per agent.
  • OpenCode plugin bash support: OpenCode TypeScript plugin now handles bash tool calls in a dedicated branch (repo resolution from workdir/cwd without file-path extraction), alongside the existing file-edit tool path.
  • Comprehensive test coverage (3,100+ lines across 3 test suites):
    • bash_tool_conformance.rs — conformance tests validating file mutations, hook semantics, tool classification, gitignore filtering, rename handling, snapshot save/load
    • bash_tool_benchmark.rs — performance tests covering snapshot timing, large repo scaling, memory bounds, concurrent session isolation
    • bash_tool_provenance.rs — end-to-end tests exercising real bash commands (file creation, modification, deletion, build tools, git operations, pipelines, symlinks, batch ops, archives, edge cases)

Devin review fixes (latest commit)

  • B1tool_use_id collision for agents without per-call IDs (Gemini, ContinueCli): sidecar UUID file in .git/ai/bash_snapshots/ correlates pre/post hooks, preventing snapshot key collisions on sequential bash calls
  • B2 — Deleted files excluded from post-hook captured checkpoint edited_filepaths (no post-content to capture; were stored as empty blobs)
  • W1compute_watermarks_from_stat now uses symlink_metadata (lstat) to match snapshot mtime semantics
  • W2 — Cold-start stale-file dedup normalizes git-status paths before contains() check (fixes false duplicates on macOS/Windows case-insensitive FS)
  • W3 — Worktree watermark now set for all live Human checkpoints (previously the is_empty() guard prevented it from ever firing for bare-CLI checkpoints)
  • W4flake.nix version bumped to match Cargo.toml

Closes #150
Closes #756

Test plan

  • All bash tool tests pass (cargo test bash_tool)
  • All agent hook migration tests pass — S1–S13, U1–U4, C1–C3 for Claude Code, Droid, Gemini (cargo test -p git-ai -- claude_code droid gemini)
  • Full test suite passes (cargo test -p git-ai)
  • cargo clippy — zero errors
  • cargo fmt — no formatting issues
  • Rebased on latest origin/main

🤖 Generated with Claude Code

@jwiegley jwiegley force-pushed the johnw/bash-support branch from 1f6aba3 to 6c57ae6 Compare March 25, 2026 17:05
@jwiegley jwiegley self-assigned this Mar 25, 2026
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no bugs or issues to report.

Open in Devin Review

@jwiegley jwiegley requested a review from svarlamov March 25, 2026 17:15
@jwiegley jwiegley force-pushed the johnw/bash-support branch from 6c57ae6 to 05a9350 Compare March 25, 2026 17:16
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@jwiegley jwiegley force-pushed the johnw/bash-support branch 2 times, most recently from 9f8b8bb to 82fc9dd Compare March 31, 2026 22:22
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

@jwiegley jwiegley force-pushed the johnw/bash-support branch from c3369d4 to 059b508 Compare April 1, 2026 20:34
@jwiegley jwiegley changed the base branch from main to johnw/bug-fix April 1, 2026 20:35
Copy link
Copy Markdown

@github-advanced-security github-advanced-security AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Base automatically changed from johnw/bug-fix to main April 1, 2026 22:10
@jwiegley jwiegley force-pushed the johnw/bash-support branch from 059b508 to 8577330 Compare April 3, 2026 01:02
Copy link
Copy Markdown
Collaborator Author

jwiegley commented Apr 3, 2026

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 5, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
2 out of 4 committers have signed the CLA.

✅ svarlamov
✅ jwiegley
❌ devin-ai-integration[bot]
❌ Test Runner


Test Runner seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

devin-ai-integration[bot]

This comment was marked as resolved.

) -> Result<(), GitAiError> {
let manifest_path = async_checkpoint_manifest_path(capture_id)?;
let mut manifest: PreparedCheckpointManifest =
serde_json::from_str(&fs::read_to_string(&manifest_path).map_err(|error| {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1).
manifest.agent_run_result = Some(updated);
}

fs::write(&manifest_path, serde_json::to_vec(&manifest)?)?;

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1).
let mut contents = HashMap::new();
for rel_path in file_paths {
let abs_path = repo_root.join(rel_path);
match fs::metadata(&abs_path) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1). This path depends on a [user-provided value](2). This path depends on a [user-provided value](3).
}
_ => {}
}
match fs::read_to_string(&abs_path) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression

This path depends on a [user-provided value](1). This path depends on a [user-provided value](2). This path depends on a [user-provided value](3).
@svarlamov svarlamov force-pushed the johnw/bash-support branch from a198ae1 to c149c36 Compare April 5, 2026 02:05
devin-ai-integration[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

svarlamov and others added 15 commits April 6, 2026 15:07
…opencode hook payloads

isBashTool now matches both "bash" and "shell" to stay in sync with the
Rust classify_tool which treats both as ToolClass::Bash.

All three hook payload constructions (file-edit pre-hook, bash pre-hook,
post-hook) now include tool_use_id: input.callID. Previously the field was
always omitted, so the Rust side always fell back to the sidecar UUID
mechanism. With real callIDs present, sequential bash calls no longer
collide: the sidecar is only used for agents that don't provide unique IDs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…as inflight signal

When an agent runs `echo foo > f && git commit -am x`, the git pre-commit
hook fires with no agent context and would default to Human attribution.

We detect this by checking for non-stale pre-snapshot JSON files in the
bash_snapshots cache. A snapshot is written at pre-hook time and consumed
(deleted) at post-hook time, so its presence is an exact signal that a bash
invocation is in flight — no new daemon messages or marker files needed, just
the existing snapshot lifecycle.

has_active_bash_inflight() in bash_tool.rs reads the cache directory and
returns true if any fresh (within SNAPSHOT_STALE_SECS) .json file exists.
In git_ai_handlers.rs the checkpoint kind is overridden from Human to AI when
this check fires and no explicit agent context is present.

Parallel bash calls are handled correctly: each call has its own snapshot file,
so removing one does not clear the signal while others are still running.
A regression test (test_inflight_parallel_calls_regression) verifies this.

Also adds: test_inflight_false_when_no_snapshots, test_inflight_true_when_snapshot_present,
test_inflight_false_after_snapshot_consumed, test_inflight_false_for_stale_snapshot,
test_inflight_ignores_sidecar_txt_files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…letion tracking

- Remove two-tier filtering (git-tracked always-include bypass).
  `should_include_new_file` now applied to ALL files uniformly; tracked
  files that match .gitignore / default patterns are excluded.

- Drop deletion tracking entirely (`StatDiffResult.deleted`, `tracked_files`,
  `missing_at_pre_hook`, `load_tracked_files` / `git ls-files` call).
  Pure-deletion bash calls return `NoChanges`; new files created after a
  deletion are attributed via timestamps as usual.

- Pass the git-ai ignore ruleset into WalkBuilder.filter_entry so entire
  ignored directories (node_modules, target, etc.) are pruned before the
  walker descends into them — this helps when the directory is in
  default_ignore_patterns but not yet in .gitignore.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- map_or(false, ...) → is_some_and(...) (unnecessary_map_or)
- Add blank line before doc paragraph to fix doc_lazy_continuation
- Remove duplicate #[test] attribute in bash_tool_provenance
- vec!["..."] → ["..."] for useless_vec
- get(...).is_none() → !contains_key(...) in family_actor tests
- Elide explicit lifetimes in claude_code test helper
- rustfmt formatting throughout

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ore support

The old collect_gitignores pre-walk had two problems: it never applied
the rules it was building during traversal (so gitignored dirs were only
skipped if they were in the hardcoded SKIP_DIR_NAMES list), and it
duplicated work already done by WalkBuilder with git_ignore(true).

build_gitignore now only adds the git-ai-specific patterns (defaults,
.git-ai-ignore, linguist-generated). Standard .gitignore discovery —
including nested files — is handled natively by WalkBuilder.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d Gemini bash AfterTool test

- Rename _MTIME_GRACE_WINDOW_SECS → MTIME_GRACE_WINDOW_SECS; the
  underscore prefix falsely signals "intentionally unused" but the
  constant is referenced directly by MTIME_GRACE_WINDOW_NS.

- Fix TOCTOU in load_and_consume_snapshot: drop the exists() check and
  handle NotFound from fs::read directly, so a concurrent post-hook
  consuming the same snapshot returns Ok(None) instead of an Err.

- Add test_gemini_preset_bash_tool_aftertool_detects_changes: exercises
  the GeminiPreset AfterTool path with tool_name="shell", verifying that
  the stat-diff snapshot logic detects files written during the bash call
  and surfaces them as edited_filepaths on the AI checkpoint result.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The two nested-gitignore tests specifically exercised collect_gitignores,
which was removed in favor of WalkBuilder's native git_ignore(true)
support.  Replace them with a comment pointing to the walker-level
coverage that already exists (test_snapshot_nested_gitignore_excludes_
matching_new_files, test_snapshot_walker_prunes_ignored_directories).

Update test_build_gitignore_parses_rules to assert the actual contract:
build_gitignore covers git-ai defaults (e.g. Cargo.lock) but not
standard .gitignore rules, which the walker handles separately.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When Step 1 strips git-ai from an old-style matcher block (e.g.
"Write|Edit|MultiEdit") and that block becomes empty as a result,
remove the entire block rather than leaving a hollow entry behind.
Pre-existing empty blocks (not touched by our removal) are left alone.

Update s3 tests in all three agents to assert the old matcher block is
completely absent after migration, not merely empty.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace STAT_DIFF_TIMEOUT_MS=5s with separate WALK_TIMEOUT_MS=1.5s and
  HOOK_TIMEOUT_MS=4s hard limits
- Walk timeout now returns Err (clean abandon) instead of a partial snapshot
- handle_bash_tool checks elapsed time at 4 checkpoints; returns Fallback
  with debug_log + observability::log_message(warning) on breach
- Rewrite benchmark to measure end-to-end handle_bash_tool latency (pre+post
  hooks) with a per-test isolated BenchDaemon instead of the system daemon
- Watermark cold-start fix: wm=None now uses git_index_mtime_ns as proxy so
  snapshot() never walks all files when daemon is not running
- MAX_TRACKED_FILES=50K cap: repos with too many recent files skip stat-diff
- Fast-exit socket check in query_daemon_watermarks avoids 500ms timeout
- Thread-local timeout overrides (test-support feature) for deterministic e2e
  tests without large repos; 6 new tests in bash_tool_timeouts.rs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The `_ => git_index_mtime_ns(repo_root)` fallback in snapshot() was
filtering files created within 2s of .git/index mtime, causing all
conformance tests to return NoChanges instead of Checkpoint.

The cold-start proxy now only applies when daemon is running but has no
worktree watermark yet. The None case (no daemon) passes through without
filtering, so direct snapshot() callers and no-daemon paths see all
recently-modified files. Walk time is still bounded by WALK_TIMEOUT_MS.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, efficient diff

- Extract `checkpoint_delegation_enabled()` to `utils.rs`; remove the
  duplicate `captured_checkpoint_delegate_enabled()` from bash_tool.rs
  and the private `daemon_checkpoint_delegate_enabled()` body from
  git_ai_handlers.rs (both now delegate to the shared fn)
- Import `normalize_to_posix` directly instead of using the full
  `crate::utils::normalize_to_posix` path at 9 call sites
- Rewrite `diff()` to use direct HashMap lookups instead of building two
  intermediate `HashSet`s — avoids allocations for snapshots up to 50K files
- Rewrite `capture_file_contents()` to open each file once (`File::open`
  + `fstat` + `read_to_string`) instead of separate `metadata` + `open`
- Fix stale `snapshot()` doc comment: `wm=None` means no filtering (not
  cold-start proxy); remove redundant `MAX_CAPTURE_FILE_SIZE` inline comment
- Remove narrating step-number comments in `attempt_pre/post_hook_capture`

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…point_id

- Add `Agent::Firebender` to the agent enum with `classify_tool` entry:
  file-editing tools (Write/Edit/Delete/RenameSymbol/DeleteSymbol) →
  FileEdit; "Bash" → Bash; everything else → Skip
- Wire bash pre/post hook handling into `FirebenderPreset::run`: on
  preToolUse take a pre-snapshot; on postToolUse diff snapshots and
  report changed paths via `BashCheckpointAction::Checkpoint`
- Replace the hardcoded tool-name `matches!` guard with
  `classify_tool(Agent::Firebender, ...)` so the bash case is no longer
  silently dropped
- Add `captured_checkpoint_id` to both `AgentRunResult` initializers in
  `FirebenderPreset` (required field missing after rebase onto main)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@svarlamov svarlamov changed the title feat: bash tool change attribution via filesystem snapshots feat: bash tool change attribution via filesystem snapshots, codex fixes Apr 7, 2026
"unexpected error message: {message}"
);
}
other => panic!("expected Codex PreToolUse skip PresetError, got {other:?}"),

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High test

This operation writes
...::session_id_from_hook_data(...)
to a log file.
"unexpected error message: {message}"
);
}
other => panic!("expected Codex PreToolUse skip PresetError, got {other:?}"),

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High test

This operation writes
...::session_id_from_hook_data(...)
to a log file.
"unexpected error message: {message}"
);
}
other => panic!("expected Codex PreToolUse skip PresetError, got {other:?}"),

Check failure

Code scanning / CodeQL

Cleartext logging of sensitive information High test

This operation writes
...::session_id_from_hook_data(...)
to a log file.
In daemon mode (no pre-commit hook), commits made during an AI bash tool
call were incorrectly attributed as Human.  The wrapper-mode pre-commit
hook detected inflight bash via checkpoint_context_from_active_bash(); the
daemon's commit replay path (sync_pre_commit_checkpoint_for_daemon_commit)
had no equivalent check.

Two fixes in sync_pre_commit_checkpoint_for_daemon_commit:

1. Carryover snapshot supplement: the daemon's carryover snapshot only
   contains files already tracked in the working log.  When a bash tool is
   in-flight, files edited after the last checkpoint are absent.  Supplement
   with the full committed diff so those files also receive attribution.

2. Bash context attribution: call checkpoint_context_from_active_bash() —
   the same check the wrapper pre-commit hook uses — and if a bash session
   is in flight, use AI attribution (edited_filepaths) instead of writing a
   synthetic Human replay checkpoint.

Fixes all 8 daemon-mode failures in codex integration tests.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@svarlamov svarlamov force-pushed the johnw/bash-support branch from 4024a27 to 432a8d9 Compare April 8, 2026 01:35
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@svarlamov svarlamov merged commit 13bf6aa into main Apr 8, 2026
26 of 28 checks passed
@svarlamov svarlamov deleted the johnw/bash-support branch April 8, 2026 02:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bash tool file changes not attributed to AI — only Write|Edit|MultiEdit are tracked Preserve attributions through intra-commit file renames/moves

4 participants