feat: bash tool change attribution via filesystem snapshots, codex fixes#798
feat: bash tool change attribution via filesystem snapshots, codex fixes#798
Conversation
1f6aba3 to
6c57ae6
Compare
6c57ae6 to
05a9350
Compare
9f8b8bb to
82fc9dd
Compare
c3369d4 to
059b508
Compare
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
059b508 to
8577330
Compare
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Test Runner seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
| ) -> Result<(), GitAiError> { | ||
| let manifest_path = async_checkpoint_manifest_path(capture_id)?; | ||
| let mut manifest: PreparedCheckpointManifest = | ||
| serde_json::from_str(&fs::read_to_string(&manifest_path).map_err(|error| { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
| manifest.agent_run_result = Some(updated); | ||
| } | ||
|
|
||
| fs::write(&manifest_path, serde_json::to_vec(&manifest)?)?; |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
| let mut contents = HashMap::new(); | ||
| for rel_path in file_paths { | ||
| let abs_path = repo_root.join(rel_path); | ||
| match fs::metadata(&abs_path) { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
| } | ||
| _ => {} | ||
| } | ||
| match fs::read_to_string(&abs_path) { |
Check failure
Code scanning / CodeQL
Uncontrolled data used in path expression
a198ae1 to
c149c36
Compare
…opencode hook payloads isBashTool now matches both "bash" and "shell" to stay in sync with the Rust classify_tool which treats both as ToolClass::Bash. All three hook payload constructions (file-edit pre-hook, bash pre-hook, post-hook) now include tool_use_id: input.callID. Previously the field was always omitted, so the Rust side always fell back to the sidecar UUID mechanism. With real callIDs present, sequential bash calls no longer collide: the sidecar is only used for agents that don't provide unique IDs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…as inflight signal When an agent runs `echo foo > f && git commit -am x`, the git pre-commit hook fires with no agent context and would default to Human attribution. We detect this by checking for non-stale pre-snapshot JSON files in the bash_snapshots cache. A snapshot is written at pre-hook time and consumed (deleted) at post-hook time, so its presence is an exact signal that a bash invocation is in flight — no new daemon messages or marker files needed, just the existing snapshot lifecycle. has_active_bash_inflight() in bash_tool.rs reads the cache directory and returns true if any fresh (within SNAPSHOT_STALE_SECS) .json file exists. In git_ai_handlers.rs the checkpoint kind is overridden from Human to AI when this check fires and no explicit agent context is present. Parallel bash calls are handled correctly: each call has its own snapshot file, so removing one does not clear the signal while others are still running. A regression test (test_inflight_parallel_calls_regression) verifies this. Also adds: test_inflight_false_when_no_snapshots, test_inflight_true_when_snapshot_present, test_inflight_false_after_snapshot_consumed, test_inflight_false_for_stale_snapshot, test_inflight_ignores_sidecar_txt_files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…letion tracking - Remove two-tier filtering (git-tracked always-include bypass). `should_include_new_file` now applied to ALL files uniformly; tracked files that match .gitignore / default patterns are excluded. - Drop deletion tracking entirely (`StatDiffResult.deleted`, `tracked_files`, `missing_at_pre_hook`, `load_tracked_files` / `git ls-files` call). Pure-deletion bash calls return `NoChanges`; new files created after a deletion are attributed via timestamps as usual. - Pass the git-ai ignore ruleset into WalkBuilder.filter_entry so entire ignored directories (node_modules, target, etc.) are pruned before the walker descends into them — this helps when the directory is in default_ignore_patterns but not yet in .gitignore. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- map_or(false, ...) → is_some_and(...) (unnecessary_map_or) - Add blank line before doc paragraph to fix doc_lazy_continuation - Remove duplicate #[test] attribute in bash_tool_provenance - vec!["..."] → ["..."] for useless_vec - get(...).is_none() → !contains_key(...) in family_actor tests - Elide explicit lifetimes in claude_code test helper - rustfmt formatting throughout Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ore support The old collect_gitignores pre-walk had two problems: it never applied the rules it was building during traversal (so gitignored dirs were only skipped if they were in the hardcoded SKIP_DIR_NAMES list), and it duplicated work already done by WalkBuilder with git_ignore(true). build_gitignore now only adds the git-ai-specific patterns (defaults, .git-ai-ignore, linguist-generated). Standard .gitignore discovery — including nested files — is handled natively by WalkBuilder. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d Gemini bash AfterTool test - Rename _MTIME_GRACE_WINDOW_SECS → MTIME_GRACE_WINDOW_SECS; the underscore prefix falsely signals "intentionally unused" but the constant is referenced directly by MTIME_GRACE_WINDOW_NS. - Fix TOCTOU in load_and_consume_snapshot: drop the exists() check and handle NotFound from fs::read directly, so a concurrent post-hook consuming the same snapshot returns Ok(None) instead of an Err. - Add test_gemini_preset_bash_tool_aftertool_detects_changes: exercises the GeminiPreset AfterTool path with tool_name="shell", verifying that the stat-diff snapshot logic detects files written during the bash call and surfaces them as edited_filepaths on the AI checkpoint result. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The two nested-gitignore tests specifically exercised collect_gitignores, which was removed in favor of WalkBuilder's native git_ignore(true) support. Replace them with a comment pointing to the walker-level coverage that already exists (test_snapshot_nested_gitignore_excludes_ matching_new_files, test_snapshot_walker_prunes_ignored_directories). Update test_build_gitignore_parses_rules to assert the actual contract: build_gitignore covers git-ai defaults (e.g. Cargo.lock) but not standard .gitignore rules, which the walker handles separately. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
When Step 1 strips git-ai from an old-style matcher block (e.g. "Write|Edit|MultiEdit") and that block becomes empty as a result, remove the entire block rather than leaving a hollow entry behind. Pre-existing empty blocks (not touched by our removal) are left alone. Update s3 tests in all three agents to assert the old matcher block is completely absent after migration, not merely empty. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace STAT_DIFF_TIMEOUT_MS=5s with separate WALK_TIMEOUT_MS=1.5s and HOOK_TIMEOUT_MS=4s hard limits - Walk timeout now returns Err (clean abandon) instead of a partial snapshot - handle_bash_tool checks elapsed time at 4 checkpoints; returns Fallback with debug_log + observability::log_message(warning) on breach - Rewrite benchmark to measure end-to-end handle_bash_tool latency (pre+post hooks) with a per-test isolated BenchDaemon instead of the system daemon - Watermark cold-start fix: wm=None now uses git_index_mtime_ns as proxy so snapshot() never walks all files when daemon is not running - MAX_TRACKED_FILES=50K cap: repos with too many recent files skip stat-diff - Fast-exit socket check in query_daemon_watermarks avoids 500ms timeout - Thread-local timeout overrides (test-support feature) for deterministic e2e tests without large repos; 6 new tests in bash_tool_timeouts.rs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The `_ => git_index_mtime_ns(repo_root)` fallback in snapshot() was filtering files created within 2s of .git/index mtime, causing all conformance tests to return NoChanges instead of Checkpoint. The cold-start proxy now only applies when daemon is running but has no worktree watermark yet. The None case (no daemon) passes through without filtering, so direct snapshot() callers and no-daemon paths see all recently-modified files. Walk time is still bounded by WALK_TIMEOUT_MS. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…, efficient diff - Extract `checkpoint_delegation_enabled()` to `utils.rs`; remove the duplicate `captured_checkpoint_delegate_enabled()` from bash_tool.rs and the private `daemon_checkpoint_delegate_enabled()` body from git_ai_handlers.rs (both now delegate to the shared fn) - Import `normalize_to_posix` directly instead of using the full `crate::utils::normalize_to_posix` path at 9 call sites - Rewrite `diff()` to use direct HashMap lookups instead of building two intermediate `HashSet`s — avoids allocations for snapshots up to 50K files - Rewrite `capture_file_contents()` to open each file once (`File::open` + `fstat` + `read_to_string`) instead of separate `metadata` + `open` - Fix stale `snapshot()` doc comment: `wm=None` means no filtering (not cold-start proxy); remove redundant `MAX_CAPTURE_FILE_SIZE` inline comment - Remove narrating step-number comments in `attempt_pre/post_hook_capture` Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…point_id - Add `Agent::Firebender` to the agent enum with `classify_tool` entry: file-editing tools (Write/Edit/Delete/RenameSymbol/DeleteSymbol) → FileEdit; "Bash" → Bash; everything else → Skip - Wire bash pre/post hook handling into `FirebenderPreset::run`: on preToolUse take a pre-snapshot; on postToolUse diff snapshots and report changed paths via `BashCheckpointAction::Checkpoint` - Replace the hardcoded tool-name `matches!` guard with `classify_tool(Agent::Firebender, ...)` so the bash case is no longer silently dropped - Add `captured_checkpoint_id` to both `AgentRunResult` initializers in `FirebenderPreset` (required field missing after rebase onto main) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
29af3b8 to
ba96738
Compare
…bash-recovery [codex] migrate Codex preset to hooks and shared bash recovery
| "unexpected error message: {message}" | ||
| ); | ||
| } | ||
| other => panic!("expected Codex PreToolUse skip PresetError, got {other:?}"), |
Check failure
Code scanning / CodeQL
Cleartext logging of sensitive information High test
| "unexpected error message: {message}" | ||
| ); | ||
| } | ||
| other => panic!("expected Codex PreToolUse skip PresetError, got {other:?}"), |
Check failure
Code scanning / CodeQL
Cleartext logging of sensitive information High test
| "unexpected error message: {message}" | ||
| ); | ||
| } | ||
| other => panic!("expected Codex PreToolUse skip PresetError, got {other:?}"), |
Check failure
Code scanning / CodeQL
Cleartext logging of sensitive information High test
13064ab to
57fea99
Compare
In daemon mode (no pre-commit hook), commits made during an AI bash tool call were incorrectly attributed as Human. The wrapper-mode pre-commit hook detected inflight bash via checkpoint_context_from_active_bash(); the daemon's commit replay path (sync_pre_commit_checkpoint_for_daemon_commit) had no equivalent check. Two fixes in sync_pre_commit_checkpoint_for_daemon_commit: 1. Carryover snapshot supplement: the daemon's carryover snapshot only contains files already tracked in the working log. When a bash tool is in-flight, files edited after the last checkpoint are absent. Supplement with the full committed diff so those files also receive attribution. 2. Bash context attribution: call checkpoint_context_from_active_bash() — the same check the wrapper pre-commit hook uses — and if a bash session is in flight, use AI attribution (edited_filepaths) instead of writing a synthetic Human replay checkpoint. Fixes all 8 daemon-mode failures in codex integration tests. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
4024a27 to
432a8d9
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Summary
.gitignorerules to avoid noise from build artifacts."Write|Edit|MultiEdit","^(Edit|Write|Create|ApplyPatch)$","write_file|replace") to the catch-all"*"matcher so every tool call — including bash — triggers pre/post checkpoints. Full TDD migration with S1–S13/U1–U4/C1–C3 test matrix per agent.workdir/cwdwithout file-path extraction), alongside the existing file-edit tool path.bash_tool_conformance.rs— conformance tests validating file mutations, hook semantics, tool classification, gitignore filtering, rename handling, snapshot save/loadbash_tool_benchmark.rs— performance tests covering snapshot timing, large repo scaling, memory bounds, concurrent session isolationbash_tool_provenance.rs— end-to-end tests exercising real bash commands (file creation, modification, deletion, build tools, git operations, pipelines, symlinks, batch ops, archives, edge cases)Devin review fixes (latest commit)
tool_use_idcollision for agents without per-call IDs (Gemini, ContinueCli): sidecar UUID file in.git/ai/bash_snapshots/correlates pre/post hooks, preventing snapshot key collisions on sequential bash callsedited_filepaths(no post-content to capture; were stored as empty blobs)compute_watermarks_from_statnow usessymlink_metadata(lstat) to match snapshot mtime semanticscontains()check (fixes false duplicates on macOS/Windows case-insensitive FS)is_empty()guard prevented it from ever firing for bare-CLI checkpoints)flake.nixversion bumped to matchCargo.tomlCloses #150
Closes #756
Test plan
cargo test bash_tool)cargo test -p git-ai -- claude_code droid gemini)cargo test -p git-ai)cargo clippy— zero errorscargo fmt— no formatting issuesorigin/main🤖 Generated with Claude Code