Skip to content

fix: prevent silent node loss from qualified_name collisions#586

Open
CodeBlackwell wants to merge 2 commits into
tirth8205:mainfrom
CodeBlackwell:fix/qualified-name-collisions
Open

fix: prevent silent node loss from qualified_name collisions#586
CodeBlackwell wants to merge 2 commits into
tirth8205:mainfrom
CodeBlackwell:fix/qualified-name-collisions

Conversation

@CodeBlackwell

Copy link
Copy Markdown

Fixes #585.

Problem

qualified_name is the graph's stable identity key, built as file::name or file::parent.name. It is not unique per symbol: same-named symbols in one file (methods of different object literals, repeated generated helpers) produce the same key. upsert_node inserts with ON CONFLICT(qualified_name) DO UPDATE, so colliding rows collapse last writer wins and earlier definitions are silently dropped. A build then reports more nodes than the database stores, and caller/callee analysis on those names points at a single merged node.

Changes

Commit 1 — disambiguate colliding qualified_names. Deduplicate per file in store_file_nodes_edges and store_file_batch, which already hold the whole file's node list. The first occurrence keeps its bare key so existing edges still resolve to it; later collisions are suffixed with :L<line_start>, which is always unique because two definitions cannot share a start line. Unique nodes keep byte identical keys, so embeddings and incremental updates are unaffected.

Commit 2 — report disambiguated names. Adds GraphStore.find_disambiguated_nodes(), a disambiguated_nodes field in the full_build and incremental_update results, a log line, and a one line CLI summary after build/update. Counts are not rewritten: now that collisions are kept rather than dropped, the reported node total and the stored COUNT(*) already agree.

Verification

# before: build claims 4 nodes, status reports 2
# after:
$ code-review-graph build
Full build: 1 files, 4 nodes, 3 edges (postprocess=full)
Disambiguated 2 duplicate symbol name(s): .../db.ts::getById:L2, .../db.ts::getById:L3
$ code-review-graph status
Nodes: 4

New tests/test_collisions.py covers: both same-named methods persist, three module scope duplicates persist, an edge to the bare key still resolves to the first definition, the disambiguation report, and that unique names keep bare keys. Full suite green (1389 passed), ruff and mypy clean.

Caveat

A collided node's identity shifts if its own line_start moves. This affects only ambiguous duplicates, never the unique majority. Open to a different marker convention if you prefer (see #585).

🤖 Generated with Claude Code

CodeBlackwell and others added 2 commits June 27, 2026 13:17
qualified_name is the graph's stable identity key, built as file::name or
file::parent.name. It is not unique per symbol: same-named symbols in one
file (e.g. methods of different object literals, or repeated generated
helpers) produce the same key. upsert_node inserts with
ON CONFLICT(qualified_name) DO UPDATE, so colliding nodes collapsed
last-writer-wins and earlier definitions were silently dropped.

Deduplicate per file in the batch store functions, which already hold the
whole file's node list. The first occurrence keeps its bare key so existing
edges (which reference the same parser-computed key) still resolve to it;
later collisions are suffixed with :L<line_start>, which is always unique.
Unique nodes keep byte-identical keys, so embeddings and incremental updates
are unaffected.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Surfaces same-file qualified_name collisions that were previously invisible.
Adds GraphStore.find_disambiguated_nodes() (matches the :L<line> marker the
batch store appends), threads a disambiguated_nodes list into the full_build
and incremental_update results, logs it, and prints a one-line summary after
build/update. Counts are unchanged: now that collisions are kept rather than
dropped, the reported node total and the stored COUNT(*) already agree.

Adds tests/test_collisions.py covering no-drop, the bare-key-first invariant,
edge-resolution safety, and the disambiguation report.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Silent node loss: same-file qualified_name collisions are dropped on insert

1 participant