Skip to content

Silent node loss: same-file qualified_name collisions are dropped on insert #585

Description

@CodeBlackwell

Summary

Two same-named symbols in one file collapse into a single node on insert, so the graph silently drops the earlier definitions. A build reports more nodes than the database actually stores, and caller/callee analysis on those names resolves to a single merged node.

Root cause

qualified_name is the stable identity key, built by GraphStore._make_qualified as file::name or file::parent.name. That key is not unique per symbol: methods of different object literals, or repeated generated helpers, produce the same key. upsert_node inserts with ON CONFLICT(qualified_name) DO UPDATE, so colliding rows collapse last writer wins and earlier definitions are lost.

The discrepancy is also visible in the counts: full_build reports total_nodes summed in memory from the parsed node lists, while status reads SELECT COUNT(*). The gap is exactly the number of dropped nodes.

Reproduction

// db.ts
export const personas = { getById(id) { return id } };
export const users = { getById(id) { return id } };
export const cart = { getById(id) { return id } };
$ code-review-graph build
Full build: 1 files, 4 nodes, 3 edges (postprocess=full)
$ code-review-graph status
Nodes: 2

Build claims 4 nodes (one File plus three getById); the database holds 2 (the three getById collapse into one). The same pattern shows up in real object modules and in generated files with repeated helper names at module scope.

Proposed fix

Deduplicate per file in the batch store functions (store_file_nodes_edges, store_file_batch), which already receive the whole file's node list. Keep the first occurrence at its bare key so existing edges still resolve to it, and suffix later collisions with :L<line_start>, which is always unique because two definitions cannot share a start line. Unique nodes keep byte identical keys, so embeddings and incremental updates are unaffected.

I also propose surfacing the disambiguation so collisions stop being invisible: a find_disambiguated_nodes() helper, a disambiguated_nodes field in the build result, and a one line CLI summary.

Convention question

Does the :L<line_start> suffix work for you, or would you prefer a different marker for the second and later collisions? One honest caveat: a collided node's identity shifts if its own start line moves. That affects only ambiguous duplicates, never the unique majority.

A PR implementing this (two commits, tests, suite green) is ready to open against main.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions