graph/db: cross-version graph Store queries#10714
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the graph database layer to support cross-version gossip data. By introducing materialized mapping tables, the system can now automatically select the preferred version of nodes and channels based on defined priority rules. This change simplifies the API for callers and ensures that graph traversal methods consistently return the most relevant data, regardless of whether it was announced via gossip v1 or v2. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the graph Store interface to support cross-version operations, enabling node and channel traversals to work across gossip v1 and v2. It introduces preferred mapping tables in the SQL backend and PreferHighest fetch helpers to prioritize v2 data while falling back to v1. Feedback was provided to implement a feature fallback in the no-cache path for directed channel traversals, ensuring consistency with the FetchNodeFeatures logic by providing the best available feature set when a specific gossip version lacks data.
| } | ||
| } | ||
|
|
||
| toNodeCallback := func() route.Vertex { |
There was a problem hiding this comment.
In the no-cache path, we should implement a fallback for node features similar to the logic in ChannelGraph.FetchNodeFeatures. This ensures that if a node has v1 features but no v2 features, we still provide the v1 features for pathfinding even if the channel announcement is v2. This maintains consistency between the cache-enabled and cache-disabled code paths.
// Try v2 first, fall back to v1 if the v2 features are empty. This
// ensures consistency with FetchNodeFeatures and provides the best
// available feature set for the node.
preferredFeatures := featuresByVersion[gossipV2]
if preferredFeatures.IsEmpty() {
preferredFeatures = featuresByVersion[gossipV1]
}
toNodeCallback := func() route.Vertex {There was a problem hiding this comment.
I think the current behaviour is actually more correct here. Tying ToNodeFeatures to the channel's gossip version ensures the features match the protocol version of the channel advertisement. A v2 preferred channel should use v2 features, and a v1 preferred channel should use v1 features.
Using a single "best" feature set across all channels would mean a v1 channel could get v2 features assigned, which is a version mismatch.
The underlying ambiguity (empty features vs absent node for a given version) is a known limitation — the proper fix is to distinguish those two cases at the FetchNodeFeatures layer rather than papering over it here.
| cachedInPolicy.ToNodeFeatures = | ||
| featuresByVersion[lnwire.GossipVersion( | ||
| row.GraphChannel.Version, | ||
| )] |
There was a problem hiding this comment.
Use the pre-calculated preferredFeatures here to ensure the best available feature set is used for the node, regardless of the channel's gossip version.
| cachedInPolicy.ToNodeFeatures = | |
| featuresByVersion[lnwire.GossipVersion( | |
| row.GraphChannel.Version, | |
| )] | |
| cachedInPolicy.ToNodeFeatures = preferredFeatures |
32f201a to
652f923
Compare
652f923 to
24c9c8d
Compare
| GetChannelAndNodesBySCID(ctx context.Context, arg sqlc.GetChannelAndNodesBySCIDParams) (sqlc.GetChannelAndNodesBySCIDRow, error) | ||
| HighestSCID(ctx context.Context, version int16) ([]byte, error) | ||
| ListChannelsByNodeID(ctx context.Context, arg sqlc.ListChannelsByNodeIDParams) ([]sqlc.ListChannelsByNodeIDRow, error) | ||
| ListPreferredDirectedChannelsPaginated(ctx context.Context, arg sqlc.ListPreferredDirectedChannelsPaginatedParams) ([]sqlc.ListPreferredDirectedChannelsPaginatedRow, error) |
There was a problem hiding this comment.
ListNodesPaginated seems to be unused
Some other ones that could need a check:
GetNodesByIDsDeleteNodeGetZombieChannelsSCIDsGetPruneHashByHeightGetPruneEntriesForHeightsGetClosedChannelsSCIDsListChannelsWithPoliciesPaginatedGetChannelsByIDs
24c9c8d to
b66429f
Compare
PR Severity: CRITICAL
Critical (6 files):
High (4 files):
Low (1 file):
AnalysisThis PR introduces SQL migration 15 (000015_graph_preferred_lookups) adding new indexes/tables to the graph SQL schema, with corresponding sqlc-generated code and updated graph store implementations. Database migrations are always CRITICAL: they are irreversible in production and affect data integrity. Key concerns for reviewers:
To override, add a severity-override-{critical,high,medium,low} label. |
|
thanks @bitromortac - addressed 👍 |
| } | ||
|
|
||
| // If this version has policies, return immediately. | ||
| if p1 != nil || p2 != nil { |
There was a problem hiding this comment.
maybe we could change his to an &&, but I'm not sure if that half-policy protection is needed?
|
@bhandras: review reminder |
3a6f03f to
26ef5e2
Compare
26ef5e2 to
b4b65bb
Compare
When FetchChannelEdgesByID hits a zombie edge, it constructs the partial ChannelEdgeInfo it returns alongside ErrZombieEdge with a hard-coded ChannelID of zero instead of the actual channel ID that was looked up. Callers such as the gossiper's processZombieUpdate receive this struct and may use the ChannelID field; returning zero is incorrect and could mask bugs in downstream code. Pass the looked-up chanID through to NewV1Channel / NewV2Channel so the zombie ChannelEdgeInfo carries the correct ChannelID.
Add two precomputed mapping tables that track the "best" gossip version for each unique node (pub_key) and channel (SCID): - graph_preferred_nodes: pub_key -> node_id - graph_preferred_channels: scid -> channel_id Priority for nodes: v2 announced > v1 announced > v2 shell > v1 shell. Priority for channels: v2 with policies > v1 with policies > v2 > v1. These tables enable simple indexed-join queries for cross-version traversal (ForEachNode, ForEachChannel, ForEachNodeDirectedChannel) without expensive per-row COALESCE subqueries. The tables are populated from existing data during the migration and maintained by upsert/delete queries on every write path (added in the next commit).
Add version-agnostic channel fetch helpers that choose the preferred gossip-version row for a logical channel: the highest version with policy data, falling back to the highest bare version. Implement the SQL path inside a single read transaction instead of calling SQLStore methods recursively, preserve zombie edge info for SCID lookups, and keep KV behavior as v1-only delegation. Callers that need to know whether any version of a channel exists can rely on the ErrEdgeNotFound returned by the Preferred fetch path. Add coverage for preferred selection, missing channels, missing outpoints, and zombie edge info.
Add UpsertPreferredNode and UpsertPreferredChannel calls to every Store write path so the preferred mapping tables stay consistent: - upsertSourceNode, upsertNode, maybeCreateShellNode, DeleteNode - insertChannel, updateChanEdgePolicy, DeleteChannelEdges - pruneGraphNodes CASCADE deletes on the underlying graph_nodes and graph_channels tables automatically clean up preferred entries when a version is removed, so PruneGraph and DisconnectBlockAtHeight (which delete every version of a SCID) need no explicit upsert call.
Drop the GossipVersion parameter from ForEachNode and ForEachChannel. Both methods now iterate across all gossip versions, yielding one result per unique pub_key or SCID using the preferred mapping tables for pagination. Wire ChannelGraph.FetchChannelEdgesByID and FetchChannelEdgesByOutpoint through the Preferred variants so the public ChannelGraph API loses its GossipVersion parameter and becomes version-agnostic on read.
Make FetchNodeFeatures version-agnostic by trying v2 first and falling back to v1 when v2 features are empty. The no-cache fallback for ForEachNodeDirectedChannel stays version-scoped (v1 only) because the SQL backend always runs with the in-memory cache enabled, and the cache already merges v1+v2 with v2 precedence. populateCache now skips empty v2 feature entries so they don't shadow a non-empty v1 feature set when the cache is rebuilt, matching the FetchNodeFeatures precedence rule. The cache iterates v1 then v2 so v2 data overwrites v1 on key collision.
VersionedGraph wraps ChannelGraph for callers that want to operate against a specific gossip version. After the cross-version refactor, the ForEachNode and ForEachChannel methods on VersionedGraph silently ignored c.v and iterated across all versions — a surprising behaviour for a wrapper whose whole purpose is version-scoping. Move the cross-version iteration to ChannelGraph (where it belongs) and drop the foot-gun overrides from VersionedGraph. *VersionedGraph continues to expose these via the embedded *ChannelGraph, but now they are explicitly methods of the version-agnostic type. Add ChannelGraph.ForEachNode mirroring ChannelGraph.ForEachChannel, and update the only non-embedded callsite (rpcserver describeGraph) to take ChannelGraph directly. The two remaining test helpers also switch to *ChannelGraph since they only ever wanted a node count.
SQLStore.ForEachNodeCached takes a gossip version and uses it correctly when paginating through nodes, but the inner ListChannelsForNodeIDs call hardcoded GossipVersion1 instead of forwarding the requested version. Calling ForEachNodeCached(ctx, GossipVersion2, ...) therefore returned v2 nodes paired with v1 channels — silently inconsistent data. The bug was latent because every existing caller happens to request v1, but it would surface as soon as any v2-scoped caller appears. Add a regression test that creates a v2-only channel between two nodes that exist under both versions and asserts that ForEachNodeCached(ctx, GossipVersion2, ...) reports the channel for each endpoint.
The in-memory graph cache is only ever disabled on the bbolt KV backend in production, so the no-cache fallback paths in ChannelGraph.ForEachNodeDirectedChannel, ChannelGraph.FetchNodeFeatures and VersionedGraph.GraphSession will only run against a v1-only store. The v2-then-v1 loop in FetchNodeFeatures is therefore dead code in production, and the comments framing v1 as a temporary choice are misleading. Collapse FetchNodeFeatures to a direct v1 call, drop the "version-scoped"/"for now" comment language, and trim the TestPreferredNodeTraversal cases that were exercising the now-removed v2 fallback. Rename the test to TestNoCacheNodeTraversal to reflect what it actually covers (the v1 KV path).
VersionedGraph.GraphSession duplicated ChannelGraph.GraphSession with no observable difference: in the cache-loaded branch, both pass themselves to the callback, and the cache's NodeTraverser surface (GetFeatures, ForEachChannel) has no version concept — so receiving a *VersionedGraph vs a *ChannelGraph routes to the same cache lookups. In the no-cache branch, both delegate to c.db.GraphSession identically. Remove the override and fold its v1-only note into ChannelGraph's docstring so the production invariant (no-cache path is KV-only, which is v1-only) is preserved at the surviving callsite. This mirrors the hygiene from the earlier ForEachNode/ForEachChannel cleanup, where VersionedGraph overrides that ignored c.v were moved off the wrapper.
sqlNodeTraverser.ForEachNodeDirectedChannel and FetchNodeFeatures both pass lnwire.GossipVersion1 to their helpers without explanation. The reason this is correct is non-obvious: sqlNodeTraverser is only ever constructed by SQLStore.GraphSession, which is only reached when the in-memory graph cache is unavailable. Since the SQL backend always runs with the cache enabled in production, this fallback never executes at runtime — the cache-backed NodeTraverser (which already merges v1+v2) is what real callers see. Only tests exercise this code, and they operate on v1 data. Capture that rationale on the type doc so a future reader doesn't mistake the hardcoding for a bug. Also switch the call sites from lnwire.GossipVersion1 to the file-local gossipV1 alias to match the convention used elsewhere in this file.
6cc300b to
a7df913
Compare
|
Concept ACK. I like the direction of making the unversioned graph view cross-version via preferred node/channel mappings, while keeping version-specific access available where callers still need it. I took a quick pass over the latest shape and the main edge cases I was worried about (preferred recomputation after deletes/prunes, lower-version policy data beating a higher-version shell, and zombie resurrection via preferred SCID fetch) look conceptually right to me. |
Summary
This PR makes the graph
Storeinterface cross-version so thatForEachNode,ForEachChannel, andForEachNodeDirectedChannelwork across gossip v1 and v2, returning one preferred entry per pub_key/SCID. It also addsPreferHighestfetch helpers andGetVersionsqueries so callers can retrieve channels without knowing which gossip version announced them.Part of the gossip v2 epic: #10293
Spun off from: #10656
Design
Two new materialised mapping tables are introduced:
graph_preferred_nodes— one row per uniquepub_key, pointing at the bestgraph_nodesrow. Priority: v2 announced > v1 announced > v2 shell > v1 shell.graph_preferred_channels— one row per unique SCID, pointing at the bestgraph_channelsrow. Priority: v2 with policies > v1 with policies > v2 bare > v1 bare.These tables are maintained on every write path (
AddNode,AddChannelEdge,UpdateEdgePolicy,DeleteNode,DeleteChannelEdges,PruneGraphNodes) viaUpsertPreferredNode/UpsertPreferredChannelSQL queries, and seeded by a migration for existing data.Changes by commit
sqldb: add preferred-node and preferred-channel mapping tables— migration 000014 creating the tables and populating them from existing data.graph/db: add PreferHighest fetch methods and GetVersions queries— newStoreinterface methods:FetchChannelEdgesByIDPreferHighest,FetchChannelEdgesByOutpointPreferHighest,GetVersionsBySCID,GetVersionsByOutpoint. KV implementations delegate to v1.graph/db: wire preferred-table maintenance into write paths— callsUpsertPreferredNode/UpsertPreferredChannelon all SQL write paths.graph/db: make ForEachNode and ForEachChannel cross-version— removes theGossipVersionparameter fromForEachNodeandForEachChannelon theStoreinterface, replacing the underlying queries with preferred-table-backed paginated queries.graph/db: implement cross-version node traversal— addsForEachNodeDirectedChannelPreferHighestfor cache-disabled SQL path, updatesFetchNodeFeaturesto prefer v2 features with v1 fallback.docs: add release notePerformance
Benchmarked on Apple M1 Pro against a mainnet v1-only graph (16,216 nodes, 51,239 channels). No regressions observed — the preferred-table JOIN adds negligible overhead:
Test plan
TestPreferHighestAndGetVersions,TestPreferHighestForEachNode,TestPreferHighestForEachChannel,TestPreferHighestNodeTraversal,TestPreferHighestNodeDirectedChannelTraversal,TestDeleteNodePreferredRecomputationisSQLDBguard