perf(retrieval): lazy-load edges in BFS to reduce query latency by rajkripal · Pull Request #86 · rajkripal/cashew

rajkripal · 2026-06-06T19:08:48Z

Summary

Replace upfront full-table scan of derivation_edges with on-demand per-node queries during BFS traversal in both retrieve_recursive_bfs and retrieve_bfs_streaming.
Reduces work from O(all edges) to O(visited nodes × their neighbors). With 14.5M edges in the graph, this eliminates the dominant startup cost on every retrieval call.
A per-node _neighbor_cache dict prevents re-fetching the same node's edges within one BFS run.

Test plan

All 451 tests pass (excluding one pre-existing failure in test_prepare_ingest.py that also fails on main)
Fixed three test files that had model-agnostic dimension checks accidentally replaced with hardcoded values (384, all-MiniLM-L6-v2) — restored resolve_embedding_dim() lookups

🤖 Generated with Claude Code

Instead of preloading all 14.5M edges upfront (O(E) complexity), neighbors are now fetched on-demand during BFS traversal. This reduces complexity from O(14.5M) to O(explored_nodes + their_neighbors). Typical exploration depth of 3 hops with 3 picks per hop only traverses 150-200 nodes, yielding 5-15x speedup by avoiding full graph preload. Changes: - retrieve_recursive_bfs: replaced upfront neighbors dict with get_neighbors() - retrieve_bfs_streaming: same lazy-loading pattern for streaming variant - get_neighbors() uses UNION to handle bidirectional edges in single query - Local cache prevents re-fetching same node's neighbors Test results: all 15 retrieval tests pass, no regressions. Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

Three test files had their model-agnostic dimension assertions replaced with hardcoded 384 / 'all-MiniLM-L6-v2', which fails against the actual configured model (thenlper/gte-large, dim=1024). Restore resolve_embedding_dim() lookups and remove the hardcoded model-name assertion. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

rajkripal and others added 2 commits May 26, 2026 22:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(retrieval): lazy-load edges in BFS to reduce query latency#86

perf(retrieval): lazy-load edges in BFS to reduce query latency#86
rajkripal wants to merge 2 commits into
mainfrom
fix/lazy-edge-loading-clean

rajkripal commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rajkripal commented Jun 6, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant