YTDB-646 Index-assisted pre-filtering for both() / bothE() MATCH patterns#982
YTDB-646 Index-assisted pre-filtering for both() / bothE() MATCH patterns#982Sandra Adamiec (sandrawar) wants to merge 8 commits into
both() / bothE() MATCH patterns#982Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the PreFilterableChainedIterable class to enable index-assisted pre-filtering for bidirectional traversals (both() and bothE()) within the MATCH engine. Previously, these traversals bypassed pre-filtering because they were wrapped in a standard chained iterable that did not support the pre-filter interface. The changes include updates to VertexEntityImpl to utilize the new class and enhancements to MatchExecutionPlanner to support class inference for bidirectional steps. Review feedback suggests extending this optimization to multi-label vertex traversals and improving the performance of the pre-filterable check in getEdgesInternal by replacing stream operations with a single loop.
Coverage Gate ResultsThresholds: 85% line, 70% branch Line Coverage: ✅ 100.0% (65/65 lines)
Branch Coverage: ✅ 100.0% (50/50 branches)
|
Test Count Gate ResultsTolerance: 5% drop allowed per module Overall: ✅ 19954 tests (baseline: 19936, +18)
|
b6f1b22 to
b7301a5
Compare
85fbedd to
9bc4ced
Compare
Benchmark results:
|
| Benchmark | develop (ops/s) | PR (ops/s) | Δ | Notes |
|---|---|---|---|---|
bothEKnows_recentConnections (small-bag, Person→KNOWS ~100 edges, 95p date) |
6974.58 ± 229.5 | 7002.45 ± 110.0 | +0.40% | noise, as expected |
bothEHasMember_recentJoiners (hub, top-100 Forums, .inV() + ORDER BY + LIMIT, 95p date) |
260.45 ± 23.2 | 288.96 ± 2.4 | +10.95% | real improvement + ~10× tighter error bars |
bothEHasMember_joinerCount (hub, top-100 Forums, COUNT only, 99p date) |
702.8 ± 151.0 | 855.4 ± 14.2 | +21.7% nominal | improvement + ~10× tighter error bars |
Per-fork breakdown for joinerCount
The headline number understates the benefit — the real story is fork-level variance.
| Branch | Fork 1 | Fork 2 | Fork 3 |
|---|---|---|---|
| develop | 389.1 | 874.1 | 845.2 |
| PR | 880.5 | 854.7 | 831.1 |
One of develop's three forks runs at less than half the throughput of the others. Without the pre-filter, the query's speed depends heavily on page-cache residency of the HAS_MEMBER bag; with the pre-filter, work is bounded by the index RID set and stays deterministic across forks. The PR eliminates the worst-case fork entirely — in production this translates to consistent query latency instead of sporadic cold-cache stalls.
Interpretation
- Small-bag (
KNOWS, ~100 edges/person): matches the benchmark author's documented prediction — pre-filter overhead balances the savings when bags are small. No regression. - Hub-shape (
HAS_MEMBER): +11% on the realistic "recent joiners" pattern and +22% (nominal) on the pure COUNT variant — the scenario the optimization is designed for. Even more importantly, error bars shrink ~10× in both hub benchmarks: the pre-filter doesn't just improve average throughput, it stabilises latency by making work independent of cache state.
Net: the optimization delivers its intended benefit on hub-shape bothE traversals with no measurable cost on small-bag traversals, plus a significant stability win that is as valuable as the throughput gain for a production query engine.
9bc4ced to
d9d3867
Compare
… to getVerticesOptimized, single-pass check
…dge-method and sets aliasClasses[e2]=X
d9d3867 to
13699f4
Compare
PR Title:
YTDB-646 Index-assisted pre-filtering for
both()/bothE()MATCH patternsMotivation:
Extend the index intersection pre-filter to support bidirectional traversals (both() and bothE() in MATCH patterns). Currently, these patterns silently degrade to unfiltered iteration because VertexEntityImpl returns an Apache Commons ChainedIterable that does not implement PreFilterableLinkBagIterable.