Skip to content

YTDB-646 Index-assisted pre-filtering for both() / bothE() MATCH patterns#982

Open
Sandra Adamiec (sandrawar) wants to merge 8 commits into
developfrom
both-pre-filter-support
Open

YTDB-646 Index-assisted pre-filtering for both() / bothE() MATCH patterns#982
Sandra Adamiec (sandrawar) wants to merge 8 commits into
developfrom
both-pre-filter-support

Conversation

@sandrawar

@sandrawar Sandra Adamiec (sandrawar) commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

PR Title:

YTDB-646 Index-assisted pre-filtering for both() / bothE() MATCH patterns

Motivation:

Extend the index intersection pre-filter to support bidirectional traversals (both() and bothE() in MATCH patterns). Currently, these patterns silently degrade to unfiltered iteration because VertexEntityImpl returns an Apache Commons ChainedIterable that does not implement PreFilterableLinkBagIterable.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the PreFilterableChainedIterable class to enable index-assisted pre-filtering for bidirectional traversals (both() and bothE()) within the MATCH engine. Previously, these traversals bypassed pre-filtering because they were wrapped in a standard chained iterable that did not support the pre-filter interface. The changes include updates to VertexEntityImpl to utilize the new class and enhancements to MatchExecutionPlanner to support class inference for bidirectional steps. Review feedback suggests extending this optimization to multi-label vertex traversals and improving the performance of the pre-filterable check in getEdgesInternal by replacing stream operations with a single loop.

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown

Coverage Gate Results

Thresholds: 85% line, 70% branch

Line Coverage: ✅ 100.0% (65/65 lines)

File Coverage Uncovered Lines
core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/PreFilterableChainedIterable.java ✅ 100.0% (37/37) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/VertexEntityImpl.java ✅ 100.0% (17/17) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchExecutionPlanner.java ✅ 100.0% (11/11) -

Branch Coverage: ✅ 100.0% (50/50 branches)

File Coverage Lines with Uncovered Branches
core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/PreFilterableChainedIterable.java ✅ 100.0% (18/18) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/record/impl/VertexEntityImpl.java ✅ 100.0% (8/8) -
core/src/main/java/com/jetbrains/youtrackdb/internal/core/sql/executor/match/MatchExecutionPlanner.java ✅ 100.0% (24/24) -

@github-actions

github-actions Bot commented Apr 17, 2026

Copy link
Copy Markdown

Test Count Gate Results

Tolerance: 5% drop allowed per module

Overall: ✅ 19954 tests (baseline: 19936, +18)

Module Baseline Current Change Status
core 9388 9403 +15
docker-tests 1891 1891 +0
embedded 1931 1931 +0
examples 6 6 +0
gremlin-annotations 30 30 +0
jmh-ldbc 39 42 +3
server 5504 5504 +0
tests 1147 1147 +0

@sandrawar

Copy link
Copy Markdown
Collaborator Author

Benchmark results: both-pre-filter-support vs develop

Ran the three single-thread bothE benchmarks on Hetzner CCX33 (8 dedicated AMD vCPUs), JDK 21, LDBC SF1. Identical canonical curated params on both branches; database freshly loaded per branch (schema differs — PR adds KNOWS.creationDate index). JMH params: -f 3 -wi 3 -w 10s -i 10 -r 30s (30 measurement iterations across 3 forks).

Summary

Benchmark develop (ops/s) PR (ops/s) Δ Notes
bothEKnows_recentConnections (small-bag, Person→KNOWS ~100 edges, 95p date) 6974.58 ± 229.5 7002.45 ± 110.0 +0.40% noise, as expected
bothEHasMember_recentJoiners (hub, top-100 Forums, .inV() + ORDER BY + LIMIT, 95p date) 260.45 ± 23.2 288.96 ± 2.4 +10.95% real improvement + ~10× tighter error bars
bothEHasMember_joinerCount (hub, top-100 Forums, COUNT only, 99p date) 702.8 ± 151.0 855.4 ± 14.2 +21.7% nominal improvement + ~10× tighter error bars

Per-fork breakdown for joinerCount

The headline number understates the benefit — the real story is fork-level variance.

Branch Fork 1 Fork 2 Fork 3
develop 389.1 874.1 845.2
PR 880.5 854.7 831.1

One of develop's three forks runs at less than half the throughput of the others. Without the pre-filter, the query's speed depends heavily on page-cache residency of the HAS_MEMBER bag; with the pre-filter, work is bounded by the index RID set and stays deterministic across forks. The PR eliminates the worst-case fork entirely — in production this translates to consistent query latency instead of sporadic cold-cache stalls.

Interpretation

  • Small-bag (KNOWS, ~100 edges/person): matches the benchmark author's documented prediction — pre-filter overhead balances the savings when bags are small. No regression.
  • Hub-shape (HAS_MEMBER): +11% on the realistic "recent joiners" pattern and +22% (nominal) on the pure COUNT variant — the scenario the optimization is designed for. Even more importantly, error bars shrink ~10× in both hub benchmarks: the pre-filter doesn't just improve average throughput, it stabilises latency by making work independent of cache state.

Net: the optimization delivers its intended benefit on hub-shape bothE traversals with no measurable cost on small-bag traversals, plus a significant stability win that is as valuable as the throughput gain for a production query engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant