Skip to content

YTDB-655: Zero-deserialization fix for index point lookups#962

Open
Andrii Rodionov (arodionov) wants to merge 9 commits into
developfrom
YTDB-655-zero-deserialization-fix
Open

YTDB-655: Zero-deserialization fix for index point lookups#962
Andrii Rodionov (arodionov) wants to merge 9 commits into
developfrom
YTDB-655-zero-deserialization-fix

Conversation

@arodionov

Copy link
Copy Markdown
Contributor

Motivation:

After introducing snapshot-isolation for indexes (YTDB-523), the engine.get() hot path
deserialized every B-tree entry during binary search to compare keys — allocating temporary
objects on each comparison. Without the compareInByteBuffer optimization, JMH benchmarks
showed a severe regression:

Benchmark develop SI branch (no fix) Delta
engineGet (ops/s) 74,581 ± 5,206 42,183 ± 1,495 -43.4%
engineGetBatch1k (ops/ms) 1.048 ± 0.172 0.422 ± 0.062 -59.7%

This PR adds compareInByteBuffer() to IndexMultiValuKeySerializer, enabling field-by-field
comparison directly in the page byte buffer without deserializing entries. Combined with the
bucket.find(byte[]) path in getVisible(), this eliminates per-comparison allocations
during B-tree descent.

After fix (this branch vs pre-SI baseline 89499755):

Benchmark pre-SI baseline This branch Delta
engineGet (ops/s) 73,935 ± 1,051 70,243 ± 1,083 -5.0%
engineGetBatch1k (ops/ms) 0.982 ± 0.127 0.705 ± 0.017 -28.2%

The single-lookup regression is reduced from -43% to -5%. The batch regression (which
amplifies per-lookup overhead) is reduced from -60% to -28%, with the remaining cost
attributable to the version field comparison and snapshot visibility checks inherent to SI.

Changes

  • IndexMultiValuKeySerializer.compareInByteBuffer() — zero-allocation in-buffer
    comparison for composite keys during B-tree binary search
  • BTree.getVisible() switched to find(byte[]) — uses serialized key for binary
    search instead of deserializing + preprocessing
  • IndexGetBenchmark — JMH benchmark for direct engine.get() throughput on a
    1M-entry UNIQUE index (single-key and batch-1k variants)

Test plan

  • Existing unit tests pass (./mvnw -pl core clean test)
  • IndexMultiValuKeySerializerCompareTest covers type-specific comparison, LINK edge cases, and type mismatch assertions
  • JMH benchmark confirms regression recovery vs pre-SI baseline

Andrii Rodionov added 8 commits April 13, 2026 11:46
Zero-deserialization field-by-field comparison for composite index keys.
The non-WAL path delegates to per-type serializer compareInByteBuffer()
for LONG/INTEGER/SHORT/STRING/BINARY and inlines FLOAT/DOUBLE (stored as
int/long bits), BOOLEAN/BYTE, LINK (compacted format), and DECIMAL
(BigDecimal fallback). The WAL-aware path inlines primitive reads via
walChanges methods and falls back to deserialization for STRING/LINK/
BINARY/DECIMAL. 34 unit tests cover all 12 field types, null handling,
prefix comparison, and WAL overlay variants.
… tests

Add assert pageTypeId == searchTypeId in both compareInByteBuffer and
WAL variant to catch serialization inconsistencies during development.
Remove unnecessary @SuppressWarnings("unchecked") on compareDecimalFallback.
Add 3 test methods: LINK position 0 boundary, asymmetric numberSize
encoding widths, and non-zero buffer/key offsets.
These review and step files are working files that persist on disk
between sessions but should not be committed to git.
…earch

Reverse Track 4a: add serializeNativeAsWhole() after buildSearchKey() to
serialize the search key into a byte array. Pass both serializedKey
(for bucket.find(byte[], ...) and findBucketSerialized()) and searchKey
(for scanLeafForVisible()'s userKeyPrefixMatches()) to getVisibleOptimistic
and getVisiblePinned. This uses IndexMultiValuKeySerializer's new
compareInByteBuffer() for zero-deserialization field-by-field comparison
during the B-tree binary search, eliminating CompositeKey allocation on
every binary search step.
…tests

Add defensive assertions to compareInByteBuffer: non-negative keysCount
in both non-WAL and WAL paths, and valid numberSize range (0-8) in
readCompactedPosition/readCompactedPositionNative.

Strengthen all test methods to assert comparison direction (< 0, > 0,
== 0) at call sites instead of relying solely on the helper's internal
sign-match check.

Add three new boundary test cases: empty BINARY arrays (zero-length
field data in offset advancement), multi-byte UTF-8 strings (2-byte
and 3-byte sequences), and Integer.MIN_VALUE/MAX_VALUE boundaries.
Post-implementation artifacts updated to reflect the
compareInByteBuffer optimization (Track 5):
- design-final.md: added IndexMultiValuKeySerializer to class diagram,
  updated workflow to show find(byte[]) path, added zero-deserialization
  binary search section
- adr.md: added D6 decision record, updated D3 evolution through
  Tracks 4a→5, added key discoveries #6 and #7, updated performance
  results
Track 5 step file, technical review, risk review, and updated
implementation plan with Track 5 episode and strategy refresh.
Adds IndexGetBenchmark to measure raw engine.get() throughput on a
UNIQUE index with 1M entries, isolating the index lookup cost from
SQL parsing and transaction overhead. Includes single-key and
batch-1k variants for measuring per-op and amortized throughput.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements zero-deserialization field-by-field comparison for CompositeKey in IndexMultiValuKeySerializer, which optimizes index lookups by eliminating object allocations during binary search. The BTree implementation is updated to use this serialized-key search path. Feedback highlights a critical bug where the search key is not preprocessed before serialization, which is required for correct normalization of types like DATE. Additionally, the documentation for DECIMAL comparison should be updated to accurately describe its fallback deserialization behavior.

The preprocess step normalizes DATE keys (truncates time component)
and LINK keys (unwraps Identifiable to RID) so the search key matches
the stored representation. Without it, lookups on DATE-typed indexes
with un-truncated timestamps would silently miss entries. For common
types (STRING, INTEGER, LONG) preprocess is a no-op.
@github-actions

Copy link
Copy Markdown

Test Count Gate Results

✅ No baseline available yet — gate skipped (first run).

@github-actions

Copy link
Copy Markdown

Coverage Gate Results

Thresholds: 85% line, 70% branch

Line Coverage: ✅ 95.7% (155/162 lines)

File Coverage Uncovered Lines
core/src/main/java/com/jetbrains/youtrackdb/internal/core/serialization/serializer/binary/impl/index/IndexMultiValuKeySerializer.java ✅ 95.6% (152/159) 346-347, 365, 620, 719, 821, 868
core/src/main/java/com/jetbrains/youtrackdb/internal/core/storage/index/sbtree/singlevalue/v3/BTree.java ✅ 100.0% (3/3) -

Branch Coverage: ✅ 94.0% (63/67 branches)

File Coverage Lines with Uncovered Branches
core/src/main/java/com/jetbrains/youtrackdb/internal/core/serialization/serializer/binary/impl/index/IndexMultiValuKeySerializer.java ✅ 94.0% (63/67) 590, 699, 798, 849

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant