YTDB-655: Zero-deserialization fix for index point lookups by arodionov · Pull Request #962 · JetBrains/youtrackdb

Andrii Rodionov (arodionov) · 2026-04-13T09:45:29Z

Motivation:

After introducing snapshot-isolation for indexes (YTDB-523), the engine.get() hot path
deserialized every B-tree entry during binary search to compare keys — allocating temporary
objects on each comparison. Without the compareInByteBuffer optimization, JMH benchmarks
showed a severe regression:

Benchmark	develop	SI branch (no fix)	Delta
`engineGet` (ops/s)	74,581 ± 5,206	42,183 ± 1,495	-43.4%
`engineGetBatch1k` (ops/ms)	1.048 ± 0.172	0.422 ± 0.062	-59.7%

This PR adds compareInByteBuffer() to IndexMultiValuKeySerializer, enabling field-by-field
comparison directly in the page byte buffer without deserializing entries. Combined with the
bucket.find(byte[]) path in getVisible(), this eliminates per-comparison allocations
during B-tree descent.

After fix (this branch vs pre-SI baseline 89499755):

Benchmark	pre-SI baseline	This branch	Delta
`engineGet` (ops/s)	73,935 ± 1,051	70,243 ± 1,083	-5.0%
`engineGetBatch1k` (ops/ms)	0.982 ± 0.127	0.705 ± 0.017	-28.2%

The single-lookup regression is reduced from -43% to -5%. The batch regression (which
amplifies per-lookup overhead) is reduced from -60% to -28%, with the remaining cost
attributable to the version field comparison and snapshot visibility checks inherent to SI.

Changes

IndexMultiValuKeySerializer.compareInByteBuffer() — zero-allocation in-buffer
comparison for composite keys during B-tree binary search
BTree.getVisible() switched to find(byte[]) — uses serialized key for binary
search instead of deserializing + preprocessing
IndexGetBenchmark — JMH benchmark for direct engine.get() throughput on a
1M-entry UNIQUE index (single-key and batch-1k variants)

Test plan

Existing unit tests pass (./mvnw -pl core clean test)
IndexMultiValuKeySerializerCompareTest covers type-specific comparison, LINK edge cases, and type mismatch assertions
JMH benchmark confirms regression recovery vs pre-SI baseline

Zero-deserialization field-by-field comparison for composite index keys. The non-WAL path delegates to per-type serializer compareInByteBuffer() for LONG/INTEGER/SHORT/STRING/BINARY and inlines FLOAT/DOUBLE (stored as int/long bits), BOOLEAN/BYTE, LINK (compacted format), and DECIMAL (BigDecimal fallback). The WAL-aware path inlines primitive reads via walChanges methods and falls back to deserialization for STRING/LINK/ BINARY/DECIMAL. 34 unit tests cover all 12 field types, null handling, prefix comparison, and WAL overlay variants.

@SuppressWarnings

… tests Add assert pageTypeId == searchTypeId in both compareInByteBuffer and WAL variant to catch serialization inconsistencies during development. Remove unnecessary @SuppressWarnings("unchecked") on compareDecimalFallback. Add 3 test methods: LINK position 0 boundary, asymmetric numberSize encoding widths, and non-zero buffer/key offsets.

These review and step files are working files that persist on disk between sessions but should not be committed to git.

…earch Reverse Track 4a: add serializeNativeAsWhole() after buildSearchKey() to serialize the search key into a byte array. Pass both serializedKey (for bucket.find(byte[], ...) and findBucketSerialized()) and searchKey (for scanLeafForVisible()'s userKeyPrefixMatches()) to getVisibleOptimistic and getVisiblePinned. This uses IndexMultiValuKeySerializer's new compareInByteBuffer() for zero-deserialization field-by-field comparison during the B-tree binary search, eliminating CompositeKey allocation on every binary search step.

…tests Add defensive assertions to compareInByteBuffer: non-negative keysCount in both non-WAL and WAL paths, and valid numberSize range (0-8) in readCompactedPosition/readCompactedPositionNative. Strengthen all test methods to assert comparison direction (< 0, > 0, == 0) at call sites instead of relying solely on the helper's internal sign-match check. Add three new boundary test cases: empty BINARY arrays (zero-length field data in offset advancement), multi-byte UTF-8 strings (2-byte and 3-byte sequences), and Integer.MIN_VALUE/MAX_VALUE boundaries.

Post-implementation artifacts updated to reflect the compareInByteBuffer optimization (Track 5): - design-final.md: added IndexMultiValuKeySerializer to class diagram, updated workflow to show find(byte[]) path, added zero-deserialization binary search section - adr.md: added D6 decision record, updated D3 evolution through Tracks 4a→5, added key discoveries #6 and #7, updated performance results

Track 5 step file, technical review, risk review, and updated implementation plan with Track 5 episode and strategy refresh.

Adds IndexGetBenchmark to measure raw engine.get() throughput on a UNIQUE index with 1M entries, isolating the index lookup cost from SQL parsing and transaction overhead. Includes single-key and batch-1k variants for measuring per-op and amortized throughput.

gemini-code-assist

Code Review

This pull request implements zero-deserialization field-by-field comparison for CompositeKey in IndexMultiValuKeySerializer, which optimizes index lookups by eliminating object allocations during binary search. The BTree implementation is updated to use this serialized-key search path. Feedback highlights a critical bug where the search key is not preprocessed before serialization, which is required for correct normalization of types like DATE. Additionally, the documentation for DECIMAL comparison should be updated to accurately describe its fallback deserialization behavior.

The preprocess step normalizes DATE keys (truncates time component) and LINK keys (unwraps Identifiable to RID) so the search key matches the stored representation. Without it, lookups on DATE-typed indexes with un-truncated timestamps would silently miss entries. For common types (STRING, INTEGER, LONG) preprocess is a no-op.

github-actions · 2026-04-13T10:58:02Z

Test Count Gate Results

✅ No baseline available yet — gate skipped (first run).

github-actions · 2026-04-13T10:58:05Z

Coverage Gate Results

Thresholds: 85% line, 70% branch

Line Coverage: ✅ 95.7% (155/162 lines)

File	Coverage	Uncovered Lines
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/serialization/serializer/binary/impl/index/IndexMultiValuKeySerializer.java`	✅ 95.6% (152/159)	346-347, 365, 620, 719, 821, 868
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/storage/index/sbtree/singlevalue/v3/BTree.java`	✅ 100.0% (3/3)	-

Branch Coverage: ✅ 94.0% (63/67 branches)

File	Coverage	Lines with Uncovered Branches
`core/src/main/java/com/jetbrains/youtrackdb/internal/core/serialization/serializer/binary/impl/index/IndexMultiValuKeySerializer.java`	✅ 94.0% (63/67)	590, 699, 798, 849

Andrii Rodionov added 8 commits April 13, 2026 11:46

YTDB-655: Remove accidentally tracked workflow working files

5d76cbd

These review and step files are working files that persist on disk between sessions but should not be committed to git.

YTDB-655: Add Track 5 workflow working files

3f0c60c

Track 5 step file, technical review, risk review, and updated implementation plan with Track 5 episode and strategy refresh.

Andrii Rodionov (arodionov) force-pushed the YTDB-655-zero-deserialization-fix branch from b88486f to a9565f7 Compare April 13, 2026 09:46

Andrii Rodionov (arodionov) self-assigned this Apr 13, 2026

gemini-code-assist Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread ...n/java/com/jetbrains/youtrackdb/internal/core/storage/index/sbtree/singlevalue/v3/BTree.java Outdated

Comment thread ...db/internal/core/serialization/serializer/binary/impl/index/IndexMultiValuKeySerializer.java

Andrii Rodionov (arodionov) requested a review from Andrii Lomakin (andrii0lomakin) April 13, 2026 10:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

YTDB-655: Zero-deserialization fix for index point lookups#962

YTDB-655: Zero-deserialization fix for index point lookups#962
Andrii Rodionov (arodionov) wants to merge 9 commits into
developfrom
YTDB-655-zero-deserialization-fix

Andrii Rodionov (arodionov) commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Andrii Rodionov (arodionov) commented Apr 13, 2026

Motivation:

Changes

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 13, 2026

Test Count Gate Results

Uh oh!

github-actions Bot commented Apr 13, 2026

Coverage Gate Results

Line Coverage: ✅ 95.7% (155/162 lines)

Branch Coverage: ✅ 94.0% (63/67 branches)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant