Skip to content

CNDB-15948: Fix handling of partition-aware keys in anti-joins#2435

Open
pkolaczk wants to merge 3 commits into
mainfrom
c15948-aa-neq-query-failure
Open

CNDB-15948: Fix handling of partition-aware keys in anti-joins#2435
pkolaczk wants to merge 3 commits into
mainfrom
c15948-aa-neq-query-failure

Conversation

@pkolaczk

@pkolaczk pkolaczk commented May 22, 2026

Copy link
Copy Markdown

Why

KeyRangeAntiJoinIterator can cause false negatives if partition-aware keys
appear in its input(s). This was caught by
CollectionIndexingTest#testUpdateMapToNullValue reporting missing rows
after it was run on AA version of indexes.

Explanation

When a KeyRangeAntiJoinIterator got a non-precise, partition-aware
PrimaryKey, e.g. a key coming from an AA index or a key
for a static row, it was incorrectly matching it with any other key in
the same partition. However, for those partition-aware keys, we don't really know
which row exactly was indexed as we're missing the clustering information,
and we cannot match such keys to each other.

What was fixed

The join algorithm has been fixed to not cause false-negatives in
such cases. Partition-aware keys on both the left and right side
of an anti-join are now handled properly. The fix makes the queries
return correct results even when mixed version indexes are used.

New randomized tests have been added and CollectionIndexingTest
has been extended to run on multiple index versions.

@github-actions

github-actions Bot commented May 22, 2026

Copy link
Copy Markdown

Checklist before you submit for review

  • This PR adheres to the Definition of Done
  • Make sure there is a PR and ticket in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits
  • All new files should contain the IBM copyright header instead of the Apache License one (no DataStax copyright any longer)

@pkolaczk pkolaczk force-pushed the c15948-aa-neq-query-failure branch 3 times, most recently from 80e1146 to 1e90fe2 Compare May 22, 2026 10:42
@sonarqubecloud

Copy link
Copy Markdown

@pkolaczk pkolaczk requested review from adelapena and k-rus and removed request for adelapena May 22, 2026 11:54
Comment thread src/java/org/apache/cassandra/index/sai/utils/PrimaryKey.java Outdated

@k-rus k-rus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think loading deferred is unnecessary.
I am confused with the comment describing the solution. I haven't checked the tests yet. Thus, it's only partial review.

Comment thread src/java/org/apache/cassandra/index/sai/disk/v2/RowAwarePrimaryKeyFactory.java Outdated
Comment thread src/java/org/apache/cassandra/index/sai/utils/PrimaryKey.java Outdated
Comment on lines +31 to +35
* In order to avoid false-negatives, the keys on the left side that can potentially match more than one row, e.g.
* keys that match whole partitions with static clustering or keys coming from partition-aware indexes (version AA),
* are always included in the result. Similarly, the keys on the right side that can potentially match more than one row
* are ignored, as they could match any key, but matching any key is not the same as matching all the keys
* in the partition. Hence, we cannot skip the whole partition from the result set in that case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds to me that it will now return false positives. Do I miss anything in the comment (it's not easy to read)?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

False positives are allowed.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I try to simplify it one more time. It's not easy to keep it terse and not lose the meaning ;)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, post filtering is always applied.
I haven't checked yet if false positives are called out explicitly as an expectation from an iterator.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can add javadoc to the iterators if it's not there

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this discussion addressed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see that the reason for false negatives come that iterators, which are input to anti-join, might have false positives, which happen for partition aware index version or for static clustering. The false positive in such cases contain all rows from a partition. The negation in anti-join leads that all rows from the partition are excluded instead of the single row.

This formulation gives actually doubts to me that static clustering needs the fix. Am I correct that both inputs to anti-join come from the same index? in such case all rows of a partition will match the static value and, thus, should be excluded. Do I miss anything?

@k-rus k-rus self-requested a review May 27, 2026 15:07

@k-rus k-rus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed KeyRangeAntiJoinIteratorTest yet. Looking to the affected tests do I understand correctly that there is no SQL test, which exercises anti-join iterator with static clustering? Can such test be added?

Am I correct that with this PR partition aware index is used with anti-join. Can this be mentioned in the PR description?
Can PR description also link explicitly to the issue being fixed?

Can you mention CNDB's PR in the PR description, so it's easy to navigate to it?

Can you comment about the coverage result in a PR comment?

ByteBuffer clusteringValue = LongType.instance.getSerializer().serialize(clustering);
DecoratedKey pk = Murmur3Partitioner.instance.decorateKey(pkValue);
Clustering<ByteBuffer> c = clustering == null ? Clustering.EMPTY : Clustering.make(clusteringValue);
Clustering<ByteBuffer> c = clustering == null ? Clustering.STATIC_CLUSTERING : Clustering.make(clusteringValue);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This violates the documentation of the method, which explicitly mentions returning Clustering.EMPTY.
I haven't investigated how usage of empty clustering differs from static clustering and what will be appropriate here or both.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static clustering is technically an empty clustering; but it's a special kind of empty.

Also I'm not aware of a situation where you would get a mix of EMPTY clusterings and non EMPTY on the same table. EMPTY is used when the table has no clustering columns. STATIC is used for indexing static rows; which actually implies the table must have clustering columns. Therefore having the test code generate STATIC/non-empty keys makes more sense than EMPTY/non-empty.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the union iterator test was using EMPTY incorrectly and should be using STATIC_CLUSTERING instead?
To me fixing to STATIC_CLUSTERING mean that there should be tests for skinny primary keys. I suggest either fix it now or create an issue to add necessary tests.

Documentation of the method must be fixed to correspond its implementation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was using EMPTY incorrectly. Generally thet mixing of empty / non-empty clusterings was always meant to test the scenario where an index on a static column is intersected with an index on non-static (regular) column. This is basically what all those randomized iterator tests are doing and I added them because we had bugs in the logic when we mixed static/non-static. EMPTY and STATIC are treated the same way in most places but there is a subtle semantic difference between them. As I said, you won't get a mix of RowAwarePrimaryKeys with EMPTY and non empty in the same iterator chain, but you can get a mix of STATIC and non-empty.

Comment thread test/unit/org/apache/cassandra/index/sai/iterators/LongIterator.java Outdated
@k-rus

k-rus commented May 28, 2026

Copy link
Copy Markdown
Member

Can PR description also link explicitly to the issue being fixed?

The link to the issue can be found on the right side, thus not necessary

@pkolaczk

pkolaczk commented May 28, 2026

Copy link
Copy Markdown
Author

Looking to the affected tests do I understand correctly that there is no SQL test, which exercises anti-join iterator with static clustering? Can such test be added?

I think it's a good idea to add more tests in QueryWriteLifecycle. I'll look into that.
BTW: This PR is a result of a test failure, but it tests AA indexes, not static columns. So this is a good catch.

@pkolaczk

Copy link
Copy Markdown
Author

The link to the issue can be found on the right side, thus not necessary

Why are you responding to your own comments? ;)
Anyway, yes, the links should be visible as long as you're logged into cndb / DataStax repos.

@k-rus

k-rus commented May 28, 2026

Copy link
Copy Markdown
Member

The link to the issue can be found on the right side, thus not necessary

Why are you responding to your own comments? ;)

I am making visible to everyone following the PR that I changed my request

@k-rus k-rus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More comments from me. They are mainly about fixing minor things in understability. I also confused with generated data in new tests.

In addition to my previous general comments, like missing CNDB PR, I see a need to improve the PR description, which will be basis commit message:

I have hard time to understand the first paragraph. According to the template a PR description starts with describing the issue. I think turning the first paragraph in issue description can make easier to understand.

Second paragraph starts with:

The join algorithm has been fixed

Will it be better to say the anti-join algorithm. (I understand that the original version is also correct, but I think it's good to be specific)

Were partition-aware keys not allowed before the fix?

I don't see any fixes to testUpdateMapToNullValue. What am I missing?

Comment thread src/java/org/apache/cassandra/index/sai/utils/PrimaryKey.java Outdated
ByteBuffer clusteringValue = LongType.instance.getSerializer().serialize(clustering);
DecoratedKey pk = Murmur3Partitioner.instance.decorateKey(pkValue);
Clustering<ByteBuffer> c = clustering == null ? Clustering.EMPTY : Clustering.make(clusteringValue);
Clustering<ByteBuffer> c = clustering == null ? Clustering.STATIC_CLUSTERING : Clustering.make(clusteringValue);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that the union iterator test was using EMPTY incorrectly and should be using STATIC_CLUSTERING instead?
To me fixing to STATIC_CLUSTERING mean that there should be tests for skinny primary keys. I suggest either fix it now or create an issue to add necessary tests.

Documentation of the method must be fixed to correspond its implementation.

@pkolaczk

pkolaczk commented May 29, 2026

Copy link
Copy Markdown
Author

Were partition-aware keys not allowed before the fix?

They were allowed and handled incorrectly by the anti join algorithm.
They are still allowed and now they are handled correctly - this is the clou of the fix - avoid false negatives caused by partition-aware (or static) keys.

I don't see any fixes to testUpdateMapToNullValue. What am I missing?

This test was failing on AA version due to a bug in AntiJoinIterator. Now the test is passing.
I also changed it to versioned as Michael suggested.

Will it be better to say the anti-join algorithm. (I understand that the original version is also correct, but I think it's good to be specific)

Good catch!

@k-rus

k-rus commented May 30, 2026

Copy link
Copy Markdown
Member

e.g. a key coming from an AA index or a key
for a static row,

Is there such thing as static row? I understand the meaning, but it requires quite a lot of knowledge, since there is no such thing as static row. May be use static clustering, e.g., a key for static clustering value.
It can be just my issue as an outsider.

@pkolaczk pkolaczk requested a review from k-rus June 1, 2026 10:31
@pkolaczk pkolaczk force-pushed the c15948-aa-neq-query-failure branch from 5a03791 to c17f0b4 Compare June 1, 2026 10:42
@plpesvc-ds

Copy link
Copy Markdown

❌ Build ds-cassandra-pr-gate/PR-2435 rejected by Butler


3 regressions found
See build details here


Found 3 new test failures

Test Explanation Runs Upstream
o.a.c.distributed.test.RepairErrorsTest.testRemoteStreamFailure REGRESSION 🔵🔴 0 / 30
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToManyCompaction[version=ec enableNVQ=true] REGRESSION 🔴 0 / 30
o.a.c.index.sai.cql.VectorSiftSmallTest.testRerankKZeroOrderMatchesFullPrecisionSimilarity[version=ed enableNVQ=true enableFused=false] REGRESSION 🔴 0 / 30

Found 8 known test failures

@k-rus k-rus left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's incorrect to apply the change for static clustering. Here is my reasoning.
Am I correct that both inputs to anti-join come from the same index? in such case all rows of a partition will match the static value and, thus, should be excluded.

Also this fix means for partition aware index (AA) means that all row from the left will be included, right? Thus, should anti-join iterator be sued for AA indexes at all? If so, this can be done as follow up to this PR.

Is there a CQL test for anti-join with static clustering? If not, I think it must have.

Some comments on PR description:

Can you complete the last two items in the checklist? Do you think CNDB's PR should contain the release notes?

The PR description has:

Partition-aware keys on both the left and right side
of an anti-join are now handled properly.

Can you expand handled properly?

In the PR description:

CollectionIndexingTest
has been extended to run on multiple index versions.

Can you change from multiple to all, since it actually runs against all index version?

Comment thread test/unit/org/apache/cassandra/index/sai/iterators/LongIterator.java Outdated
Comment on lines +31 to +35
* In order to avoid false-negatives, the keys on the left side that can potentially match more than one row, e.g.
* keys that match whole partitions with static clustering or keys coming from partition-aware indexes (version AA),
* are always included in the result. Similarly, the keys on the right side that can potentially match more than one row
* are ignored, as they could match any key, but matching any key is not the same as matching all the keys
* in the partition. Hence, we cannot skip the whole partition from the result set in that case.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see that the reason for false negatives come that iterators, which are input to anti-join, might have false positives, which happen for partition aware index version or for static clustering. The false positive in such cases contain all rows from a partition. The negation in anti-join leads that all rows from the partition are excluded instead of the single row.

This formulation gives actually doubts to me that static clustering needs the fix. Am I correct that both inputs to anti-join come from the same index? in such case all rows of a partition will match the static value and, thus, should be excluded. Do I miss anything?

Comment on lines +109 to +110
var avgPartitions = 1 + testIteration / 10;
var avgRowsPerPartition = 1 + testIteration / 10;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me the type is not obvious here. Should it be explicit? It also inconsistent with existing code. C* community is also quite conservative on this question.
Note that BobIDE is not able to annotate types.

In Rust I would be on the opposite side and always support implicit typing.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced to int.

Note that BobIDE is not able to annotate types.

As a side note – BobIDE cannot also do many other things including simple find usages / go to definition. In terms of being an IDE for humans it's a lot behind. I use it only for AI assisted generation.

@pkolaczk pkolaczk marked this pull request as draft June 8, 2026 07:43
Piotr Kołaczkowski added 2 commits June 8, 2026 19:19
When a KeyRangeAntiJoinIterator gets a non-precise, partition-aware
PrimaryKey, e.g. a key coming from an AA index or a key
for a static row, it must not assume such key matches all the rows
inside the partition, because it is enough for a single row in
a partition to be present for the partition to be indexed. So such
a key should never be assumed to match any particular row,
or otherwise we might miss some rows from the result.

The join algorithm has been fixed to not allow false-negatives in
such cases. Partition-aware keys on both the left and right side
of an anti-join are allowed. The fix makes the queries
return correct results even when mixed version indexes are used.

New randomized tests have been added and CollectionIndexingTest
has been extended to run on multiple index versions.

Fixes CollectionIndexingTest#testUpdateMapToNullValue.
@pkolaczk pkolaczk force-pushed the c15948-aa-neq-query-failure branch from c17f0b4 to d56f6e9 Compare June 8, 2026 19:51
@pkolaczk pkolaczk marked this pull request as ready for review June 8, 2026 19:51
@pkolaczk pkolaczk requested a review from k-rus June 8, 2026 19:52
@pkolaczk pkolaczk force-pushed the c15948-aa-neq-query-failure branch 2 times, most recently from 4fd7336 to 7de0e6b Compare June 8, 2026 20:00
Removed identifiesUniqueRow from `PrimaryKey`.
Instead of trying to make `KeyRangeAntiJoinIterator`
work on AA version keys (by ignoring them),
make sure it's never used in those cases.
@pkolaczk pkolaczk force-pushed the c15948-aa-neq-query-failure branch from 7de0e6b to ec28f30 Compare June 8, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants