CNDB-15948: Fix handling of partition-aware keys in anti-joins by pkolaczk · Pull Request #2435 · datastax/cassandra

pkolaczk · 2026-05-22T10:08:56Z

Why

KeyRangeAntiJoinIterator can cause false negatives if partition-aware keys
appear in its input(s). This was caught by
CollectionIndexingTest#testUpdateMapToNullValue reporting missing rows
after it was run on AA version of indexes.

Explanation

When a KeyRangeAntiJoinIterator got a non-precise, partition-aware
PrimaryKey, e.g. a key coming from an AA index or a key
for a static row, it was incorrectly matching it with any other key in
the same partition. However, for those partition-aware keys, we don't really know
which row exactly was indexed as we're missing the clustering information,
and we cannot match such keys to each other.

What was fixed

The join algorithm has been fixed to not cause false-negatives in
such cases. Partition-aware keys on both the left and right side
of an anti-join are now handled properly. The fix makes the queries
return correct results even when mixed version indexes are used.

New randomized tests have been added and CollectionIndexingTest
has been extended to run on multiple index versions.

github-actions · 2026-05-22T10:09:12Z

sonarqubecloud · 2026-05-22T11:45:58Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
94.4% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

k-rus

I think loading deferred is unnecessary.
I am confused with the comment describing the solution. I haven't checked the tests yet. Thus, it's only partial review.

k-rus · 2026-05-26T08:59:51Z

+ * In order to avoid false-negatives, the keys on the left side that can potentially match more than one row, e.g.
+ * keys that match whole partitions with static clustering or keys coming from partition-aware indexes (version AA),
+ * are always included in the result. Similarly, the keys on the right side that can potentially match more than one row
+ * are ignored, as they could match any key, but matching any key is not the same as matching all the keys
+ * in the partition. Hence, we cannot skip the whole partition from the result set in that case.


It sounds to me that it will now return false positives. Do I miss anything in the comment (it's not easy to read)?

False positives are allowed.

Ok, I try to simplify it one more time. It's not easy to keep it terse and not lose the meaning ;)

Right, post filtering is always applied.
I haven't checked yet if false positives are called out explicitly as an expectation from an iterator.

Ok, I can add javadoc to the iterators if it's not there

Was this discussion addressed?

I would like to see that the reason for false negatives come that iterators, which are input to anti-join, might have false positives, which happen for partition aware index version or for static clustering. The false positive in such cases contain all rows from a partition. The negation in anti-join leads that all rows from the partition are excluded instead of the single row.

This formulation gives actually doubts to me that static clustering needs the fix. Am I correct that both inputs to anti-join come from the same index? in such case all rows of a partition will match the static value and, thus, should be excluded. Do I miss anything?

k-rus

I haven't reviewed KeyRangeAntiJoinIteratorTest yet. Looking to the affected tests do I understand correctly that there is no SQL test, which exercises anti-join iterator with static clustering? Can such test be added?

Am I correct that with this PR partition aware index is used with anti-join. Can this be mentioned in the PR description?
~~Can PR description also link explicitly to the issue being fixed?~~

Can you mention CNDB's PR in the PR description, so it's easy to navigate to it?

Can you comment about the coverage result in a PR comment?

k-rus · 2026-05-27T16:26:15Z

        ByteBuffer clusteringValue = LongType.instance.getSerializer().serialize(clustering);
        DecoratedKey pk = Murmur3Partitioner.instance.decorateKey(pkValue);
-        Clustering<ByteBuffer> c = clustering == null ? Clustering.EMPTY : Clustering.make(clusteringValue);
+        Clustering<ByteBuffer> c = clustering == null ? Clustering.STATIC_CLUSTERING : Clustering.make(clusteringValue);


This violates the documentation of the method, which explicitly mentions returning Clustering.EMPTY.
I haven't investigated how usage of empty clustering differs from static clustering and what will be appropriate here or both.

Static clustering is technically an empty clustering; but it's a special kind of empty.

Also I'm not aware of a situation where you would get a mix of EMPTY clusterings and non EMPTY on the same table. EMPTY is used when the table has no clustering columns. STATIC is used for indexing static rows; which actually implies the table must have clustering columns. Therefore having the test code generate STATIC/non-empty keys makes more sense than EMPTY/non-empty.

Do you mean that the union iterator test was using EMPTY incorrectly and should be using STATIC_CLUSTERING instead?
To me fixing to STATIC_CLUSTERING mean that there should be tests for skinny primary keys. I suggest either fix it now or create an issue to add necessary tests.

Documentation of the method must be fixed to correspond its implementation.

Yes, it was using EMPTY incorrectly. Generally thet mixing of empty / non-empty clusterings was always meant to test the scenario where an index on a static column is intersected with an index on non-static (regular) column. This is basically what all those randomized iterator tests are doing and I added them because we had bugs in the logic when we mixed static/non-static. EMPTY and STATIC are treated the same way in most places but there is a subtle semantic difference between them. As I said, you won't get a mix of RowAwarePrimaryKeys with EMPTY and non empty in the same iterator chain, but you can get a mix of STATIC and non-empty.

k-rus · 2026-05-28T08:26:05Z

Can PR description also link explicitly to the issue being fixed?

The link to the issue can be found on the right side, thus not necessary

pkolaczk · 2026-05-28T08:40:00Z

Looking to the affected tests do I understand correctly that there is no SQL test, which exercises anti-join iterator with static clustering? Can such test be added?

I think it's a good idea to add more tests in QueryWriteLifecycle. I'll look into that.
BTW: This PR is a result of a test failure, but it tests AA indexes, not static columns. So this is a good catch.

pkolaczk · 2026-05-28T08:41:31Z

The link to the issue can be found on the right side, thus not necessary

Why are you responding to your own comments? ;)
Anyway, yes, the links should be visible as long as you're logged into cndb / DataStax repos.

k-rus · 2026-05-28T10:12:40Z

The link to the issue can be found on the right side, thus not necessary

Why are you responding to your own comments? ;)

I am making visible to everyone following the PR that I changed my request

k-rus

More comments from me. They are mainly about fixing minor things in understability. I also confused with generated data in new tests.

In addition to my previous general comments, like missing CNDB PR, I see a need to improve the PR description, which will be basis commit message:

I have hard time to understand the first paragraph. According to the template a PR description starts with describing the issue. I think turning the first paragraph in issue description can make easier to understand.

Second paragraph starts with:

The join algorithm has been fixed

Will it be better to say the anti-join algorithm. (I understand that the original version is also correct, but I think it's good to be specific)

Were partition-aware keys not allowed before the fix?

I don't see any fixes to testUpdateMapToNullValue. What am I missing?

k-rus · 2026-05-29T08:02:24Z

        ByteBuffer clusteringValue = LongType.instance.getSerializer().serialize(clustering);
        DecoratedKey pk = Murmur3Partitioner.instance.decorateKey(pkValue);
-        Clustering<ByteBuffer> c = clustering == null ? Clustering.EMPTY : Clustering.make(clusteringValue);
+        Clustering<ByteBuffer> c = clustering == null ? Clustering.STATIC_CLUSTERING : Clustering.make(clusteringValue);


Do you mean that the union iterator test was using EMPTY incorrectly and should be using STATIC_CLUSTERING instead?
To me fixing to STATIC_CLUSTERING mean that there should be tests for skinny primary keys. I suggest either fix it now or create an issue to add necessary tests.

Documentation of the method must be fixed to correspond its implementation.

pkolaczk · 2026-05-29T08:50:50Z

Were partition-aware keys not allowed before the fix?

They were allowed and handled incorrectly by the anti join algorithm.
They are still allowed and now they are handled correctly - this is the clou of the fix - avoid false negatives caused by partition-aware (or static) keys.

I don't see any fixes to testUpdateMapToNullValue. What am I missing?

This test was failing on AA version due to a bug in AntiJoinIterator. Now the test is passing.
I also changed it to versioned as Michael suggested.

Will it be better to say the anti-join algorithm. (I understand that the original version is also correct, but I think it's good to be specific)

Good catch!

k-rus · 2026-05-30T06:19:38Z

e.g. a key coming from an AA index or a key
for a static row,

Is there such thing as static row? I understand the meaning, but it requires quite a lot of knowledge, since there is no such thing as static row. May be use static clustering, e.g., a key for static clustering value.
It can be just my issue as an outsider.

plpesvc-ds · 2026-06-01T11:46:13Z

❌ Build ds-cassandra-pr-gate/PR-2435 rejected by Butler

3 regressions found
See build details here

Found 3 new test failures

Test	Explanation	Runs	Upstream
o.a.c.distributed.test.RepairErrorsTest.testRemoteStreamFailure	REGRESSION	🔵🔴	0 / 30
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToManyCompaction[version=ec enableNVQ=true]	REGRESSION	🔴⚪	0 / 30
o.a.c.index.sai.cql.VectorSiftSmallTest.testRerankKZeroOrderMatchesFullPrecisionSimilarity[version=ed enableNVQ=true enableFused=false]	REGRESSION	🔴⚪	0 / 30

Found 8 known test failures

k-rus

I think it's incorrect to apply the change for static clustering. Here is my reasoning.
Am I correct that both inputs to anti-join come from the same index? in such case all rows of a partition will match the static value and, thus, should be excluded.

Also this fix means for partition aware index (AA) means that all row from the left will be included, right? Thus, should anti-join iterator be sued for AA indexes at all? If so, this can be done as follow up to this PR.

Is there a CQL test for anti-join with static clustering? If not, I think it must have.

Some comments on PR description:

Can you complete the last two items in the checklist? Do you think CNDB's PR should contain the release notes?

The PR description has:

Partition-aware keys on both the left and right side
of an anti-join are now handled properly.

Can you expand handled properly?

In the PR description:

CollectionIndexingTest
has been extended to run on multiple index versions.

Can you change from multiple to all, since it actually runs against all index version?

k-rus · 2026-06-02T08:16:35Z

+ * In order to avoid false-negatives, the keys on the left side that can potentially match more than one row, e.g.
+ * keys that match whole partitions with static clustering or keys coming from partition-aware indexes (version AA),
+ * are always included in the result. Similarly, the keys on the right side that can potentially match more than one row
+ * are ignored, as they could match any key, but matching any key is not the same as matching all the keys
+ * in the partition. Hence, we cannot skip the whole partition from the result set in that case.


I would like to see that the reason for false negatives come that iterators, which are input to anti-join, might have false positives, which happen for partition aware index version or for static clustering. The false positive in such cases contain all rows from a partition. The negation in anti-join leads that all rows from the partition are excluded instead of the single row.

This formulation gives actually doubts to me that static clustering needs the fix. Am I correct that both inputs to anti-join come from the same index? in such case all rows of a partition will match the static value and, thus, should be excluded. Do I miss anything?

k-rus · 2026-06-02T10:01:48Z

+            var avgPartitions = 1 + testIteration / 10;
+            var avgRowsPerPartition = 1 + testIteration / 10;


To me the type is not obvious here. Should it be explicit? It also inconsistent with existing code. C* community is also quite conservative on this question.
Note that BobIDE is not able to annotate types.

In Rust I would be on the opposite side and always support implicit typing.

Replaced to int.

Note that BobIDE is not able to annotate types.

As a side note – BobIDE cannot also do many other things including simple find usages / go to definition. In terms of being an IDE for humans it's a lot behind. I use it only for AI assisted generation.

When a KeyRangeAntiJoinIterator gets a non-precise, partition-aware PrimaryKey, e.g. a key coming from an AA index or a key for a static row, it must not assume such key matches all the rows inside the partition, because it is enough for a single row in a partition to be present for the partition to be indexed. So such a key should never be assumed to match any particular row, or otherwise we might miss some rows from the result. The join algorithm has been fixed to not allow false-negatives in such cases. Partition-aware keys on both the left and right side of an anti-join are allowed. The fix makes the queries return correct results even when mixed version indexes are used. New randomized tests have been added and CollectionIndexingTest has been extended to run on multiple index versions. Fixes CollectionIndexingTest#testUpdateMapToNullValue.

Removed identifiesUniqueRow from `PrimaryKey`. Instead of trying to make `KeyRangeAntiJoinIterator` work on AA version keys (by ignoring them), make sure it's never used in those cases.

pkolaczk force-pushed the c15948-aa-neq-query-failure branch 3 times, most recently from 80e1146 to 1e90fe2 Compare May 22, 2026 10:42

pkolaczk requested review from adelapena and k-rus and removed request for adelapena May 22, 2026 11:54

k-rus reviewed May 25, 2026

View reviewed changes

Comment thread src/java/org/apache/cassandra/index/sai/utils/PrimaryKey.java Outdated

k-rus reviewed May 26, 2026

View reviewed changes

k-rus self-requested a review May 27, 2026 15:07

k-rus reviewed May 27, 2026

View reviewed changes

k-rus reviewed May 29, 2026

View reviewed changes

pkolaczk requested a review from k-rus June 1, 2026 10:31

pkolaczk force-pushed the c15948-aa-neq-query-failure branch from 5a03791 to c17f0b4 Compare June 1, 2026 10:42

k-rus reviewed Jun 2, 2026

View reviewed changes

pkolaczk marked this pull request as draft June 8, 2026 07:43

Piotr Kołaczkowski added 2 commits June 8, 2026 19:19

CNDB-15948: Address review remarks

011dc68

pkolaczk force-pushed the c15948-aa-neq-query-failure branch from c17f0b4 to d56f6e9 Compare June 8, 2026 19:51

pkolaczk marked this pull request as ready for review June 8, 2026 19:51

pkolaczk requested a review from k-rus June 8, 2026 19:52

pkolaczk force-pushed the c15948-aa-neq-query-failure branch 2 times, most recently from 4fd7336 to 7de0e6b Compare June 8, 2026 20:00

CNDB-15948: Different, simpler approach

ec28f30

Removed identifiesUniqueRow from `PrimaryKey`. Instead of trying to make `KeyRangeAntiJoinIterator` work on AA version keys (by ignoring them), make sure it's never used in those cases.

pkolaczk force-pushed the c15948-aa-neq-query-failure branch from 7de0e6b to ec28f30 Compare June 8, 2026 20:00

		var avgPartitions = 1 + testIteration / 10;
		var avgRowsPerPartition = 1 + testIteration / 10;

Uh oh!

Conversation

pkolaczk commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

Explanation

What was fixed

Uh oh!

github-actions Bot commented May 22, 2026 • edited by pkolaczk Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist before you submit for review

Uh oh!

sonarqubecloud Bot commented May 22, 2026

Quality Gate passed

Uh oh!

Uh oh!

k-rus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-rus left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

k-rus commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkolaczk commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkolaczk commented May 28, 2026

Uh oh!

k-rus commented May 28, 2026

Uh oh!

k-rus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pkolaczk commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k-rus commented May 30, 2026

Uh oh!

plpesvc-ds commented Jun 1, 2026

❌ Build ds-cassandra-pr-gate/PR-2435 rejected by Butler

Found 3 new test failures

Found 8 known test failures

Uh oh!

k-rus left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

pkolaczk commented May 22, 2026 •

edited

Loading

github-actions Bot commented May 22, 2026 •

edited by pkolaczk

Loading

k-rus left a comment •

edited

Loading

k-rus commented May 28, 2026 •

edited

Loading

pkolaczk commented May 28, 2026 •

edited

Loading

pkolaczk commented May 29, 2026 •

edited

Loading

k-rus left a comment •

edited

Loading