[CALCITE-7317] SubQueryRemoveRule should skip NULL-safety checks when… #4668

asolimando · 2025-12-05T17:11:53Z

… NOT IN needle is NOT NULL

All details are in CALCITE-7317

… NOT IN needle is NOT NULL

mihaibudiu

I am surprised that no other plans change

… of IN expression in all cases, updated tests

asolimando · 2025-12-05T19:06:36Z

I am surprised that no other plans change

Indeed there are more changes, testAnyInProjectNullable in particular me realize that we should preserve the nullability of the IN expression in all cases, so I improved that.

I have also simplified the detection of null keys, a boolean value was already computed and I don't need the extra method anymore.

With RelOptRuleTest, every time I execute some individual tests in the IDE, there are problems with the regeneration of the computed XML file, so tests seem to pass but it's not executing all of them, it's not the first time I get tricked by that, but I haven't had time to get to the bottom of it, not sure it's Gradle caching or our own machinery.

mihaibudiu · 2025-12-05T21:35:49Z

core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml

-            LogicalAggregate(group=[{0}], c=[COUNT()])
-              LogicalProject(NAME=[$1])
-                LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
+  LogicalFilter(condition=[IS NULL($10)])


these plans are significantly simpler, I hope they are correct too.. not easy to tell

I have tried to explain step by step how I verified that the rewrite is sound (not a formal proof but it will help increase confidence in the change): #4668 (comment)

mihaibudiu · 2025-12-05T21:37:48Z

some of the tests seem to have different results, so maybe something is wrong

…s must be NOT NULL to lift the null-safety checks, updating tests and test results

… it's the CASE -> OR translation

sonarqubecloud · 2025-12-05T22:42:03Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

asolimando · 2025-12-05T23:05:50Z

some of the tests seem to have different results, so maybe something is wrong

I was under the impression that having the key NOT NULL was enough to lift the NULL safety checks, but I was wrong, the wrong results in the quidem test for subquery made me realize it.

This is my example from the Jira ticket (taken from the comments in the class suggesting the optimization):

SELECT e.deptno, e.deptno IN (SELECT deptno FROM emp)
FROM emp AS e

This is the current rewrite without this improvement, let's annotate what each part does:

SELECT e.deptno,
    CASE
    WHEN ct.c = 0 THEN false                     -- (1) empty check
    WHEN e.deptno IS NULL THEN null      -- (2) key NULL check
    WHEN dt.i IS NOT NULL THEN true      -- (3) match found
    WHEN ct.ck < ct.c THEN null                 -- (4) NULLs exist check
    ELSE false                                                -- (5) no match
    END
  FROM emp AS e
  LEFT JOIN (
    (SELECT COUNT(*) AS c, COUNT(deptno) AS ck FROM emp) AS ct
    CROSS JOIN (SELECT DISTINCT deptno, true AS i FROM emp)) AS dt
    ON e.deptno = dt.deptno

used to see if the subquery is empty (as COUNT(*) returns one row anyway): DROP (we will re-check this later)
if we know the key is NOT NULL, we can drop: DROP
this is needed in any case, we wouldn't know when we matched otherwise: KEEP
we can drop, if both sides are NOT NULL, they are trivially identical: DROP
needed to as it's the fallback: KEEP

So we are left with:

SELECT e.deptno,
    CASE
    WHEN dt.i IS NOT NULL THEN true      -- (3) match found
    ELSE false                                                -- (5) no match
    END
  FROM emp AS e
  LEFT JOIN (
    (SELECT COUNT(*) AS c, COUNT(deptno) AS ck FROM emp) AS ct
    CROSS JOIN (SELECT DISTINCT deptno, true AS i FROM emp)) AS dt
    ON e.deptno = dt.deptno

and we can see that we don't use ct anymore, we can drop it, which leaves us with:

SELECT e.deptno,
    CASE
    WHEN dt.i IS NOT NULL THEN true      -- (3) match found
    ELSE false                                                -- (5) no match
    END
  FROM emp AS e
  LEFT JOIN (
    SELECT DISTINCT deptno, true AS i FROM emp)
  AS dt ON e.deptno = dt.deptno

We were wondering about 1. (empty subquery check), but we can see it's not needed here, if emp is empty, then we get all dt.i to be NULL, so we get all false for every row.

It's unfortunate that the example in the original comment uses twice the same table, but it's not hard to see that the whole reasoning hold if you replace the emp in the subquery table with something else.

So in the last commit I have pushed the change checking for both the keys and the subquery columns to be NOT NULL, and thus applying the optimization described, which is not safe if either of the two are NULL, as covered by tests in RelOptRuleTest.

In the top-most commit I have left some separated test changes which are only due to a different order of OR operands after the CASE to OR rewrite, I am not sure where this is coming from, it doesn't alter the semantics.

I hope this helps double checking my understanding and what I implemented in this PR, since subqueries are always a tricky subject.

iwanttobepowerful · 2025-12-06T15:19:07Z

core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml

Could you add the SQL of the modified test cases in RelOptRulesTest.xml‎ to sub-query.iq?

mihaibudiu · 2025-12-06T19:49:53Z

I like the reasoning, could we preserve it somewhere in a comment?

[CALCITE-7317] SubQueryRemoveRule should skip NULL-safety checks when…

f3c584d

… NOT IN needle is NOT NULL

mihaibudiu approved these changes Dec 5, 2025

View reviewed changes

[CALCITE-7317] Simplified nullable keys logic, preserving nullability…

c168d1a

… of IN expression in all cases, updated tests

mihaibudiu approved these changes Dec 5, 2025

View reviewed changes

asolimando added 2 commits December 5, 2025 23:17

[CALCITE-7317] Fixing implementation as both keys and subquery column…

02f6860

…s must be NOT NULL to lift the null-safety checks, updating tests and test results

[CALCITE-7317] Cosmetic test results changes, no semantic difference,…

674439b

… it's the CASE -> OR translation

iwanttobepowerful reviewed Dec 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CALCITE-7317] SubQueryRemoveRule should skip NULL-safety checks when… #4668

[CALCITE-7317] SubQueryRemoveRule should skip NULL-safety checks when… #4668

asolimando commented Dec 5, 2025

Uh oh!

mihaibudiu left a comment

Uh oh!

asolimando commented Dec 5, 2025 •

edited

Loading

Uh oh!

mihaibudiu Dec 5, 2025

Uh oh!

asolimando Dec 5, 2025

Uh oh!

mihaibudiu commented Dec 5, 2025

Uh oh!

sonarqubecloud bot commented Dec 5, 2025

Uh oh!

asolimando commented Dec 5, 2025

Uh oh!

iwanttobepowerful Dec 6, 2025

Uh oh!

mihaibudiu commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[CALCITE-7317] SubQueryRemoveRule should skip NULL-safety checks when… #4668

Are you sure you want to change the base?

[CALCITE-7317] SubQueryRemoveRule should skip NULL-safety checks when… #4668

Conversation

asolimando commented Dec 5, 2025

Uh oh!

mihaibudiu left a comment

Choose a reason for hiding this comment

Uh oh!

asolimando commented Dec 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mihaibudiu Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

asolimando Dec 5, 2025

Choose a reason for hiding this comment

Uh oh!

mihaibudiu commented Dec 5, 2025

Uh oh!

sonarqubecloud bot commented Dec 5, 2025

Quality Gate passed

Uh oh!

asolimando commented Dec 5, 2025

Uh oh!

iwanttobepowerful Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

mihaibudiu commented Dec 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

asolimando commented Dec 5, 2025 •

edited

Loading