-
Notifications
You must be signed in to change notification settings - Fork 2.5k
[CALCITE-7317] SubQueryRemoveRule should skip NULL-safety checks when… #4668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[CALCITE-7317] SubQueryRemoveRule should skip NULL-safety checks when… #4668
Conversation
… NOT IN needle is NOT NULL
mihaibudiu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised that no other plans change
… of IN expression in all cases, updated tests
Indeed there are more changes, I have also simplified the detection of null keys, a boolean value was already computed and I don't need the extra method anymore. With |
| LogicalAggregate(group=[{0}], c=[COUNT()]) | ||
| LogicalProject(NAME=[$1]) | ||
| LogicalTableScan(table=[[CATALOG, SALES, DEPT]]) | ||
| LogicalFilter(condition=[IS NULL($10)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these plans are significantly simpler, I hope they are correct too.. not easy to tell
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to explain step by step how I verified that the rewrite is sound (not a formal proof but it will help increase confidence in the change): #4668 (comment)
|
some of the tests seem to have different results, so maybe something is wrong |
…s must be NOT NULL to lift the null-safety checks, updating tests and test results
… it's the CASE -> OR translation
|
I was under the impression that having the key NOT NULL was enough to lift the NULL safety checks, but I was wrong, the wrong results in the quidem test for subquery made me realize it. This is my example from the Jira ticket (taken from the comments in the class suggesting the optimization): This is the current rewrite without this improvement, let's annotate what each part does:
So we are left with: and we can see that we don't use We were wondering about 1. (empty subquery check), but we can see it's not needed here, if It's unfortunate that the example in the original comment uses twice the same table, but it's not hard to see that the whole reasoning hold if you replace the So in the last commit I have pushed the change checking for both the keys and the subquery columns to be NOT NULL, and thus applying the optimization described, which is not safe if either of the two are NULL, as covered by tests in In the top-most commit I have left some separated test changes which are only due to a different order of OR operands after the I hope this helps double checking my understanding and what I implemented in this PR, since subqueries are always a tricky subject. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add the SQL of the modified test cases in RelOptRulesTest.xml to sub-query.iq?
|
I like the reasoning, could we preserve it somewhere in a comment? |



… NOT IN needle is NOT NULL
All details are in CALCITE-7317