Fix MATCH on brand-new label after CREATE returning 0 rows#2341
Fix MATCH on brand-new label after CREATE returning 0 rows#2341gregfelice wants to merge 2 commits intoapache:masterfrom
Conversation
When a MATCH clause follows CREATE + WITH and re-uses bound variables (e.g. CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b)), the MATCH generates filter quals (age_start_id(e) = age_id(a), etc.) that reference only columns from the predecessor subquery. PostgreSQL's optimizer pushes these quals through the transparent subquery layers into the CREATE's child plan, where they evaluate on NULL values before CREATE has executed — always yielding 0 rows. Fix: mark the predecessor subquery RTE as security_barrier when the clause chain contains a data-modifying operation (CREATE, SET, DELETE, or MERGE). This prevents PostgreSQL from pushing filter quals into the subquery, ensuring they evaluate after the DML produces output values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…che#2193) When CREATE introduces a new label and a subsequent MATCH references it (e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns 0 rows on first execution but works on the second. Root cause: match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE has not yet executed its transform (which creates the label table as a side effect), the label is not in the cache and the check generates a One-Time Filter: false plan that returns no rows. Fix: Skip the early label validity check when the predecessor clause chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE). After transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If the labels are still invalid at that point, generate an empty result via makeBoolConst(false). This preserves the existing behavior for MATCH without DML predecessors (e.g., MATCH-MATCH chains still get the early check and proper error messages for invalid labels). Depends on: PR apache#2340 (clause_chain_has_dml helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@gregfelice Dependencies?
|
There was a problem hiding this comment.
Pull request overview
Fixes an execution-time planning bug where MATCH immediately following CREATE ... WITH ... on a brand-new label returns 0 rows on first execution due to label-cache validation running before predecessor clause transformation.
Changes:
- Defer
MATCHlabel validity checking when the predecessor clause chain contains DML, then re-check aftertransform_prev_cypher_clause()completes. - Add
clause_chain_has_dml()helper and use it to gate early label checks and to protect predecessor subqueries from qual pushdown (security barrier). - Add/extend regression tests and expected outputs for issues #2193 and #2308.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/backend/parser/cypher_clause.c | Defers label validation for MATCH when DML predecessors exist; adds clause-chain DML detection helper; updates MATCH pattern transformation behavior. |
| regress/sql/cypher_match.sql | Adds regression coverage for the reported CREATE/WITH/MATCH cases and related scenarios. |
| regress/expected/cypher_match.out | Updates expected output to match new regression tests. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (valid_labels) | ||
| { | ||
| transform_match_pattern(cpstate, query, self->pattern, where); | ||
| } | ||
| else | ||
| { | ||
| query->rtable = pstate->p_rtable; | ||
| query->rteperminfos = pstate->p_rteperminfos; | ||
| query->jointree = makeFromExpr(pstate->p_joinlist, | ||
| makeBoolConst(false, false)); | ||
| } |
There was a problem hiding this comment.
When deferred label validation fails (valid_labels == false), the code skips transform_match_pattern() and builds a query that only projects variables from the previous clause. This can break later clauses that reference variables introduced by this MATCH pattern (e.g., CREATE ... WITH a MATCH (p:NoSuchLabel) RETURN p) because p will never be added to the targetList/namespace. Consider still transforming the MATCH pattern to register any new variables/target entries, and then force an empty result by applying a constant-false qual (or equivalent) after the pattern has been transformed.
| SELECT * FROM cypher('issue_2193', $$ | ||
| CREATE (a:Person {name: 'Jane', livesIn: 'London'}), | ||
| (b:Person {name: 'Tom', livesIn: 'Copenhagen'}) | ||
| WITH a, b | ||
| MATCH (p:Person) | ||
| RETURN p.name | ||
| $$) AS (result agtype); | ||
|
|
||
| -- Single CREATE + MATCH on brand-new label | ||
| SELECT * FROM cypher('issue_2193', $$ | ||
| CREATE (a:City {name: 'Berlin'}) | ||
| WITH a | ||
| MATCH (c:City) | ||
| RETURN c.name | ||
| $$) AS (result agtype); | ||
|
|
||
| -- MATCH on a label that now exists (second execution) still works | ||
| SELECT * FROM cypher('issue_2193', $$ | ||
| CREATE (a:City {name: 'Paris'}) | ||
| WITH a | ||
| MATCH (c:City) | ||
| RETURN c.name | ||
| $$) AS (result agtype); |
There was a problem hiding this comment.
The regression tests added for issue_2193 assert a specific row order (e.g., Jane then Tom; Berlin then Paris) but the Cypher queries don't include an ORDER BY. Without ordering, result order can vary with plan changes (seq scan order, optimizer choices), making the test output potentially flaky. Consider adding an ORDER BY on the returned expression (or changing assertions to use counts/sets) to make the expected output deterministic.
| MATCH (x:NonExistentLabel) | ||
| RETURN x | ||
| $$) AS (result agtype); | ||
|
|
There was a problem hiding this comment.
The new label-deferral logic introduces a new branch for “DML predecessor + invalid label”. There’s no regression test covering a query that (1) has a DML predecessor and (2) references a non-existent label in MATCH while still returning a MATCH-introduced variable (e.g., CREATE ... WITH a MATCH (p:NoSuchLabel) RETURN p). Adding a test for this case would help catch planner/namespace regressions in the deferred-label path.
| -- MATCH on non-existent label after DML predecessor still returns 0 rows | |
| SELECT * FROM cypher('issue_2193', $$ | |
| CREATE (a:Person {name: 'Alice'}) | |
| WITH a | |
| MATCH (p:NonExistentLabel) | |
| RETURN p | |
| $$) AS (result agtype); |
|
@gregfelice Please see Copilot's comments. Additionally, this PR likely needs to be rebased after #2340 has been merged. |
Issue
Fixes #2193
Problem
When
CREATEintroduces a new label and a subsequentMATCHreferences it viaWITH, the query returns 0 rows on first execution but works correctly on the second:First execution: 0 rows,
EXPLAINshowsOne-Time Filter: falseSecond execution: 2 rows,
EXPLAINshowsSeq Scan on "Person"Root Cause
match_check_valid_label()intransform_cypher_match()runs beforetransform_prev_cypher_clause()processes the predecessor chain. Since CREATE's transform has not yet executed (which creates the label table as a side effect), thePersonlabel is not in the label cache. The check concludes the label is invalid and generates a paradoxicalWHERE true = falseclause, producing aOne-Time Filter: falseplan that returns no rows.On the second execution the label table already exists from the first run, so the check passes.
Fix
Defer the label check when the predecessor chain contains DML:
In
transform_cypher_match(): skip the earlymatch_check_valid_label()whenclause_chain_has_dml(clause->prev)is true (CREATE, SET, DELETE, or MERGE in the predecessor chain).In
transform_cypher_match_pattern(): aftertransform_prev_cypher_clause()completes and any new labels exist in the cache, run a deferred label check. If labels are still invalid, generate an empty result viamakeBoolConst(false, false).This preserves existing behavior for MATCH without DML predecessors — e.g.,
MATCH (a) MATCH (a:invalid_label)still produces the proper "multiple labels" error.EXPLAIN after fix (first execution)
Dependencies
This PR is based on PR #2340 (
clause_chain_has_dmlhelper andsecurity_barrierfix for issue #2308).Regression tests added
All 31 regression tests pass.
AI Disclosure
AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.