Skip to content

Fix MATCH on brand-new label after CREATE returning 0 rows#2341

Open
gregfelice wants to merge 2 commits intoapache:masterfrom
gregfelice:fix_2193_match_label_cache_timing
Open

Fix MATCH on brand-new label after CREATE returning 0 rows#2341
gregfelice wants to merge 2 commits intoapache:masterfrom
gregfelice:fix_2193_match_label_cache_timing

Conversation

@gregfelice
Copy link
Contributor

@gregfelice gregfelice commented Feb 26, 2026

Issue

Fixes #2193

Problem

When CREATE introduces a new label and a subsequent MATCH references it via WITH, the query returns 0 rows on first execution but works correctly on the second:

SELECT * FROM cypher('test_graph', $$
   CREATE (a:Person {name: 'Jane'}), (b:Person {name: 'Tom'})
   WITH a, b
   MATCH (p:Person)
   RETURN p.name
$$) AS (result agtype);

First execution: 0 rows, EXPLAIN shows One-Time Filter: false
Second execution: 2 rows, EXPLAIN shows Seq Scan on "Person"

Root Cause

match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE's transform has not yet executed (which creates the label table as a side effect), the Person label is not in the label cache. The check concludes the label is invalid and generates a paradoxical WHERE true = false clause, producing a One-Time Filter: false plan that returns no rows.

On the second execution the label table already exists from the first run, so the check passes.

Fix

Defer the label check when the predecessor chain contains DML:

  1. In transform_cypher_match(): skip the early match_check_valid_label() when clause_chain_has_dml(clause->prev) is true (CREATE, SET, DELETE, or MERGE in the predecessor chain).

  2. In transform_cypher_match_pattern(): after transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If labels are still invalid, generate an empty result via makeBoolConst(false, false).

This preserves existing behavior for MATCH without DML predecessors — e.g., MATCH (a) MATCH (a:invalid_label) still produces the proper "multiple labels" error.

EXPLAIN after fix (first execution)

 Nested Loop
   ->  Result
         ->  Custom Scan (Cypher Create)
               ->  Subquery Scan on _age_default_alias_previous_cypher_clause
                     ->  Result
   ->  Seq Scan on "Person" p

Dependencies

This PR is based on PR #2340 (clause_chain_has_dml helper and security_barrier fix for issue #2308).

Regression tests added

  • Reporter's exact case (CREATE two nodes, WITH, MATCH — expects 2 rows)
  • Single CREATE + MATCH on brand-new label
  • CREATE + MATCH when label already exists (second execution)
  • MATCH on non-existent label without DML predecessor (still returns 0 rows)

All 31 regression tests pass.

AI Disclosure

AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.

gregfelice and others added 2 commits February 25, 2026 22:20
When a MATCH clause follows CREATE + WITH and re-uses bound variables
(e.g. CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b)), the MATCH
generates filter quals (age_start_id(e) = age_id(a), etc.) that
reference only columns from the predecessor subquery. PostgreSQL's
optimizer pushes these quals through the transparent subquery layers
into the CREATE's child plan, where they evaluate on NULL values before
CREATE has executed — always yielding 0 rows.

Fix: mark the predecessor subquery RTE as security_barrier when the
clause chain contains a data-modifying operation (CREATE, SET, DELETE,
or MERGE). This prevents PostgreSQL from pushing filter quals into the
subquery, ensuring they evaluate after the DML produces output values.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…che#2193)

When CREATE introduces a new label and a subsequent MATCH references it
(e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns
0 rows on first execution but works on the second.

Root cause: match_check_valid_label() in transform_cypher_match() runs
before transform_prev_cypher_clause() processes the predecessor chain.
Since CREATE has not yet executed its transform (which creates the label
table as a side effect), the label is not in the cache and the check
generates a One-Time Filter: false plan that returns no rows.

Fix: Skip the early label validity check when the predecessor clause
chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE).
After transform_prev_cypher_clause() completes and any new labels exist
in the cache, run a deferred label check. If the labels are still
invalid at that point, generate an empty result via makeBoolConst(false).

This preserves the existing behavior for MATCH without DML predecessors
(e.g., MATCH-MATCH chains still get the early check and proper error
messages for invalid labels).

Depends on: PR apache#2340 (clause_chain_has_dml helper)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@jrgemignani
Copy link
Contributor

@gregfelice Dependencies?

Dependencies
This PR is based on PR #2340 (clause_chain_has_dml helper and security_barrier fix for issue #2308).

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes an execution-time planning bug where MATCH immediately following CREATE ... WITH ... on a brand-new label returns 0 rows on first execution due to label-cache validation running before predecessor clause transformation.

Changes:

  • Defer MATCH label validity checking when the predecessor clause chain contains DML, then re-check after transform_prev_cypher_clause() completes.
  • Add clause_chain_has_dml() helper and use it to gate early label checks and to protect predecessor subqueries from qual pushdown (security barrier).
  • Add/extend regression tests and expected outputs for issues #2193 and #2308.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/backend/parser/cypher_clause.c Defers label validation for MATCH when DML predecessors exist; adds clause-chain DML detection helper; updates MATCH pattern transformation behavior.
regress/sql/cypher_match.sql Adds regression coverage for the reported CREATE/WITH/MATCH cases and related scenarios.
regress/expected/cypher_match.out Updates expected output to match new regression tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2978 to +2988
if (valid_labels)
{
transform_match_pattern(cpstate, query, self->pattern, where);
}
else
{
query->rtable = pstate->p_rtable;
query->rteperminfos = pstate->p_rteperminfos;
query->jointree = makeFromExpr(pstate->p_joinlist,
makeBoolConst(false, false));
}
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When deferred label validation fails (valid_labels == false), the code skips transform_match_pattern() and builds a query that only projects variables from the previous clause. This can break later clauses that reference variables introduced by this MATCH pattern (e.g., CREATE ... WITH a MATCH (p:NoSuchLabel) RETURN p) because p will never be added to the targetList/namespace. Consider still transforming the MATCH pattern to register any new variables/target entries, and then force an empty result by applying a constant-false qual (or equivalent) after the pattern has been transformed.

Copilot uses AI. Check for mistakes.
Comment on lines +1504 to +1526
SELECT * FROM cypher('issue_2193', $$
CREATE (a:Person {name: 'Jane', livesIn: 'London'}),
(b:Person {name: 'Tom', livesIn: 'Copenhagen'})
WITH a, b
MATCH (p:Person)
RETURN p.name
$$) AS (result agtype);

-- Single CREATE + MATCH on brand-new label
SELECT * FROM cypher('issue_2193', $$
CREATE (a:City {name: 'Berlin'})
WITH a
MATCH (c:City)
RETURN c.name
$$) AS (result agtype);

-- MATCH on a label that now exists (second execution) still works
SELECT * FROM cypher('issue_2193', $$
CREATE (a:City {name: 'Paris'})
WITH a
MATCH (c:City)
RETURN c.name
$$) AS (result agtype);
Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression tests added for issue_2193 assert a specific row order (e.g., Jane then Tom; Berlin then Paris) but the Cypher queries don't include an ORDER BY. Without ordering, result order can vary with plan changes (seq scan order, optimizer choices), making the test output potentially flaky. Consider adding an ORDER BY on the returned expression (or changing assertions to use counts/sets) to make the expected output deterministic.

Copilot uses AI. Check for mistakes.
MATCH (x:NonExistentLabel)
RETURN x
$$) AS (result agtype);

Copy link

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new label-deferral logic introduces a new branch for “DML predecessor + invalid label”. There’s no regression test covering a query that (1) has a DML predecessor and (2) references a non-existent label in MATCH while still returning a MATCH-introduced variable (e.g., CREATE ... WITH a MATCH (p:NoSuchLabel) RETURN p). Adding a test for this case would help catch planner/namespace regressions in the deferred-label path.

Suggested change
-- MATCH on non-existent label after DML predecessor still returns 0 rows
SELECT * FROM cypher('issue_2193', $$
CREATE (a:Person {name: 'Alice'})
WITH a
MATCH (p:NonExistentLabel)
RETURN p
$$) AS (result agtype);

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@jrgemignani jrgemignani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to rebase this after #2340

@jrgemignani
Copy link
Contributor

jrgemignani commented Feb 27, 2026

@gregfelice Please see Copilot's comments. Additionally, this PR likely needs to be rebased after #2340 has been merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

create … with … The execution plan of match is abnormal

3 participants