Fix MATCH on brand-new label after CREATE returning 0 rows by gregfelice · Pull Request #2341 · apache/age

gregfelice · 2026-02-26T04:05:43Z

Issue

Problem

When CREATE introduces a new label and a subsequent MATCH references it via WITH, the query returns 0 rows on first execution but works correctly on the second:

SELECT * FROM cypher('test_graph', $$
   CREATE (a:Person {name: 'Jane'}), (b:Person {name: 'Tom'})
   WITH a, b
   MATCH (p:Person)
   RETURN p.name
$$) AS (result agtype);

First execution: 0 rows, EXPLAIN shows One-Time Filter: false
Second execution: 2 rows, EXPLAIN shows Seq Scan on "Person"

Root Cause

match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE's transform has not yet executed (which creates the label table as a side effect), the Person label is not in the label cache. The check concludes the label is invalid and generates a paradoxical WHERE true = false clause, producing a One-Time Filter: false plan that returns no rows.

On the second execution the label table already exists from the first run, so the check passes.

Fix

Defer the label check when the predecessor chain contains DML:

In transform_cypher_match(): skip the early match_check_valid_label() when clause_chain_has_dml(clause->prev) is true (CREATE, SET, DELETE, or MERGE in the predecessor chain).
In transform_cypher_match_pattern(): after transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If labels are still invalid, generate an empty result via makeBoolConst(false, false).

This preserves existing behavior for MATCH without DML predecessors — e.g., MATCH (a) MATCH (a:invalid_label) still produces the proper "multiple labels" error.

EXPLAIN after fix (first execution)

 Nested Loop
   ->  Result
         ->  Custom Scan (Cypher Create)
               ->  Subquery Scan on _age_default_alias_previous_cypher_clause
                     ->  Result
   ->  Seq Scan on "Person" p

Dependencies

This PR is based on PR #2340 (clause_chain_has_dml helper and security_barrier fix for issue #2308).

Regression tests added

Reporter's exact case (CREATE two nodes, WITH, MATCH — expects 2 rows)
Single CREATE + MATCH on brand-new label
CREATE + MATCH when label already exists (second execution)
MATCH on non-existent label without DML predecessor (still returns 0 rows)

All 31 regression tests pass.

AI Disclosure

AI tools (Claude by Anthropic) were used to assist in developing this fix, including root cause analysis, code changes, and regression tests.

When a MATCH clause follows CREATE + WITH and re-uses bound variables (e.g. CREATE (a)-[e]->(b) WITH a,e,b MATCH p=(a)-[e]->(b)), the MATCH generates filter quals (age_start_id(e) = age_id(a), etc.) that reference only columns from the predecessor subquery. PostgreSQL's optimizer pushes these quals through the transparent subquery layers into the CREATE's child plan, where they evaluate on NULL values before CREATE has executed — always yielding 0 rows. Fix: mark the predecessor subquery RTE as security_barrier when the clause chain contains a data-modifying operation (CREATE, SET, DELETE, or MERGE). This prevents PostgreSQL from pushing filter quals into the subquery, ensuring they evaluate after the DML produces output values. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…che#2193) When CREATE introduces a new label and a subsequent MATCH references it (e.g., CREATE (:Person) WITH ... MATCH (p:Person)), the query returns 0 rows on first execution but works on the second. Root cause: match_check_valid_label() in transform_cypher_match() runs before transform_prev_cypher_clause() processes the predecessor chain. Since CREATE has not yet executed its transform (which creates the label table as a side effect), the label is not in the cache and the check generates a One-Time Filter: false plan that returns no rows. Fix: Skip the early label validity check when the predecessor clause chain contains a data-modifying operation (CREATE, SET, DELETE, MERGE). After transform_prev_cypher_clause() completes and any new labels exist in the cache, run a deferred label check. If the labels are still invalid at that point, generate an empty result via makeBoolConst(false). This preserves the existing behavior for MATCH without DML predecessors (e.g., MATCH-MATCH chains still get the early check and proper error messages for invalid labels). Depends on: PR apache#2340 (clause_chain_has_dml helper) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jrgemignani · 2026-02-27T17:16:41Z

@gregfelice Dependencies?

Dependencies
This PR is based on PR #2340 (clause_chain_has_dml helper and security_barrier fix for issue #2308).

Copilot

Pull request overview

Fixes an execution-time planning bug where MATCH immediately following CREATE ... WITH ... on a brand-new label returns 0 rows on first execution due to label-cache validation running before predecessor clause transformation.

Changes:

Defer MATCH label validity checking when the predecessor clause chain contains DML, then re-check after transform_prev_cypher_clause() completes.
Add clause_chain_has_dml() helper and use it to gate early label checks and to protect predecessor subqueries from qual pushdown (security barrier).
Add/extend regression tests and expected outputs for issues #2193 and #2308.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
src/backend/parser/cypher_clause.c	Defers label validation for MATCH when DML predecessors exist; adds clause-chain DML detection helper; updates MATCH pattern transformation behavior.
regress/sql/cypher_match.sql	Adds regression coverage for the reported CREATE/WITH/MATCH cases and related scenarios.
regress/expected/cypher_match.out	Updates expected output to match new regression tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-27T17:22:13Z

src/backend/parser/cypher_clause.c

+        if (valid_labels)
+        {
+            transform_match_pattern(cpstate, query, self->pattern, where);
+        }
+        else
+        {
+            query->rtable = pstate->p_rtable;
+            query->rteperminfos = pstate->p_rteperminfos;
+            query->jointree = makeFromExpr(pstate->p_joinlist,
+                                           makeBoolConst(false, false));
+        }


When deferred label validation fails (valid_labels == false), the code skips transform_match_pattern() and builds a query that only projects variables from the previous clause. This can break later clauses that reference variables introduced by this MATCH pattern (e.g., CREATE ... WITH a MATCH (p:NoSuchLabel) RETURN p) because p will never be added to the targetList/namespace. Consider still transforming the MATCH pattern to register any new variables/target entries, and then force an empty result by applying a constant-false qual (or equivalent) after the pattern has been transformed.

Copilot · 2026-02-27T17:22:14Z

regress/sql/cypher_match.sql

+SELECT * FROM cypher('issue_2193', $$
+    CREATE (a:Person {name: 'Jane', livesIn: 'London'}),
+           (b:Person {name: 'Tom', livesIn: 'Copenhagen'})
+    WITH a, b
+    MATCH (p:Person)
+    RETURN p.name
+$$) AS (result agtype);
+
+-- Single CREATE + MATCH on brand-new label
+SELECT * FROM cypher('issue_2193', $$
+    CREATE (a:City {name: 'Berlin'})
+    WITH a
+    MATCH (c:City)
+    RETURN c.name
+$$) AS (result agtype);
+
+-- MATCH on a label that now exists (second execution) still works
+SELECT * FROM cypher('issue_2193', $$
+    CREATE (a:City {name: 'Paris'})
+    WITH a
+    MATCH (c:City)
+    RETURN c.name
+$$) AS (result agtype);


The regression tests added for issue_2193 assert a specific row order (e.g., Jane then Tom; Berlin then Paris) but the Cypher queries don't include an ORDER BY. Without ordering, result order can vary with plan changes (seq scan order, optimizer choices), making the test output potentially flaky. Consider adding an ORDER BY on the returned expression (or changing assertions to use counts/sets) to make the expected output deterministic.

Copilot · 2026-02-27T17:22:14Z

regress/sql/cypher_match.sql

+    MATCH (x:NonExistentLabel)
+    RETURN x
+$$) AS (result agtype);
+


The new label-deferral logic introduces a new branch for “DML predecessor + invalid label”. There’s no regression test covering a query that (1) has a DML predecessor and (2) references a non-existent label in MATCH while still returning a MATCH-introduced variable (e.g., CREATE ... WITH a MATCH (p:NoSuchLabel) RETURN p). Adding a test for this case would help catch planner/namespace regressions in the deferred-label path.

Suggested change

-- MATCH on non-existent label after DML predecessor still returns 0 rows

SELECT * FROM cypher('issue_2193', $$

CREATE (a:Person {name: 'Alice'})

WITH a

MATCH (p:NonExistentLabel)

RETURN p

$$) AS (result agtype);

jrgemignani

We may need to rebase this after #2340

jrgemignani · 2026-02-27T17:28:17Z

@gregfelice Please see Copilot's comments. Additionally, this PR likely needs to be rebased after #2340 has been merged.

gregfelice and others added 2 commits February 25, 2026 22:20

gregfelice mentioned this pull request Feb 26, 2026

create … with … The execution plan of match is abnormal #2193

Open

jrgemignani approved these changes Feb 27, 2026

View reviewed changes

jrgemignani requested a review from Copilot February 27, 2026 17:12

Copilot started reviewing on behalf of jrgemignani February 27, 2026 17:16 View session

Copilot AI reviewed Feb 27, 2026

View reviewed changes

jrgemignani requested changes Feb 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix MATCH on brand-new label after CREATE returning 0 rows#2341

Fix MATCH on brand-new label after CREATE returning 0 rows#2341
gregfelice wants to merge 2 commits intoapache:masterfrom
gregfelice:fix_2193_match_label_cache_timing

gregfelice commented Feb 26, 2026 •

edited

Loading

Uh oh!

jrgemignani commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

Copilot AI Feb 27, 2026

Uh oh!

jrgemignani left a comment •

edited

Loading

Uh oh!

jrgemignani commented Feb 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

gregfelice commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue

Problem

Root Cause

Fix

EXPLAIN after fix (first execution)

Dependencies

Regression tests added

AI Disclosure

Uh oh!

jrgemignani commented Feb 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 27, 2026

Choose a reason for hiding this comment

Uh oh!

jrgemignani left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jrgemignani commented Feb 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gregfelice commented Feb 26, 2026 •

edited

Loading

jrgemignani left a comment •

edited

Loading

jrgemignani commented Feb 27, 2026 •

edited

Loading