Skip to content

Add MERGE ON CREATE SET / ON MATCH SET support (#1619)#2347

Open
gregfelice wants to merge 1 commit intoapache:masterfrom
gregfelice:feature_1619_merge_on_set
Open

Add MERGE ON CREATE SET / ON MATCH SET support (#1619)#2347
gregfelice wants to merge 1 commit intoapache:masterfrom
gregfelice:feature_1619_merge_on_set

Conversation

@gregfelice
Copy link
Contributor

Summary

Implements the openCypher-standard ON CREATE SET and ON MATCH SET clauses for the MERGE statement, resolving #1619. This allows conditional property updates depending on whether MERGE created a new path or matched an existing one:

MERGE (n:Person {name: 'Alice'})
  ON CREATE SET n.created = timestamp()
  ON MATCH SET n.updated = timestamp()

Design

The implementation spans parser, planner, and executor:

Parser — New grammar rules (merge_actions_opt, merge_actions, merge_action) in cypher_gram.y. The ON keyword is added to cypher_kwlist.h.

Nodeson_match / on_create lists on cypher_merge, corresponding on_match_set_info / on_create_set_info on cypher_merge_information, and prop_expr on cypher_update_item. All fields serialized through copy/out/read funcs.

Transformcypher_clause.c transforms ON SET items and stores prop_expr for direct expression evaluation.

Executorapply_update_list() is extracted from process_update_list() in cypher_set.c as reusable SET logic. cypher_merge.c calls it at all merge decision points:

Why prop_expr?

The PostgreSQL planner strips target list entries for SET expressions that the CustomScan doesn't reference. This makes prop_position references into the scan tuple dangling. The solution: store the Expr* directly in cypher_update_item->prop_expr and evaluate it with ExecInitExpr / ExecEvalExpr, independent of scan tuple layout. This is only done for MERGE ON SET items — regular SET continues to use prop_position unchanged.

Files changed (12)

File Changes
src/include/parser/cypher_kwlist.h Added ON keyword
src/backend/parser/cypher_gram.y Grammar rules for merge actions
src/include/nodes/cypher_nodes.h Node struct fields for on_match/on_create
src/backend/nodes/cypher_copyfuncs.c Serialization for new fields
src/backend/nodes/cypher_outfuncs.c Serialization for new fields
src/backend/nodes/cypher_readfuncs.c Deserialization for new fields
src/backend/parser/cypher_clause.c Transform ON MATCH/CREATE SET items
src/include/executor/cypher_utils.h State fields + apply_update_list declaration
src/backend/executor/cypher_set.c Extracted apply_update_list() from process_update_list()
src/backend/executor/cypher_merge.c Wired apply_update_list at all merge decision points
regress/sql/cypher_merge.sql Regression tests
regress/expected/cypher_merge.out Expected output

Test plan

  • All 31 existing regression tests pass (no regressions)
  • New tests cover: basic ON CREATE SET, basic ON MATCH SET, combined ON CREATE + ON MATCH, multiple SET items in a single clause, expression evaluation, interaction with WITH clause, edge property updates
  • Verified across all merge execution paths including non-terminal eager buffering (Fix chained MERGE not seeing sibling MERGE's changes (#1446) #2344)
  • Backward compatible — existing MERGE queries without ON SET clauses are unaffected

Closes #1619

Implements the openCypher-standard ON CREATE SET and ON MATCH SET
clauses for the MERGE statement. This allows conditional property
updates depending on whether MERGE created a new path or matched
an existing one:

  MERGE (n:Person {name: 'Alice'})
    ON CREATE SET n.created = timestamp()
    ON MATCH SET n.updated = timestamp()

Implementation spans parser, planner, and executor:

- Grammar: new merge_actions_opt/merge_actions/merge_action rules
  in cypher_gram.y, with ON keyword added to cypher_kwlist.h
- Nodes: on_match/on_create lists on cypher_merge, corresponding
  on_match_set_info/on_create_set_info on cypher_merge_information,
  and prop_expr on cypher_update_item (all serialized through
  copy/out/read funcs)
- Transform: cypher_clause.c transforms ON SET items and stores
  prop_expr for direct expression evaluation
- Executor: cypher_set.c extracts apply_update_list() from
  process_update_list(); cypher_merge.c calls it at all merge
  decision points (simple merge, terminal, non-terminal with
  eager buffering, and first-clause-with-followers paths)

Key design choice: prop_expr stores the Expr* directly in
cypher_update_item rather than using prop_position into the scan
tuple. The planner strips target list entries for SET expressions
that CustomScan doesn't need, making prop_position references
dangling. By storing the expression directly (only for MERGE ON
SET items), we evaluate it with ExecInitExpr/ExecEvalExpr
independent of the scan tuple layout.

Includes regression tests covering: basic ON CREATE SET, basic
ON MATCH SET, combined ON CREATE + ON MATCH, multiple SET items,
expression evaluation, interaction with WITH clause, and edge
property updates.

All 31 regression tests pass.
@gregfelice
Copy link
Contributor Author

Friendly ping — this PR adds MERGE ON CREATE SET / ON MATCH SET support (issue #1619), one of the most requested Cypher features for AGE. This is critical for users migrating from Kuzu (recently archived) and Neo4j. The implementation adds grammar rules, executor support, and full regression test coverage. Would really appreciate a review when someone has bandwidth. Thanks!

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the ON CREATE SET and ON MATCH SET sub-clauses for MERGE statements (openCypher-standard feature, issue #1619). These allow conditional property updates depending on whether MERGE created a new path or matched an existing one.

Changes:

  • New grammar rules (ON keyword, merge_actions_opt/actions/action rules), new node fields (on_match/on_create on cypher_merge; on_match_set_info/on_create_set_info on cypher_merge_information; prop_expr on cypher_update_item) with full serialization support
  • Extracted shared apply_update_list() from the SET executor and wired it into all four MERGE execution paths (simple merge, terminal non-first clause, non-terminal eager-buffering path, first-clause-with-followers)
  • Regression tests covering basic ON CREATE SET, ON MATCH SET, combined clauses, multiple items, reverse ordering, duplicate clause error detection, and edge property updates

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/include/parser/cypher_kwlist.h Adds on as a RESERVED_KEYWORD
src/backend/parser/cypher_gram.y Adds ON token and merge_actions_opt/merge_actions/merge_action grammar rules
src/include/nodes/cypher_nodes.h Adds on_match/on_create to cypher_merge; on_match_set_info/on_create_set_info to cypher_merge_information; prop_expr to cypher_update_item
src/backend/nodes/cypher_copyfuncs.c Copies new prop_expr, on_match_set_info, on_create_set_info fields
src/backend/nodes/cypher_outfuncs.c Serializes new fields; fixes wrong comment (cypher_deletecypher_merge)
src/backend/nodes/cypher_readfuncs.c Deserializes new prop_expr, on_match_set_info, on_create_set_info fields
src/backend/parser/cypher_clause.c Transforms ON MATCH/CREATE SET item lists and stores prop_expr for direct evaluation
src/include/executor/cypher_utils.h Declares apply_update_list(); adds on_match_set_info/on_create_set_info to scan state
src/backend/executor/cypher_set.c Extracts apply_update_list() from process_update_list(); adds prop_expr-based direct evaluation path
src/backend/executor/cypher_merge.c Calls apply_update_list() at all four merge decision points; reorders mark_tts_isnull/ExecStoreVirtualTuple for correctness
regress/sql/cypher_merge.sql Adds regression tests for ON CREATE SET, ON MATCH SET, errors, and cleanup
regress/expected/cypher_merge.out Expected output for new test cases

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +507 to +509
expr_state = ExecInitExpr((Expr *)update_item->prop_expr,
(PlanState *)node);
val = ExecEvalExpr(expr_state, econtext, &isnull);
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ExecInitExpr is called on every invocation of apply_update_list (once per row in the non-terminal MERGE path, lines 640-720 in cypher_merge.c). Expression state initialization should happen once at plan initialization time (in create_cypher_merge_plan_state), not per row.

The established pattern in this codebase is to call ExecInitExpr once during the plan state setup (see cypher_merge.c:175-185 and cypher_create.c:115-125). Pre-initialized ExprState objects should be stored alongside cypher_update_item (e.g., a parallel prop_expr_state field) or in the cypher_merge_custom_scan_state. Calling ExecInitExpr on each row incurs unnecessary memory allocation and state-building overhead that grows linearly with the number of matched or created entities.

Copilot uses AI. Check for mistakes.
Comment on lines +906 to +913
-- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor)
SELECT * FROM cypher('merge_actions', $$
MATCH (a:Person {name: 'Alice'})
MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'})
ON CREATE SET b.source = 'merge_create'
RETURN a.name, b.name, b.source
$$) AS (a agtype, b agtype, source agtype);

Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test labeled "ON MATCH SET with MERGE after MATCH (Case 1: has predecessor)" only tests ON CREATE SET in Case 1 (MERGE with a predecessor clause). There is no test that verifies ON MATCH SET behavior in Case 1 — specifically, running a MATCH ... MERGE ... ON MATCH SET query where the path already exists and the ON MATCH SET clause fires. This execution path (lines 682-685 and 704-708 in cypher_merge.c) is untested.

Suggested change
-- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor)
SELECT * FROM cypher('merge_actions', $$
MATCH (a:Person {name: 'Alice'})
MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'})
ON CREATE SET b.source = 'merge_create'
RETURN a.name, b.name, b.source
$$) AS (a agtype, b agtype, source agtype);
-- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor, first run = create)
SELECT * FROM cypher('merge_actions', $$
MATCH (a:Person {name: 'Alice'})
MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'})
ON CREATE SET b.source = 'merge_create'
ON MATCH SET b.source = 'merge_match'
RETURN a.name, b.name, b.source
$$) AS (a agtype, b agtype, source agtype);
-- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor, second run = match)
SELECT * FROM cypher('merge_actions', $$
MATCH (a:Person {name: 'Alice'})
MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'})
ON CREATE SET b.source = 'merge_create'
ON MATCH SET b.source = 'merge_match'
RETURN a.name, b.name, b.source
$$) AS (a agtype, b agtype, source agtype);

Copilot uses AI. Check for mistakes.
RETURN n.name, n.created, n.matched
$$) AS (name agtype, created agtype, matched agtype);

-- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor)
Copy link

Copilot AI Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "ON MATCH SET with MERGE after MATCH (Case 1: has predecessor)" but the test only uses ON CREATE SET. The comment is misleading — it should say "ON CREATE SET with MERGE after MATCH" or the test should also cover the ON MATCH SET path by re-running the same MERGE query so that the path is found on the second run, triggering ON MATCH SET.

Suggested change
-- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor)
-- ON CREATE SET with MERGE after MATCH (Case 1: has predecessor)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can we add support for ON CREATE SET and ON MATCH SET. (like neo4j)

2 participants