Add MERGE ON CREATE SET / ON MATCH SET support (#1619)#2347
Add MERGE ON CREATE SET / ON MATCH SET support (#1619)#2347gregfelice wants to merge 1 commit intoapache:masterfrom
Conversation
Implements the openCypher-standard ON CREATE SET and ON MATCH SET
clauses for the MERGE statement. This allows conditional property
updates depending on whether MERGE created a new path or matched
an existing one:
MERGE (n:Person {name: 'Alice'})
ON CREATE SET n.created = timestamp()
ON MATCH SET n.updated = timestamp()
Implementation spans parser, planner, and executor:
- Grammar: new merge_actions_opt/merge_actions/merge_action rules
in cypher_gram.y, with ON keyword added to cypher_kwlist.h
- Nodes: on_match/on_create lists on cypher_merge, corresponding
on_match_set_info/on_create_set_info on cypher_merge_information,
and prop_expr on cypher_update_item (all serialized through
copy/out/read funcs)
- Transform: cypher_clause.c transforms ON SET items and stores
prop_expr for direct expression evaluation
- Executor: cypher_set.c extracts apply_update_list() from
process_update_list(); cypher_merge.c calls it at all merge
decision points (simple merge, terminal, non-terminal with
eager buffering, and first-clause-with-followers paths)
Key design choice: prop_expr stores the Expr* directly in
cypher_update_item rather than using prop_position into the scan
tuple. The planner strips target list entries for SET expressions
that CustomScan doesn't need, making prop_position references
dangling. By storing the expression directly (only for MERGE ON
SET items), we evaluate it with ExecInitExpr/ExecEvalExpr
independent of the scan tuple layout.
Includes regression tests covering: basic ON CREATE SET, basic
ON MATCH SET, combined ON CREATE + ON MATCH, multiple SET items,
expression evaluation, interaction with WITH clause, and edge
property updates.
All 31 regression tests pass.
|
Friendly ping — this PR adds MERGE ON CREATE SET / ON MATCH SET support (issue #1619), one of the most requested Cypher features for AGE. This is critical for users migrating from Kuzu (recently archived) and Neo4j. The implementation adds grammar rules, executor support, and full regression test coverage. Would really appreciate a review when someone has bandwidth. Thanks! |
There was a problem hiding this comment.
Pull request overview
This PR implements the ON CREATE SET and ON MATCH SET sub-clauses for MERGE statements (openCypher-standard feature, issue #1619). These allow conditional property updates depending on whether MERGE created a new path or matched an existing one.
Changes:
- New grammar rules (
ONkeyword,merge_actions_opt/actions/actionrules), new node fields (on_match/on_createoncypher_merge;on_match_set_info/on_create_set_infooncypher_merge_information;prop_exproncypher_update_item) with full serialization support - Extracted shared
apply_update_list()from theSETexecutor and wired it into all four MERGE execution paths (simple merge, terminal non-first clause, non-terminal eager-buffering path, first-clause-with-followers) - Regression tests covering basic ON CREATE SET, ON MATCH SET, combined clauses, multiple items, reverse ordering, duplicate clause error detection, and edge property updates
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
src/include/parser/cypher_kwlist.h |
Adds on as a RESERVED_KEYWORD |
src/backend/parser/cypher_gram.y |
Adds ON token and merge_actions_opt/merge_actions/merge_action grammar rules |
src/include/nodes/cypher_nodes.h |
Adds on_match/on_create to cypher_merge; on_match_set_info/on_create_set_info to cypher_merge_information; prop_expr to cypher_update_item |
src/backend/nodes/cypher_copyfuncs.c |
Copies new prop_expr, on_match_set_info, on_create_set_info fields |
src/backend/nodes/cypher_outfuncs.c |
Serializes new fields; fixes wrong comment (cypher_delete → cypher_merge) |
src/backend/nodes/cypher_readfuncs.c |
Deserializes new prop_expr, on_match_set_info, on_create_set_info fields |
src/backend/parser/cypher_clause.c |
Transforms ON MATCH/CREATE SET item lists and stores prop_expr for direct evaluation |
src/include/executor/cypher_utils.h |
Declares apply_update_list(); adds on_match_set_info/on_create_set_info to scan state |
src/backend/executor/cypher_set.c |
Extracts apply_update_list() from process_update_list(); adds prop_expr-based direct evaluation path |
src/backend/executor/cypher_merge.c |
Calls apply_update_list() at all four merge decision points; reorders mark_tts_isnull/ExecStoreVirtualTuple for correctness |
regress/sql/cypher_merge.sql |
Adds regression tests for ON CREATE SET, ON MATCH SET, errors, and cleanup |
regress/expected/cypher_merge.out |
Expected output for new test cases |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| expr_state = ExecInitExpr((Expr *)update_item->prop_expr, | ||
| (PlanState *)node); | ||
| val = ExecEvalExpr(expr_state, econtext, &isnull); |
There was a problem hiding this comment.
ExecInitExpr is called on every invocation of apply_update_list (once per row in the non-terminal MERGE path, lines 640-720 in cypher_merge.c). Expression state initialization should happen once at plan initialization time (in create_cypher_merge_plan_state), not per row.
The established pattern in this codebase is to call ExecInitExpr once during the plan state setup (see cypher_merge.c:175-185 and cypher_create.c:115-125). Pre-initialized ExprState objects should be stored alongside cypher_update_item (e.g., a parallel prop_expr_state field) or in the cypher_merge_custom_scan_state. Calling ExecInitExpr on each row incurs unnecessary memory allocation and state-building overhead that grows linearly with the number of matched or created entities.
| -- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor) | ||
| SELECT * FROM cypher('merge_actions', $$ | ||
| MATCH (a:Person {name: 'Alice'}) | ||
| MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'}) | ||
| ON CREATE SET b.source = 'merge_create' | ||
| RETURN a.name, b.name, b.source | ||
| $$) AS (a agtype, b agtype, source agtype); | ||
|
|
There was a problem hiding this comment.
The test labeled "ON MATCH SET with MERGE after MATCH (Case 1: has predecessor)" only tests ON CREATE SET in Case 1 (MERGE with a predecessor clause). There is no test that verifies ON MATCH SET behavior in Case 1 — specifically, running a MATCH ... MERGE ... ON MATCH SET query where the path already exists and the ON MATCH SET clause fires. This execution path (lines 682-685 and 704-708 in cypher_merge.c) is untested.
| -- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor) | |
| SELECT * FROM cypher('merge_actions', $$ | |
| MATCH (a:Person {name: 'Alice'}) | |
| MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'}) | |
| ON CREATE SET b.source = 'merge_create' | |
| RETURN a.name, b.name, b.source | |
| $$) AS (a agtype, b agtype, source agtype); | |
| -- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor, first run = create) | |
| SELECT * FROM cypher('merge_actions', $$ | |
| MATCH (a:Person {name: 'Alice'}) | |
| MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'}) | |
| ON CREATE SET b.source = 'merge_create' | |
| ON MATCH SET b.source = 'merge_match' | |
| RETURN a.name, b.name, b.source | |
| $$) AS (a agtype, b agtype, source agtype); | |
| -- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor, second run = match) | |
| SELECT * FROM cypher('merge_actions', $$ | |
| MATCH (a:Person {name: 'Alice'}) | |
| MERGE (a)-[:KNOWS]->(b:Person {name: 'Charlie'}) | |
| ON CREATE SET b.source = 'merge_create' | |
| ON MATCH SET b.source = 'merge_match' | |
| RETURN a.name, b.name, b.source | |
| $$) AS (a agtype, b agtype, source agtype); |
| RETURN n.name, n.created, n.matched | ||
| $$) AS (name agtype, created agtype, matched agtype); | ||
|
|
||
| -- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor) |
There was a problem hiding this comment.
The comment says "ON MATCH SET with MERGE after MATCH (Case 1: has predecessor)" but the test only uses ON CREATE SET. The comment is misleading — it should say "ON CREATE SET with MERGE after MATCH" or the test should also cover the ON MATCH SET path by re-running the same MERGE query so that the path is found on the second run, triggering ON MATCH SET.
| -- ON MATCH SET with MERGE after MATCH (Case 1: has predecessor) | |
| -- ON CREATE SET with MERGE after MATCH (Case 1: has predecessor) |
Summary
Implements the openCypher-standard
ON CREATE SETandON MATCH SETclauses for the MERGE statement, resolving #1619. This allows conditional property updates depending on whether MERGE created a new path or matched an existing one:Design
The implementation spans parser, planner, and executor:
Parser — New grammar rules (
merge_actions_opt,merge_actions,merge_action) incypher_gram.y. TheONkeyword is added tocypher_kwlist.h.Nodes —
on_match/on_createlists oncypher_merge, correspondingon_match_set_info/on_create_set_infooncypher_merge_information, andprop_exproncypher_update_item. All fields serialized through copy/out/read funcs.Transform —
cypher_clause.ctransforms ON SET items and storesprop_exprfor direct expression evaluation.Executor —
apply_update_list()is extracted fromprocess_update_list()incypher_set.cas reusable SET logic.cypher_merge.ccalls it at all merge decision points:Why prop_expr?
The PostgreSQL planner strips target list entries for SET expressions that the CustomScan doesn't reference. This makes
prop_positionreferences into the scan tuple dangling. The solution: store theExpr*directly incypher_update_item->prop_exprand evaluate it withExecInitExpr/ExecEvalExpr, independent of scan tuple layout. This is only done for MERGE ON SET items — regular SET continues to useprop_positionunchanged.Files changed (12)
src/include/parser/cypher_kwlist.hONkeywordsrc/backend/parser/cypher_gram.ysrc/include/nodes/cypher_nodes.hsrc/backend/nodes/cypher_copyfuncs.csrc/backend/nodes/cypher_outfuncs.csrc/backend/nodes/cypher_readfuncs.csrc/backend/parser/cypher_clause.csrc/include/executor/cypher_utils.happly_update_listdeclarationsrc/backend/executor/cypher_set.capply_update_list()fromprocess_update_list()src/backend/executor/cypher_merge.capply_update_listat all merge decision pointsregress/sql/cypher_merge.sqlregress/expected/cypher_merge.outTest plan
Closes #1619