Skip to content

DataFrame(v3): adopt nested row to separate system and user properties #290

@zipdoki

Description

@zipdoki

Summary

DataFrame(v3) currently represents each edge as a flat row where system properties (version, source, target, direction, ...) and properties share a single namespace. When a property happens to use the same name as a system property, it overwrites the system property at row construction time, and any downstream code that reads the system property with its declared type fails.

A short-term workaround landed in #283: colliding property keys are wrapped in backticks (`version`) inside the row and unwrapped at the EdgePayload boundary. This unblocks the immediate breakage but does not address the underlying row design.

This issue proposes the structural fix: adopt a nested row representation, with system properties and properties placed in separate namespaces.

Background

The flat row layout is inherited from DataFrame(v2), where system properties used short names (ts / src / tgt / dir) that were unlikely to collide with property names. The v3 schema renamed them to version / source / target / direction, which are common enough names that real-world tables define properties with the exact same names. The collision was first observed there.

flat row (current) — Row.data: Map<String, Any?>
{
  "version" -> <Edge.version, Long>,
  "source"  -> "...",
  "target"  -> "...",
  "version" -> <properties.version, String>   # second put() overwrites
  ...
}

Proposal

Make Row a nested structure so that system properties and properties are accessed as distinct sub-rows:

nested row (proposed)
{
  "version": <Edge.version, Long>,
  "source": "...",
  "target": "...",
  "properties": {                             # nested row
    "version": <properties.version, String>,
    ...
  }
}

This matches the externally observable EdgePayload shape (where properties is already nested), so the boundary translation in QueryService becomes a straight pass-through instead of a key rewrite, and the backtick workaround introduced in #283 can be removed.

Scope

  • engine/sql: DataFrame / Row / StructType — support nested rows and nested field types.
  • TableBinding / V2BackedTableBinding: materialize the v2 flat row into the new nested row layout.
  • QueryService: DataFrameEdgePayload conversion (simplified — no key rewrite).

Backward compatibility

The change is internal — EdgePayload (the public response shape) is unchanged. Once the nested row representation lands, the backtick escape/unescape from #283 can be removed.

References

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions