Skip to content

[VL] Spark 4.1: Support Parquet struct field compatibility improvements #11914

@baibaichen

Description

@baibaichen

Backend

VL (Velox)

Bug description

Two Parquet struct field handling issues in GlutenParquetIOSuite on Spark 4.1:

Issue 1: SPARK-53535 — Spark 4.1 added returnNullStructIfAllFieldsMissing parameter for missing struct field handling. Gluten has not adapted to this change.

Issue 2: Vectorized reader missing struct fields — Pre-existing test (SPARK-34863) for vectorized reader complex type handling. SPARK-53535 modified this test, causing it to fail on Spark 4.1 with Gluten.

These are two independent root causes but both only affect Spark 4.1.

Parent issue: #11550

Impact

Suite Exclude spark40 spark41
GlutenParquetIOSuite SPARK-53535 🟢 🔴
GlutenParquetIOSuite vectorized reader: missing all struct fields 🟢 🔴

Note: GlutenParquetIOSuite also has a SPARK-54220 (NullType) exclude tracked separately under the Tracking issue #11910.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions