Skip to content

[VL] Spark 4.x: JNI and Velox exception handling loses Spark error condition and exception type #11912

@baibaichen

Description

@baibaichen

Backend

VL (Velox)

Bug description

Gluten's exception handling has two layers of information loss that cause ~49 test excludes and 2 disabled suites:

Sub-problem 1: JNI → Java exception rewriting
Java-side exceptions are replaced with GlutenException when crossing the JNI boundary, losing the original exception type and error condition code. Related: #11825.

Sub-problem 2: Velox C++ → Spark exception mapping
Velox errors go through the JNI_METHOD_END macro and are uniformly thrown as GlutenException. Spark expects specific subtypes like SparkRuntimeException with error condition codes (e.g., NOT_NULL_ASSERT_VIOLATION).

Parent issue: #11550

Impact

~49 excludes + 2 TODO suites:

Category Excludes Affected suites
assert_not_null (NOT_NULL_ASSERT_VIOLATION) 14 RuntimeNullChecksV2Writes(7), InsertSuite(1), DataSourceV2SQL(1), Delta/GroupBased(5)
merge cardinality (prefix excludes) 24 (18 real + 6 collateral) DeltaBasedMerge(x2), GroupBasedMerge
raise_error / assert_true 3 ColumnExpressionSuite(2), CachedTableSuite(1)
SparkRuntimeException → SparkException 2 DataFrameFunctionsSuite
Exception class/message mismatch 4 JsonExpressionsSuite(1), SQLQuerySuite(3)
TODO suites 2 GlutenGroupBasedUpdateTableSuite(1), GlutenSimpleSQLViewSuite(1)

Cross-reference

GlutenSimpleSQLViewSuite has 2 failures: 1 belongs to this issue (error condition), 1 belongs to the decimal precision issue (SPARK-53968, spark41-only).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions