[VL] Enable enhanced tests for spark 4.0 & fix failures#11868
[VL] Enable enhanced tests for spark 4.0 & fix failures#11868infvg wants to merge 1 commit intoapache:mainfrom
Conversation
2887032 to
ef0d7ac
Compare
| auto inputRowVector = batch.getRowVector(); | ||
| auto inputRowType = asRowType(inputRowVector->type()); | ||
|
|
||
| // Filter columns to match the expected schema (rowType_) |
There was a problem hiding this comment.
The metadata columns should append in the end or beginning of the schema, and the number of the columns should be fixed value, so could we simplify the logic?
There was a problem hiding this comment.
And the metadata column name is the specific name, we only need to match the pattern to decide it is metadata column, could you show the example schema to let us understand this issue.
There was a problem hiding this comment.
I think we can simplify it by using fieldIds. Field ID > Integer.MAX - 200 are reserved for metadata columns:
https://iceberg.apache.org/spec/#reserved-field-ids
There was a problem hiding this comment.
We can just splice the columns and remove any additional columns that appear at the end so we don't have to add any loops
There was a problem hiding this comment.
Yes, that's what I want
e6dc9d7 to
c34f5b7
Compare
| // Filter out metadata columns from the Spark output schema and reorder to match Iceberg schema | ||
| // Spark 4.0 may include metadata columns in the output schema during UPDATE operations, | ||
| // but these should not be written to the Iceberg table | ||
| val schemaFieldMap = schema.fields.map(f => f.name -> f).toMap |
There was a problem hiding this comment.
You could use Intellij to debug here, see the writeSchema and schema: StructType difference, also use slice to take some of the columns
a12d8da to
73c9e38
Compare
|
@infvg Thanks for the fix. Please fix the CI. |
| dataSink_->appendData(batch.getRowVector()); | ||
| auto inputRowVector = batch.getRowVector(); | ||
|
|
||
| auto outputRowVector = std::make_shared<RowVector>( |
There was a problem hiding this comment.
Why you need it? Just set the rowType_?
Co-authored-by: Yuan <yuanzhou@apache.org>
73c9e38 to
7704838
Compare
This PR enables enhanced test for spark 4.0 & resolves sql queries in iceberg due to the new metadata columns.