Optimize native shuffle to write schema once per output partition instead of once per batch

### What is the problem the feature request solves?

Native shuffle currently encodes the name of the compression codec and the IPC schema once per batch. Storing the schema per batch was originally a requirement because the schema could vary between batches due to dictionary encoding but this is no longer the case.

### Describe the potential solution

Write the compression codec name and schema once per partittion. This can be implemented in `ShuffleBlockWriter::try_new`. Update `ShuffleBlockWriter::write_batch` to no longer write the header per batch.

Make corresponding changes in the reader `NativeBlockDecoderIterator.scala`.

We need to make sure that this approach works correctly with spilling as well.

### Additional context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize native shuffle to write schema once per output partition instead of once per batch #2928

What is the problem the feature request solves?

Describe the potential solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize native shuffle to write schema once per output partition instead of once per batch #2928

Description

What is the problem the feature request solves?

Describe the potential solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions