fix: use writer types in Skipper for resolved named record types by ariel-miculas · Pull Request #9605 · apache/arrow-rs

ariel-miculas · 2026-03-23T15:26:38Z

Which issue does this PR close?

Rationale for this change

When a writer-only field references a named Avro type that was previously resolved against a reader schema, parse_type returns the registered reader-resolved type from the shared resolver. This caused two problems:

The Skipper built its struct sub-skippers from the reader's field list, which omits writer-only fields. Their bytes were never consumed, leaving the cursor at the wrong position for all subsequent records.
Reader fields carry resolution-induced nullability (e.g. a writer plain long matched against a reader ["null", long] gains nullability = Some(NullFirst)). The Skipper read a union-tag byte that was never written, causing "Unexpected EOF" errors.

Fix: store the writer's data type in ResolvedField::ToReader alongside the reader index. The Skipper's Codec::Struct arm now iterates rec.writer_fields and uses the writer type from every entry - both ToReader(_, wdt) and Skip(wdt) - so it always follows the writer's wire format.

What changes are included in this PR?

Are these changes tested?

Yes, added unit tests.

Are there any user-facing changes?

No

alamb · 2026-03-26T22:11:51Z

FYI @jecsand838

When a writer-only field references a named Avro type that was previously resolved against a reader schema, `parse_type` returns the registered reader-resolved type from the shared resolver. This caused two problems: 1. The Skipper built its struct sub-skippers from the reader's field list, which omits writer-only fields. Their bytes were never consumed, leaving the cursor at the wrong position for all subsequent records. 2. Reader fields carry resolution-induced nullability (e.g. a writer plain `long` matched against a reader `["null", long]` gains `nullability = Some(NullFirst)`). The Skipper read a union-tag byte that was never written, causing "Unexpected EOF" errors. Fix: store the writer's data type in `ResolvedField::ToReader` alongside the reader index. The Skipper's `Codec::Struct` arm now iterates `rec.writer_fields` and uses the writer type from every entry - both `ToReader(_, wdt)` and `Skip(wdt)` - so it always follows the writer's wire format.

ariel-miculas · 2026-03-31T11:03:12Z

fixed formatting and rebased onto main;
@jecsand838 can you please take a look?

jecsand838 · 2026-04-01T20:00:09Z

@ariel-miculas

I think the correctness fix here is right.

One improvement I'd recommend is pushing the writer-wire planning fully into codec.rs, instead of carrying full AvroDataTypes down into record.rs and having Skipper::from_avro reconstruct the skip tree there. However this can always be done in a follow-up as well.

ariel-miculas · 2026-04-01T21:06:17Z

However this can always be done in a follow-up as well.

I'd prefer it this way, since refactoring would take a bit more consideration.

alamb

Thank you for this PR @ariel-miculas 🙏

I defer to @jecsand838 's opinion on code structure

Before approving this PR I would would request that we:

Try and make the tests easier to understand (see comemnts below)
File a ticket that explains what the end user of this crate would see before this code fix (I can't quite tell from the PR description which focuses on what the code does, not the end user visible behavior)

It would also be nice to file a ticket to track @jecsand838 's suggestion for a different structure

alamb · 2026-04-01T21:35:59Z

arrow-avro/src/reader/record.rs

+    /// writer's wire format for that type. Here the reader wraps the Timestamp's scalar
+    /// fields in nullable unions, but the writer wrote them as plain values; the Skipper
+    /// must not add a union-tag read for each field.
+    #[test]


these tests seem pretty repetitive and thus it is hard for me to understand what they are supposed to be testing and validate what is different between tehm

Can you please refactor the common functionality into helper functions so the tests themselves are easier to understand and validate ?

github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Mar 23, 2026

ariel-miculas added 2 commits March 31, 2026 13:49

fix: formatting

1902d38

ariel-miculas force-pushed the fix-named-type-ref-fields branch from a768913 to 1902d38 Compare March 31, 2026 10:53

alamb reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use writer types in Skipper for resolved named record types#9605

fix: use writer types in Skipper for resolved named record types#9605
ariel-miculas wants to merge 2 commits intoapache:mainfrom
ariel-miculas:fix-named-type-ref-fields

ariel-miculas commented Mar 23, 2026

Uh oh!

alamb commented Mar 26, 2026

Uh oh!

ariel-miculas commented Mar 31, 2026

Uh oh!

jecsand838 commented Apr 1, 2026 •

edited

Loading

Uh oh!

ariel-miculas commented Apr 1, 2026

Uh oh!

alamb left a comment

Uh oh!

alamb Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ariel-miculas commented Mar 23, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

alamb commented Mar 26, 2026

Uh oh!

ariel-miculas commented Mar 31, 2026

Uh oh!

jecsand838 commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ariel-miculas commented Apr 1, 2026

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Uh oh!

alamb Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jecsand838 commented Apr 1, 2026 •

edited

Loading