Skip to content

[FLINK-39035][format][avro] Support Avro fast-read and column pruning…#27536

Open
low-bee wants to merge 3 commits intoapache:masterfrom
low-bee:avro_format_improve
Open

[FLINK-39035][format][avro] Support Avro fast-read and column pruning…#27536
low-bee wants to merge 3 commits intoapache:masterfrom
low-bee:avro_format_improve

Conversation

@low-bee
Copy link

@low-bee low-bee commented Feb 5, 2026

What is the purpose of the change

To optimize avro format reading performance, about https://issues.apache.org/jira/browse/FLINK-39035

Brief change log

add option for avro

  1. support fastread
  2. support user config writerSchemaString
  3. The concepts of Avro reader and writer were distinguished.

Verifying this change

Unit tests have been added.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @public(Evolving): (no)
The serializers: (yes)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not applicable)

… in AvroDeserializationSchema via configuration
@flinkbot
Copy link
Collaborator

flinkbot commented Feb 5, 2026

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

+ "Avro Fastread improves Avro read speeds by constructing a resolution chain. "
+ "get more information about this feature, please visit https://issues.apache.org/jira/browse/AVRO-3230");

public static final ConfigOption<String> AVRO_WRITER_SCHEMA_STRING =
Copy link
Contributor

@davidradl davidradl Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see any test for this option.

In the Confluent Avro format, which will inherit the Avro options, you can specify a schema to use there, also the Confluent schema registry can supply the real schema. I think we should understand and document which options take precedent.

Also I suggest we say that this writer schema needs to be compatible with the table definition and what that means. I am thinking about

  • compatibility between nullable and non nullable fields
  • what it means for pruning nested columns
  • changing to castable types.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad to see your reply, and I will add more unit tests and documentation as suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants