Skip to content

bug(parquet): Disabling global statistics but enabling for particular column breaks reading #4587

@ozgrakkurt

Description

@ozgrakkurt

If I write files with:

.set_statistics_enabled(EnabledStatistics::None)
.set_column_statistics_enabled("block_number".into(), EnabledStatistics::Page)

When I query it with datafusion or just parquet::ParquetRecordBatchReaderBuilder, it errors with: "missing offset index"

Seems like it is skipping writing offset indices if page statistics are globally disabled?

I would expect, if it doesn't write offset indices then it shouldn't try to filter pages by statistics, also it should be documented that set_column_statistics_enabled doesn't override global settings in this way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugparquetChanges to the parquet crate

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions