Skip to content

StrictMetricsEvaluator does not use column bounds to evaluate startsWith #15901

@bharos

Description

@bharos

Feature Request / Improvement

StrictMetricsEvaluator.startsWith currently always returns ROWS_MIGHT_NOT_MATCH, ignoring file-level column bounds entirely.

When the column's lower and upper bounds prove that every value in the file starts with the given prefix, the evaluator should return ROWS_MUST_MATCH. This allows the engine to skip unnecessary row-level filtering for STARTS WITH predicates, improving query performance on string-heavy workloads.

Specifically, the evaluator can determine that all rows match STARTS WITH <prefix> when:

  • The column contains no null values
  • Both the lower and upper bounds start with the prefix (truncated lower/upper each equal the prefix)

This mirrors how eq and the recently added notStartsWith (#15882) already use bounds in the same evaluator.

Query engine

None

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions