Skip to content

StrictMetricsEvaluator does not use column bounds to evaluate notStartsWith #15882

@bharos

Description

@bharos

Feature Request / Improvement

StrictMetricsEvaluator.notStartsWith currently always returns ROWS_MIGHT_NOT_MATCH, ignoring file-level column bounds entirely. There is an existing TODO in the code:

// TODO: Handle cases that definitely cannot match, such as notStartsWith("x") when the bounds
// are ["a", "b"].

When the column's lower and upper bounds prove that no value in the file can start with the given prefix, the evaluator should return ROWS_MUST_MATCH. This allows the engine to skip unnecessary row-level filtering for NOT STARTS WITH predicates, improving query performance on string-heavy workloads.

Specifically, the evaluator can determine that all rows match NOT STARTS WITH when:

  • The column contains only null values
  • The lower bound (truncated to the shorter of prefix/bound length) is strictly greater than the prefix — all values are above the prefix range
  • The upper bound (truncated to the shorter of prefix/bound length) is strictly less than the prefix — all values are below the prefix range

This mirrors how notEq and notIn already use bounds in the same evaluator.

Query engine

None

Willingness to contribute

  • I can contribute this improvement/feature independently
  • I would be willing to contribute this improvement/feature with guidance from the Iceberg community
  • I cannot contribute this improvement/feature at this time

Metadata

Metadata

Assignees

No one assigned

    Labels

    improvementPR that improves existing functionality

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions