Feature Request / Improvement
StrictMetricsEvaluator.notStartsWith currently always returns ROWS_MIGHT_NOT_MATCH, ignoring file-level column bounds entirely. There is an existing TODO in the code:
// TODO: Handle cases that definitely cannot match, such as notStartsWith("x") when the bounds
// are ["a", "b"].
When the column's lower and upper bounds prove that no value in the file can start with the given prefix, the evaluator should return ROWS_MUST_MATCH. This allows the engine to skip unnecessary row-level filtering for NOT STARTS WITH predicates, improving query performance on string-heavy workloads.
Specifically, the evaluator can determine that all rows match NOT STARTS WITH when:
- The column contains only null values
- The lower bound (truncated to the shorter of prefix/bound length) is strictly greater than the prefix — all values are above the prefix range
- The upper bound (truncated to the shorter of prefix/bound length) is strictly less than the prefix — all values are below the prefix range
This mirrors how notEq and notIn already use bounds in the same evaluator.
Query engine
None
Willingness to contribute
Feature Request / Improvement
StrictMetricsEvaluator.notStartsWith currently always returns ROWS_MIGHT_NOT_MATCH, ignoring file-level column bounds entirely. There is an existing TODO in the code:
When the column's lower and upper bounds prove that no value in the file can start with the given prefix, the evaluator should return ROWS_MUST_MATCH. This allows the engine to skip unnecessary row-level filtering for NOT STARTS WITH predicates, improving query performance on string-heavy workloads.
Specifically, the evaluator can determine that all rows match NOT STARTS WITH when:
This mirrors how notEq and notIn already use bounds in the same evaluator.
Query engine
None
Willingness to contribute