Skip to content

Conversation

@youssef-tharwat
Copy link

Summary

  • Add support for toLower/toUpper (Cypher) and lower/upper (SQL) string functions
  • Change fallback for unsupported functions from lit(0) to ScalarValue::Null to prevent type coercion errors

Problem

When using toLower(s.name) CONTAINS 'x' with integer columns in the RETURN clause, a type coercion error occurred:

type_coercion: There isn't a common type to coerce Int32 and Utf8 in LIKE expression

This happened because toLower was not implemented and fell through to the default case which returned lit(0) (Int32). When combined with CONTAINS (translated to LIKE), DataFusion couldn't coerce Int32 to Utf8.

Solution

  1. Implemented toLower/toUpper using DataFusion's lower()/upper() functions
  2. Changed the fallback for unsupported functions from lit(0) to ScalarValue::Null, which coerces to any type in SQL semantics

Test plan

  • Added unit tests for toLower, toUpper, lower, upper functions
  • Added integration test reproducing the exact bug scenario
  • All 196 existing tests pass

- Add support for toLower/toUpper (Cypher) and lower/upper (SQL) string functions
- Change fallback for unsupported functions from lit(0) to ScalarValue::Null
  to prevent type coercion errors in any context (string or numeric)
- Add unit tests for new string functions
- Add integration tests including exact bug reproduction

Fixes type coercion error when using toLower(s.name) CONTAINS 'x' with
integer columns in RETURN clause.
@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 90.81633% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...t/lance-graph/src/datafusion_planner/expression.rs 90.81% 9 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Collaborator

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@youssef-tharwat Thanks so much for the fix!
I just left very minor comments for the tests. The fix generally looks good to me.

Meanwhile, could you fix the style check error of your PR? Feel free to tag me when you want me to review again.

assert_eq!(name, "expr", "Arithmetic should use generic name");
}

// ========================================================================
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a section "Unit tests for to_df_value_expr()" in the unit tests. Could you append your string function tests directly to that section? You don't need to create a new section here.

Comment on lines +4119 to +4121
// ============================================================================
// String Function Tests (toLower, toUpper)
// ============================================================================
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the previous comment, there has been a section called "String Operator Tests", so you don't need to create a new section. Actually, the new tests have already been in the "String Operator Tests" section.

Comment on lines +4118 to +4121

// ============================================================================
// String Function Tests (toLower, toUpper)
// ============================================================================
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// ============================================================================
// String Function Tests (toLower, toUpper)
// ============================================================================

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants