Skip to content

Support nested REE in arrow-ord partition function#9642

Open
liamzwbao wants to merge 1 commit intoapache:mainfrom
liamzwbao:issue-9640-support-nested-ree-partition
Open

Support nested REE in arrow-ord partition function#9642
liamzwbao wants to merge 1 commit intoapache:mainfrom
liamzwbao:issue-9640-support-nested-ree-partition

Conversation

@liamzwbao
Copy link
Copy Markdown
Contributor

@liamzwbao liamzwbao commented Apr 1, 2026

Which issue does this PR close?

Rationale for this change

Although rare, it's totally valid to have nested REE and dict encoding and we should handle it correctly.

What changes are included in this PR?

Process nested REE, nested dict, dict of REE, REE of dict gracefully

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 1, 2026
@liamzwbao liamzwbao marked this pull request as ready for review April 1, 2026 01:56
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me -- thank you @liamzwbao

I think it would be nice to move supports_distinct but we can do that as a follow on if you prefer

/// `compare_op` unwraps at most one level of dictionary, then dispatches on
/// the leaf type. Anything else (REE, nested dictionary, nested/complex types)
/// must go through `make_comparator` instead.
fn supports_distinct(dt: &DataType) -> bool {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice minimal code change 👍

Ok(Partitions(Some(acc)))
}

/// Returns true if `distinct` (via `compare_op`) can handle this data type.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would make more sense to put next to distinct rather than in this module

So that owuld mean moving to arrow-ord/src/cmp.rs I think (and making it pub crate)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support nested REE in arrow-ord partition function

2 participants