Skip to content

feat(arrow-array): add GenericByteViewArray::total_bytes_len#9641

Open
hcrosse wants to merge 1 commit intoapache:mainfrom
hcrosse:feat/total-bytes-len-byte-view-array
Open

feat(arrow-array): add GenericByteViewArray::total_bytes_len#9641
hcrosse wants to merge 1 commit intoapache:mainfrom
hcrosse:feat/total-bytes-len-byte-view-array

Conversation

@hcrosse
Copy link
Copy Markdown

@hcrosse hcrosse commented Apr 1, 2026

Which issue does this PR close?

Rationale for this change

total_buffer_bytes_used() only counts non-inlined strings (> 12 bytes), so it returns 0 for arrays of short strings. This makes it unsuitable as a capacity hint when pre-allocating output buffers (e.g. in DataFusion's concat()/concat_ws()).

What changes are included in this PR?

Adds total_bytes_len() to GenericByteViewArray, which sums byte lengths of all non-null values including inlined strings.

Are there any user-facing changes?

New public method on GenericByteViewArray (and by extension StringViewArray / BinaryViewArray).

Sums byte lengths of all non-null values, including inlined strings that
total_buffer_bytes_used() skips. Closes apache#9435
@github-actions github-actions bot added the arrow Changes to the arrow crate label Apr 1, 2026
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @hcrosse

This is great

FYI @neilconway

@neilconway
Copy link
Copy Markdown

🎉 Awesome, thank you @hcrosse !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GenericByteViewArray: support finding total length of all strings

3 participants