[TASK-429] Pre-size Arrow builders and recommend jemalloc for write path#430
[TASK-429] Pre-size Arrow builders and recommend jemalloc for write path#430luoyuxia merged 4 commits intoapache:mainfrom
Conversation
|
@luoyuxia @leekeiabstraction PTAL 🙏 |
leekeiabstraction
left a comment
There was a problem hiding this comment.
TY for the PR, left some comments
|
@leekeiabstraction Addressed magic number, PTAL 🙏 |
leekeiabstraction
left a comment
There was a problem hiding this comment.
Approved, TY for the PR
There was a problem hiding this comment.
Pull request overview
This PR implements two performance optimizations for the write path: pre-sizing Arrow column builders to DEFAULT_MAX_RECORD (256) capacity to eliminate growth reallocations, and documenting jemalloc (tikv-jemallocator) as the recommended allocator for production Linux deployments with an accompanying example setup.
Changes:
- Pre-size all Arrow column builders (
with_capacityinstead ofnew) inRowAppendRecordBatchBuilder::create_builder, using a fixed capacity of 256 and a 64-byte average data buffer for variable-width types. - Add jemalloc documentation to the crate-level doc comment in
lib.rsand configure it in theexamplescrate (Cargo.toml+example_table.rs). - Remove a stray blank line in
crates/fluss/Cargo.toml.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
crates/fluss/src/record/arrow.rs |
All Arrow builders now use with_capacity(256) instead of new(); variable-width types also pre-allocate data buffers; create_builder updated with a capacity parameter. |
crates/fluss/src/lib.rs |
Crate-level doc comment expanded with a # Performance section recommending jemalloc for Linux production use. |
crates/examples/Cargo.toml |
Adds tikv-jemallocator = "0.6" as a conditional dependency for non-MSVC targets. |
crates/examples/src/example_table.rs |
Activates jemalloc as #[global_allocator] for the example binary. |
crates/fluss/Cargo.toml |
Removes trailing blank line in dev-dependencies section. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Addressed comments |
|
fixed ci/cd, |
luoyuxia
left a comment
There was a problem hiding this comment.
@fresh-borzoni Thanks for the improvement. LGTM
Summary
close #429
DEFAULT_MAX_RECORD(256) capacity to eliminate reallocationsBenchmark results (Linux, glibc, 8 threads, 100K cycles, 13-column schema)
I will address these findings during benchmarking in
Follow-ups