Skip to content

feat(parquet): stream-encode definition/repetition levels incrementally#9447

Merged
alamb merged 7 commits intoapache:mainfrom
HippoBaro:main
Apr 1, 2026
Merged

feat(parquet): stream-encode definition/repetition levels incrementally#9447
alamb merged 7 commits intoapache:mainfrom
HippoBaro:main

Conversation

@HippoBaro
Copy link
Copy Markdown
Contributor

@HippoBaro HippoBaro commented Feb 20, 2026

Which issue does this PR close?

Rationale for this change

When writing a Parquet column with very sparse data, GenericColumnWriter accumulates unbounded memory for definition and repetition levels. The raw i16 values are appended into Vec<i16> sinks on every write_batch call and only RLE-encoded in bulk when a data page is flushed. For a column that is almost entirely nulls, the actual RLE-encoded output can be tiny, yet the intermediate buffer grows linearly with the number of rows.

What changes are included in this PR?

Replace the two raw-level Vec<i16> sinks (def_levels_sink / rep_levels_sink) with streaming LevelEncoder fields (def_levels_encoder / rep_levels_encoder). Behavior is the same, but we keep running RLE-encoded state rather than the full list of rows in memory. Existing logic is reused.

Are these changes tested?

Yes, all tests passing.
Benchmarks show no regression. list_primitive benches improved by 3-5%:

Benchmarking list_primitive/default: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.1s, enable flat sampling, or reduce sample count to 60.
list_primitive/default  time:   [1.2109 ms 1.2171 ms 1.2248 ms]
                        thrpt:  [1.6999 GiB/s 1.7105 GiB/s 1.7194 GiB/s]
                 change:
                        time:   [−3.7197% −2.8848% −2.0036%] (p = 0.00 < 0.05)
                        thrpt:  [+2.0445% +2.9705% +3.8634%]
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe
Benchmarking list_primitive/bloom_filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.5s, enable flat sampling, or reduce sample count to 50.
list_primitive/bloom_filter
                        time:   [1.4405 ms 1.4810 ms 1.5292 ms]
                        thrpt:  [1.3615 GiB/s 1.4058 GiB/s 1.4452 GiB/s]
                 change:
                        time:   [−6.4332% −4.7568% −2.9048%] (p = 0.00 < 0.05)
                        thrpt:  [+2.9917% +4.9944% +6.8755%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
Benchmarking list_primitive/parquet_2: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.3s, enable flat sampling, or reduce sample count to 60.
list_primitive/parquet_2
                        time:   [1.2271 ms 1.2311 ms 1.2362 ms]
                        thrpt:  [1.6841 GiB/s 1.6911 GiB/s 1.6966 GiB/s]
                 change:
                        time:   [−5.8536% −4.9672% −4.1905%] (p = 0.00 < 0.05)
                        thrpt:  [+4.3738% +5.2269% +6.2175%]
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  2 (2.00%) high mild
  3 (3.00%) high severe
list_primitive/zstd     time:   [2.0056 ms 2.0148 ms 2.0262 ms]
                        thrpt:  [1.0275 GiB/s 1.0333 GiB/s 1.0381 GiB/s]
                 change:
                        time:   [−4.7073% −3.6719% −2.6698%] (p = 0.00 < 0.05)
                        thrpt:  [+2.7431% +3.8118% +4.9398%]
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  2 (2.00%) high mild
  10 (10.00%) high severe
list_primitive/zstd_parquet_2
                        time:   [2.0455 ms 2.0730 ms 2.1120 ms]
                        thrpt:  [1009.4 MiB/s 1.0043 GiB/s 1.0178 GiB/s]
                 change:
                        time:   [−5.8626% −3.7672% −1.4196%] (p = 0.00 < 0.05)
                        thrpt:  [+1.4401% +3.9146% +6.2277%]
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  2 (2.00%) high mild
  5 (5.00%) high severe

Benchmarking list_primitive_non_null/default: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 6.6s, enable flat sampling, or reduce sample count to 60.
list_primitive_non_null/default
                        time:   [1.3199 ms 1.3333 ms 1.3504 ms]
                        thrpt:  [1.5384 GiB/s 1.5581 GiB/s 1.5740 GiB/s]
                 change:
                        time:   [−4.1662% −2.3491% −0.7148%] (p = 0.01 < 0.05)
                        thrpt:  [+0.7200% +2.4056% +4.3473%]
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) high mild
  3 (3.00%) high severe
Benchmarking list_primitive_non_null/bloom_filter: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.4s, enable flat sampling, or reduce sample count to 50.
list_primitive_non_null/bloom_filter
                        time:   [1.6567 ms 1.6668 ms 1.6805 ms]
                        thrpt:  [1.2362 GiB/s 1.2464 GiB/s 1.2540 GiB/s]
                 change:
                        time:   [−2.7884% −1.3493% +0.2820%] (p = 0.07 > 0.05)
                        thrpt:  [−0.2812% +1.3677% +2.8684%]
                        No change in performance detected.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe
Benchmarking list_primitive_non_null/parquet_2: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.2s, enable flat sampling, or reduce sample count to 50.
list_primitive_non_null/parquet_2
                        time:   [1.4279 ms 1.4409 ms 1.4551 ms]
                        thrpt:  [1.4277 GiB/s 1.4418 GiB/s 1.4550 GiB/s]
                 change:
                        time:   [−2.0598% −0.9952% −0.1318%] (p = 0.04 < 0.05)
                        thrpt:  [+0.1319% +1.0052% +2.1032%]
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
list_primitive_non_null/zstd
                        time:   [2.6966 ms 2.7358 ms 2.7994 ms]
                        thrpt:  [759.93 MiB/s 777.60 MiB/s 788.89 MiB/s]
                 change:
                        time:   [−3.8379% −2.1418% +0.0785%] (p = 0.03 < 0.05)
                        thrpt:  [−0.0784% +2.1887% +3.9911%]
                        Change within noise threshold.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
list_primitive_non_null/zstd_parquet_2
                        time:   [2.7684 ms 2.7861 ms 2.8099 ms]
                        thrpt:  [757.07 MiB/s 763.55 MiB/s 768.44 MiB/s]
                 change:
                        time:   [−6.4460% −4.1387% −2.1474%] (p = 0.00 < 0.05)
                        thrpt:  [+2.1946% +4.3174% +6.8901%]
                        Performance has improved.

Are there any user-facing changes?

None. Some internal symbols are now unused. I added some #[allow(dead_code)] statements since these were experimental-visible and might be externally relied on.

Previously, the column writer accumulated raw definition and repetition
levels in `Vec<i16>` sinks (`def_levels_sink` / `rep_levels_sink`) and
only RLE-encoded them in bulk at page-flush time.

Replace the two sinks with streaming `LevelEncoder` fields. Levels are
now encoded incrementally as each `write_batch` call arrives, so only
the compact encoded bytes are held in memory at all times. At page
flush, the encoder is consumed and its bytes are written directly into
the page buffer; a fresh encoder is swapped in for the next page.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
@github-actions github-actions bot added the parquet Changes to the parquet crate label Feb 20, 2026
Copy link
Copy Markdown
Contributor

@brunal brunal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small nit

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
@HippoBaro HippoBaro requested a review from brunal February 20, 2026 17:20
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 18, 2026

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
Cloning into '/workspace/arrow-rs-branch'...
main
fatal: refusing to fetch into branch 'refs/heads/main' checked out at '/workspace/arrow-rs-branch'

Copy link
Copy Markdown
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks sound to me. Thanks!

Comment on lines -1158 to -1159

// Reset state.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Reset state.
// Reset state.

/// Computes max buffer size for level encoder/decoder based on encoding, max
/// repetition/definition level and number of total buffered values (includes null
/// values).
#[allow(dead_code)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the dead code be deprecated?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just removing it?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, I didn't notice that the encodings module is experimental...thought this was public. Yes, let's just remove the dead code then :)

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented Mar 19, 2026

I plan to benchmark on my WS since the bot seems tired

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented Mar 19, 2026

run benchmark arrow_writer

@adriangbot
Copy link
Copy Markdown

Benchmark for this request failed.

Last 20 lines of output:

Click to expand
Cloning into '/workspace/arrow-rs-branch'...
main
fatal: refusing to fetch into branch 'refs/heads/main' checked out at '/workspace/arrow-rs-branch'

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented Mar 19, 2026

Ahh, I get it. The benchbot can't handle the remote branch also being named "main".

Not much difference either way on my laptop

Details
group                                     9447                                    main
-----                                     ----                                    ----
bool/bloom_filter                         1.00    157.7±1.83µs     6.7 MB/sec     1.00    157.4±2.00µs     6.7 MB/sec
bool/default                              1.01     67.5±2.00µs    15.7 MB/sec     1.00     66.6±1.55µs    15.9 MB/sec
bool/parquet_2                            1.01     83.1±1.88µs    12.8 MB/sec     1.00     82.1±1.65µs    12.9 MB/sec
bool/zstd                                 1.00     78.5±1.82µs    13.5 MB/sec     1.01     79.6±2.78µs    13.3 MB/sec
bool/zstd_parquet_2                       1.00     92.3±1.55µs    11.5 MB/sec     1.01     92.9±1.71µs    11.4 MB/sec
bool_non_null/bloom_filter                1.00    121.9±1.31µs     4.7 MB/sec     1.01    122.6±1.68µs     4.7 MB/sec
bool_non_null/default                     1.00     29.4±0.48µs    19.4 MB/sec     1.00     29.4±0.64µs    19.5 MB/sec
bool_non_null/parquet_2                   1.00     46.0±0.73µs    12.5 MB/sec     1.01     46.6±1.04µs    12.3 MB/sec
bool_non_null/zstd                        1.00     39.0±0.59µs    14.7 MB/sec     1.01     39.5±0.76µs    14.5 MB/sec
bool_non_null/zstd_parquet_2              1.00     56.5±1.05µs    10.1 MB/sec     1.01     57.0±1.07µs    10.0 MB/sec
float_with_nans/bloom_filter              1.01  1100.0±21.00µs    50.0 MB/sec     1.00  1090.2±15.25µs    50.4 MB/sec
float_with_nans/default                   1.00   712.1±20.32µs    77.2 MB/sec     1.01   718.0±28.49µs    76.5 MB/sec
float_with_nans/parquet_2                 1.01  1001.0±30.26µs    54.9 MB/sec     1.00   995.0±24.92µs    55.2 MB/sec
float_with_nans/zstd                      1.00   907.4±20.16µs    60.6 MB/sec     1.00   902.9±15.18µs    60.9 MB/sec
float_with_nans/zstd_parquet_2            1.01  1190.0±32.39µs    46.2 MB/sec     1.00  1181.1±25.41µs    46.5 MB/sec
list_primitive/bloom_filter               1.02      2.9±0.04ms   736.6 MB/sec     1.00      2.8±0.06ms   750.1 MB/sec
list_primitive/default                    1.00      2.0±0.05ms  1041.7 MB/sec     1.00      2.0±0.04ms  1044.9 MB/sec
list_primitive/parquet_2                  1.00      2.1±0.04ms  1036.1 MB/sec     1.00      2.1±0.04ms  1039.4 MB/sec
list_primitive/zstd                       1.00      3.5±0.06ms   614.6 MB/sec     1.00      3.5±0.08ms   614.2 MB/sec
list_primitive/zstd_parquet_2             1.00      3.6±0.07ms   593.3 MB/sec     1.01      3.6±0.07ms   585.1 MB/sec
list_primitive_non_null/bloom_filter      1.00      3.4±0.17ms   627.5 MB/sec     1.10      3.7±0.10ms   570.8 MB/sec
list_primitive_non_null/default           1.00      2.2±0.06ms   953.4 MB/sec     1.00      2.2±0.05ms   950.7 MB/sec
list_primitive_non_null/parquet_2         1.15      3.0±0.07ms   716.4 MB/sec     1.00      2.6±0.06ms   826.1 MB/sec
list_primitive_non_null/zstd              1.00      6.4±0.12ms   332.0 MB/sec     1.01      6.5±0.29ms   328.3 MB/sec
list_primitive_non_null/zstd_parquet_2    1.01      6.4±0.13ms   330.2 MB/sec     1.00      6.4±0.15ms   331.9 MB/sec
primitive/bloom_filter                    1.01  1688.0±39.53µs   104.2 MB/sec     1.00  1668.9±38.29µs   105.4 MB/sec
primitive/default                         1.00   863.1±15.45µs   203.8 MB/sec     1.00   861.5±18.26µs   204.2 MB/sec
primitive/parquet_2                       1.00   863.8±14.79µs   203.7 MB/sec     1.00   867.1±29.50µs   202.9 MB/sec
primitive/zstd                            1.00  1227.9±24.67µs   143.3 MB/sec     1.00  1230.7±30.77µs   143.0 MB/sec
primitive/zstd_parquet_2                  1.00  1273.0±31.28µs   138.2 MB/sec     1.00  1271.6±38.95µs   138.4 MB/sec
primitive_non_null/bloom_filter           1.00  1561.1±35.74µs   110.5 MB/sec     1.00  1555.0±39.18µs   110.9 MB/sec
primitive_non_null/default                1.00   670.8±20.08µs   257.2 MB/sec     1.00   671.3±13.90µs   257.0 MB/sec
primitive_non_null/parquet_2              1.00   674.1±22.93µs   255.9 MB/sec     1.00   676.5±13.99µs   255.0 MB/sec
primitive_non_null/zstd                   1.01  1070.7±27.28µs   161.1 MB/sec     1.00  1064.8±31.81µs   162.0 MB/sec
primitive_non_null/zstd_parquet_2         1.00  1151.1±49.02µs   149.9 MB/sec     1.01  1159.6±46.01µs   148.8 MB/sec
string/bloom_filter                       1.01  1641.3±35.39µs  1247.8 MB/sec     1.00  1628.6±32.02µs  1257.6 MB/sec
string/default                            1.00  1025.0±24.08µs  1998.1 MB/sec     1.00  1025.1±30.04µs  1998.0 MB/sec
string/parquet_2                          1.00  1028.6±28.34µs  1991.1 MB/sec     1.01  1039.8±35.29µs  1969.7 MB/sec
string/zstd                               1.00      2.8±0.06ms   719.8 MB/sec     1.00      2.8±0.06ms   721.0 MB/sec
string/zstd_parquet_2                     1.00      3.2±0.16ms   636.7 MB/sec     1.01      3.2±0.07ms   632.3 MB/sec
string_and_binary_view/bloom_filter       1.00   739.4±21.56µs   170.7 MB/sec     1.00   737.5±33.35µs   171.1 MB/sec
string_and_binary_view/default            1.01   473.7±12.23µs   266.4 MB/sec     1.00    468.0±8.82µs   269.6 MB/sec
string_and_binary_view/parquet_2          1.01    477.5±7.91µs   264.3 MB/sec     1.00    472.2±8.99µs   267.2 MB/sec
string_and_binary_view/zstd               1.01   765.5±27.71µs   164.8 MB/sec     1.00   757.7±12.94µs   166.5 MB/sec
string_and_binary_view/zstd_parquet_2     1.01   746.7±13.67µs   169.0 MB/sec     1.00   742.1±14.61µs   170.0 MB/sec
string_dictionary/bloom_filter            1.02   854.5±26.17µs  1207.8 MB/sec     1.00   835.9±14.69µs  1234.6 MB/sec
string_dictionary/default                 1.01   526.6±15.16µs  1959.9 MB/sec     1.00   522.1±19.22µs  1976.6 MB/sec
string_dictionary/parquet_2               1.03   539.7±23.54µs  1912.1 MB/sec     1.00   521.7±16.10µs  1978.3 MB/sec
string_dictionary/zstd                    1.04  1629.5±96.16µs   633.3 MB/sec     1.00  1564.0±35.74µs   659.9 MB/sec
string_dictionary/zstd_parquet_2          1.04  1451.9±78.11µs   710.8 MB/sec     1.00  1399.6±31.57µs   737.3 MB/sec
string_non_null/bloom_filter              1.00      2.3±0.12ms   894.3 MB/sec     1.03      2.3±0.07ms   871.6 MB/sec
string_non_null/default                   1.04  1525.9±103.77µs  1341.6 MB/sec    1.00  1471.2±71.91µs  1391.5 MB/sec
string_non_null/parquet_2                 1.00  1440.6±52.50µs  1421.1 MB/sec     1.01  1459.1±57.98µs  1403.0 MB/sec
string_non_null/zstd                      1.00      4.0±0.12ms   511.3 MB/sec     1.01      4.1±0.12ms   505.4 MB/sec
string_non_null/zstd_parquet_2            1.00      4.2±0.14ms   491.8 MB/sec     1.00      4.2±0.11ms   492.3 MB/sec

On my WS the times are all over the place :( The huge regressions can show up on either branch (could be too much else going on during benchmarking)

Details
group                                     9447                                   main
-----                                     ----                                   ----
bool/bloom_filter                         1.01     61.7±0.24µs    17.2 MB/sec    1.00     61.3±0.59µs    17.3 MB/sec
bool/default                              1.01     27.0±0.10µs    39.2 MB/sec    1.00     26.9±0.28µs    39.4 MB/sec
bool/parquet_2                            1.02     31.2±0.31µs    34.0 MB/sec    1.00     30.7±0.21µs    34.5 MB/sec
bool/zstd                                 1.01     32.5±0.23µs    32.6 MB/sec    1.00     32.4±0.36µs    32.8 MB/sec
bool/zstd_parquet_2                       1.01     35.9±0.21µs    29.5 MB/sec    1.00     35.6±0.67µs    29.8 MB/sec
bool_non_null/bloom_filter                1.01     52.0±0.37µs    11.0 MB/sec    1.00     51.5±0.45µs    11.1 MB/sec
bool_non_null/default                     1.00     12.5±0.09µs    45.8 MB/sec    1.00     12.5±0.07µs    45.8 MB/sec
bool_non_null/parquet_2                   1.01     17.8±0.06µs    32.1 MB/sec    1.00     17.6±0.24µs    32.5 MB/sec
bool_non_null/zstd                        1.01     17.4±0.27µs    32.9 MB/sec    1.00     17.1±0.08µs    33.4 MB/sec
bool_non_null/zstd_parquet_2              1.02     22.6±0.09µs    25.3 MB/sec    1.00     22.2±0.24µs    25.7 MB/sec
float_with_nans/bloom_filter              1.00    412.4±6.75µs   133.3 MB/sec    1.00    414.5±1.54µs   132.6 MB/sec
float_with_nans/default                   1.00    243.8±2.62µs   225.5 MB/sec    1.02    247.9±1.34µs   221.7 MB/sec
float_with_nans/parquet_2                 1.00    383.7±1.83µs   143.2 MB/sec    1.02    391.6±5.04µs   140.4 MB/sec
float_with_nans/zstd                      1.00    354.7±3.99µs   155.0 MB/sec    1.01    357.4±1.57µs   153.8 MB/sec
float_with_nans/zstd_parquet_2            1.00    490.2±5.70µs   112.1 MB/sec    1.02    498.8±2.48µs   110.2 MB/sec
list_primitive/bloom_filter               1.00  1256.3±15.07µs  1697.0 MB/sec    1.56   1955.6±9.67µs  1090.1 MB/sec
list_primitive/default                    1.00    948.9±5.49µs     2.2 GB/sec    1.03   975.8±20.29µs     2.1 GB/sec
list_primitive/parquet_2                  1.00    970.7±3.47µs     2.1 GB/sec    1.32  1282.9±17.01µs  1661.8 MB/sec
list_primitive/zstd                       1.00  1611.5±16.26µs  1322.9 MB/sec    1.00   1611.3±9.27µs  1323.1 MB/sec
list_primitive/zstd_parquet_2             1.00   1623.5±8.28µs  1313.2 MB/sec    1.22  1977.4±17.31µs  1078.1 MB/sec
list_primitive_non_null/bloom_filter      1.00  1429.8±25.68µs  1487.9 MB/sec    1.54      2.2±0.01ms   965.6 MB/sec
list_primitive_non_null/default           1.00    972.8±6.84µs     2.1 GB/sec    1.02   992.6±12.82µs     2.1 GB/sec
list_primitive_non_null/parquet_2         1.00   1049.3±7.22µs  2027.3 MB/sec    1.37  1438.6±17.86µs  1478.8 MB/sec
list_primitive_non_null/zstd              1.00      2.1±0.03ms  1009.0 MB/sec    1.21      2.6±0.01ms   831.4 MB/sec
list_primitive_non_null/zstd_parquet_2    1.00      2.1±0.03ms   998.8 MB/sec    1.32      2.8±0.05ms   759.5 MB/sec
primitive/bloom_filter                    1.02  1739.9±56.99µs   101.1 MB/sec    1.00  1701.9±32.84µs   103.4 MB/sec
primitive/default                         1.02    351.6±1.71µs   500.4 MB/sec    1.00    344.4±4.48µs   510.9 MB/sec
primitive/parquet_2                       1.05    402.0±3.50µs   437.7 MB/sec    1.00   382.7±12.07µs   459.7 MB/sec
primitive/zstd                            1.02    512.4±6.91µs   343.4 MB/sec    1.00    502.7±4.61µs   350.0 MB/sec
primitive/zstd_parquet_2                  1.00    480.7±2.43µs   366.0 MB/sec    1.24    596.9±7.32µs   294.8 MB/sec
primitive_non_null/bloom_filter           1.00   690.2±11.84µs   250.0 MB/sec    2.42   1671.9±9.10µs   103.2 MB/sec
primitive_non_null/default                1.00    261.3±1.05µs   660.1 MB/sec    1.00    260.9±2.12µs   661.1 MB/sec
primitive_non_null/parquet_2              1.00    262.1±1.87µs   658.3 MB/sec    1.15    301.1±2.85µs   572.9 MB/sec
primitive_non_null/zstd                   1.02    399.8±4.15µs   431.6 MB/sec    1.00    393.3±2.69µs   438.6 MB/sec
primitive_non_null/zstd_parquet_2         1.00    398.6±5.22µs   432.8 MB/sec    1.30    518.4±4.04µs   332.8 MB/sec
string/bloom_filter                       1.00    663.8±4.72µs     3.0 GB/sec    1.69   1121.8±5.38µs  1825.7 MB/sec
string/default                            1.00    418.0±1.79µs     4.8 GB/sec    1.00    419.7±5.55µs     4.8 GB/sec
string/parquet_2                          1.00    433.1±1.89µs     4.6 GB/sec    1.43    618.4±6.76µs     3.2 GB/sec
string/zstd                               1.00   1194.9±6.74µs  1714.0 MB/sec    1.02  1212.9±28.17µs  1688.6 MB/sec
string/zstd_parquet_2                     1.00  1204.0±17.90µs  1701.1 MB/sec    1.41   1694.4±9.14µs  1208.7 MB/sec
string_and_binary_view/bloom_filter       1.00    296.9±2.99µs   425.0 MB/sec    1.02    304.3±2.02µs   414.7 MB/sec
string_and_binary_view/default            1.00    187.7±0.66µs   672.5 MB/sec    1.05    196.6±3.12µs   641.8 MB/sec
string_and_binary_view/parquet_2          1.00    191.8±2.10µs   657.8 MB/sec    1.05    201.4±0.88µs   626.5 MB/sec
string_and_binary_view/zstd               1.00    333.5±1.86µs   378.4 MB/sec    1.05    348.6±1.60µs   362.0 MB/sec
string_and_binary_view/zstd_parquet_2     1.00    326.6±2.85µs   386.4 MB/sec    1.07    349.3±1.93µs   361.3 MB/sec
string_dictionary/bloom_filter            1.00    331.9±2.10µs     3.0 GB/sec    1.00    330.8±3.34µs     3.0 GB/sec
string_dictionary/default                 1.02    211.2±3.43µs     4.8 GB/sec    1.00    208.0±0.88µs     4.8 GB/sec
string_dictionary/parquet_2               1.01    211.3±2.05µs     4.8 GB/sec    1.00    209.0±1.22µs     4.8 GB/sec
string_dictionary/zstd                    1.00    602.9±6.44µs  1711.7 MB/sec    1.00    601.1±3.78µs  1716.9 MB/sec
string_dictionary/zstd_parquet_2          1.00    597.6±3.65µs  1726.8 MB/sec    1.48   881.7±11.13µs  1170.5 MB/sec
string_non_null/bloom_filter              1.00   902.2±24.38µs     2.2 GB/sec    1.58  1424.6±21.18µs  1436.9 MB/sec
string_non_null/default                   1.00    587.5±6.36µs     3.4 GB/sec    1.01    591.1±3.54µs     3.4 GB/sec
string_non_null/parquet_2                 1.00    584.6±3.27µs     3.4 GB/sec    1.48   865.1±18.32µs     2.3 GB/sec
string_non_null/zstd                      1.00  1634.6±15.53µs  1252.4 MB/sec    1.31      2.1±0.01ms   956.0 MB/sec
string_non_null/zstd_parquet_2            1.00   1630.4±7.24µs  1255.6 MB/sec    1.39      2.3±0.01ms   902.4 MB/sec

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @HippoBaro I agree with @etseidl . I think this one is ready to go. I think we can merge it and address follow ons as another PR.

Let us know what you prefer

max_rep_level,
)[..],
let encoder = mem::replace(
&mut self.rep_levels_encoder,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This path recreates the encoders each time (and thus probably allocates, etc)

It seems like the old code path does the same thing (in encode_levels_v1, ..)

@HippoBaro would you be open to exploring adding a "clear" method to the encoder now that you have it encapsulated to save the allocation?

/// Computes max buffer size for level encoder/decoder based on encoding, max
/// repetition/definition level and number of total buffered values (includes null
/// values).
#[allow(dead_code)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about just removing it?

Previously, flushing a data page would `mem::replace` each level encoder
with a freshly allocated one, consuming the old encoder to get its
buffer. This allocated new internal `Vec`s on every page boundary.

We now preserve the internal state of the encoder and reuse memory
across pages.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
@HippoBaro
Copy link
Copy Markdown
Contributor Author

HippoBaro commented Mar 25, 2026

@alamb @etseidl Thank you both for the review! See 0e28fba, which now preserves internal state across pages. This latest change also adds consume to the set of dead symbols.

Your comment suggests we should address those in a follow-up. I’m happy to open another PR after this one. I can also remove them here if that’s more appropriate. Let me know.

@HippoBaro HippoBaro force-pushed the main branch 3 times, most recently from 757f11c to 0e28fba Compare March 27, 2026 14:57
@HippoBaro
Copy link
Copy Markdown
Contributor Author

Apologies for the force-push there, using my fork's main was a mistake.... The commits are restored as before.

@etseidl
Copy link
Copy Markdown
Contributor

etseidl commented Mar 27, 2026

Thanks @HippoBaro, this looks even better now. I'm fine with simply removing the dead code in this PR.

This does have me wondering if we should remove the BIT_PACKED encoder. We need a decoder for backwards compatibility, but we shouldn't ever be using it for level encoding.

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 31, 2026

This does have me wondering if we should remove the BIT_PACKED encoder. We need a decoder for backwards compatibility, but we shouldn't ever be using it for level encoding.

@alamb alamb mentioned this pull request Mar 31, 2026
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 31, 2026

I made a fake PR to run benchmarks on:

Assuming they all look good I will merge this PR

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 31, 2026

🤔 the benchmarks found an issue: #9636 (comment)

It doesn't seem to be introduced by this PR though. Looking more carefully

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Mar 31, 2026

I made a fake PR to run benchmarks on:

Assuming they all look good I will merge this PR
🤔 the benchmarks found an issue: #9636 (comment)

The issue is

I made a PR to disable this benchmark temporarly:

The `v1`, `v2`, and `max_buffer_size` functions required knowing the
number of values upfront and pre-allocated buffers. All callers have
been
migrated to the streaming variants (`v1_streaming`, `v2_streaming`), so
remove the dead code and switch the last remaining caller in test utils.

Signed-off-by: Hippolyte Barraud <hippolyte.barraud@datadoghq.com>
@HippoBaro
Copy link
Copy Markdown
Contributor Author

@alamb All dead code removed as requested in f2767de

alamb added a commit that referenced this pull request Apr 1, 2026
# Which issue does this PR close?

- Part of #9637
# Rationale for this change

I can't benchmark the arrow-writer changes in
#9447 due to hitting a panic:
- #9637

# What changes are included in this PR?

Temporarily disable the cdc benchmarks until the underlying bug is fixed

# Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

# Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.
-->
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 1, 2026

I am looking into this one trying to get CI to pass / clean (and benchmarks run) so we can merge it in. So close

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 1, 2026

The benchmarks on #9636 show no noticeable difference so I think this one is good to go

Thank you @HippoBaro for bearing with us and thank you @etseidl for the review

@alamb alamb merged commit a05129a into apache:main Apr 1, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parquet: Raw level buffering causes unbounded memory growth for sparse columns

5 participants