[PoC]: Yet another implementation of PARQUET-2249: Introduce IEEE 754 total order by etseidl · Pull Request #9619 · apache/arrow-rs

etseidl · 2026-03-26T18:53:06Z

Which issue does this PR close?

Closes [Parquet] Prototype: PARQUET-2249: Introduce IEEE 754 total order & NaN-counts #514 #8156.

Rationale for this change

This takes the implementation done by @Xuanwo (#8158) and updates it to the new thrift format and recent changes to the original proposal (apache/parquet-format#514).

What changes are included in this PR?

Adds needed thrift structures as well as NaN counts for pages and column chunks.

Are these changes tested?

Yes, new tests added (more may be needed).

Are there any user-facing changes?

Yes, this is a breaking change.

etseidl · 2026-03-26T23:29:45Z

parquet/src/column/writer/encoder.rs

+// For floating point we need to compare NaN values until we encounter a non-NaN
+// value which then becomes the new min/max. After this, only non-NaN values are
+// evaluated. If all values are NaN, then the min/max NaNs as determined by
+// IEEE 754 total order are returned.


This has me a bit worried. I need to do some benchmarking to make sure all the complicated NaN logic isn't killing performance.

Agree, this method is on the hot path. I had a look at optimizing it, but could not get the compiler to generate nice auto-vectorized code for nan-handling yet. I think we can try optimizations in a followup, it would be more important to get the semantics correct first and make sure there are tests for edge cases.
In that regard, about this requirement

If all values are NaN, then the min/max NaNs as determined by
// IEEE 754 total order are returned.

Does the current code correctly distinguish different NaN payloads according to their sign and bit patterns?

(Solved, github was hiding the changes to compare_greater in mod.rs)

etseidl · 2026-03-26T23:32:46Z

parquet/src/column/writer/mod.rs


 fn update_min<T: ParquetValueType>(descr: &ColumnDescriptor, val: &T, min: &mut Option<T>) {
-    update_stat::<T, _>(descr, val, min, |cur| compare_greater(descr, cur, val))
+    if min.is_none() {


Changes here and to update_max also worry me. It's just complicated because we can't simply exclude NaN. In the case of all-NaN we have to properly order them so we get the min and max NaN, but if a non-NaN shows up we have to start over. Same thing in get_min_max().

This is why I prefer the other solution of simply using total order and dealing with the possibility of NaN in the statistics.

etseidl added 5 commits March 24, 2026 14:24

add total order enum variants

3821611

clean up sort_order

fca8c2e

add nan counts from @Xuanwo

032ded6

mod tests for new float ordering

00d2786

add more tests from @Xuanwo

410f365

github-actions bot added the parquet Changes to the parquet crate label Mar 26, 2026

etseidl added the api-change Changes to the arrow API label Mar 26, 2026

etseidl mentioned this pull request Mar 26, 2026

PARQUET-2249: Introduce IEEE 754 total order & NaN-counts apache/parquet-format#514

Open

etseidl added 3 commits March 26, 2026 12:01

clippy and formatting

e0f9d07

add test of mixed all-nan/some-nan/no-nan pages

08b77cb

fix NaN updates across pages

0621be1

etseidl commented Mar 26, 2026

View reviewed changes

fix comment

2da596c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC]: Yet another implementation of PARQUET-2249: Introduce IEEE 754 total order#9619

[PoC]: Yet another implementation of PARQUET-2249: Introduce IEEE 754 total order#9619
etseidl wants to merge 9 commits intoapache:mainfrom
etseidl:total_order_514

etseidl commented Mar 26, 2026

Uh oh!

etseidl Mar 26, 2026

Uh oh!

jhorstmann Mar 30, 2026 •

edited

Loading

Uh oh!

etseidl Mar 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

etseidl commented Mar 26, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

etseidl Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

jhorstmann Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

etseidl Mar 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jhorstmann Mar 30, 2026 •

edited

Loading