Skip to content

perf: remove extra pass used for inverse transforms#114

Merged
Shnatsel merged 1 commit intomainfrom
improve-inverse-transforms
Apr 17, 2026
Merged

perf: remove extra pass used for inverse transforms#114
Shnatsel merged 1 commit intomainfrom
improve-inverse-transforms

Conversation

@smu160
Copy link
Copy Markdown
Member

@smu160 smu160 commented Apr 17, 2026

No description provided.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.80%. Comparing base (f18c455) to head (1bc5dee).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #114      +/-   ##
==========================================
- Coverage   98.80%   98.80%   -0.01%     
==========================================
  Files           9        9              
  Lines        2090     2086       -4     
==========================================
- Hits         2065     2061       -4     
  Misses         25       25              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@smu160
Copy link
Copy Markdown
Member Author

smu160 commented Apr 17, 2026

Benchmark run using cargo bench --bench bench "Inverse" on main followed by this branch (i.e., improve-inverse-transforms)

     Running benches/bench.rs (target/release/deps/bench-bbf0fdc7d8455996)
Inverse f32/PhastFT DIT/64
                        time:   [104.98 ns 105.34 ns 105.64 ns]
                        thrpt:  [605.83 Melem/s 607.56 Melem/s 609.62 Melem/s]
                        thrpt:  [4.5138 GiB/s 4.5267 GiB/s 4.5420 GiB/s]
                 change:
                        time:   [−4.8154% −4.1888% −3.6227%] (p = 0.00 < 0.05)
                        thrpt:  [+3.7589% +4.3720% +5.0590%]
                        Performance has improved.
Inverse f32/RustFFT/64  time:   [82.849 ns 83.078 ns 83.313 ns]
                        thrpt:  [768.19 Melem/s 770.36 Melem/s 772.49 Melem/s]
                        thrpt:  [5.7234 GiB/s 5.7396 GiB/s 5.7555 GiB/s]
                 change:
                        time:   [−0.2569% +0.3474% +0.9639%] (p = 0.28 > 0.05)
                        thrpt:  [−0.9547% −0.3462% +0.2576%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Inverse f32/PhastFT DIT/128
                        time:   [191.54 ns 192.26 ns 192.91 ns]
                        thrpt:  [663.51 Melem/s 665.76 Melem/s 668.28 Melem/s]
                        thrpt:  [4.9435 GiB/s 4.9603 GiB/s 4.9791 GiB/s]
                 change:
                        time:   [−1.9898% −1.2865% −0.5853%] (p = 0.00 < 0.05)
                        thrpt:  [+0.5887% +1.3033% +2.0302%]
                        Change within noise threshold.
Inverse f32/RustFFT/128 time:   [161.96 ns 162.57 ns 163.05 ns]
                        thrpt:  [785.05 Melem/s 787.37 Melem/s 790.30 Melem/s]
                        thrpt:  [5.8490 GiB/s 5.8664 GiB/s 5.8882 GiB/s]
                 change:
                        time:   [−0.4560% +0.1637% +0.7676%] (p = 0.61 > 0.05)
                        thrpt:  [−0.7617% −0.1634% +0.4581%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/256
                        time:   [384.81 ns 386.86 ns 388.49 ns]
                        thrpt:  [658.97 Melem/s 661.73 Melem/s 665.26 Melem/s]
                        thrpt:  [4.9097 GiB/s 4.9303 GiB/s 4.9566 GiB/s]
                 change:
                        time:   [+0.5728% +1.7399% +2.8282%] (p = 0.01 < 0.05)
                        thrpt:  [−2.7504% −1.7102% −0.5696%]
                        Change within noise threshold.
Inverse f32/RustFFT/256 time:   [356.68 ns 358.39 ns 359.98 ns]
                        thrpt:  [711.15 Melem/s 714.32 Melem/s 717.74 Melem/s]
                        thrpt:  [5.2985 GiB/s 5.3221 GiB/s 5.3476 GiB/s]
                 change:
                        time:   [−0.4336% +0.2326% +0.9208%] (p = 0.52 > 0.05)
                        thrpt:  [−0.9124% −0.2320% +0.4355%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/512
                        time:   [739.72 ns 742.29 ns 744.47 ns]
                        thrpt:  [687.74 Melem/s 689.75 Melem/s 692.16 Melem/s]
                        thrpt:  [5.1241 GiB/s 5.1391 GiB/s 5.1570 GiB/s]
                 change:
                        time:   [−2.7715% −1.9877% −1.2448%] (p = 0.00 < 0.05)
                        thrpt:  [+1.2605% +2.0280% +2.8505%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Inverse f32/RustFFT/512 time:   [742.34 ns 745.53 ns 748.44 ns]
                        thrpt:  [684.09 Melem/s 686.76 Melem/s 689.71 Melem/s]
                        thrpt:  [5.0969 GiB/s 5.1167 GiB/s 5.1387 GiB/s]
                 change:
                        time:   [−2.8057% −1.4563% −0.2680%] (p = 0.03 < 0.05)
                        thrpt:  [+0.2687% +1.4778% +2.8867%]
                        Change within noise threshold.
Inverse f32/PhastFT DIT/1024
                        time:   [1.6186 µs 1.6252 µs 1.6306 µs]
                        thrpt:  [627.97 Melem/s 630.08 Melem/s 632.64 Melem/s]
                        thrpt:  [4.6787 GiB/s 4.6945 GiB/s 4.7136 GiB/s]
                 change:
                        time:   [−2.5054% −1.3760% −0.4269%] (p = 0.01 < 0.05)
                        thrpt:  [+0.4287% +1.3952% +2.5698%]
                        Change within noise threshold.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f32/RustFFT/1024
                        time:   [2.0160 µs 2.0399 µs 2.0530 µs]
                        thrpt:  [498.77 Melem/s 502.00 Melem/s 507.95 Melem/s]
                        thrpt:  [3.7162 GiB/s 3.7402 GiB/s 3.7845 GiB/s]
                 change:
                        time:   [−3.8195% +0.4465% +4.9310%] (p = 0.84 > 0.05)
                        thrpt:  [−4.6993% −0.4445% +3.9712%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/2048
                        time:   [3.4517 µs 3.4639 µs 3.4724 µs]
                        thrpt:  [589.79 Melem/s 591.23 Melem/s 593.34 Melem/s]
                        thrpt:  [4.3943 GiB/s 4.4050 GiB/s 4.4207 GiB/s]
                 change:
                        time:   [−3.7818% −2.7880% −1.8185%] (p = 0.00 < 0.05)
                        thrpt:  [+1.8522% +2.8680% +3.9304%]
                        Performance has improved.
Inverse f32/RustFFT/2048
                        time:   [4.4014 µs 4.4729 µs 4.5131 µs]
                        thrpt:  [453.79 Melem/s 457.87 Melem/s 465.31 Melem/s]
                        thrpt:  [3.3810 GiB/s 3.4114 GiB/s 3.4668 GiB/s]
                 change:
                        time:   [−6.1638% −0.1880% +5.6333%] (p = 0.95 > 0.05)
                        thrpt:  [−5.3329% +0.1884% +6.5687%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/4096
                        time:   [8.0130 µs 8.0716 µs 8.1104 µs]
                        thrpt:  [505.03 Melem/s 507.46 Melem/s 511.17 Melem/s]
                        thrpt:  [3.7628 GiB/s 3.7809 GiB/s 3.8085 GiB/s]
                 change:
                        time:   [−3.3384% −1.2498% +0.8140%] (p = 0.25 > 0.05)
                        thrpt:  [−0.8075% +1.2656% +3.4537%]
                        No change in performance detected.
Inverse f32/RustFFT/4096
                        time:   [8.5182 µs 8.5737 µs 8.6099 µs]
                        thrpt:  [475.73 Melem/s 477.74 Melem/s 480.85 Melem/s]
                        thrpt:  [3.5445 GiB/s 3.5594 GiB/s 3.5826 GiB/s]
                 change:
                        time:   [−1.9275% −0.1402% +1.5678%] (p = 0.88 > 0.05)
                        thrpt:  [−1.5436% +0.1404% +1.9654%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/8192
                        time:   [17.325 µs 17.453 µs 17.535 µs]
                        thrpt:  [467.17 Melem/s 469.36 Melem/s 472.85 Melem/s]
                        thrpt:  [3.4807 GiB/s 3.4970 GiB/s 3.5230 GiB/s]
                 change:
                        time:   [−4.2892% −2.0288% +0.3189%] (p = 0.10 > 0.05)
                        thrpt:  [−0.3179% +2.0708% +4.4815%]
                        No change in performance detected.
Inverse f32/RustFFT/8192
                        time:   [17.243 µs 17.341 µs 17.409 µs]
                        thrpt:  [470.56 Melem/s 472.39 Melem/s 475.08 Melem/s]
                        thrpt:  [3.5060 GiB/s 3.5196 GiB/s 3.5396 GiB/s]
                 change:
                        time:   [−1.3353% +0.1470% +1.6487%] (p = 0.85 > 0.05)
                        thrpt:  [−1.6220% −0.1467% +1.3534%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/16384
                        time:   [41.048 µs 41.355 µs 41.555 µs]
                        thrpt:  [394.27 Melem/s 396.18 Melem/s 399.14 Melem/s]
                        thrpt:  [2.9376 GiB/s 2.9518 GiB/s 2.9738 GiB/s]
                 change:
                        time:   [−4.4546% −2.3035% −0.3077%] (p = 0.04 < 0.05)
                        thrpt:  [+0.3087% +2.3578% +4.6623%]
                        Change within noise threshold.
Inverse f32/RustFFT/16384
                        time:   [39.217 µs 39.402 µs 39.548 µs]
                        thrpt:  [414.28 Melem/s 415.82 Melem/s 417.78 Melem/s]
                        thrpt:  [3.0866 GiB/s 3.0981 GiB/s 3.1127 GiB/s]
                 change:
                        time:   [−5.3656% −1.5434% +0.9524%] (p = 0.54 > 0.05)
                        thrpt:  [−0.9434% +1.5676% +5.6698%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/32768
                        time:   [89.950 µs 90.498 µs 90.858 µs]
                        thrpt:  [360.65 Melem/s 362.08 Melem/s 364.29 Melem/s]
                        thrpt:  [2.6871 GiB/s 2.6977 GiB/s 2.7142 GiB/s]
                 change:
                        time:   [−3.3441% −1.5469% +0.3864%] (p = 0.13 > 0.05)
                        thrpt:  [−0.3849% +1.5712% +3.4598%]
                        No change in performance detected.
Inverse f32/RustFFT/32768
                        time:   [84.313 µs 85.057 µs 85.593 µs]
                        thrpt:  [382.84 Melem/s 385.25 Melem/s 388.65 Melem/s]
                        thrpt:  [2.8523 GiB/s 2.8703 GiB/s 2.8957 GiB/s]
                 change:
                        time:   [−1.7249% +0.2592% +2.2033%] (p = 0.80 > 0.05)
                        thrpt:  [−2.1558% −0.2586% +1.7552%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/65536
                        time:   [213.17 µs 213.99 µs 214.75 µs]
                        thrpt:  [305.17 Melem/s 306.25 Melem/s 307.44 Melem/s]
                        thrpt:  [2.2737 GiB/s 2.2818 GiB/s 2.2906 GiB/s]
                 change:
                        time:   [−2.8097% −1.9807% −1.2297%] (p = 0.00 < 0.05)
                        thrpt:  [+1.2450% +2.0207% +2.8909%]
                        Performance has improved.
Inverse f32/RustFFT/65536
                        time:   [192.12 µs 194.24 µs 195.72 µs]
                        thrpt:  [334.85 Melem/s 337.40 Melem/s 341.11 Melem/s]
                        thrpt:  [2.4948 GiB/s 2.5138 GiB/s 2.5415 GiB/s]
                 change:
                        time:   [−1.9066% +0.8367% +3.7532%] (p = 0.57 > 0.05)
                        thrpt:  [−3.6174% −0.8298% +1.9437%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/131072
                        time:   [440.45 µs 443.68 µs 445.64 µs]
                        thrpt:  [294.12 Melem/s 295.42 Melem/s 297.58 Melem/s]
                        thrpt:  [2.1914 GiB/s 2.2011 GiB/s 2.2172 GiB/s]
                 change:
                        time:   [−3.0361% −1.6995% −0.2245%] (p = 0.04 < 0.05)
                        thrpt:  [+0.2250% +1.7289% +3.1311%]
                        Change within noise threshold.
Inverse f32/RustFFT/131072
                        time:   [388.39 µs 392.01 µs 394.25 µs]
                        thrpt:  [332.46 Melem/s 334.36 Melem/s 337.48 Melem/s]
                        thrpt:  [2.4770 GiB/s 2.4912 GiB/s 2.5144 GiB/s]
                 change:
                        time:   [−3.1126% −0.1263% +2.8645%] (p = 0.94 > 0.05)
                        thrpt:  [−2.7848% +0.1264% +3.2126%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/262144
                        time:   [1.0155 ms 1.0274 ms 1.0347 ms]
                        thrpt:  [253.34 Melem/s 255.16 Melem/s 258.13 Melem/s]
                        thrpt:  [1.8876 GiB/s 1.9011 GiB/s 1.9233 GiB/s]
                 change:
                        time:   [−5.6947% −2.0136% +1.9271%] (p = 0.33 > 0.05)
                        thrpt:  [−1.8907% +2.0550% +6.0386%]
                        No change in performance detected.
Inverse f32/RustFFT/262144
                        time:   [850.00 µs 855.50 µs 859.35 µs]
                        thrpt:  [305.05 Melem/s 306.42 Melem/s 308.41 Melem/s]
                        thrpt:  [2.2728 GiB/s 2.2830 GiB/s 2.2978 GiB/s]
                 change:
                        time:   [−2.1950% +0.2882% +2.9513%] (p = 0.83 > 0.05)
                        thrpt:  [−2.8667% −0.2874% +2.2442%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low mild
  1 (5.00%) high mild
Inverse f32/PhastFT DIT/524288
                        time:   [2.1759 ms 2.1998 ms 2.2148 ms]
                        thrpt:  [236.72 Melem/s 238.33 Melem/s 240.96 Melem/s]
                        thrpt:  [1.7637 GiB/s 1.7757 GiB/s 1.7953 GiB/s]
                 change:
                        time:   [−6.7993% −2.6360% +1.6522%] (p = 0.24 > 0.05)
                        thrpt:  [−1.6254% +2.7074% +7.2954%]
                        No change in performance detected.
Inverse f32/RustFFT/524288
                        time:   [1.7782 ms 1.7868 ms 1.7939 ms]
                        thrpt:  [292.26 Melem/s 293.43 Melem/s 294.84 Melem/s]
                        thrpt:  [2.1775 GiB/s 2.1862 GiB/s 2.1968 GiB/s]
                 change:
                        time:   [−1.3720% +0.6762% +2.7677%] (p = 0.55 > 0.05)
                        thrpt:  [−2.6931% −0.6717% +1.3911%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low severe
  1 (5.00%) low mild
Inverse f32/PhastFT DIT/1048576
                        time:   [4.7073 ms 4.7595 ms 4.7981 ms]
                        thrpt:  [218.54 Melem/s 220.31 Melem/s 222.76 Melem/s]
                        thrpt:  [1.6282 GiB/s 1.6414 GiB/s 1.6597 GiB/s]
                 change:
                        time:   [−4.0123% −1.1806% +2.0324%] (p = 0.47 > 0.05)
                        thrpt:  [−1.9919% +1.1947% +4.1800%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Inverse f32/RustFFT/1048576
                        time:   [3.8205 ms 3.8386 ms 3.8656 ms]
                        thrpt:  [271.25 Melem/s 273.16 Melem/s 274.46 Melem/s]
                        thrpt:  [2.0210 GiB/s 2.0352 GiB/s 2.0449 GiB/s]
                 change:
                        time:   [−0.3582% +0.8426% +2.2313%] (p = 0.26 > 0.05)
                        thrpt:  [−2.1826% −0.8356% +0.3594%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  1 (5.00%) high mild
  2 (10.00%) high severe
Inverse f32/PhastFT DIT/2097152
                        time:   [10.431 ms 10.519 ms 10.579 ms]
                        thrpt:  [198.24 Melem/s 199.36 Melem/s 201.06 Melem/s]
                        thrpt:  [1.4770 GiB/s 1.4853 GiB/s 1.4980 GiB/s]
                 change:
                        time:   [−6.5955% −3.4958% −0.0574%] (p = 0.05 > 0.05)
                        thrpt:  [+0.0575% +3.6224% +7.0613%]
                        No change in performance detected.
Found 4 outliers among 20 measurements (20.00%)
  4 (20.00%) low mild
Inverse f32/RustFFT/2097152
                        time:   [8.8552 ms 8.9411 ms 9.0507 ms]
                        thrpt:  [231.71 Melem/s 234.55 Melem/s 236.83 Melem/s]
                        thrpt:  [1.7264 GiB/s 1.7476 GiB/s 1.7645 GiB/s]
                 change:
                        time:   [+0.2666% +1.4538% +2.6653%] (p = 0.03 < 0.05)
                        thrpt:  [−2.5961% −1.4330% −0.2659%]
                        Change within noise threshold.
Benchmarking Inverse f32/PhastFT DIT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 7.3s, enable flat sampling, or reduce sample count to 10.
Inverse f32/PhastFT DIT/4194304
                        time:   [24.328 ms 24.482 ms 24.589 ms]
                        thrpt:  [170.57 Melem/s 171.32 Melem/s 172.41 Melem/s]
                        thrpt:  [1.2709 GiB/s 1.2765 GiB/s 1.2845 GiB/s]
                 change:
                        time:   [−4.6222% −3.0360% −1.4847%] (p = 0.00 < 0.05)
                        thrpt:  [+1.5071% +3.1311% +4.8462%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Benchmarking Inverse f32/RustFFT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 6.5s, enable flat sampling, or reduce sample count to 10.
Inverse f32/RustFFT/4194304
                        time:   [20.714 ms 20.861 ms 21.000 ms]
                        thrpt:  [199.73 Melem/s 201.06 Melem/s 202.48 Melem/s]
                        thrpt:  [1.4881 GiB/s 1.4980 GiB/s 1.5086 GiB/s]
                 change:
                        time:   [−2.8285% −1.5673% −0.3945%] (p = 0.02 < 0.05)
                        thrpt:  [+0.3960% +1.5923% +2.9108%]
                        Change within noise threshold.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) high mild
  1 (5.00%) high severe
Inverse f32/PhastFT DIT/8388608
                        time:   [51.937 ms 52.674 ms 53.650 ms]
                        thrpt:  [156.36 Melem/s 159.26 Melem/s 161.52 Melem/s]
                        thrpt:  [1.1650 GiB/s 1.1865 GiB/s 1.2034 GiB/s]
                 change:
                        time:   [−1.9132% −0.4811% +1.5693%] (p = 0.62 > 0.05)
                        thrpt:  [−1.5451% +0.4834% +1.9505%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  1 (5.00%) high mild
  2 (10.00%) high severe
Inverse f32/RustFFT/8388608
                        time:   [43.708 ms 43.871 ms 44.060 ms]
                        thrpt:  [190.39 Melem/s 191.21 Melem/s 191.93 Melem/s]
                        thrpt:  [1.4185 GiB/s 1.4246 GiB/s 1.4300 GiB/s]
                 change:
                        time:   [−0.7289% −0.0763% +0.5399%] (p = 0.82 > 0.05)
                        thrpt:  [−0.5370% +0.0763% +0.7343%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Inverse f32/PhastFT DIT/16777216
                        time:   [107.79 ms 108.66 ms 109.96 ms]
                        thrpt:  [152.57 Melem/s 154.40 Melem/s 155.65 Melem/s]
                        thrpt:  [1.1367 GiB/s 1.1504 GiB/s 1.1597 GiB/s]
                 change:
                        time:   [−2.3257% −1.3417% −0.0303%] (p = 0.03 < 0.05)
                        thrpt:  [+0.0303% +1.3599% +2.3811%]
                        Change within noise threshold.
Found 4 outliers among 20 measurements (20.00%)
  1 (5.00%) high mild
  3 (15.00%) high severe
Inverse f32/RustFFT/16777216
                        time:   [91.657 ms 92.048 ms 92.514 ms]
                        thrpt:  [181.35 Melem/s 182.27 Melem/s 183.04 Melem/s]
                        thrpt:  [1.3511 GiB/s 1.3580 GiB/s 1.3638 GiB/s]
                 change:
                        time:   [−0.2693% +0.3543% +1.0430%] (p = 0.29 > 0.05)
                        thrpt:  [−1.0323% −0.3530% +0.2700%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) high mild

Inverse f64/PhastFT DIT/64
                        time:   [182.92 ns 183.71 ns 184.41 ns]
                        thrpt:  [347.05 Melem/s 348.38 Melem/s 349.87 Melem/s]
                        thrpt:  [5.1715 GiB/s 5.1913 GiB/s 5.2135 GiB/s]
                 change:
                        time:   [−1.5432% −0.3695% +0.8771%] (p = 0.56 > 0.05)
                        thrpt:  [−0.8695% +0.3709% +1.5673%]
                        No change in performance detected.
Inverse f64/RustFFT/64  time:   [149.02 ns 150.03 ns 150.73 ns]
                        thrpt:  [424.60 Melem/s 426.57 Melem/s 429.47 Melem/s]
                        thrpt:  [6.3270 GiB/s 6.3564 GiB/s 6.3995 GiB/s]
                 change:
                        time:   [+10.610% +12.997% +15.357%] (p = 0.00 < 0.05)
                        thrpt:  [−13.312% −11.502% −9.5927%]
                        Performance has regressed.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/PhastFT DIT/128
                        time:   [349.79 ns 350.91 ns 352.03 ns]
                        thrpt:  [363.60 Melem/s 364.76 Melem/s 365.94 Melem/s]
                        thrpt:  [5.4181 GiB/s 5.4354 GiB/s 5.4529 GiB/s]
                 change:
                        time:   [−1.4380% −0.4522% +0.4211%] (p = 0.37 > 0.05)
                        thrpt:  [−0.4193% +0.4542% +1.4590%]
                        No change in performance detected.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/RustFFT/128 time:   [275.73 ns 276.71 ns 277.71 ns]
                        thrpt:  [460.91 Melem/s 462.58 Melem/s 464.22 Melem/s]
                        thrpt:  [6.8680 GiB/s 6.8929 GiB/s 6.9175 GiB/s]
                 change:
                        time:   [+2.2164% +3.2408% +4.1455%] (p = 0.00 < 0.05)
                        thrpt:  [−3.9805% −3.1391% −2.1684%]
                        Performance has regressed.
Found 3 outliers among 20 measurements (15.00%)
  2 (10.00%) low mild
  1 (5.00%) high mild
Inverse f64/PhastFT DIT/256
                        time:   [670.95 ns 673.22 ns 675.36 ns]
                        thrpt:  [379.06 Melem/s 380.26 Melem/s 381.55 Melem/s]
                        thrpt:  [5.6484 GiB/s 5.6663 GiB/s 5.6855 GiB/s]
                 change:
                        time:   [−2.9335% −1.8078% −0.6886%] (p = 0.00 < 0.05)
                        thrpt:  [+0.6933% +1.8411% +3.0221%]
                        Change within noise threshold.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/RustFFT/256 time:   [639.40 ns 642.52 ns 646.31 ns]
                        thrpt:  [396.09 Melem/s 398.43 Melem/s 400.37 Melem/s]
                        thrpt:  [5.9023 GiB/s 5.9371 GiB/s 5.9660 GiB/s]
                 change:
                        time:   [+1.2415% +3.3998% +5.3466%] (p = 0.00 < 0.05)
                        thrpt:  [−5.0752% −3.2880% −1.2263%]
                        Performance has regressed.
Found 5 outliers among 20 measurements (25.00%)
  2 (10.00%) low severe
  1 (5.00%) low mild
  2 (10.00%) high mild
Inverse f64/PhastFT DIT/512
                        time:   [1.4733 µs 1.4785 µs 1.4837 µs]
                        thrpt:  [345.08 Melem/s 346.30 Melem/s 347.52 Melem/s]
                        thrpt:  [5.1420 GiB/s 5.1602 GiB/s 5.1785 GiB/s]
                 change:
                        time:   [−3.4986% −2.0739% −0.6840%] (p = 0.01 < 0.05)
                        thrpt:  [+0.6887% +2.1178% +3.6254%]
                        Change within noise threshold.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Inverse f64/RustFFT/512 time:   [1.6629 µs 1.6804 µs 1.6902 µs]
                        thrpt:  [302.92 Melem/s 304.68 Melem/s 307.89 Melem/s]
                        thrpt:  [4.5138 GiB/s 4.5401 GiB/s 4.5879 GiB/s]
                 change:
                        time:   [−4.2545% +0.3233% +5.4320%] (p = 0.90 > 0.05)
                        thrpt:  [−5.1521% −0.3222% +4.4436%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Inverse f64/PhastFT DIT/1024
                        time:   [3.0417 µs 3.0548 µs 3.0672 µs]
                        thrpt:  [333.86 Melem/s 335.21 Melem/s 336.66 Melem/s]
                        thrpt:  [4.9748 GiB/s 4.9951 GiB/s 5.0166 GiB/s]
                 change:
                        time:   [−4.5667% −3.4696% −2.3671%] (p = 0.00 < 0.05)
                        thrpt:  [+2.4245% +3.5943% +4.7852%]
                        Performance has improved.
Inverse f64/RustFFT/1024
                        time:   [3.1609 µs 3.1798 µs 3.1921 µs]
                        thrpt:  [320.79 Melem/s 322.03 Melem/s 323.96 Melem/s]
                        thrpt:  [4.7801 GiB/s 4.7986 GiB/s 4.8274 GiB/s]
                 change:
                        time:   [−2.5602% +0.6993% +3.7334%] (p = 0.67 > 0.05)
                        thrpt:  [−3.5990% −0.6945% +2.6275%]
                        No change in performance detected.
Found 4 outliers among 20 measurements (20.00%)
  3 (15.00%) low severe
  1 (5.00%) low mild
Inverse f64/PhastFT DIT/2048
                        time:   [7.2552 µs 7.3052 µs 7.3349 µs]
                        thrpt:  [279.21 Melem/s 280.35 Melem/s 282.28 Melem/s]
                        thrpt:  [4.1606 GiB/s 4.1775 GiB/s 4.2063 GiB/s]
                 change:
                        time:   [−6.1808% −3.6274% −0.9333%] (p = 0.01 < 0.05)
                        thrpt:  [+0.9420% +3.7640% +6.5880%]
                        Change within noise threshold.
Inverse f64/RustFFT/2048
                        time:   [7.2972 µs 7.3438 µs 7.3703 µs]
                        thrpt:  [277.87 Melem/s 278.87 Melem/s 280.65 Melem/s]
                        thrpt:  [4.1406 GiB/s 4.1555 GiB/s 4.1821 GiB/s]
                 change:
                        time:   [−3.4675% −0.4881% +2.5979%] (p = 0.76 > 0.05)
                        thrpt:  [−2.5321% +0.4905% +3.5921%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/4096
                        time:   [15.894 µs 15.989 µs 16.055 µs]
                        thrpt:  [255.12 Melem/s 256.18 Melem/s 257.71 Melem/s]
                        thrpt:  [3.8016 GiB/s 3.8174 GiB/s 3.8402 GiB/s]
                 change:
                        time:   [−5.3503% −3.1560% −0.6185%] (p = 0.02 < 0.05)
                        thrpt:  [+0.6223% +3.2589% +5.6527%]
                        Change within noise threshold.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/RustFFT/4096
                        time:   [14.851 µs 14.927 µs 14.981 µs]
                        thrpt:  [273.42 Melem/s 274.40 Melem/s 275.81 Melem/s]
                        thrpt:  [4.0743 GiB/s 4.0888 GiB/s 4.1098 GiB/s]
                 change:
                        time:   [−2.6367% −0.3836% +1.8375%] (p = 0.75 > 0.05)
                        thrpt:  [−1.8043% +0.3851% +2.7081%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/PhastFT DIT/8192
                        time:   [35.768 µs 35.972 µs 36.095 µs]
                        thrpt:  [226.96 Melem/s 227.73 Melem/s 229.03 Melem/s]
                        thrpt:  [3.3819 GiB/s 3.3934 GiB/s 3.4128 GiB/s]
                 change:
                        time:   [−4.6258% −2.2902% +0.1576%] (p = 0.08 > 0.05)
                        thrpt:  [−0.1574% +2.3439% +4.8502%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/RustFFT/8192
                        time:   [33.324 µs 33.508 µs 33.632 µs]
                        thrpt:  [243.57 Melem/s 244.48 Melem/s 245.83 Melem/s]
                        thrpt:  [3.6295 GiB/s 3.6430 GiB/s 3.6631 GiB/s]
                 change:
                        time:   [−1.9950% −0.2648% +1.4104%] (p = 0.77 > 0.05)
                        thrpt:  [−1.3908% +0.2655% +2.0356%]
                        No change in performance detected.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/PhastFT DIT/16384
                        time:   [83.485 µs 84.063 µs 84.479 µs]
                        thrpt:  [193.94 Melem/s 194.90 Melem/s 196.25 Melem/s]
                        thrpt:  [2.8899 GiB/s 2.9043 GiB/s 2.9244 GiB/s]
                 change:
                        time:   [−4.8579% −2.6144% −0.5509%] (p = 0.03 < 0.05)
                        thrpt:  [+0.5540% +2.6846% +5.1059%]
                        Change within noise threshold.
Found 4 outliers among 20 measurements (20.00%)
  4 (20.00%) low mild
Inverse f64/RustFFT/16384
                        time:   [81.421 µs 82.011 µs 82.411 µs]
                        thrpt:  [198.81 Melem/s 199.78 Melem/s 201.23 Melem/s]
                        thrpt:  [2.9625 GiB/s 2.9769 GiB/s 2.9985 GiB/s]
                 change:
                        time:   [−3.1619% −0.6564% +1.9271%] (p = 0.62 > 0.05)
                        thrpt:  [−1.8906% +0.6607% +3.2651%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/32768
                        time:   [193.79 µs 194.67 µs 195.29 µs]
                        thrpt:  [167.79 Melem/s 168.33 Melem/s 169.09 Melem/s]
                        thrpt:  [2.5003 GiB/s 2.5083 GiB/s 2.5196 GiB/s]
                 change:
                        time:   [−4.5597% −2.9622% −1.4626%] (p = 0.00 < 0.05)
                        thrpt:  [+1.4844% +3.0526% +4.7775%]
                        Performance has improved.
Inverse f64/RustFFT/32768
                        time:   [175.52 µs 177.10 µs 178.21 µs]
                        thrpt:  [183.87 Melem/s 185.02 Melem/s 186.69 Melem/s]
                        thrpt:  [2.7398 GiB/s 2.7571 GiB/s 2.7819 GiB/s]
                 change:
                        time:   [−2.9608% +0.7329% +4.5833%] (p = 0.72 > 0.05)
                        thrpt:  [−4.3824% −0.7276% +3.0511%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/PhastFT DIT/65536
                        time:   [439.48 µs 443.02 µs 445.43 µs]
                        thrpt:  [147.13 Melem/s 147.93 Melem/s 149.12 Melem/s]
                        thrpt:  [2.1924 GiB/s 2.2043 GiB/s 2.2221 GiB/s]
                 change:
                        time:   [−5.8194% −2.8487% +0.2558%] (p = 0.08 > 0.05)
                        thrpt:  [−0.2552% +2.9323% +6.1789%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/RustFFT/65536
                        time:   [364.98 µs 367.61 µs 369.27 µs]
                        thrpt:  [177.48 Melem/s 178.28 Melem/s 179.56 Melem/s]
                        thrpt:  [2.6446 GiB/s 2.6565 GiB/s 2.6757 GiB/s]
                 change:
                        time:   [−3.8768% −0.8055% +2.1075%] (p = 0.61 > 0.05)
                        thrpt:  [−2.0640% +0.8120% +4.0332%]
                        No change in performance detected.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) low mild
Inverse f64/PhastFT DIT/131072
                        time:   [941.70 µs 952.94 µs 961.80 µs]
                        thrpt:  [136.28 Melem/s 137.54 Melem/s 139.19 Melem/s]
                        thrpt:  [2.0307 GiB/s 2.0496 GiB/s 2.0740 GiB/s]
                 change:
                        time:   [−9.0268% −4.0693% +0.3600%] (p = 0.11 > 0.05)
                        thrpt:  [−0.3587% +4.2419% +9.9225%]
                        No change in performance detected.
Inverse f64/RustFFT/131072
                        time:   [771.97 µs 776.40 µs 779.86 µs]
                        thrpt:  [168.07 Melem/s 168.82 Melem/s 169.79 Melem/s]
                        thrpt:  [2.5044 GiB/s 2.5156 GiB/s 2.5301 GiB/s]
                 change:
                        time:   [−10.338% −3.0347% +2.6789%] (p = 0.49 > 0.05)
                        thrpt:  [−2.6090% +3.1297% +11.530%]
                        No change in performance detected.
Found 4 outliers among 20 measurements (20.00%)
  2 (10.00%) low severe
  2 (10.00%) low mild
Inverse f64/PhastFT DIT/262144
                        time:   [2.0272 ms 2.0440 ms 2.0546 ms]
                        thrpt:  [127.59 Melem/s 128.25 Melem/s 129.31 Melem/s]
                        thrpt:  [1.9012 GiB/s 1.9110 GiB/s 1.9269 GiB/s]
                 change:
                        time:   [−4.3709% −1.5443% +1.4175%] (p = 0.32 > 0.05)
                        thrpt:  [−1.3977% +1.5685% +4.5706%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Inverse f64/RustFFT/262144
                        time:   [1.7030 ms 1.7115 ms 1.7185 ms]
                        thrpt:  [152.54 Melem/s 153.16 Melem/s 153.93 Melem/s]
                        thrpt:  [2.2730 GiB/s 2.2823 GiB/s 2.2938 GiB/s]
                 change:
                        time:   [−2.3810% +0.2548% +3.2585%] (p = 0.86 > 0.05)
                        thrpt:  [−3.1557% −0.2541% +2.4391%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low severe
Inverse f64/PhastFT DIT/524288
                        time:   [4.4325 ms 4.4788 ms 4.5140 ms]
                        thrpt:  [116.15 Melem/s 117.06 Melem/s 118.28 Melem/s]
                        thrpt:  [1.7307 GiB/s 1.7443 GiB/s 1.7625 GiB/s]
                 change:
                        time:   [−5.2729% −2.2894% +0.7804%] (p = 0.16 > 0.05)
                        thrpt:  [−0.7744% +2.3430% +5.5664%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) low mild
Inverse f64/RustFFT/524288
                        time:   [3.6458 ms 3.6609 ms 3.6788 ms]
                        thrpt:  [142.52 Melem/s 143.21 Melem/s 143.80 Melem/s]
                        thrpt:  [2.1237 GiB/s 2.1340 GiB/s 2.1429 GiB/s]
                 change:
                        time:   [+0.0388% +1.0905% +2.2356%] (p = 0.06 > 0.05)
                        thrpt:  [−2.1867% −1.0788% −0.0388%]
                        No change in performance detected.
Found 6 outliers among 20 measurements (30.00%)
  2 (10.00%) low severe
  1 (5.00%) low mild
  2 (10.00%) high mild
  1 (5.00%) high severe
Inverse f64/PhastFT DIT/1048576
                        time:   [9.7798 ms 9.8639 ms 9.9224 ms]
                        thrpt:  [105.68 Melem/s 106.30 Melem/s 107.22 Melem/s]
                        thrpt:  [1.5747 GiB/s 1.5841 GiB/s 1.5977 GiB/s]
                 change:
                        time:   [−5.0831% −2.0806% +0.8607%] (p = 0.21 > 0.05)
                        thrpt:  [−0.8533% +2.1248% +5.3553%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low mild
Inverse f64/RustFFT/1048576
                        time:   [7.9859 ms 8.0172 ms 8.0528 ms]
                        thrpt:  [130.21 Melem/s 130.79 Melem/s 131.30 Melem/s]
                        thrpt:  [1.9403 GiB/s 1.9489 GiB/s 1.9566 GiB/s]
                 change:
                        time:   [−0.9873% −0.1894% +0.6230%] (p = 0.66 > 0.05)
                        thrpt:  [−0.6191% +0.1897% +0.9972%]
                        No change in performance detected.
Found 2 outliers among 20 measurements (10.00%)
  2 (10.00%) high mild
Benchmarking Inverse f64/PhastFT DIT/2097152: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 10.
Inverse f64/PhastFT DIT/2097152
                        time:   [22.423 ms 22.580 ms 22.706 ms]
                        thrpt:  [92.361 Melem/s 92.877 Melem/s 93.526 Melem/s]
                        thrpt:  [1.3763 GiB/s 1.3840 GiB/s 1.3937 GiB/s]
                 change:
                        time:   [−4.4515% −2.7032% −0.7173%] (p = 0.01 < 0.05)
                        thrpt:  [+0.7225% +2.7783% +4.6588%]
                        Change within noise threshold.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high severe
Benchmarking Inverse f64/RustFFT/2097152: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.0s, enable flat sampling, or reduce sample count to 10.
Inverse f64/RustFFT/2097152
                        time:   [17.951 ms 18.165 ms 18.312 ms]
                        thrpt:  [114.52 Melem/s 115.45 Melem/s 116.83 Melem/s]
                        thrpt:  [1.7065 GiB/s 1.7203 GiB/s 1.7409 GiB/s]
                 change:
                        time:   [−3.2526% −1.5057% −0.0507%] (p = 0.08 > 0.05)
                        thrpt:  [+0.0507% +1.5287% +3.3620%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/4194304
                        time:   [50.549 ms 50.904 ms 51.256 ms]
                        thrpt:  [81.830 Melem/s 82.396 Melem/s 82.974 Melem/s]
                        thrpt:  [1.2194 GiB/s 1.2278 GiB/s 1.2364 GiB/s]
                 change:
                        time:   [−0.4467% +0.6978% +1.7297%] (p = 0.23 > 0.05)
                        thrpt:  [−1.7003% −0.6930% +0.4487%]
                        No change in performance detected.
Found 6 outliers among 20 measurements (30.00%)
  4 (20.00%) low mild
  2 (10.00%) high mild
Inverse f64/RustFFT/4194304
                        time:   [39.991 ms 40.251 ms 40.511 ms]
                        thrpt:  [103.54 Melem/s 104.20 Melem/s 104.88 Melem/s]
                        thrpt:  [1.5428 GiB/s 1.5527 GiB/s 1.5628 GiB/s]
                 change:
                        time:   [+1.8026% +2.6178% +3.3784%] (p = 0.00 < 0.05)
                        thrpt:  [−3.2680% −2.5511% −1.7707%]
                        Performance has regressed.
Inverse f64/PhastFT DIT/8388608
                        time:   [105.81 ms 106.67 ms 107.59 ms]
                        thrpt:  [77.967 Melem/s 78.639 Melem/s 79.278 Melem/s]
                        thrpt:  [1.1618 GiB/s 1.1718 GiB/s 1.1813 GiB/s]
                 change:
                        time:   [+0.1588% +1.1456% +2.1239%] (p = 0.03 < 0.05)
                        thrpt:  [−2.0798% −1.1326% −0.1586%]
                        Change within noise threshold.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) high mild
Inverse f64/RustFFT/8388608
                        time:   [80.531 ms 81.139 ms 81.799 ms]
                        thrpt:  [102.55 Melem/s 103.39 Melem/s 104.17 Melem/s]
                        thrpt:  [1.5281 GiB/s 1.5406 GiB/s 1.5522 GiB/s]
                 change:
                        time:   [+0.7409% +1.6107% +2.5218%] (p = 0.00 < 0.05)
                        thrpt:  [−2.4598% −1.5852% −0.7354%]
                        Change within noise threshold.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Benchmarking Inverse f64/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 5.4s, or reduce sample count to 10.
Inverse f64/PhastFT DIT/16777216
                        time:   [217.90 ms 220.08 ms 222.38 ms]
                        thrpt:  [75.444 Melem/s 76.232 Melem/s 76.994 Melem/s]
                        thrpt:  [1.1242 GiB/s 1.1359 GiB/s 1.1473 GiB/s]
                 change:
                        time:   [−1.1650% −0.0002% +1.2748%] (p = 1.00 > 0.05)
                        thrpt:  [−1.2587% +0.0002% +1.1787%]
                        No change in performance detected.
Inverse f64/RustFFT/16777216
                        time:   [169.85 ms 170.73 ms 171.71 ms]
                        thrpt:  [97.708 Melem/s 98.270 Melem/s 98.777 Melem/s]
                        thrpt:  [1.4560 GiB/s 1.4643 GiB/s 1.4719 GiB/s]
                 change:
                        time:   [−0.1602% +0.4821% +1.1261%] (p = 0.17 > 0.05)
                        thrpt:  [−1.1136% −0.4798% +0.1605%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild

@smu160 smu160 marked this pull request as ready for review April 17, 2026 18:00
@Shnatsel
Copy link
Copy Markdown
Collaborator

Seems like a small improvement but maybe noise. Doesn't regress for sure

Forward f32/PhastFT DIT/64
                        time:   [105.27 ns 110.60 ns 113.66 ns]
                        thrpt:  [563.07 Melem/s 578.66 Melem/s 607.96 Melem/s]
                        thrpt:  [4.1952 GiB/s 4.3113 GiB/s 4.5296 GiB/s]
                 change:
                        time:   [−10.529% −5.2738% +0.4013%] (p = 0.08 > 0.05)
                        thrpt:  [−0.3997% +5.5674% +11.769%]
                        No change in performance detected.
Forward f32/PhastFT DIT/128
                        time:   [178.33 ns 188.68 ns 194.69 ns]
                        thrpt:  [657.44 Melem/s 678.41 Melem/s 717.79 Melem/s]
                        thrpt:  [4.8983 GiB/s 5.0545 GiB/s 5.3479 GiB/s]
                 change:
                        time:   [−9.4065% −3.2530% +3.7760%] (p = 0.36 > 0.05)
                        thrpt:  [−3.6386% +3.3624% +10.383%]
                        No change in performance detected.
Forward f32/PhastFT DIT/256
                        time:   [319.11 ns 338.64 ns 349.73 ns]
                        thrpt:  [731.99 Melem/s 755.97 Melem/s 802.22 Melem/s]
                        thrpt:  [5.4538 GiB/s 5.6324 GiB/s 5.9770 GiB/s]
                 change:
                        time:   [−8.9533% −1.8713% +5.6762%] (p = 0.63 > 0.05)
                        thrpt:  [−5.3714% +1.9069% +9.8338%]
                        No change in performance detected.
Forward f32/PhastFT DIT/512
                        time:   [632.22 ns 675.57 ns 700.00 ns]
                        thrpt:  [731.43 Melem/s 757.88 Melem/s 809.84 Melem/s]
                        thrpt:  [5.4495 GiB/s 5.6467 GiB/s 6.0338 GiB/s]
                 change:
                        time:   [−9.7190% −1.5221% +7.4845%] (p = 0.73 > 0.05)
                        thrpt:  [−6.9633% +1.5457% +10.765%]
                        No change in performance detected.
Forward f32/PhastFT DIT/1024
                        time:   [1.5119 µs 1.6416 µs 1.7157 µs]
                        thrpt:  [596.84 Melem/s 623.76 Melem/s 677.28 Melem/s]
                        thrpt:  [4.4468 GiB/s 4.6474 GiB/s 5.0462 GiB/s]
                 change:
                        time:   [−10.610% +1.2294% +14.313%] (p = 0.85 > 0.05)
                        thrpt:  [−12.521% −1.2145% +11.870%]
                        No change in performance detected.
Forward f32/PhastFT DIT/2048
                        time:   [3.2538 µs 3.5799 µs 3.7714 µs]
                        thrpt:  [543.04 Melem/s 572.09 Melem/s 629.42 Melem/s]
                        thrpt:  [4.0459 GiB/s 4.2624 GiB/s 4.6896 GiB/s]
                 change:
                        time:   [−11.843% +0.9028% +15.287%] (p = 0.89 > 0.05)
                        thrpt:  [−13.260% −0.8947% +13.433%]
                        No change in performance detected.
Forward f32/PhastFT DIT/4096
                        time:   [6.8302 µs 7.4484 µs 7.8092 µs]
                        thrpt:  [524.51 Melem/s 549.92 Melem/s 599.69 Melem/s]
                        thrpt:  [3.9079 GiB/s 4.0972 GiB/s 4.4680 GiB/s]
                 change:
                        time:   [−10.808% +0.6789% +13.504%] (p = 0.91 > 0.05)
                        thrpt:  [−11.897% −0.6743% +12.117%]
                        No change in performance detected.
Forward f32/PhastFT DIT/8192
                        time:   [14.184 µs 15.318 µs 15.982 µs]
                        thrpt:  [512.58 Melem/s 534.81 Melem/s 577.56 Melem/s]
                        thrpt:  [3.8190 GiB/s 3.9846 GiB/s 4.3031 GiB/s]
                 change:
                        time:   [−9.0964% +1.1956% +12.810%] (p = 0.83 > 0.05)
                        thrpt:  [−11.355% −1.1815% +10.007%]
                        No change in performance detected.
Forward f32/PhastFT DIT/16384
                        time:   [33.647 µs 35.859 µs 37.145 µs]
                        thrpt:  [441.08 Melem/s 456.91 Melem/s 486.94 Melem/s]
                        thrpt:  [3.2863 GiB/s 3.4042 GiB/s 3.6280 GiB/s]
                 change:
                        time:   [−5.9358% +1.8447% +10.114%] (p = 0.66 > 0.05)
                        thrpt:  [−9.1851% −1.8113% +6.3104%]
                        No change in performance detected.
Forward f32/PhastFT DIT/32768
                        time:   [69.818 µs 75.361 µs 78.674 µs]
                        thrpt:  [416.50 Melem/s 434.81 Melem/s 469.33 Melem/s]
                        thrpt:  [3.1032 GiB/s 3.2396 GiB/s 3.4968 GiB/s]
                 change:
                        time:   [−8.2072% +0.3083% +9.2415%] (p = 0.95 > 0.05)
                        thrpt:  [−8.4597% −0.3073% +8.9409%]
                        No change in performance detected.
Forward f32/PhastFT DIT/65536
                        time:   [127.70 µs 135.89 µs 140.56 µs]
                        thrpt:  [466.25 Melem/s 482.27 Melem/s 513.21 Melem/s]
                        thrpt:  [3.4738 GiB/s 3.5932 GiB/s 3.8237 GiB/s]
                 change:
                        time:   [−10.446% −3.6688% +3.4244%] (p = 0.34 > 0.05)
                        thrpt:  [−3.3110% +3.8086% +11.665%]
                        No change in performance detected.
Forward f32/PhastFT DIT/131072
                        time:   [222.05 µs 236.17 µs 243.92 µs]
                        thrpt:  [537.36 Melem/s 555.00 Melem/s 590.29 Melem/s]
                        thrpt:  [4.0037 GiB/s 4.1351 GiB/s 4.3980 GiB/s]
                 change:
                        time:   [−10.431% −0.9761% +9.3402%] (p = 0.85 > 0.05)
                        thrpt:  [−8.5423% +0.9857% +11.646%]
                        No change in performance detected.
Forward f32/PhastFT DIT/262144
                        time:   [420.44 µs 445.97 µs 460.01 µs]
                        thrpt:  [569.87 Melem/s 587.80 Melem/s 623.50 Melem/s]
                        thrpt:  [4.2458 GiB/s 4.3795 GiB/s 4.6454 GiB/s]
                 change:
                        time:   [−10.055% −0.8526% +9.3419%] (p = 0.87 > 0.05)
                        thrpt:  [−8.5437% +0.8599% +11.179%]
                        No change in performance detected.
Forward f32/PhastFT DIT/524288
                        time:   [844.79 µs 874.66 µs 891.31 µs]
                        thrpt:  [588.22 Melem/s 599.42 Melem/s 620.61 Melem/s]
                        thrpt:  [4.3826 GiB/s 4.4660 GiB/s 4.6239 GiB/s]
                 change:
                        time:   [−10.644% +0.0085% +12.446%] (p = 1.00 > 0.05)
                        thrpt:  [−11.069% −0.0085% +11.912%]
                        No change in performance detected.
Forward f32/PhastFT DIT/1048576
                        time:   [1.8279 ms 1.8708 ms 1.9009 ms]
                        thrpt:  [551.63 Melem/s 560.49 Melem/s 573.66 Melem/s]
                        thrpt:  [4.1099 GiB/s 4.1760 GiB/s 4.2741 GiB/s]
                 change:
                        time:   [−11.434% −1.7813% +9.3018%] (p = 0.73 > 0.05)
                        thrpt:  [−8.5102% +1.8136% +12.910%]
                        No change in performance detected.
Forward f32/PhastFT DIT/2097152
                        time:   [3.9793 ms 4.0358 ms 4.0940 ms]
                        thrpt:  [512.26 Melem/s 519.64 Melem/s 527.02 Melem/s]
                        thrpt:  [3.8166 GiB/s 3.8716 GiB/s 3.9266 GiB/s]
                 change:
                        time:   [−9.6296% +0.6791% +12.264%] (p = 0.90 > 0.05)
                        thrpt:  [−10.924% −0.6745% +10.656%]
                        No change in performance detected.
Found 5 outliers among 20 measurements (25.00%)
  5 (25.00%) low mild
Benchmarking Forward f32/PhastFT DIT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 7.6s, enable flat sampling, or reduce sample count to 10.
Forward f32/PhastFT DIT/4194304
                        time:   [8.8574 ms 8.9357 ms 9.0106 ms]
                        thrpt:  [465.49 Melem/s 469.39 Melem/s 473.54 Melem/s]
                        thrpt:  [3.4681 GiB/s 3.4972 GiB/s 3.5281 GiB/s]
                 change:
                        time:   [−4.7910% −1.2246% +2.2222%] (p = 0.54 > 0.05)
                        thrpt:  [−2.1739% +1.2398% +5.0321%]
                        No change in performance detected.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) low mild
  1 (5.00%) high severe
Forward f32/PhastFT DIT/8388608
                        time:   [22.850 ms 23.018 ms 23.198 ms]
                        thrpt:  [361.60 Melem/s 364.43 Melem/s 367.11 Melem/s]
                        thrpt:  [2.6942 GiB/s 2.7153 GiB/s 2.7352 GiB/s]
                 change:
                        time:   [−0.3485% +0.6181% +1.5569%] (p = 0.22 > 0.05)
                        thrpt:  [−1.5331% −0.6143% +0.3497%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Forward f32/PhastFT DIT/16777216
                        time:   [53.981 ms 54.265 ms 54.580 ms]
                        thrpt:  [307.39 Melem/s 309.17 Melem/s 310.80 Melem/s]
                        thrpt:  [2.2902 GiB/s 2.3035 GiB/s 2.3156 GiB/s]
                 change:
                        time:   [−8.3028% −7.2228% −6.0854%] (p = 0.00 < 0.05)
                        thrpt:  [+6.4797% +7.7851% +9.0546%]
                        Performance has improved.

Inverse f32/PhastFT DIT/64
                        time:   [110.01 ns 117.19 ns 122.02 ns]
                        thrpt:  [524.50 Melem/s 546.11 Melem/s 581.74 Melem/s]
                        thrpt:  [3.9078 GiB/s 4.0689 GiB/s 4.3343 GiB/s]
                 change:
                        time:   [−10.367% −6.1947% −2.2789%] (p = 0.01 < 0.05)
                        thrpt:  [+2.3320% +6.6038% +11.567%]
                        Performance has improved.
Found 5 outliers among 20 measurements (25.00%)
  5 (25.00%) high mild
Inverse f32/PhastFT DIT/128
                        time:   [183.96 ns 198.98 ns 209.16 ns]
                        thrpt:  [611.98 Melem/s 643.27 Melem/s 695.81 Melem/s]
                        thrpt:  [4.5596 GiB/s 4.7928 GiB/s 5.1842 GiB/s]
                 change:
                        time:   [−8.2455% −2.5484% +4.1173%] (p = 0.42 > 0.05)
                        thrpt:  [−3.9545% +2.6150% +8.9865%]
                        No change in performance detected.
Found 5 outliers among 20 measurements (25.00%)
  5 (25.00%) high mild
Inverse f32/PhastFT DIT/256
                        time:   [330.17 ns 356.05 ns 372.17 ns]
                        thrpt:  [687.86 Melem/s 719.00 Melem/s 775.35 Melem/s]
                        thrpt:  [5.1250 GiB/s 5.3570 GiB/s 5.7768 GiB/s]
                 change:
                        time:   [−8.6178% −1.9669% +4.2634%] (p = 0.57 > 0.05)
                        thrpt:  [−4.0891% +2.0064% +9.4305%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/512
                        time:   [728.88 ns 800.36 ns 847.22 ns]
                        thrpt:  [604.33 Melem/s 639.71 Melem/s 702.45 Melem/s]
                        thrpt:  [4.5026 GiB/s 4.7662 GiB/s 5.2336 GiB/s]
                 change:
                        time:   [−1.8055% +6.6791% +16.329%] (p = 0.17 > 0.05)
                        thrpt:  [−14.037% −6.2609% +1.8387%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/1024
                        time:   [1.5415 µs 1.7152 µs 1.8362 µs]
                        thrpt:  [557.68 Melem/s 597.03 Melem/s 664.28 Melem/s]
                        thrpt:  [4.1550 GiB/s 4.4482 GiB/s 4.9493 GiB/s]
                 change:
                        time:   [−9.7390% −0.7493% +10.026%] (p = 0.89 > 0.05)
                        thrpt:  [−9.1128% +0.7550% +10.790%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/2048
                        time:   [3.2959 µs 3.6615 µs 3.9153 µs]
                        thrpt:  [523.07 Melem/s 559.34 Melem/s 621.37 Melem/s]
                        thrpt:  [3.8972 GiB/s 4.1674 GiB/s 4.6296 GiB/s]
                 change:
                        time:   [−9.9660% −0.1279% +11.216%] (p = 0.98 > 0.05)
                        thrpt:  [−10.085% +0.1281% +11.069%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/4096
                        time:   [6.8429 µs 7.5486 µs 8.0353 µs]
                        thrpt:  [509.75 Melem/s 542.62 Melem/s 598.58 Melem/s]
                        thrpt:  [3.7980 GiB/s 4.0428 GiB/s 4.4597 GiB/s]
                 change:
                        time:   [−10.193% −0.7970% +10.143%] (p = 0.88 > 0.05)
                        thrpt:  [−9.2089% +0.8034% +11.349%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/8192
                        time:   [14.215 µs 15.461 µs 16.333 µs]
                        thrpt:  [501.57 Melem/s 529.84 Melem/s 576.27 Melem/s]
                        thrpt:  [3.7370 GiB/s 3.9476 GiB/s 4.2936 GiB/s]
                 change:
                        time:   [−8.7996% −0.4356% +8.5698%] (p = 0.92 > 0.05)
                        thrpt:  [−7.8934% +0.4375% +9.6486%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/16384
                        time:   [32.634 µs 35.153 µs 36.963 µs]
                        thrpt:  [443.26 Melem/s 466.07 Melem/s 502.05 Melem/s]
                        thrpt:  [3.3025 GiB/s 3.4725 GiB/s 3.7405 GiB/s]
                 change:
                        time:   [−7.7197% −1.9848% +4.2377%] (p = 0.53 > 0.05)
                        thrpt:  [−4.0654% +2.0250% +8.3655%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Inverse f32/PhastFT DIT/32768
                        time:   [68.115 µs 74.108 µs 78.436 µs]
                        thrpt:  [417.77 Melem/s 442.17 Melem/s 481.07 Melem/s]
                        thrpt:  [3.1126 GiB/s 3.2944 GiB/s 3.5842 GiB/s]
                 change:
                        time:   [−6.1479% −0.2081% +6.4329%] (p = 0.95 > 0.05)
                        thrpt:  [−6.0441% +0.2085% +6.5506%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) high mild
Inverse f32/PhastFT DIT/65536
                        time:   [128.01 µs 138.07 µs 145.54 µs]
                        thrpt:  [450.30 Melem/s 474.67 Melem/s 511.95 Melem/s]
                        thrpt:  [3.3550 GiB/s 3.5366 GiB/s 3.8143 GiB/s]
                 change:
                        time:   [−5.6247% +0.4250% +7.0684%] (p = 0.89 > 0.05)
                        thrpt:  [−6.6017% −0.4232% +5.9599%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/131072
                        time:   [225.96 µs 245.56 µs 257.99 µs]
                        thrpt:  [508.05 Melem/s 533.77 Melem/s 580.06 Melem/s]
                        thrpt:  [3.7853 GiB/s 3.9769 GiB/s 4.3218 GiB/s]
                 change:
                        time:   [−11.642% −4.1096% +4.0524%] (p = 0.32 > 0.05)
                        thrpt:  [−3.8945% +4.2857% +13.176%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/262144
                        time:   [425.21 µs 458.59 µs 478.35 µs]
                        thrpt:  [548.01 Melem/s 571.63 Melem/s 616.50 Melem/s]
                        thrpt:  [4.0830 GiB/s 4.2590 GiB/s 4.5933 GiB/s]
                 change:
                        time:   [−10.661% −3.0742% +5.1482%] (p = 0.48 > 0.05)
                        thrpt:  [−4.8961% +3.1717% +11.934%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/524288
                        time:   [819.08 µs 882.74 µs 921.03 µs]
                        thrpt:  [569.24 Melem/s 593.93 Melem/s 640.09 Melem/s]
                        thrpt:  [4.2412 GiB/s 4.4251 GiB/s 4.7691 GiB/s]
                 change:
                        time:   [−16.412% −6.4084% +3.8027%] (p = 0.24 > 0.05)
                        thrpt:  [−3.6634% +6.8472% +19.634%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/1048576
                        time:   [1.8215 ms 1.9116 ms 1.9599 ms]
                        thrpt:  [535.02 Melem/s 548.54 Melem/s 575.65 Melem/s]
                        thrpt:  [3.9862 GiB/s 4.0870 GiB/s 4.2889 GiB/s]
                 change:
                        time:   [−14.316% −5.7023% +4.1753%] (p = 0.25 > 0.05)
                        thrpt:  [−4.0079% +6.0472% +16.707%]
                        No change in performance detected.
Inverse f32/PhastFT DIT/2097152
                        time:   [4.2070 ms 4.2642 ms 4.3317 ms]
                        thrpt:  [484.14 Melem/s 491.81 Melem/s 498.49 Melem/s]
                        thrpt:  [3.6071 GiB/s 3.6643 GiB/s 3.7140 GiB/s]
                 change:
                        time:   [−11.671% −2.1890% +8.9465%] (p = 0.68 > 0.05)
                        thrpt:  [−8.2118% +2.2379% +13.213%]
                        No change in performance detected.
Found 5 outliers among 20 measurements (25.00%)
  5 (25.00%) low mild
Benchmarking Inverse f32/PhastFT DIT/4194304: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 7.7s, enable flat sampling, or reduce sample count to 10.
Inverse f32/PhastFT DIT/4194304
                        time:   [9.5766 ms 9.6662 ms 9.7602 ms]
                        thrpt:  [429.73 Melem/s 433.91 Melem/s 437.98 Melem/s]
                        thrpt:  [3.2018 GiB/s 3.2329 GiB/s 3.2632 GiB/s]
                 change:
                        time:   [−7.5255% −4.3609% −1.3314%] (p = 0.01 < 0.05)
                        thrpt:  [+1.3494% +4.5597% +8.1380%]
                        Performance has improved.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) low mild
  1 (5.00%) high severe
Inverse f32/PhastFT DIT/8388608
                        time:   [24.806 ms 24.923 ms 25.044 ms]
                        thrpt:  [334.96 Melem/s 336.57 Melem/s 338.17 Melem/s]
                        thrpt:  [2.4956 GiB/s 2.5077 GiB/s 2.5196 GiB/s]
                 change:
                        time:   [−3.3570% −2.6878% −1.9007%] (p = 0.00 < 0.05)
                        thrpt:  [+1.9375% +2.7620% +3.4736%]
                        Performance has improved.
Inverse f32/PhastFT DIT/16777216
                        time:   [58.300 ms 58.642 ms 59.003 ms]
                        thrpt:  [284.34 Melem/s 286.10 Melem/s 287.77 Melem/s]
                        thrpt:  [2.1185 GiB/s 2.1316 GiB/s 2.1441 GiB/s]
                 change:
                        time:   [−17.040% −16.137% −15.216%] (p = 0.00 < 0.05)
                        thrpt:  [+17.946% +19.242% +20.540%]
                        Performance has improved.

Forward f64/PhastFT DIT/64
                        time:   [145.77 ns 156.71 ns 162.97 ns]
                        thrpt:  [392.72 Melem/s 408.41 Melem/s 439.04 Melem/s]
                        thrpt:  [5.8520 GiB/s 6.0858 GiB/s 6.5422 GiB/s]
                 change:
                        time:   [−9.9697% −1.0042% +8.7144%] (p = 0.84 > 0.05)
                        thrpt:  [−8.0159% +1.0144% +11.074%]
                        No change in performance detected.
Forward f64/PhastFT DIT/128
                        time:   [260.76 ns 281.74 ns 293.74 ns]
                        thrpt:  [435.76 Melem/s 454.31 Melem/s 490.88 Melem/s]
                        thrpt:  [6.4933 GiB/s 6.7698 GiB/s 7.3147 GiB/s]
                 change:
                        time:   [−11.727% −1.4644% +9.3682%] (p = 0.79 > 0.05)
                        thrpt:  [−8.5657% +1.4862% +13.285%]
                        No change in performance detected.
Forward f64/PhastFT DIT/256
                        time:   [556.40 ns 603.01 ns 629.52 ns]
                        thrpt:  [406.66 Melem/s 424.54 Melem/s 460.10 Melem/s]
                        thrpt:  [6.0597 GiB/s 6.3261 GiB/s 6.8561 GiB/s]
                 change:
                        time:   [−10.726% +0.2047% +13.325%] (p = 0.97 > 0.05)
                        thrpt:  [−11.758% −0.2043% +12.015%]
                        No change in performance detected.
Forward f64/PhastFT DIT/512
                        time:   [1.3274 µs 1.4704 µs 1.5559 µs]
                        thrpt:  [329.07 Melem/s 348.21 Melem/s 385.71 Melem/s]
                        thrpt:  [4.9035 GiB/s 5.1888 GiB/s 5.7475 GiB/s]
                 change:
                        time:   [−13.261% +1.1826% +17.384%] (p = 0.88 > 0.05)
                        thrpt:  [−14.810% −1.1688% +15.288%]
                        No change in performance detected.
Forward f64/PhastFT DIT/1024
                        time:   [2.8902 µs 3.1973 µs 3.3779 µs]
                        thrpt:  [303.15 Melem/s 320.27 Melem/s 354.30 Melem/s]
                        thrpt:  [4.5173 GiB/s 4.7724 GiB/s 5.2795 GiB/s]
                 change:
                        time:   [−12.443% +1.6781% +17.989%] (p = 0.83 > 0.05)
                        thrpt:  [−15.246% −1.6504% +14.211%]
                        No change in performance detected.
Forward f64/PhastFT DIT/2048
                        time:   [6.0778 µs 6.6955 µs 7.0502 µs]
                        thrpt:  [290.49 Melem/s 305.88 Melem/s 336.97 Melem/s]
                        thrpt:  [4.3286 GiB/s 4.5579 GiB/s 5.0212 GiB/s]
                 change:
                        time:   [−11.056% +2.5735% +17.994%] (p = 0.73 > 0.05)
                        thrpt:  [−15.250% −2.5089% +12.430%]
                        No change in performance detected.
Forward f64/PhastFT DIT/4096
                        time:   [12.947 µs 14.187 µs 14.915 µs]
                        thrpt:  [274.62 Melem/s 288.72 Melem/s 316.36 Melem/s]
                        thrpt:  [4.0921 GiB/s 4.3023 GiB/s 4.7141 GiB/s]
                 change:
                        time:   [−8.9237% +3.1126% +16.665%] (p = 0.64 > 0.05)
                        thrpt:  [−14.285% −3.0187% +9.7980%]
                        No change in performance detected.
Forward f64/PhastFT DIT/8192
                        time:   [27.909 µs 30.586 µs 32.208 µs]
                        thrpt:  [254.35 Melem/s 267.83 Melem/s 293.52 Melem/s]
                        thrpt:  [3.7901 GiB/s 3.9910 GiB/s 4.3738 GiB/s]
                 change:
                        time:   [−9.9286% +1.4388% +13.843%] (p = 0.82 > 0.05)
                        thrpt:  [−12.160% −1.4184% +11.023%]
                        No change in performance detected.
Forward f64/PhastFT DIT/16384
                        time:   [62.628 µs 67.762 µs 70.930 µs]
                        thrpt:  [230.99 Melem/s 241.79 Melem/s 261.61 Melem/s]
                        thrpt:  [3.4420 GiB/s 3.6029 GiB/s 3.8982 GiB/s]
                 change:
                        time:   [−7.3638% +2.4009% +13.246%] (p = 0.66 > 0.05)
                        thrpt:  [−11.697% −2.3446% +7.9492%]
                        No change in performance detected.
Forward f64/PhastFT DIT/32768
                        time:   [123.60 µs 134.60 µs 141.10 µs]
                        thrpt:  [232.23 Melem/s 243.45 Melem/s 265.12 Melem/s]
                        thrpt:  [3.4605 GiB/s 3.6276 GiB/s 3.9506 GiB/s]
                 change:
                        time:   [−9.6552% +0.6574% +12.306%] (p = 0.91 > 0.05)
                        thrpt:  [−10.958% −0.6531% +10.687%]
                        No change in performance detected.
Forward f64/PhastFT DIT/65536
                        time:   [203.34 µs 214.49 µs 220.85 µs]
                        thrpt:  [296.75 Melem/s 305.54 Melem/s 322.30 Melem/s]
                        thrpt:  [4.4219 GiB/s 4.5529 GiB/s 4.8026 GiB/s]
                 change:
                        time:   [−11.660% −3.5674% +5.5965%] (p = 0.43 > 0.05)
                        thrpt:  [−5.2999% +3.6994% +13.199%]
                        No change in performance detected.
Forward f64/PhastFT DIT/131072
                        time:   [421.27 µs 447.27 µs 462.42 µs]
                        thrpt:  [283.45 Melem/s 293.05 Melem/s 311.14 Melem/s]
                        thrpt:  [4.2237 GiB/s 4.3668 GiB/s 4.6363 GiB/s]
                 change:
                        time:   [−6.5153% +3.2493% +13.871%] (p = 0.54 > 0.05)
                        thrpt:  [−12.182% −3.1470% +6.9694%]
                        No change in performance detected.
Forward f64/PhastFT DIT/262144
                        time:   [802.18 µs 845.31 µs 869.07 µs]
                        thrpt:  [301.64 Melem/s 310.12 Melem/s 326.79 Melem/s]
                        thrpt:  [4.4947 GiB/s 4.6211 GiB/s 4.8696 GiB/s]
                 change:
                        time:   [−7.0254% +2.9396% +13.851%] (p = 0.58 > 0.05)
                        thrpt:  [−12.166% −2.8557% +7.5563%]
                        No change in performance detected.
Forward f64/PhastFT DIT/524288
                        time:   [1.7698 ms 1.8251 ms 1.8615 ms]
                        thrpt:  [281.64 Melem/s 287.27 Melem/s 296.24 Melem/s]
                        thrpt:  [4.1968 GiB/s 4.2806 GiB/s 4.4143 GiB/s]
                 change:
                        time:   [−9.6416% +0.0508% +11.454%] (p = 0.98 > 0.05)
                        thrpt:  [−10.277% −0.0508% +10.670%]
                        No change in performance detected.
Forward f64/PhastFT DIT/1048576
                        time:   [3.9503 ms 3.9981 ms 4.0615 ms]
                        thrpt:  [258.17 Melem/s 262.27 Melem/s 265.44 Melem/s]
                        thrpt:  [3.8471 GiB/s 3.9081 GiB/s 3.9554 GiB/s]
                 change:
                        time:   [−3.2307% +5.0936% +14.373%] (p = 0.27 > 0.05)
                        thrpt:  [−12.567% −4.8468% +3.3385%]
                        No change in performance detected.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) low severe
Benchmarking Forward f64/PhastFT DIT/2097152: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 6.7s, enable flat sampling, or reduce sample count to 10.
Forward f64/PhastFT DIT/2097152
                        time:   [8.8456 ms 8.9459 ms 9.0526 ms]
                        thrpt:  [231.66 Melem/s 234.43 Melem/s 237.09 Melem/s]
                        thrpt:  [3.4521 GiB/s 3.4932 GiB/s 3.5328 GiB/s]
                 change:
                        time:   [−3.9830% −0.6905% +2.1816%] (p = 0.70 > 0.05)
                        thrpt:  [−2.1350% +0.6953% +4.1482%]
                        No change in performance detected.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high severe
Forward f64/PhastFT DIT/4194304
                        time:   [23.242 ms 23.429 ms 23.636 ms]
                        thrpt:  [177.46 Melem/s 179.03 Melem/s 180.46 Melem/s]
                        thrpt:  [2.6443 GiB/s 2.6677 GiB/s 2.6891 GiB/s]
                 change:
                        time:   [+1.3789% +2.3968% +3.4432%] (p = 0.00 < 0.05)
                        thrpt:  [−3.3286% −2.3407% −1.3601%]
                        Performance has regressed.
Found 3 outliers among 20 measurements (15.00%)
  3 (15.00%) high mild
Forward f64/PhastFT DIT/8388608
                        time:   [54.038 ms 54.436 ms 54.823 ms]
                        thrpt:  [153.01 Melem/s 154.10 Melem/s 155.24 Melem/s]
                        thrpt:  [2.2801 GiB/s 2.2963 GiB/s 2.3132 GiB/s]
                 change:
                        time:   [−7.6718% −6.6677% −5.6896%] (p = 0.00 < 0.05)
                        thrpt:  [+6.0329% +7.1440% +8.3092%]
                        Performance has improved.
Benchmarking Forward f64/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 7.3s, or reduce sample count to 10.
Forward f64/PhastFT DIT/16777216
                        time:   [153.54 ms 155.09 ms 156.69 ms]
                        thrpt:  [107.08 Melem/s 108.18 Melem/s 109.27 Melem/s]
                        thrpt:  [1.5955 GiB/s 1.6120 GiB/s 1.6282 GiB/s]
                 change:
                        time:   [−0.1641% +1.2476% +2.6801%] (p = 0.11 > 0.05)
                        thrpt:  [−2.6102% −1.2323% +0.1643%]
                        No change in performance detected.

Inverse f64/PhastFT DIT/64
                        time:   [151.89 ns 166.70 ns 176.04 ns]
                        thrpt:  [363.54 Melem/s 383.92 Melem/s 421.35 Melem/s]
                        thrpt:  [5.4172 GiB/s 5.7208 GiB/s 6.2785 GiB/s]
                 change:
                        time:   [−10.020% −1.9487% +6.7788%] (p = 0.69 > 0.05)
                        thrpt:  [−6.3484% +1.9875% +11.136%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/128
                        time:   [266.87 ns 294.16 ns 311.19 ns]
                        thrpt:  [411.33 Melem/s 435.14 Melem/s 479.64 Melem/s]
                        thrpt:  [6.1292 GiB/s 6.4841 GiB/s 7.1472 GiB/s]
                 change:
                        time:   [−12.608% −3.6754% +6.0798%] (p = 0.47 > 0.05)
                        thrpt:  [−5.7313% +3.8156% +14.427%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/256
                        time:   [633.30 ns 708.12 ns 756.44 ns]
                        thrpt:  [338.43 Melem/s 361.52 Melem/s 404.23 Melem/s]
                        thrpt:  [5.0430 GiB/s 5.3871 GiB/s 6.0236 GiB/s]
                 change:
                        time:   [−3.8133% +8.6525% +22.458%] (p = 0.20 > 0.05)
                        thrpt:  [−18.339% −7.9634% +3.9645%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/512
                        time:   [1.4132 µs 1.5824 µs 1.6916 µs]
                        thrpt:  [302.66 Melem/s 323.57 Melem/s 362.30 Melem/s]
                        thrpt:  [4.5100 GiB/s 4.8215 GiB/s 5.3987 GiB/s]
                 change:
                        time:   [−10.695% +1.9082% +16.561%] (p = 0.79 > 0.05)
                        thrpt:  [−14.208% −1.8725% +11.976%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/1024
                        time:   [2.9295 µs 3.2533 µs 3.4664 µs]
                        thrpt:  [295.41 Melem/s 314.76 Melem/s 349.55 Melem/s]
                        thrpt:  [4.4019 GiB/s 4.6902 GiB/s 5.2087 GiB/s]
                 change:
                        time:   [−13.419% +0.0661% +14.355%] (p = 0.99 > 0.05)
                        thrpt:  [−12.553% −0.0661% +15.499%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/2048
                        time:   [6.2084 µs 6.8588 µs 7.2791 µs]
                        thrpt:  [281.35 Melem/s 298.59 Melem/s 329.88 Melem/s]
                        thrpt:  [4.1925 GiB/s 4.4494 GiB/s 4.9155 GiB/s]
                 change:
                        time:   [−11.439% +0.8338% +13.716%] (p = 0.89 > 0.05)
                        thrpt:  [−12.061% −0.8269% +12.916%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/4096
                        time:   [12.974 µs 14.250 µs 15.061 µs]
                        thrpt:  [271.96 Melem/s 287.45 Melem/s 315.71 Melem/s]
                        thrpt:  [4.0525 GiB/s 4.2833 GiB/s 4.7045 GiB/s]
                 change:
                        time:   [−9.6278% +0.9202% +13.062%] (p = 0.88 > 0.05)
                        thrpt:  [−11.553% −0.9118% +10.654%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/8192
                        time:   [28.326 µs 30.981 µs 32.857 µs]
                        thrpt:  [249.33 Melem/s 264.42 Melem/s 289.21 Melem/s]
                        thrpt:  [3.7152 GiB/s 3.9402 GiB/s 4.3095 GiB/s]
                 change:
                        time:   [−11.365% −2.4171% +8.1728%] (p = 0.65 > 0.05)
                        thrpt:  [−7.5553% +2.4769% +12.822%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/16384
                        time:   [63.342 µs 68.824 µs 72.717 µs]
                        thrpt:  [225.31 Melem/s 238.06 Melem/s 258.66 Melem/s]
                        thrpt:  [3.3574 GiB/s 3.5473 GiB/s 3.8543 GiB/s]
                 change:
                        time:   [−6.5148% +2.0345% +11.726%] (p = 0.67 > 0.05)
                        thrpt:  [−10.495% −1.9939% +6.9688%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/32768
                        time:   [124.15 µs 135.44 µs 143.67 µs]
                        thrpt:  [228.07 Melem/s 241.94 Melem/s 263.94 Melem/s]
                        thrpt:  [3.3986 GiB/s 3.6052 GiB/s 3.9331 GiB/s]
                 change:
                        time:   [−9.4109% −0.8223% +9.1336%] (p = 0.86 > 0.05)
                        thrpt:  [−8.3692% +0.8291% +10.389%]
                        No change in performance detected.
Found 4 outliers among 20 measurements (20.00%)
  4 (20.00%) high mild
Inverse f64/PhastFT DIT/65536
                        time:   [206.83 µs 224.16 µs 234.61 µs]
                        thrpt:  [279.34 Melem/s 292.36 Melem/s 316.85 Melem/s]
                        thrpt:  [4.1625 GiB/s 4.3565 GiB/s 4.7215 GiB/s]
                 change:
                        time:   [−9.9062% −0.6898% +9.6704%] (p = 0.89 > 0.05)
                        thrpt:  [−8.8177% +0.6946% +10.995%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/131072
                        time:   [415.57 µs 452.68 µs 475.96 µs]
                        thrpt:  [275.39 Melem/s 289.55 Melem/s 315.41 Melem/s]
                        thrpt:  [4.1036 GiB/s 4.3146 GiB/s 4.6999 GiB/s]
                 change:
                        time:   [−11.715% −2.9848% +6.4427%] (p = 0.55 > 0.05)
                        thrpt:  [−6.0527% +3.0766% +13.270%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/262144
                        time:   [799.46 µs 866.48 µs 907.56 µs]
                        thrpt:  [288.84 Melem/s 302.54 Melem/s 327.90 Melem/s]
                        thrpt:  [4.3041 GiB/s 4.5082 GiB/s 4.8861 GiB/s]
                 change:
                        time:   [−15.070% −4.7076% +5.5877%] (p = 0.41 > 0.05)
                        thrpt:  [−5.2920% +4.9402% +17.744%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/524288
                        time:   [1.7639 ms 1.8543 ms 1.9037 ms]
                        thrpt:  [275.40 Melem/s 282.75 Melem/s 297.23 Melem/s]
                        thrpt:  [4.1037 GiB/s 4.2133 GiB/s 4.4290 GiB/s]
                 change:
                        time:   [−13.348% −4.7703% +5.0639%] (p = 0.36 > 0.05)
                        thrpt:  [−4.8198% +5.0093% +15.404%]
                        No change in performance detected.
Inverse f64/PhastFT DIT/1048576
                        time:   [4.0432 ms 4.1013 ms 4.1539 ms]
                        thrpt:  [252.43 Melem/s 255.67 Melem/s 259.35 Melem/s]
                        thrpt:  [3.7615 GiB/s 3.8097 GiB/s 3.8646 GiB/s]
                 change:
                        time:   [−8.9144% −0.0854% +9.7000%] (p = 0.99 > 0.05)
                        thrpt:  [−8.8423% +0.0854% +9.7868%]
                        No change in performance detected.
Benchmarking Inverse f64/PhastFT DIT/2097152: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 6.8s, enable flat sampling, or reduce sample count to 10.
Inverse f64/PhastFT DIT/2097152
                        time:   [9.5177 ms 9.6025 ms 9.6803 ms]
                        thrpt:  [216.64 Melem/s 218.40 Melem/s 220.34 Melem/s]
                        thrpt:  [3.2282 GiB/s 3.2544 GiB/s 3.2834 GiB/s]
                 change:
                        time:   [−6.6331% −2.8469% +0.7353%] (p = 0.16 > 0.05)
                        thrpt:  [−0.7299% +2.9303% +7.1044%]
                        No change in performance detected.
Found 2 outliers among 20 measurements (10.00%)
  1 (5.00%) low severe
  1 (5.00%) high severe
Inverse f64/PhastFT DIT/4194304
                        time:   [24.998 ms 25.098 ms 25.204 ms]
                        thrpt:  [166.41 Melem/s 167.12 Melem/s 167.79 Melem/s]
                        thrpt:  [2.4798 GiB/s 2.4903 GiB/s 2.5002 GiB/s]
                 change:
                        time:   [−2.5679% −1.9847% −1.3747%] (p = 0.00 < 0.05)
                        thrpt:  [+1.3939% +2.0249% +2.6355%]
                        Performance has improved.
Found 1 outliers among 20 measurements (5.00%)
  1 (5.00%) high mild
Inverse f64/PhastFT DIT/8388608
                        time:   [59.012 ms 59.387 ms 59.782 ms]
                        thrpt:  [140.32 Melem/s 141.25 Melem/s 142.15 Melem/s]
                        thrpt:  [2.0909 GiB/s 2.1048 GiB/s 2.1182 GiB/s]
                 change:
                        time:   [−12.940% −12.110% −11.291%] (p = 0.00 < 0.05)
                        thrpt:  [+12.728% +13.779% +14.863%]
                        Performance has improved.
Benchmarking Inverse f64/PhastFT DIT/16777216: Warming up for 3.0000 s
Warning: Unable to complete 20 samples in 5.0s. You may wish to increase target time to 7.7s, or reduce sample count to 10.
Inverse f64/PhastFT DIT/16777216
                        time:   [179.86 ms 181.68 ms 183.52 ms]
                        thrpt:  [91.418 Melem/s 92.345 Melem/s 93.280 Melem/s]
                        thrpt:  [1.3622 GiB/s 1.3760 GiB/s 1.3900 GiB/s]
                 change:
                        time:   [−1.8314% −0.5089% +0.8031%] (p = 0.48 > 0.05)
                        thrpt:  [−0.7967% +0.5115% +1.8656%]
                        No change in performance detected.

@Shnatsel Shnatsel merged commit 4d3d75d into main Apr 17, 2026
10 checks passed
@Shnatsel Shnatsel deleted the improve-inverse-transforms branch April 17, 2026 19:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants