Skip to content

Port thrust::minmax_element to CUB#8292

Draft
bernhardmgruber wants to merge 4 commits intoNVIDIA:mainfrom
bernhardmgruber:port_minmax_element
Draft

Port thrust::minmax_element to CUB#8292
bernhardmgruber wants to merge 4 commits intoNVIDIA:mainfrom
bernhardmgruber:port_minmax_element

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented Apr 5, 2026

Replaces #4970

Benchmark does not look good. My guess is that we need some better tunings.

# minmax_element

## [0] NVIDIA B200

|  T{ct}  |  Elements  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |        Diff |   %Diff |  Status  |
|---------|------------|------------|-------------|------------|-------------|-------------|---------|----------|
|   I8    |    2^16    |  32.876 us |       3.68% |  33.464 us |       2.15% |    0.588 us |   1.79% |   SAME   |
|   I8    |    2^20    |  39.693 us |       2.31% |  49.214 us |       2.28% |    9.521 us |  23.99% |   SLOW   |
|   I8    |    2^24    |  67.258 us |       1.71% |  90.032 us |       1.30% |   22.774 us |  33.86% |   SLOW   |
|   I8    |    2^28    | 455.555 us |       0.30% | 504.361 us |       0.30% |   48.806 us |  10.71% |   SLOW   |
|   I16   |    2^16    |  32.700 us |       3.57% |  31.092 us |       2.10% |   -1.608 us |  -4.92% |   FAST   |
|   I16   |    2^20    |  39.276 us |       2.02% |  46.393 us |       1.02% |    7.117 us |  18.12% |   SLOW   |
|   I16   |    2^24    |  72.472 us |       1.54% |  84.794 us |       1.35% |   12.322 us |  17.00% |   SLOW   |
|   I16   |    2^28    | 569.882 us |       0.15% | 467.841 us |       0.65% | -102.041 us | -17.91% |   FAST   |
|   I32   |    2^16    |  32.200 us |       4.30% |  30.120 us |       4.46% |   -2.080 us |  -6.46% |   FAST   |
|   I32   |    2^20    |  39.319 us |       1.91% |  41.266 us |       2.61% |    1.948 us |   4.95% |   SLOW   |
|   I32   |    2^24    |  65.933 us |       1.66% |  75.461 us |       1.44% |    9.528 us |  14.45% |   SLOW   |
|   I32   |    2^28    | 465.526 us |       0.19% | 407.095 us |       0.53% |  -58.430 us | -12.55% |   FAST   |
|   I64   |    2^16    |  32.911 us |       2.49% |  28.392 us |       2.90% |   -4.519 us | -13.73% |   FAST   |
|   I64   |    2^20    |  38.265 us |       2.17% |  36.767 us |       1.57% |   -1.498 us |  -3.91% |   FAST   |
|   I64   |    2^24    |  78.854 us |       1.63% |  76.643 us |       1.36% |   -2.211 us |  -2.80% |   FAST   |
|   I64   |    2^28    | 597.821 us |       0.18% | 507.651 us |       0.28% |  -90.170 us | -15.08% |   FAST   |
|  I128   |    2^16    |  34.047 us |       3.20% |  31.389 us |       1.48% |   -2.658 us |  -7.81% |   FAST   |
|  I128   |    2^20    |  43.497 us |       1.98% |  53.521 us |       1.29% |   10.024 us |  23.04% |   SLOW   |
|  I128   |    2^24    | 132.239 us |       0.69% | 133.107 us |       0.52% |    0.868 us |   0.66% |   SLOW   |
|  I128   |    2^28    |   1.404 ms |       0.07% |   1.231 ms |       0.16% | -173.041 us | -12.33% |   FAST   |
|   F32   |    2^16    |  32.519 us |       3.79% |  30.147 us |       4.19% |   -2.372 us |  -7.29% |   FAST   |
|   F32   |    2^20    |  39.152 us |       2.51% |  41.185 us |       2.93% |    2.033 us |   5.19% |   SLOW   |
|   F32   |    2^24    |  65.839 us |       1.30% |  75.523 us |       1.34% |    9.685 us |  14.71% |   SLOW   |
|   F32   |    2^28    | 465.087 us |       0.25% | 407.306 us |       0.35% |  -57.781 us | -12.42% |   FAST   |
|   F64   |    2^16    |  32.050 us |       3.78% |  28.373 us |       2.45% |   -3.676 us | -11.47% |   FAST   |
|   F64   |    2^20    |  38.152 us |       2.04% |  36.842 us |       1.86% |   -1.310 us |  -3.43% |   FAST   |
|   F64   |    2^24    |  77.053 us |       1.34% |  73.956 us |       1.78% |   -3.097 us |  -4.02% |   FAST   |
|   F64   |    2^28    | 595.398 us |       0.15% | 439.956 us |       0.32% | -155.442 us | -26.11% |   FAST   |

Fixes: #1626

@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 5, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Progress in CCCL Apr 5, 2026
@bernhardmgruber bernhardmgruber force-pushed the port_minmax_element branch 2 times, most recently from d9e4af4 to 2f69580 Compare April 7, 2026 08:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

Refactor thrust/extrema.h to use cub::DeviceReduce

1 participant