Skip to content

Port thrust::min|max_element to CUB#8291

Open
bernhardmgruber wants to merge 2 commits intoNVIDIA:mainfrom
bernhardmgruber:port_min_element
Open

Port thrust::min|max_element to CUB#8291
bernhardmgruber wants to merge 2 commits intoNVIDIA:mainfrom
bernhardmgruber:port_min_element

Conversation

@bernhardmgruber
Copy link
Copy Markdown
Contributor

@bernhardmgruber bernhardmgruber commented Apr 4, 2026

Pulled out from: #4970

Added a new thrust benchmark. Performance is overwhelmingly good, except for a few runs.

# min_element

## [0] NVIDIA B200

|  T{ct}  |  Elements  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|---------|------------|------------|-------------|------------|-------------|------------|---------|----------|
|   I8    |    2^16    |  29.767 us |       3.51% |  27.695 us |       2.50% |  -2.072 us |  -6.96% |   FAST   |
|   I8    |    2^20    |  34.445 us |       2.45% |  30.387 us |       3.75% |  -4.057 us | -11.78% |   FAST   |
|   I8    |    2^24    |  48.797 us |       1.97% |  46.842 us |       2.15% |  -1.955 us |  -4.01% |   FAST   |
|   I8    |    2^28    | 232.320 us |       0.71% | 173.622 us |       0.84% | -58.698 us | -25.27% |   FAST   |
|   I16   |    2^16    |  32.439 us |       4.30% |  30.093 us |       7.20% |  -2.347 us |  -7.23% |   FAST   |
|   I16   |    2^20    |  35.631 us |       3.93% |  31.878 us |       3.12% |  -3.753 us | -10.53% |   FAST   |
|   I16   |    2^24    |  52.312 us |       2.39% |  48.735 us |       2.00% |  -3.578 us |  -6.84% |   FAST   |
|   I16   |    2^28    | 237.099 us |       0.81% | 163.475 us |       1.07% | -73.624 us | -31.05% |   FAST   |
|   I32   |    2^16    |  31.905 us |       4.46% |  29.510 us |       3.11% |  -2.395 us |  -7.51% |   FAST   |
|   I32   |    2^20    |  35.995 us |       4.11% |  31.087 us |       3.98% |  -4.907 us | -13.63% |   FAST   |
|   I32   |    2^24    |  52.556 us |       2.99% |  47.221 us |       2.27% |  -5.334 us | -10.15% |   FAST   |
|   I32   |    2^28    | 259.893 us |       0.79% | 202.981 us |       0.83% | -56.913 us | -21.90% |   FAST   |
|   I64   |    2^16    |  31.984 us |       4.34% |  29.859 us |       2.94% |  -2.125 us |  -6.65% |   FAST   |
|   I64   |    2^20    |  35.956 us |       5.26% |  32.477 us |       2.94% |  -3.479 us |  -9.68% |   FAST   |
|   I64   |    2^24    |  63.105 us |       2.19% |  61.777 us |       1.71% |  -1.328 us |  -2.10% |   FAST   |
|   I64   |    2^28    | 363.831 us |       1.16% | 348.154 us |       0.63% | -15.677 us |  -4.31% |   FAST   |
|  I128   |    2^16    |  32.892 us |       5.53% |  30.926 us |       4.51% |  -1.966 us |  -5.98% |   FAST   |
|  I128   |    2^20    |  40.230 us |       4.75% |  39.214 us |       3.15% |  -1.016 us |  -2.53% |   SAME   |
|  I128   |    2^24    |  89.145 us |       1.47% |  92.932 us |       1.49% |   3.786 us |   4.25% |   SLOW   |
|  I128   |    2^28    | 721.795 us |       0.32% | 685.891 us |       1.41% | -35.904 us |  -4.97% |   FAST   |
|   F32   |    2^16    |  32.142 us |       3.67% |  29.304 us |       3.25% |  -2.838 us |  -8.83% |   FAST   |
|   F32   |    2^20    |  35.792 us |       4.37% |  31.437 us |       4.82% |  -4.355 us | -12.17% |   FAST   |
|   F32   |    2^24    |  52.600 us |       2.42% |  47.182 us |       2.07% |  -5.417 us | -10.30% |   FAST   |
|   F32   |    2^28    | 261.674 us |       0.50% | 202.987 us |       0.51% | -58.687 us | -22.43% |   FAST   |
|   F64   |    2^16    |  31.788 us |       4.75% |  29.779 us |       2.33% |  -2.008 us |  -6.32% |   FAST   |
|   F64   |    2^20    |  35.550 us |       4.09% |  32.349 us |       4.18% |  -3.201 us |  -9.00% |   FAST   |
|   F64   |    2^24    |  61.892 us |       2.09% |  61.993 us |       1.65% |   0.101 us |   0.16% |   SAME   |
|   F64   |    2^28    | 353.879 us |       0.40% | 347.793 us |       0.81% |  -6.086 us |  -1.72% |   FAST   |

# max_element

## [0] NVIDIA B200

|  T{ct}  |  Elements  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |       Diff |   %Diff |  Status  |
|---------|------------|------------|-------------|------------|-------------|------------|---------|----------|
|   I8    |    2^16    |  32.616 us |       5.73% |  29.773 us |       4.54% |  -2.842 us |  -8.71% |   FAST   |
|   I8    |    2^20    |  35.695 us |       4.96% |  31.616 us |       3.32% |  -4.080 us | -11.43% |   FAST   |
|   I8    |    2^24    |  48.544 us |       2.74% |  48.531 us |       1.98% |  -0.013 us |  -0.03% |   SAME   |
|   I8    |    2^28    | 230.603 us |       0.62% | 173.792 us |       0.81% | -56.811 us | -24.64% |   FAST   |
|   I16   |    2^16    |  32.419 us |       4.77% |  29.407 us |       3.41% |  -3.011 us |  -9.29% |   FAST   |
|   I16   |    2^20    |  36.012 us |       3.63% |  31.532 us |       3.82% |  -4.480 us | -12.44% |   FAST   |
|   I16   |    2^24    |  49.631 us |       4.07% |  48.665 us |       2.24% |  -0.966 us |  -1.95% |   SAME   |
|   I16   |    2^28    | 232.820 us |       0.73% | 162.812 us |       0.66% | -70.008 us | -30.07% |   FAST   |
|   I32   |    2^16    |  32.406 us |       4.96% |  29.634 us |       3.86% |  -2.772 us |  -8.55% |   FAST   |
|   I32   |    2^20    |  35.590 us |       3.25% |  31.042 us |       3.92% |  -4.548 us | -12.78% |   FAST   |
|   I32   |    2^24    |  51.926 us |       2.57% |  47.256 us |       2.21% |  -4.670 us |  -8.99% |   FAST   |
|   I32   |    2^28    | 257.810 us |       0.51% | 203.083 us |       0.55% | -54.727 us | -21.23% |   FAST   |
|   I64   |    2^16    |  32.068 us |       7.18% |  29.981 us |       4.47% |  -2.086 us |  -6.51% |   FAST   |
|   I64   |    2^20    |  36.542 us |       3.62% |  32.323 us |       3.15% |  -4.219 us | -11.55% |   FAST   |
|   I64   |    2^24    |  61.183 us |       3.07% |  61.967 us |       1.70% |   0.784 us |   1.28% |   SAME   |
|   I64   |    2^28    | 353.614 us |       0.58% | 347.825 us |       0.65% |  -5.790 us |  -1.64% |   FAST   |
|  I128   |    2^16    |  34.014 us |       5.03% |  31.267 us |       4.49% |  -2.747 us |  -8.08% |   FAST   |
|  I128   |    2^20    |  39.949 us |       3.06% |  39.196 us |       3.14% |  -0.752 us |  -1.88% |   SAME   |
|  I128   |    2^24    |  89.703 us |       1.24% |  92.680 us |       1.08% |   2.977 us |   3.32% |   SLOW   |
|  I128   |    2^28    | 733.036 us |       4.12% | 686.048 us |       1.72% | -46.988 us |  -6.41% |   FAST   |
|   F32   |    2^16    |  32.219 us |       4.26% |  29.352 us |       4.45% |  -2.867 us |  -8.90% |   FAST   |
|   F32   |    2^20    |  35.693 us |       3.41% |  31.447 us |       3.32% |  -4.246 us | -11.90% |   FAST   |
|   F32   |    2^24    |  51.790 us |       4.32% |  47.009 us |       2.75% |  -4.780 us |  -9.23% |   FAST   |
|   F32   |    2^28    | 257.314 us |       0.42% | 202.868 us |       0.64% | -54.447 us | -21.16% |   FAST   |
|   F64   |    2^16    |  31.941 us |       4.17% |  29.663 us |       3.68% |  -2.279 us |  -7.13% |   FAST   |
|   F64   |    2^20    |  35.494 us |       3.67% |  32.450 us |       2.87% |  -3.045 us |  -8.58% |   FAST   |
|   F64   |    2^24    |  60.304 us |       2.59% |  62.061 us |       1.88% |   1.757 us |   2.91% |   SLOW   |
|   F64   |    2^28    | 364.362 us |       3.27% | 347.638 us |       0.61% | -16.724 us |  -4.59% |   FAST   |

Fixes part of: #1626

@bernhardmgruber bernhardmgruber requested review from a team as code owners April 4, 2026 22:11
@bernhardmgruber bernhardmgruber requested a review from a team as a code owner April 4, 2026 22:11
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 4, 2026
@bernhardmgruber bernhardmgruber marked this pull request as draft April 4, 2026 22:11
@copy-pr-bot
Copy link
Copy Markdown
Contributor

copy-pr-bot bot commented Apr 4, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 4, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Review to In Progress in CCCL Apr 4, 2026
@bernhardmgruber bernhardmgruber changed the title Port min element Port thrust::min|max_element to CUB Apr 4, 2026
@github-actions

This comment has been minimized.

@bernhardmgruber bernhardmgruber marked this pull request as ready for review April 7, 2026 07:16
@cccl-authenticator-app cccl-authenticator-app bot moved this from In Progress to In Review in CCCL Apr 7, 2026
@bernhardmgruber bernhardmgruber requested a review from a team as a code owner April 7, 2026 07:27
@github-actions

This comment has been minimized.

@miscco
Copy link
Copy Markdown
Contributor

miscco commented Apr 8, 2026

Looks like we are not returning the right values:

 /home/coder/cccl/cub/test/catch2_test_device_reduce.cu:302: FAILED:
    REQUIRE( expected_result[0] == gpu_extremum )
  with expansion:
    0 == -9223372036854775808

@bernhardmgruber
Copy link
Copy Markdown
Contributor Author

Looks like we are not returning the right values:

 /home/coder/cccl/cub/test/catch2_test_device_reduce.cu:302: FAILED:
    REQUIRE( expected_result[0] == gpu_extremum )
  with expansion:
    0 == -9223372036854775808

This may have been an issue in #8285, where abs() was causing integer overflow when inverting INT_MIN, etc. This was fixed and I rebased now. Let's see what the CI says.

@bernhardmgruber
Copy link
Copy Markdown
Contributor Author

I tried adding THRUST_INDEX_TYPE_DISPATCH and it was a lot slower!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 8, 2026

🥳 CI Workflow Results

🟩 Finished in 2h 46m: Pass: 100%/116 | Total: 4d 04h | Max: 2h 45m | Hits: 67%/172467

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants