Add env overloads for InclusiveSum and future ExclusiveScan#8059
Add env overloads for InclusiveSum and future ExclusiveScan#8059gonidelis wants to merge 7 commits intoNVIDIA:mainfrom
Conversation
This comment has been minimized.
This comment has been minimized.
| //! - When ``d_in`` and ``d_out`` are equal, the scan is performed in-place. The | ||
| //! range ``[d_in, d_in + num_items)`` and ``[d_out, d_out + num_items)`` | ||
| //! shall not overlap in any other way. | ||
| //! |
There was a problem hiding this comment.
add env overload documentation info
This comment has been minimized.
This comment has been minimized.
| C2H_TEST("cub::DeviceScan::ExclusiveScan accepts not_guaranteed determinism requirements", "[scan][env]") | ||
| { | ||
| auto op = cuda::std::plus{}; | ||
| auto input = thrust::device_vector<int>{0, 1, 2, 3}; | ||
| auto output = thrust::device_vector<int>(4); | ||
| auto init = 0; | ||
|
|
||
| auto env = cuda::execution::require(cuda::execution::determinism::not_guaranteed); | ||
|
|
||
| auto error = cub::DeviceScan::ExclusiveScan(input.begin(), output.begin(), op, init, input.size(), env); | ||
| if (error != cudaSuccess) | ||
| { | ||
| std::cerr << "cub::DeviceScan::ExclusiveScan failed with status: " << error << std::endl; | ||
| } | ||
|
|
||
| thrust::device_vector<int> expected{0, 0, 1, 3}; | ||
|
|
||
| REQUIRE(error == cudaSuccess); | ||
| REQUIRE(output == expected); | ||
| } |
There was a problem hiding this comment.
Important: This example seems unused by any documentation. Please add it.
There was a problem hiding this comment.
@gonidelis you resolved this comment but there are still no // example-begin etc. comments in this example. Therefore, it's still not used anywhere. Please address this.
There was a problem hiding this comment.
there might be api test cases where we don't insert in the docs in order to avoid bloating. i added the guards but did not insert it in the rst docs with a literalinclude as a non_guaranteed example already exists
There was a problem hiding this comment.
we can also remove that
- dispatch_scan's warpspeed path calls max_dynamic_smem_size_for and
set_max_dynamic_smem_size_for on the launcher factory
- Previously untriggered because env tests used constant_iterator (non-contiguous),
which skips the warpspeed path
- Now needed for InclusiveSum env tests with device_vector (contiguous iterators)
…ents - Add default environment test for ExclusiveScan with FutureValue - Add not_guaranteed determinism test for ExclusiveScan - Remove duplicate in-place precondition from env overload descriptions (already in Preconditions section) - Remove @devicestorage from env overloads (no temp storage parameter)
| C2H_TEST("cub::DeviceScan::ExclusiveScan accepts not_guaranteed determinism requirements", "[scan][env]") | ||
| { | ||
| auto op = cuda::std::plus{}; | ||
| auto input = thrust::device_vector<int>{0, 1, 2, 3}; | ||
| auto output = thrust::device_vector<int>(4); | ||
| auto init = 0; | ||
|
|
||
| auto env = cuda::execution::require(cuda::execution::determinism::not_guaranteed); | ||
|
|
||
| auto error = cub::DeviceScan::ExclusiveScan(input.begin(), output.begin(), op, init, input.size(), env); | ||
| if (error != cudaSuccess) | ||
| { | ||
| std::cerr << "cub::DeviceScan::ExclusiveScan failed with status: " << error << std::endl; | ||
| } | ||
|
|
||
| thrust::device_vector<int> expected{0, 0, 1, 3}; | ||
|
|
||
| REQUIRE(error == cudaSuccess); | ||
| REQUIRE(output == expected); | ||
| } |
There was a problem hiding this comment.
@gonidelis you resolved this comment but there are still no // example-begin etc. comments in this example. Therefore, it's still not used anywhere. Please address this.
This comment has been minimized.
This comment has been minimized.
| { | ||
| using num_items_t = int; | ||
|
|
||
| num_items_t num_items = 10; |
There was a problem hiding this comment.
Use (small) random value to include corner-cases.
There was a problem hiding this comment.
this would require to make the logic for the expected vectors more complex, do you think this change is necessary? are you referring to zero sized inputs type of corner cases?
There was a problem hiding this comment.
Do use GENERATE() to generate size of input but elements are still all 1 using constant_iter
…ents
- Use non-identity init values
- Replace per-element REQUIRE(d_out[i]) with bulk thrust::equal or
device_vector comparison where possible
- Add example-begin/end markers for exclusive-scan-env-not-guaranteed
- Fix mismatched init/expected in inclusive-scan-init env test
🥳 CI Workflow Results🟩 Finished in 1h 56m: Pass: 100%/249 | Total: 5d 03h | Max: 1h 38m | Hits: 93%/160113See results here. |
| REQUIRE(d_block_size[0] == target_block_size); | ||
| } | ||
|
|
||
| TEST_CASE("Device scan exclusive scan with FutureValue works with default environment", "[scan][device]") |
There was a problem hiding this comment.
Add failed tests for float run_to_run here:
/home/ggonidelis/cccl/cub/test/test_device_exclusive_scan_determinism_fail.cu
|
|
||
| thrust::device_vector<float> expected{1.0f, 3.0f, 6.0f, 10.0f}; | ||
| // example-end inclusive-sum-env-stream | ||
| stream.sync(); |
There was a problem hiding this comment.
Important: we should not need any sync here. Also not in any other tests in this file. Why is it needed?
Adds miscellaneous env overloads for
InclusiveSumandExclusiveScanwithFutureValue