Skip to content

Add env overloads for InclusiveSum and future ExclusiveScan#8059

Open
gonidelis wants to merge 7 commits intoNVIDIA:mainfrom
gonidelis:init_sum_env
Open

Add env overloads for InclusiveSum and future ExclusiveScan#8059
gonidelis wants to merge 7 commits intoNVIDIA:mainfrom
gonidelis:init_sum_env

Conversation

@gonidelis
Copy link
Copy Markdown
Member

Adds miscellaneous env overloads for InclusiveSum and ExclusiveScan with FutureValue

@gonidelis gonidelis requested a review from a team as a code owner March 17, 2026 05:40
@gonidelis gonidelis requested a review from pauleonix March 17, 2026 05:40
@github-project-automation github-project-automation bot moved this to Todo in CCCL Mar 17, 2026
@gonidelis gonidelis changed the title Add ebv overloads for InclusiveSum and future ExclusiveScan Add env overloads for InclusiveSum and future ExclusiveScan Mar 17, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Mar 17, 2026
@github-actions

This comment has been minimized.

//! - When ``d_in`` and ``d_out`` are equal, the scan is performed in-place. The
//! range ``[d_in, d_in + num_items)`` and ``[d_out, d_out + num_items)``
//! shall not overlap in any other way.
//!
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add env overload documentation info

@github-actions

This comment has been minimized.

Comment on lines +42 to +61
C2H_TEST("cub::DeviceScan::ExclusiveScan accepts not_guaranteed determinism requirements", "[scan][env]")
{
auto op = cuda::std::plus{};
auto input = thrust::device_vector<int>{0, 1, 2, 3};
auto output = thrust::device_vector<int>(4);
auto init = 0;

auto env = cuda::execution::require(cuda::execution::determinism::not_guaranteed);

auto error = cub::DeviceScan::ExclusiveScan(input.begin(), output.begin(), op, init, input.size(), env);
if (error != cudaSuccess)
{
std::cerr << "cub::DeviceScan::ExclusiveScan failed with status: " << error << std::endl;
}

thrust::device_vector<int> expected{0, 0, 1, 3};

REQUIRE(error == cudaSuccess);
REQUIRE(output == expected);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: This example seems unused by any documentation. Please add it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gonidelis you resolved this comment but there are still no // example-begin etc. comments in this example. Therefore, it's still not used anywhere. Please address this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there might be api test cases where we don't insert in the docs in order to avoid bloating. i added the guards but did not insert it in the rst docs with a literalinclude as a non_guaranteed example already exists

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also remove that

@gonidelis gonidelis enabled auto-merge (squash) March 25, 2026 10:34
  - dispatch_scan's warpspeed path calls max_dynamic_smem_size_for and
    set_max_dynamic_smem_size_for on the launcher factory
  - Previously untriggered because env tests used constant_iterator (non-contiguous),
    which skips the warpspeed path
  - Now needed for InclusiveSum env tests with device_vector (contiguous iterators)
…ents

  - Add default environment test for ExclusiveScan with FutureValue
  - Add not_guaranteed determinism test for ExclusiveScan
  - Remove duplicate in-place precondition from env overload descriptions (already in Preconditions section)
  - Remove @devicestorage from env overloads (no temp storage parameter)
Comment on lines +42 to +61
C2H_TEST("cub::DeviceScan::ExclusiveScan accepts not_guaranteed determinism requirements", "[scan][env]")
{
auto op = cuda::std::plus{};
auto input = thrust::device_vector<int>{0, 1, 2, 3};
auto output = thrust::device_vector<int>(4);
auto init = 0;

auto env = cuda::execution::require(cuda::execution::determinism::not_guaranteed);

auto error = cub::DeviceScan::ExclusiveScan(input.begin(), output.begin(), op, init, input.size(), env);
if (error != cudaSuccess)
{
std::cerr << "cub::DeviceScan::ExclusiveScan failed with status: " << error << std::endl;
}

thrust::device_vector<int> expected{0, 0, 1, 3};

REQUIRE(error == cudaSuccess);
REQUIRE(output == expected);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gonidelis you resolved this comment but there are still no // example-begin etc. comments in this example. Therefore, it's still not used anywhere. Please address this.

@github-actions

This comment has been minimized.

{
using num_items_t = int;

num_items_t num_items = 10;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use (small) random value to include corner-cases.

Copy link
Copy Markdown
Member Author

@gonidelis gonidelis Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would require to make the logic for the expected vectors more complex, do you think this change is necessary? are you referring to zero sized inputs type of corner cases?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do use GENERATE() to generate size of input but elements are still all 1 using constant_iter

…ents

  - Use non-identity init values
  - Replace per-element REQUIRE(d_out[i]) with bulk thrust::equal or
    device_vector comparison where possible
  - Add example-begin/end markers for exclusive-scan-env-not-guaranteed
  - Fix mismatched init/expected in inclusive-scan-init env test
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 2, 2026

🥳 CI Workflow Results

🟩 Finished in 1h 56m: Pass: 100%/249 | Total: 5d 03h | Max: 1h 38m | Hits: 93%/160113

See results here.

REQUIRE(d_block_size[0] == target_block_size);
}

TEST_CASE("Device scan exclusive scan with FutureValue works with default environment", "[scan][device]")
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add failed tests for float run_to_run here:
/home/ggonidelis/cccl/cub/test/test_device_exclusive_scan_determinism_fail.cu


thrust::device_vector<float> expected{1.0f, 3.0f, 6.0f, 10.0f};
// example-end inclusive-sum-env-stream
stream.sync();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important: we should not need any sync here. Also not in any other tests in this file. Why is it needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

3 participants