Skip to content

Add rocSPARSE CSR SpGEMM#51

Merged
yhmtsai merged 13 commits intomainfrom
dev/yhmtsai/rocsparse_spgemm
Mar 26, 2026
Merged

Add rocSPARSE CSR SpGEMM#51
yhmtsai merged 13 commits intomainfrom
dev/yhmtsai/rocsparse_spgemm

Conversation

@yhmtsai
Copy link
Copy Markdown
Contributor

@yhmtsai yhmtsai commented Mar 27, 2025

Summary:
This PR adds rocSPARSE CSR SpGEMM with 3 args (C = alpha A * B), 4 args (C = alpha A * B + beta D), and the symbolic/numeric phase.

Details:

  • Add spgemm state
  • Add 4 arguments version of SpGEMM (C = alpha A * B + beta D)
  • the 3 arguments is passing a null matrix D into 4 arguments version
  • also adds the symbolic_compute, symbolic_fill, and numeric stage
  • currently, three type of test: spgemm similar to the current CPU version, spgemm_reuse: checking symbolic/numeric, and spgemm_4args: checking 4 arguments version

potential TODO:

  • create a structure to handle the rocsparse handle in one place
  • helper for create rocsparse_mat (is already merged in another pr)

Merge Checklist:

  • Passing CI
  • Update documentation or README.md
  • Additional Test/example added (if applicable) and passing
  • At least one reviewer approval
  • (optional) Clang sanitizer scan run and triaged
  • Clang formatter applied (verified as part of passing CI)

@yhmtsai yhmtsai self-assigned this Mar 27, 2025
Copy link
Copy Markdown

@YvanMokwinski YvanMokwinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a general comment: I might not be used to cutting-edge C++, but I don't find the code friendly to read.

to_rocsparse_indextype<typename output_type::offset_type>(),
to_rocsparse_indextype<typename output_type::index_type>(),
rocsparse_index_base_zero,
to_rocsparse_datatype<typename output_type::scalar_type>()));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we could have something like:

using output_scalar_t = typename output_type::scalar_type;
It would simplify the writings.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have merged them into a helper function to create a matrix handle easily from the csr_view

to_rocsparse_datatype<value_type>(), rocsparse_spgemm_alg_default,
rocsparse_spgemm_stage_buffer_size, &buffer_size, nullptr));
if (buffer_size > this->buffer_size_) {
this->alloc_.deallocate(workspace_, this->buffer_size_);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's have a reallocate method in the allocator?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is close to the array more than the allocator?
do you imagine something like alloc_.reallocate(&workspace_, old_buffer_size_, new_buffer_size_)?

tensor_scalar_t<A> alpha = alpha_optional.value_or(1);
value_type alpha_val = alpha;
auto beta_optional = __detail::get_scaling_factor(d);
value_type beta = beta_optional.value_or(1);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that relying on a and d types for alpha and beta is the best approach.

__detail::is_csr_view_v<C>
void multiply_compute(spgemm_state_t& spgemm_handle, A&& a, B&& b, C&& c) {
auto d = __rocsparse::create_null_matrix<std::remove_reference_t<C>>();
spgemm_handle.multiply_compute(a, b, c, scaled(0.0, d));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we have either scaled(0.0, d) or d?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is to reuse the 4 arguments SpGEMM. It will give the same setup when using rocsparse_spgemm for C = A*B, null_matrix set the all pointer to nullptr.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to note that if down the road, we do start to enforce the difference in behaviour for

C = alpha op(A)*op(B)
C = alpha op(A)op(B) + betaop(D)

with regards to beta, with the first being a strong ZERO for beta meaning we write into C and values didn't need to be initialized, vs the second one with beta == 0.0, still having possibility of propogating NaNs or Infs and not going like BLAS does of saying beta == 0.0 leads to not accessing D. Right now this test as written does not make this distinction and the 3 arg test would fail if D had nans in it...

@yhmtsai yhmtsai force-pushed the dev/yhmtsai/rocsparse_spgemm branch from 605f085 to 64d662f Compare April 17, 2025 14:43
@BenBrock
Copy link
Copy Markdown
Collaborator

@yhmtsai What's the status of this PR? Anything required on my part here, or are you still updating, waiting for @YvanMokwinski to comment on your most recent updates?

@yhmtsai yhmtsai force-pushed the dev/yhmtsai/rocsparse_spgemm branch from c39ef55 to eeb3a67 Compare May 22, 2025 14:10
@yhmtsai yhmtsai marked this pull request as ready for review May 22, 2025 14:49
@yhmtsai
Copy link
Copy Markdown
Contributor Author

yhmtsai commented Jun 5, 2025

@YvanMokwinski @BenBrock forgot to mention it can be checked again.
I decide not to introduce a class to handle the rocsparse handle in this PR.
I am also thinking whether using shared_ptr to handle rocsparse matrix handle but I keep it as current state until matrix_opt.
I think 4arg SpGEMM are not supported by the other backend, so I keep them in rocsparse only

BenBrock
BenBrock previously approved these changes Jun 9, 2025
Copy link
Copy Markdown
Collaborator

@BenBrock BenBrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @yhmtsai, looks mostly good, I just have a couple of small comments.

As a favor to me, can you please avoid performing Git force pushes on branches for which you have an open PR? If no one is looking at your code yet, force pushing is mostly fine. However, if you force push to a branch I've already reviewed, it becomes very difficult to see what's changed recently. Please use merge commits instead of rebasing if necessary.

throw_if_error(rocsparse_create_csr_descr(
&descr, __backend::shape(a)[0], __backend::shape(a)[1], a.values().size(),
a.rowptr().data(), a.colind().data(), a.values().data(),
to_rocsparse_indextype<typename matrix_type::offset_type>(),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the tensor_traits inspector instead of depending on scalar_type, index_type, and offset_type typedefs. This is cleaner (you can directly write tensor_scalar_t<mat>) and will work with types we don't define like mdspan as well.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it be uniform for the other types? like Coo, offset_type does not make sense.


} // namespace __rocsparse

class spgemm_state_t {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is okay for now, but eventually we need to have one operation_state_t that encapsulates this (if that's how you want to design it).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, if the operation_state_t is for user-friendly in the end. for now, I think separating individually should be clearer and easier for future change

template <matrix A, matrix B, matrix C>
requires __detail::has_csr_base<A> && __detail::has_csr_base<B> &&
__detail::is_csr_view_v<C>
void multiply_inspect(spgemm_state_t& spgemm_handle, A&& a, B&& b, C&& c) {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to at least add a version that takes in three matrices and returns a state (see multiply.hpp).

@yhmtsai
Copy link
Copy Markdown
Contributor Author

yhmtsai commented Jun 26, 2025

@BenBrock Sure, I can avoid force rebasing. It is what we need to do to keep linear history in Ginkgo and I try to reorganize the commit history to describe the change. As we always do squash, my commit history will be merged together anyway.

@yhmtsai
Copy link
Copy Markdown
Contributor Author

yhmtsai commented Aug 7, 2025

Hi @BenBrock I have used merge way to update this branch. hope this will help the review process.
I only do adaption on changes in this pr not revert something in this pr here

BenBrock
BenBrock previously approved these changes Aug 20, 2025
Copy link
Copy Markdown
Collaborator

@BenBrock BenBrock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have one comment about the CMakeLists.txt, but otherwise looks fine to merge.


function(add_rocsparse_lang file_list)
if (ENABLE_ROCSPARSE)
set_source_files_properties(${${file_list}} PROPERTIES LANGUAGE HIP)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we perhaps just have this inline? Personally I find that easier to read than needing to go to a separate function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. agree with it. It is quite small so no need for function now

Co-authored-by: Benjamin Brock <brock@cs.berkeley.edu>
@yhmtsai
Copy link
Copy Markdown
Contributor Author

yhmtsai commented Aug 21, 2025

after push, it automatically dismiss the review. @BenBrock could you check it again?

Copy link
Copy Markdown

@srobertp srobertp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these changes look good to me as well. Approved for merge

Copy link
Copy Markdown
Contributor

@spencerpatty spencerpatty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved with my other github id, looks good!

@yhmtsai yhmtsai merged commit a138942 into main Mar 26, 2026
9 of 12 checks passed
@yhmtsai yhmtsai deleted the dev/yhmtsai/rocsparse_spgemm branch March 26, 2026 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants