|
operation_info_t info; |
|
|
|
device_policy policy; |
|
|
|
multiply_inspect(info, policy, a, x, y); |
|
multiply_inspect(info, policy, transposed(a), x, y); |
|
|
|
// Allocate more memory for y based on `info` |
|
|
|
while (/* ... */) { |
|
multiply_execute(info, policy, a, x, y); |
|
// do something with y, update x... |
|
multiply_execute(info, policy, transposed(a), y, x); |
|
// Maybe do some more stuff... |
|
} |
I like this idea of having an info type that is directly associated with some matrix structure and which is filled with 0 or more inspection based optimizations (which means it houses "stateful + read-only" optimizations. I wonder if it would be possible to have our multiply functions take in some hybrid matrix_obj object which consists of either a matrix_view or a matrix_view + an associated matrix_info_t type -- used in some way like the following snippet
csr_view<T,I,O> A(...);
matrix_info_t A_info(...);
multiply_inspect( matrix_obj{A, A_info}, descriptor, x, y, /*backend stuff*/ )
multiply_execute( matrix_obj{A, A_info}, descriptor, x,y, /*backend stuff */)
or we might also skip the inspection at the cost of less performance...
csr_view<T,I,O> A(...);
multiply_execute( matrix_obj{A}, descriptor, x,y, /*backend stuff */)
The benefit of this is that when we look at the sparse * sparse operation, we could have an A_info, B_info, that may contain good (read-only stateful) stuff that might be useful about A, B while creating C, and then there may be another multistage_info_t which is particular to the multi-stage operation (stateful + read/write data)
csr_view<T,I,O> A(...);
matrix_info_t A_info(...);
csr_view<T,I,O> B(...);
matrix_info_t B_info(...);
csr_view<T,I,O> C(...);
multiply_info_t mult_info(); // C = A *B^T
multiply_inspect(matrix_obj{C}, matrix_obj{A, A_info}, transpose(matrix_obj{B,B_info}), desc, /*backend stuff*/ ); // fills A_info and or B_info
multiply_execute_stage1( matrix_obj{C}, matrix_obj{A, A_info}, transpose(matrix_obj{B,B_info}), mult_info, /*backend stuff*/ ); // fills mult_info and C
multiply_execute_stage2( matrix_obj{C}, matrix_obj{A, A_info}, transpose(matrix_obj{B,B_info}), mult_info, /*backend stuff*/ ); // fills mult_info and C
multiply_execute_stage3( matrix_obj{C}, matrix_obj{A, A_info}, transpose(matrix_obj{B,B_info}), mult_info, /*backend stuff*/ ); // fills mult_info and C
mult_info might house the stateful + read/write optimizations pertaining to the multiply multi-stage process ... A_info and B_info might pertain stateful + read-only optimizations about A and or B ...
Does this idea make sense? Does anyone see any use issues ? Is it too ugly ? I worry that we will have too many overloads if we have A and possibly A_info etc separated as inputs ... And this allows us to distinguish between matrix inputs + info and operational info data ...
spblas-reference/notes/spmv.hpp
Lines 10 to 24 in 0930680
I like this idea of having an info type that is directly associated with some matrix structure and which is filled with 0 or more inspection based optimizations (which means it houses "stateful + read-only" optimizations. I wonder if it would be possible to have our multiply functions take in some hybrid matrix_obj object which consists of either a matrix_view or a matrix_view + an associated matrix_info_t type -- used in some way like the following snippet
or we might also skip the inspection at the cost of less performance...
The benefit of this is that when we look at the sparse * sparse operation, we could have an A_info, B_info, that may contain good (read-only stateful) stuff that might be useful about A, B while creating C, and then there may be another
multistage_info_twhich is particular to the multi-stage operation (stateful + read/write data)mult_info might house the stateful + read/write optimizations pertaining to the multiply multi-stage process ... A_info and B_info might pertain stateful + read-only optimizations about A and or B ...
Does this idea make sense? Does anyone see any use issues ? Is it too ugly ? I worry that we will have too many overloads if we have A and possibly A_info etc separated as inputs ... And this allows us to distinguish between matrix inputs + info and operational info data ...