Comments (7)
Here is an example of a sumInto function that uses the above relative offsets.
template<class KokkosSparseMatrix,
class RelOffsetType = typename KokkosSparseMatrix::ordinal_type>
KOKKOS_FUNCTION
typename KokkosSparseMatrix::ordinal_type
sumIntoCrsMatrixRowValsByRelOffsets (const OrdinalType rowi,
const RelOffsetType relOffsets[],
const typename KokkosSparseMatrix::value_type vals[],
const typename KokkosSparseMatrix::ordinal_type numVals,
const bool forceAtomic = false) const
{
typedef typename KokkosSparseMatrix::ordinal_type LO;
typedef typename KokkosSparseMatrix::value_type SC; // "scalar type"
auto row_view = A.row (rowi);
const LO length = row_view.length;
LO numValid = 0; // number of valid offsets (likely not needed)
for (LO i = 0; i < numVals; ++i) {
const RelOffsetType relOffset = relOffsets[i];
if (offset != length) {
if (forceAtomic) {
Kokkos::atomic_add (&(row_view.value(relOffset)), vals[i]);
}
else {
row_view.value(offset) += vals[i];
}
++numValid;
}
}
return numValid;
}
from kokkos-kernels.
template<class KokkosSparseMatrix,
class RelOffsetType = typename KokkosSparseMatrix::ordinal_type>
KOKKOS_FUNCTION
typename KokkosSparseMatrix::ordinal_type
replaceCrsMatrixRowValsByRelOffsets (const OrdinalType rowi,
const RelOffsetType relOffsets[],
const typename KokkosSparseMatrix::value_type vals[],
const typename KokkosSparseMatrix::ordinal_type numVals,
const bool forceAtomic = false) const
{
typedef typename KokkosSparseMatrix::ordinal_type LO;
typedef typename KokkosSparseMatrix::value_type SC; // "scalar type"
auto row_view = A.row (rowi);
const LO length = row_view.length;
LO numValid = 0; // number of valid offsets (likely not needed)
for (LO i = 0; i < numVals; ++i) {
const RelOffsetType relOffset = relOffsets[i];
if (offset != length) {
if (forceAtomic) {
Kokkos::atomic_assign (&(row_view.value(relOffset)), vals[i]);
}
else {
row_view.value(offset) = vals[i];
}
++numValid;
}
}
return numValid;
}
from kokkos-kernels.
Here's how you might get relative offsets:
typedef LO rel_offset_type;
// how many offsets? number of DOFs in workset?
View<rel_offset_type*> relOffsets (totNumOffsets);
const bool matrixRowsAreSorted = true; // generally they are
size_t totNumValid = 0;
parallel_reduce (A.numRows (), KOKKOS_LAMBDA (const LO& i_lcl, size_t& totNumValid) {
rel_offset_type* const relOffsets_i = &relOffsets(relOffsetsStart[i_lcl]);
const LO* const lclColInds = ...; // input local column indices for which to search
const LO numLclColInds = ...; // # of input local column indices for which to search
LO curNumValid = getCrsMatrixRowOffsets (relOffsets_i, A, i_lcl, lclColInds, numLclColInds, matrixRowsAreSorted);
totNumValid += static_cast<size_t> (curNumValid);
}, totNumValid);
// ... error out if totNumValid != totNumOffsets ...
from kokkos-kernels.
@vbrunini Btw this approach of storing offsets requires that you store an offset per DOF per element. Thus, you need (# elements on my MPI process) * (# DOF per element)^2 storage, which is way more than the sparse matrix (!). This is unlikely to be a good idea if you're already memory limited.
Since you only do sumInto once per DOF, no matter how many expressions there are, it won't pay to compute and store offsets temporarily per workset.
from kokkos-kernels.
This implies that we should focus on improving sumInto performance, by the previously proposed approach of strip-mining input into chunks of (say) 8, sorting the input, then doing a single pass over the row and the input (assuming the row is also sorted). This should reduce the total number of comparisons and reads.
from kokkos-kernels.
My previous comment refers to this issue: trilinos/Trilinos#877
from kokkos-kernels.
Yeah I agree that the amount of storage probably makes this impractical and we should just pursue improving sumInto performance.
from kokkos-kernels.
Related Issues (20)
- oneMKL `optimize_gemv` too slow for one-shot SpMV HOT 2
- Unecessary wait in oneMKL SpMV for 2023.2 and up HOT 1
- Remove references to `KokkosKernels_LINALG_OPT_LEVEL`
- Make sure `KokkosKernels_ENABLED_COMPONENTS` list of valid options is complete
- Necessity of both `KokkosKernels_ENABLED_COMPONENTS` and `KokkosKernels_ENABLE_...`? HOT 6
- axpby introduced deep_copies when alpha,beta are scalars HOT 2
- HIP -O0 -g: spgemm producess incorrect entries
- Trilinos nightly failure, ifpack2: spiluk errors with too few arguments to function call HOT 1
- One-based-ness of coloring is undocumented HOT 1
- Trilinos nightly failure, tpetra: no matching function for call to 'spadd_symbolic' HOT 1
- `KokkosBlas::Impl::MV_Reciprocal_Generic`: `g++-12` internal compiler failure with `-O3 -march=skylake-avx512` HOT 3
- rocSPARSE 3.0.2 for ROCm 6.0 breaking changes HOT 3
- Nightly test failures with cusolver tpl enabled, Cuda.svd_* unit tests HOT 6
- Nightly test failures, Cuda.svd_* and MKL DGEMM HOT 5
- Nightly test failures, builds with gcc/8.3.0 as host compiler: cc1plus: error with KokkosSparse::Impl::Sequential::TrsvWrap<...>::divide HOT 3
- Trilinos nightly failure, Cuda+UVM build, ifpack2/stokhos/sacado interaction in Ifpack2_LocalSparseTriangularSolver_def
- Lapack cuda.gesv_double test failing
- SYCL/PVC: native spmv, spmv_mv fail for complex_double
- Intel/2023.1.0 OpenMP, Serial test failures on SPR HOT 6
- spmv follow on changes: enable/disable deprecated code
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kokkos-kernels.