Giter Site home page Giter Site logo

Comments (7)

mhoemmen avatar mhoemmen commented on May 25, 2024

Here is an example of a sumInto function that uses the above relative offsets.

  template<class KokkosSparseMatrix, 
    class RelOffsetType = typename KokkosSparseMatrix::ordinal_type>
  KOKKOS_FUNCTION
  typename KokkosSparseMatrix::ordinal_type
  sumIntoCrsMatrixRowValsByRelOffsets (const OrdinalType rowi,
    const RelOffsetType relOffsets[],
    const typename KokkosSparseMatrix::value_type vals[],
    const typename KokkosSparseMatrix::ordinal_type numVals,
    const bool forceAtomic = false) const
  {
    typedef typename KokkosSparseMatrix::ordinal_type LO;
    typedef typename KokkosSparseMatrix::value_type SC; // "scalar type"

    auto row_view = A.row (rowi);
    const LO length = row_view.length;
    LO numValid = 0; // number of valid offsets (likely not needed)

    for (LO i = 0; i < numVals; ++i) {
      const RelOffsetType relOffset = relOffsets[i];
      if (offset != length) {
        if (forceAtomic) {
          Kokkos::atomic_add (&(row_view.value(relOffset)), vals[i]);
        }
        else {
          row_view.value(offset) += vals[i];
        }
        ++numValid;
      }
    }
    return numValid;
  }

from kokkos-kernels.

mhoemmen avatar mhoemmen commented on May 25, 2024
  template<class KokkosSparseMatrix, 
    class RelOffsetType = typename KokkosSparseMatrix::ordinal_type>
  KOKKOS_FUNCTION
  typename KokkosSparseMatrix::ordinal_type
  replaceCrsMatrixRowValsByRelOffsets (const OrdinalType rowi,
    const RelOffsetType relOffsets[],
    const typename KokkosSparseMatrix::value_type vals[],
    const typename KokkosSparseMatrix::ordinal_type numVals,
    const bool forceAtomic = false) const
  {
    typedef typename KokkosSparseMatrix::ordinal_type LO;
    typedef typename KokkosSparseMatrix::value_type SC; // "scalar type"

    auto row_view = A.row (rowi);
    const LO length = row_view.length;
    LO numValid = 0; // number of valid offsets (likely not needed)

    for (LO i = 0; i < numVals; ++i) {
      const RelOffsetType relOffset = relOffsets[i];
      if (offset != length) {
        if (forceAtomic) {
          Kokkos::atomic_assign (&(row_view.value(relOffset)), vals[i]);
        }
        else {
          row_view.value(offset) = vals[i];
        }
        ++numValid;
      }
    }
    return numValid;
  }

from kokkos-kernels.

mhoemmen avatar mhoemmen commented on May 25, 2024

Here's how you might get relative offsets:

typedef LO rel_offset_type;
// how many offsets? number of DOFs in workset?
View<rel_offset_type*> relOffsets (totNumOffsets); 

const bool matrixRowsAreSorted = true; // generally they are
size_t totNumValid = 0;
parallel_reduce (A.numRows (), KOKKOS_LAMBDA (const LO& i_lcl, size_t& totNumValid) {
    rel_offset_type* const relOffsets_i = &relOffsets(relOffsetsStart[i_lcl]);
    const LO* const lclColInds = ...; // input local column indices for which to search
    const LO numLclColInds = ...; // # of input local column indices for which to search 
    LO curNumValid = getCrsMatrixRowOffsets (relOffsets_i, A, i_lcl, lclColInds, numLclColInds, matrixRowsAreSorted);
    totNumValid += static_cast<size_t> (curNumValid);
  }, totNumValid);

// ... error out if totNumValid != totNumOffsets ...

from kokkos-kernels.

mhoemmen avatar mhoemmen commented on May 25, 2024

@vbrunini Btw this approach of storing offsets requires that you store an offset per DOF per element. Thus, you need (# elements on my MPI process) * (# DOF per element)^2 storage, which is way more than the sparse matrix (!). This is unlikely to be a good idea if you're already memory limited.

Since you only do sumInto once per DOF, no matter how many expressions there are, it won't pay to compute and store offsets temporarily per workset.

from kokkos-kernels.

mhoemmen avatar mhoemmen commented on May 25, 2024

This implies that we should focus on improving sumInto performance, by the previously proposed approach of strip-mining input into chunks of (say) 8, sorting the input, then doing a single pass over the row and the input (assuming the row is also sorted). This should reduce the total number of comparisons and reads.

from kokkos-kernels.

mhoemmen avatar mhoemmen commented on May 25, 2024

My previous comment refers to this issue: trilinos/Trilinos#877

from kokkos-kernels.

vbrunini avatar vbrunini commented on May 25, 2024

Yeah I agree that the amount of storage probably makes this impractical and we should just pursue improving sumInto performance.

from kokkos-kernels.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.