Giter Site home page Giter Site logo

kokkos / kokkos-kernels Goto Github PK

View Code? Open in Web Editor NEW
278.0 29.0 92.0 27.59 MB

Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels

License: Other

CMake 1.33% C++ 97.41% Python 0.22% C 0.01% Shell 1.01% Groovy 0.03%
kokkos linear-algebra sparse-matrix performance-portability blas

kokkos-kernels's People

Contributors

ajpowelsnl avatar ambrad avatar bartlettroscoe avatar brian-kelley avatar cgcgcg avatar crtrott avatar cwpearson avatar cz4rs avatar dalg24 avatar e10harvey avatar eeprude avatar hcedwar avatar ibaned avatar iyamazaki avatar jczhang07 avatar jennloe avatar jgfouca avatar kliegeois avatar kyungjoo-kim avatar lucbv avatar masterleinad avatar mndevec avatar mperrinel avatar ndellingwood avatar nmhamster avatar seheracer avatar srajama1 avatar tmranse avatar uhetmaniuk avatar vqd8a avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kokkos-kernels's Issues

Build warnings in sparse matrix-matrix multiply when MKL TPL is enabled

.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:111:62: warning: unused typedef 'device3' [-Wunused-local-typedef]
    typedef typename in_nonzero_value_view_type::device_type device3;
                                                             ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:99:36: warning: unused typedef 'idx_array_type' [-Wunused-local-typedef]
    typedef in_row_index_view_type idx_array_type;
                                   ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:110:62: warning: unused typedef 'device2' [-Wunused-local-typedef]
    typedef typename in_nonzero_index_view_type::device_type device2;
                                                             ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:109:58: warning: unused typedef 'device1' [-Wunused-local-typedef]
    typedef typename in_row_index_view_type::device_type device1;
                                                         ^
In file included from .../CHECKIN-CLANG-3.9.0/MPI_DEBUG_REAL/packages/tpetra/core/src/Tpetra_Details_packCrsMatrix_DOUBLE_INT_LONG_LONG_SERIAL.cpp:71:
In file included from .../Trilinos/packages/tpetra/core/src/Tpetra_Details_packCrsMatrix_def.hpp:52:
In file included from .../Trilinos/packages/tpetra/core/src/Tpetra_CrsMatrix_decl.hpp:64:
In file included from .../Trilinos/packages/kokkos-kernels/src/sparse/KokkosSparse.hpp:60:
In file included from .../Trilinos/packages/kokkos-kernels/src/sparse/KokkosSparse_spgemm.hpp:51:
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:97:56: warning: unused typedef 'device1' [-Wunused-local-typedef]
  typedef typename in_row_index_view_type::device_type device1;
                                                       ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:87:44: warning: unused typedef 'size_type' [-Wunused-local-typedef]
  typedef typename KernelHandle::size_type size_type;
                                           ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:88:34: warning: unused typedef 'idx_array_type' [-Wunused-local-typedef]
  typedef in_row_index_view_type idx_array_type;
                                 ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:98:60: warning: unused typedef 'device2' [-Wunused-local-typedef]
  typedef typename in_nonzero_index_view_type::device_type device2;
                                                           ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:236:62: warning: unused typedef 'device2' [-Wunused-local-typedef]
    typedef typename in_nonzero_index_view_type::device_type device2;
                                                             ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:222:46: warning: unused typedef 'size_type' [-Wunused-local-typedef]
    typedef typename KernelHandle::size_type size_type;
                                             ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:235:58: warning: unused typedef 'device1' [-Wunused-local-typedef]
    typedef typename in_row_index_view_type::device_type device1;
                                                         ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:223:36: warning: unused typedef 'idx_array_type' [-Wunused-local-typedef]
    typedef in_row_index_view_type idx_array_type;
                                   ^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:237:62: warning: unused typedef 'device3' [-Wunused-local-typedef]
    typedef typename in_nonzero_value_view_type::device_type device3;

List of required Unit Tests

Function Name Parameters/Scenarios Status
spgemm_numeric spgemm_symbolic Backends:All - Algorithms:kkmem, kkdense, mkl, cusparse - AxA for 4 matrices Done
graph_color_symbolic Backends:All Done
gauss_seidel_symbolic gauss_seidel_numeric symmetric_gauss_seidel_apply forward_sweep_gauss_seidel_apply backward_sweep_gauss_seidel_apply Backends:All Done

Please add/remove fields and tests.
@srajama1 @crtrott @ambrad @mhoemmen @kyungjoo-kim @nmhamster

Get rid of use of TriBITS ETI in CMake

Story: #1

KokkosKernels currently uses TriBITS' ETI (explicit template instantiation) CMake functions and generated macros. We plan to get rid of those altogether, in favor of a new ETI system for KokkosKernels.

BLAS 1 Feature completeness

This is the issue to track progress for BLAS 1 feature completeness.

Function Operation Performed BLAS Name KK impl done TPL hooks Supports MV
abs Y(i) = abs(X(i)) -- X X
axpby Y(i) = a*X(i) + b*Y(i) -- X X
axpy Y(i) += a*X(i) axpy X X X
dot Sum( X(i)*Y(i) ) dot X X X
iamax MaxIndex(abs(X(i))) iamax X
mult Z(i)=a*A(i)*X(i)+c*Z(i) -- X X
nrm1 Sum( abs(X(i)) asum X X X
nrm2 sqrt(Sum(abs(X(i))*abs(X(i)))) nrm2 X X X
nrm2_squared Sum ( abs(X(i))*abs(X(i)) ) -- X X
nrminf Max(abs(X(i))) -- X X
reciprocal Y(i) = 1/X(i) -- X X
rot rot X X
rotm rotm X X
rotg rotg X X
rotmg rotmg X X
scal Y(i) = a*X(i) scal* X X X
sum Sum( X(i) ) -- X X
update Z(i)=a*X(i)+b*Y(i)+c*Z(i) -- X X

Write new ETI system compatible with Makefile-based build system

Story: #1

ETI stands for "explicit template instantiation." We weren't and aren't strictly using ETI. Instead, we "prebuild" kernels for a small set of template parameter combinations, determined at configure time. We also (will) have an option to disable use of template parameter combinations outside that set. This will help both developers and users reduce build times and library sizes, by determining whether they are unexpectedly using combinations outside the set of prebuilt combinations.

We used to use macros to generate prebuilt combinations. We're switching to the C++11 extern template approach, which obviates the need for definition macros that duplicate code in the templated definitions.

The default solution needs to respect
trilinos/Trilinos#362
and thus, it needs to respect the following CMake variables:

  • Trilinos_ENABLE_FLOAT
  • Trilinos_ENABLE_COMPLEX_DOUBLE
  • Trilinos_ENABLE_COMPLEX_FLOAT
  • (in theory, also, Trilinos_ENABLE_COMPLEX, though no Trilinos package currently uses that)

New macro names:

  • KOKKOSKERNELS_INST_SCALAR_${SCALAR}
  • KOKKOSKERNELS_INST_LAYOUT_${LAYOUT}
  • KOKKOSKERNELS_INST_EXECSPACE_${EXECSPACE}
  • KOKKOSKERNELS_INST_MEMSPACE_${MEMSPACE}

The variables in the above macro names are upper-case and mangled-for-macro-use versions of the original type names. Here is a CMake rule for converting a type name into a name suitable for use either as a typedef used in a macro argument (macros don't like spaces, commas, etc.), or as part of a macro name (if made upper case).

FUNCTION(TPETRA_MANGLE_TEMPLATE_PARAMETER TYPE_MANGLED_OUT TYPE_IN)
  STRING(REPLACE "<" "0" TMP0 "${TYPE_IN}")
  STRING(REPLACE ">" "0" TMP1 "${TMP0}")
  STRING(REPLACE "::" "_" TMP2 "${TMP1}")
  # Spaces (as in "long long") get squished out.
  STRING(REPLACE " " "" TMP3 "${TMP2}")
  SET(${TYPE_MANGLED_OUT} ${TMP3} PARENT_SCOPE)
ENDFUNCTION(TPETRA_MANGLE_TEMPLATE_PARAMETER)

Summary of that rule:

  • < turns into 0
  • > turns into 0
  • :: turns into _
  • (space) turns into `` (empty string)

Here is a more Tpetra-specific CMake rule:

# Function that turns a valid Scalar, LocalOrdinal, or GlobalOrdinal
# template parameter into a macro name (all caps, with no white space
# and no punctuation other than underscore).
#
# NAME_OUT [out] The mangled type name.
#
# NAME_IN [in] The type to mangle.
FUNCTION(TPETRA_SLG_MACRO_NAME NAME_OUT NAME_IN)
  STRING(COMPARE EQUAL "${NAME_IN}" "__float128" IS_FLOAT128)
  IF(IS_FLOAT128)
    # __float128 is a special case; we remove the __ from the macro name.
    SET(${NAME_OUT} "FLOAT128" PARENT_SCOPE)
  ELSE()
    STRING(COMPARE EQUAL "${NAME_IN}" "std::complex<float>" IS_COMPLEX_FLOAT)
    IF(IS_COMPLEX_FLOAT)
      SET(${NAME_OUT} "COMPLEX_FLOAT" PARENT_SCOPE)
    ELSE()
      STRING(COMPARE EQUAL "${NAME_IN}" "std::complex<double>" IS_COMPLEX_DOUBLE)
      IF(IS_COMPLEX_DOUBLE)
        SET(${NAME_OUT} "COMPLEX_DOUBLE" PARENT_SCOPE)
      ELSE()
        # Make upper-case version of ${NAME_IN}.
        STRING(TOUPPER "${NAME_IN}" TMP0)
        # Use the generic algorithm for mangling the type name.
        TPETRA_MANGLE_TEMPLATE_PARAMETER(TMP1 "${TMP0}")
        SET(${NAME_OUT} ${TMP1} PARENT_SCOPE)
      ENDIF()
    ENDIF()
  ENDIF()
ENDFUNCTION(TPETRA_SLG_MACRO_NAME)

KokkosKernels Trilinos Integration

This issue is for the kokkoskernels integration into trilinos.

After commits :
77a2fde and f9b1559
cmake files should be ready to configure and compile "only" kokkoskernels in trilinos.

Next steps are to adapt the changes in the namespaces and filenames to rest of trilinos that uses kokkoskernels directly. To my knowledge, we have ifpack2 and tpetra that use kokkoskernels directly. Please extend this list if there are more packages.

For changes to be done in Trilinos, I created the branch:
https://github.com/mndevec/Trilinos/tree/kk_integration

Jenkins Testing

The following are the environment variables set and arguments used for test_all_sandia to run the jenkins jobs:

Apollo:

  • clang/4.0.0 --build-list=Pthreads

Bowman:

export OMP_NUM_THREADS=256
export OMP_PROC_BIND=close
export OMP_PLACES=threads
  • intel/17.0.098 --build-list=OpenMP
  • intel/17.0.098 --build-list=Serial

White:

export OMP_NUM_THREADS=64
export OMP_PROC_BIND=close
export OMP_PLACES=threads
  • cuda/8.0.44 --build-list=OpenMP_Cuda
  • gcc/5.4.0 --build-list=OpenMP

KokkosBlas::dot: Add unit test

KokkosKernels doesn't have a unit test for KokkosBlas::dot. We have been relying on Tpetra testing this and other KokkosBlas functionality, but since KokkosKernels is now stand-alone, it must have its own unit tests now. I'm working on this, because I'll need it for #13.

CrsMatrix: Add function to return relative offsets for a given list of column indices

@vbrunini @crtrott

Here is a first-pass implementation of such a function.

template<class KokkosSparseMatrix, 
  class RelOffsetType = typename KokkosSparseMatrix::ordinal_type>
KOKKOS_FUNCTION
typename KokkosSparseMatrix::ordinal_type
getCrsMatrixRowOffsets (RelOffsetType relOffsets[],
  const KokkosSparseMatrix& A,
  const typename KokkosSparseMatrix::ordinal_type lclRowInd,
  const typename KokkosSparseMatrix::ordinal_type lclColInds[],
  const typename KokkosSparseMatrix::ordinal_type numLclColInds,
  const bool rowIsSorted = false,
  const bool /* inputIsSorted */ = false)
{
  typedef typename KokkosSparseMatrix::ordinal_type LO;

  auto A_rowView = A.row (lclRowInd);
  const LO numEntInRow = A_rowView.length;
  const LO* const rowLclColInds = numEntInRow == 0 ? NULL : &(A_rowView.colidx(0));

  LO hint = 0; // Guess for offset of current column index in row
  LO numValid = 0; // number of valid local column indices

  for (LO i = 0; i < numLclColInds; ++i) {
    const LO relOffset = 
      KokkosSparse::findRelOffset (rowLclColInds, numEntInRow, lclColInds[i], hint, rowIsSorted);
    relOffsets[i] = static_cast<RelOffsetType> (relOffset);
    // If relOffset == numEntInRow, then the column index was not found in the row.
    // Compare to iterators returning end() in the C++ Standard Library.
    if (relOffset != numEntInRow) {
      hint = offset + 1; // optimize for the case where input == row
      ++numValid;
    }
  }
  return numValid;
}

You'll also need a function that does sumIntoValues / replaceValues, using the offsets rather than doing row search. I'll write that in the next comment. Note that Tpetra::BlockCrsMatrix already has functions like this, that use existing offsets to do sumInto / replace.

Some KokkosBlas kernels assume default execution space

The implementation of KokkosBlas::dot incorrectly assumes the default execution space, by using expressions like Kokkos::parallel_reduce(numRows, op);, instead of an explicit Kokkos::RangePolicy<execution_space, ...>. Check other KokkosBlas kernels as well.

Graph coloring build warnings

Reported by Stefan Domino.

nalu/src/LinearSolver.C
In file included from nalu/src/LinearSolver.C:9:
In file included from nalu/include/LinearSolver.h:25:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Factory.hpp:1:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Factory_decl.hpp:48:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Details_Factory.hpp:2:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Details_Factory_def.hpp:46:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Details_OneLevelFactory.hpp:2:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Details_OneLevelFactory_def.hpp:51:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Relaxation.hpp:2:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Relaxation_def.hpp:54:
In file included from TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GaussSeidel.hpp:47:
In file included from TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GaussSeidel_impl.hpp:44:
In file included from TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GraphColor.hpp:47:
TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GraphColor_impl.hpp:1296:27: warning: equality comparison with extraneous parentheses [-Wparentheses-equality]
      while ((forbidden[c]==i)) c++;
              ~~~~~~~~~~~~^~~
TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GraphColor_impl.hpp:1020:13: note: in instantiation of member function 'KokkosKernels::Experimental::Graph::Impl::GraphColor_VB<<f>KokkosKernels::Experimental::Graph::GraphColoringHandle<<f>Kokkos::View<<f>const unsigned long *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<<f>int *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<<f>0> >, Kokkos::View<<f>int *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::Serial, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<<f>const unsigned long *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<<f>0> >, Kokkos::View<<f>const int *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<<f>0> > >::resolveConflicts' requested here
      this->resolveConflicts(
            ^

Unsymmetric Graph Coloring

Graph coloring problem is normally defined on the structurally symmetric graphs. Current kokkos-kernels implementation assumes the graph is symmetric, if it is not a preprocessing is required to symmetrize the graph. This symmetrization step can be significantly expensive.

Instead the plan is to implement a distance-1 graph coloring that will also work on unsymmetric graphs.
The development cant be tracked in the branch:
https://github.com/mndevec/kokkos-kernels/tree/develop_unsymmetric_coloring

Pthread ETI error

I get below error when I enable Kokkos::Pthread.

In file included from /home/mndevec/work/trilinoses/Trilinos/packages/kokkos-kernels/src/impl/Kokkos_Blas1_MV_impl_abs.hpp(339),
from /home/mndevec/work/trilinoses/Trilinos/packages/kokkos-kernels/src/impl/generated_specializations_cpp/abs/KokkosBlas1_impl_MV_abs_inst_specialization_Kokkos_complex_double__LayoutLeft_Cuda_CudaSpace.cpp(45):
/home/mndevec/work/trilinoses/Trilinos/packages/kokkos-kernels/src/impl/generated_specializations_hpp/KokkosBlas1_impl_MV_abs_decl_specializations.hpp(67): error: namespace "Kokkos" has no member "Pthread"
KOKKOSBLAS1_IMPL_MV_ABS_DECL(double, Kokkos::LayoutLeft, Kokkos::Pthread, Kokkos::HostSpace)
^
....

Kokkos::ArithTraits<double>::nan() is very slow

I just got a report from Albany users (@lxmota and @calleman21). They switched from Teuchos::ScalarTraits<double>::nan() to Kokkos::ArithTraits<double>::nan() which slowed down their entire application by over 3X because all variables are initialized to NaN using this function.

KokkosKernels uses strtod() to implement this function (on the host), while Teuchos returns a global variable which is initialized to 0.0/0.0. @lxmota also recommended that we simply call std::numeric_traits<double>::quiet_NaN().

All the above also applies to float.

I think we should switch to either what Teuchos does or the quiet_NaN() from the standard library.

@crtrott @mhoemmen any thoughts?

I'll pick one of these and submit a PR soon.

KokkosKernels Trilinos Integration Threads Compilation Failure

@crtrott @srajama1
I am having some issues when I enable Threads in KokkosKernels.

With standalone make:

  • if I enable serial and pthreads, it works fine.
  • if I enable openmp,serial, and pthreads, the unit tests fail, because of kokkos::initialize initializes openmp, and then the initialization of pthreads fails. I don't know if that is a case we should support, but I do not know how to fix this issue.

Within Trilinos cmake,

  • If I enable pthreads, the compilation fails because of undefined reference errors.

KokkosKernels Unit Test Teuchos Dependency

I removed most of the Teuchos dependencies that are related to UnitTestHarness, and replaced them with gtest.

However, MV unit tests depend on Teuchos MPI. Are they supposed to move to Tpetra? They are the last bits before removing the Teuchos dependency.

Is this warning ok ?

/ascldap/users/kyukim/Work/lib/kokkoskernels/master/src/Kokkos_ArithTraits.hpp(183): warning: pointless comparison of unsigned integer with zero                                                                                        
          detected during instantiation of "IntType <unnamed>::intPowSigned(IntType, IntType) [with IntType=char]"  
(1528): here                                                                                                        

/ascldap/users/kyukim/Work/lib/kokkoskernels/master/src/Kokkos_ArithTraits.hpp(187): warning: pointless comparison of unsigned integer with a negative constant                                                                         
          detected during instantiation of "IntType <unnamed>::intPowSigned(IntType, IntType) [with IntType=char]"  
(1528): here                                                                                                        

ETI System and file structure

Ok I think I am finally close to make this ETI stuff work properly. There is some funky compiler stuff with regards to using extern template instantiations for classes, in particular if you want to allow instantiations of other types but I believe my solution is now fool proof ......

Furthermore I believe the file structure and naming etc needs some cleanup. In particular this focus on MultiVector which historically comes from Tpetra is confusing for standalone users.

Lets start with some requirements what we need to be able to do::

  • pre-compile functions, and prevent them from being implicitly instantiated (ETI)
  • Even with ETI on, allow other input types (say for example extended precision, or nonstandard data layouts)
  • call TPLs (MKL, CUBLAS etc.) for input types which allow it
  • disallow anything other than ETI types if requested
  • check what type of instantiation gets hit in apps (ETI, Non-ETI, TPL)

In order to do all this we came up with a design which has 3 functionality layers (I will go into details later):

  1. User Interface: void foo(ViewType a, Scalar alpha): takes views accepts all kinds of combinations; calls the specialization layer
  2. Specialization Layer: struct Foo { static void foo(ViewInternalType a, Scalar alpha); }; makes sure that only the minimally necessary number of instantiations exists, serves as ETI specialization layer, serves as TPL specialization layer
  3. Implementation Layer: This is called by the specialization layer, and has the actual functors etc.

Now I want to go through a couple of design aspects in the next posts.

Fix unused variable warnings in spmv_impl_omp, spmv Test and graph color perf_test

/ascldap/users/crtrott/Kokkos/kokkos-kernels/src/sparse/impl/KokkosSparse_spmv_impl_omp.hpp:54:22: warning: unused variable 'rowCount' [-Wunused-variable]
/ascldap/users/crtrott/Kokkos/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:54:9: warning: unused variable 'nc' [-Wunused-variable]
/ascldap/users/crtrott/Kokkos/kokkos-kernels/perf_test/graph/KokkosGraph_color.cpp:178:15: warning: unused variable 'm' [-Wunused-variable]

These warnings are discovered by XL

spgemm: optimize for zero entries

I've been running some experiments in MueLu, and noticed that matrix-matrix multiplication time does not change when we use a filtered matrix with OpenMp node.

My assumption is that the currently implemented spgemm in kokkos-kernels does not have any shortcuts or optimization when the left matrix may have multiple zero elements. In the serial Tpetra version, the code checks for zeros in A and then skips fetching corresponding rows of B, thus significantly improving performance.

My question is: would something like that be possible in the threaded spgemm? This could significantly help with some applications that use multigrid, particularly Nalu when used with high geometric anisotropy.

Move Tpetra::Details::OrdinalTraits to KokkosKernels

Story: #1

KokkosSparse::Impl::getDiagCopyWithOffsets depends on it, but it lives in Tpetra. It's only a coincidence that the function builds correctly (probably because Tpetra is currently the only consumer of this file, and Tpetra must include the header file with OrdinalTraits before including the header file defining that function).

Timeline for team level dense linear algebra

@kyungjoo-kim, is it possible to get a timeline on Kokkos-Kernels team level dense linear algebra?

It's not critical at the moment, but it would be useful to have an estimate on when that capability will be available. We can discuss offline further if need be.

KokkosBlas: Add GEMV that wraps BLAS (and cuBLAS)

Blocks: trilinos/Trilinos#1169

KokkosBlas::gemv currently exists, but it does not currently call the BLAS library or cuBLAS where appropriate.

There is a subtlety in whether Tpetra uses this GEMV as a matrix-vector product, or as the dot product of each column of a MultiVector with a single Vector. The difference is that matrix-vector products do all computations with Scalar values (actually Tpetra::MultiVector::impl_scalar_type), while the intermediate sums in a dot product have type Tpetra::MultiVector::dot_type. The two types are usually the same, except for Scalar types that come from Stokhos. We have to decide whether we want GEMV to support both cases, or just one of them.

In practice, BLAS implementations only support types for which impl_scalar_type == dot_type. Thus, this is really about the interface that we present to users.

Split Tests spgemm and gauss_seidel into multipl object files.

XL 14 output during build. Yes we are looking at over 45mins compile time with 7 and 3 GB footprint respectively:

 27473 crtrott   20   0 7897280 7.436g  58560 R  84.0  1.5  44:58.57 /home/projects/pwr8-rhel73-lsf/ibm/xl/xlC/14.1.0/exe/ipa -comp -qalias=ansi -qthreaded -qtls -qtls -maltivec -qtls -qlanglvl=extended0x -qarch=pwr8 -qtune=pwr8 -qsmp=omp /tmp/xlcW0wbmEmP /tmp/xlcW13OIGeu Test_OpenMP_Sparse_spgemm.o /tmp/xlcLj2CORsT.lst /tmp/xlcW25m6I68                                                                                               
 27312 crtrott   20   0 3903744 3.159g  58624 R  80.0  0.6  45:19.68 /home/projects/pwr8-rhel73-lsf/ibm/xl/xlC/14.1.0/exe/ipa -comp -qalias=ansi -qthreaded -qtls -qtls -maltivec -qtls -qlanglvl=extended0x -qarch=pwr8 -qtune=pwr8 -qsmp=omp /tmp/xlcW02Rd4eQ /tmp/xlcW1XvNf7u Test_OpenMP_Sparse_gauss_seidel.o /tmp/xlcLOE0GKtT.lst /tmp/xlcW26HmrZ9  

header files name changes impact on MueLu

I noticed that the latest snapshot of kokkos in Trilinos included a large renaming of files which triggered some build errors in MueLu.
I talked with @mndevec and it seems that the integration test done before the snapshot push did not catch MueLu's dependencies on kokkos.
To have MueLu use Kokkos you need to add the following flags in your configure script:

-D MueLu_ENABLE_Experimental=ON \
-D MueLu_ENABLE_Kokkos_Refactor=ON \
-D Xpetra_ENABLE_Experimental=ON \
-D Xpetra_ENABLE_Kokkos_Refactor=ON \

as far as I can tell we have only issues with the renaming of Kokkos_CrsMatrix.hpp into KokkosSparse_CrsMatrix.hpp
@jhux2 @tawiesn @csiefer2 do you have any comments/additions to make

Trilinos/KokkosKernels reports no ETI in almost any circumstance

I looked through a few of the dashboard tests and they all report KokkosKernels as not ETI-ing anything.

e.g.:


Processing ETI support: KokkosKernels
-- KokkosKernels: Processing ETI / test support
-- Enabled Scalar types:       
-- Enabled LocalOrdinal types: 
-- Enabled Device types:       
-- Set of enabled types, before exclusions: 

Is this to be expected?

UnitTest Compile and Runtime

Are the long compile and runtime for SPGEMM and GaussSeidel really necessary?

For OpenMP on my workstation SPGEMM takes 200s test time out of 340s for the whole library. If I add Gaussseidel in it is 296s out of 340s. Do we really need that for correctness checking?

Furthermore the compile times are also pretty high SPGEMM takes in a non-parallel build (i.e. -j 1) 78s and gaussseidel 54s out of a total of 296s for the all unit tests together.

BLAS/LAPACK calls people are interested in

This is just to collect stuff. I will update the first post if more comes in. This is not a promise off what is gonna be there when, its just to help us planning. I differentiate global, team, and thread kernels.

BLAS

Global:

  • AXPY
  • SCAL
  • GEMM
  • DOT

Team:

  • GEMM
  • GEMV
  • AXPY
  • DOT

Thread

  • GEMM
  • GEMV
  • AXPY
  • DOT

LAPACK

Global:

  • SYEV
  • HEEVR

Team:

  • GETRF
  • GETRS
  • GETF2
  • GETRI

Thread:

  • GETRF
  • GETRS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.