kokkos / kokkos-kernels Goto Github PK
View Code? Open in Web Editor NEWKokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
License: Other
Kokkos C++ Performance Portability Programming Ecosystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels
License: Other
On Power char is unsigned, while otherwise it is mostly signed. This causes issues in the abs function giving a warning on comparing x>=0. I've got a fix in the works.
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:111:62: warning: unused typedef 'device3' [-Wunused-local-typedef]
typedef typename in_nonzero_value_view_type::device_type device3;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:99:36: warning: unused typedef 'idx_array_type' [-Wunused-local-typedef]
typedef in_row_index_view_type idx_array_type;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:110:62: warning: unused typedef 'device2' [-Wunused-local-typedef]
typedef typename in_nonzero_index_view_type::device_type device2;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl_impl.hpp:109:58: warning: unused typedef 'device1' [-Wunused-local-typedef]
typedef typename in_row_index_view_type::device_type device1;
^
In file included from .../CHECKIN-CLANG-3.9.0/MPI_DEBUG_REAL/packages/tpetra/core/src/Tpetra_Details_packCrsMatrix_DOUBLE_INT_LONG_LONG_SERIAL.cpp:71:
In file included from .../Trilinos/packages/tpetra/core/src/Tpetra_Details_packCrsMatrix_def.hpp:52:
In file included from .../Trilinos/packages/tpetra/core/src/Tpetra_CrsMatrix_decl.hpp:64:
In file included from .../Trilinos/packages/kokkos-kernels/src/sparse/KokkosSparse.hpp:60:
In file included from .../Trilinos/packages/kokkos-kernels/src/sparse/KokkosSparse_spgemm.hpp:51:
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:97:56: warning: unused typedef 'device1' [-Wunused-local-typedef]
typedef typename in_row_index_view_type::device_type device1;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:87:44: warning: unused typedef 'size_type' [-Wunused-local-typedef]
typedef typename KernelHandle::size_type size_type;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:88:34: warning: unused typedef 'idx_array_type' [-Wunused-local-typedef]
typedef in_row_index_view_type idx_array_type;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:98:60: warning: unused typedef 'device2' [-Wunused-local-typedef]
typedef typename in_nonzero_index_view_type::device_type device2;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:236:62: warning: unused typedef 'device2' [-Wunused-local-typedef]
typedef typename in_nonzero_index_view_type::device_type device2;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:222:46: warning: unused typedef 'size_type' [-Wunused-local-typedef]
typedef typename KernelHandle::size_type size_type;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:235:58: warning: unused typedef 'device1' [-Wunused-local-typedef]
typedef typename in_row_index_view_type::device_type device1;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:223:36: warning: unused typedef 'idx_array_type' [-Wunused-local-typedef]
typedef in_row_index_view_type idx_array_type;
^
.../Trilinos/packages/kokkos-kernels/src/sparse/impl/KokkosSparse_spgemm_mkl2phase_impl.hpp:237:62: warning: unused typedef 'device3' [-Wunused-local-typedef]
typedef typename in_nonzero_value_view_type::device_type device3;
Add a run-time branch in KokkosBlas::dot for 2-D Views where exactly one of the Views has a single column. We need this for trilinos/Trilinos#1013 .
Function Name | Parameters/Scenarios | Status |
---|---|---|
spgemm_numeric spgemm_symbolic |
Backends:All - Algorithms:kkmem, kkdense, mkl, cusparse - AxA for 4 matrices | Done |
graph_color_symbolic |
Backends:All | Done |
gauss_seidel_symbolic gauss_seidel_numeric symmetric_gauss_seidel_apply forward_sweep_gauss_seidel_apply backward_sweep_gauss_seidel_apply |
Backends:All | Done |
Please add/remove fields and tests.
@srajama1 @crtrott @ambrad @mhoemmen @kyungjoo-kim @nmhamster
Story: #1
KokkosKernels currently uses TriBITS' ETI (explicit template instantiation) CMake functions and generated macros. We plan to get rid of those altogether, in favor of a new ETI system for KokkosKernels.
This is broken right now, due to a mismatch of Macro names.
This is the issue to track progress for BLAS 1 feature completeness.
Function | Operation Performed | BLAS Name | KK impl done | TPL hooks | Supports MV |
---|---|---|---|---|---|
abs |
Y(i) = abs(X(i)) |
-- | X | X | |
axpby |
Y(i) = a*X(i) + b*Y(i) |
-- | X | X | |
axpy |
Y(i) += a*X(i) |
axpy |
X | X | X |
dot |
Sum( X(i)*Y(i) ) |
dot |
X | X | X |
iamax |
MaxIndex(abs(X(i))) |
iamax |
X | ||
mult |
Z(i)=a*A(i)*X(i)+c*Z(i) |
-- | X | X | |
nrm1 |
Sum( abs(X(i)) |
asum |
X | X | X |
nrm2 |
sqrt(Sum(abs(X(i))*abs(X(i)))) |
nrm2 |
X | X | X |
nrm2_squared |
Sum ( abs(X(i))*abs(X(i)) ) |
-- | X | X | |
nrminf |
Max(abs(X(i))) |
-- | X | X | |
reciprocal |
Y(i) = 1/X(i) |
-- | X | X | |
rot |
rot |
X | X | ||
rotm |
rotm |
X | X | ||
rotg |
rotg |
X | X | ||
rotmg |
rotmg |
X | X | ||
scal |
Y(i) = a*X(i) |
scal* |
X | X | X |
sum |
Sum( X(i) ) |
-- | X | X | |
update |
Z(i)=a*X(i)+b*Y(i)+c*Z(i) |
-- | X | X |
Story: #1
ETI stands for "explicit template instantiation." We weren't and aren't strictly using ETI. Instead, we "prebuild" kernels for a small set of template parameter combinations, determined at configure time. We also (will) have an option to disable use of template parameter combinations outside that set. This will help both developers and users reduce build times and library sizes, by determining whether they are unexpectedly using combinations outside the set of prebuilt combinations.
We used to use macros to generate prebuilt combinations. We're switching to the C++11 extern template
approach, which obviates the need for definition macros that duplicate code in the templated definitions.
The default solution needs to respect
trilinos/Trilinos#362
and thus, it needs to respect the following CMake variables:
Trilinos_ENABLE_FLOAT
Trilinos_ENABLE_COMPLEX_DOUBLE
Trilinos_ENABLE_COMPLEX_FLOAT
Trilinos_ENABLE_COMPLEX
, though no Trilinos package currently uses that)New macro names:
The variables in the above macro names are upper-case and mangled-for-macro-use versions of the original type names. Here is a CMake rule for converting a type name into a name suitable for use either as a typedef used in a macro argument (macros don't like spaces, commas, etc.), or as part of a macro name (if made upper case).
FUNCTION(TPETRA_MANGLE_TEMPLATE_PARAMETER TYPE_MANGLED_OUT TYPE_IN)
STRING(REPLACE "<" "0" TMP0 "${TYPE_IN}")
STRING(REPLACE ">" "0" TMP1 "${TMP0}")
STRING(REPLACE "::" "_" TMP2 "${TMP1}")
# Spaces (as in "long long") get squished out.
STRING(REPLACE " " "" TMP3 "${TMP2}")
SET(${TYPE_MANGLED_OUT} ${TMP3} PARENT_SCOPE)
ENDFUNCTION(TPETRA_MANGLE_TEMPLATE_PARAMETER)
Summary of that rule:
<
turns into 0
>
turns into 0
::
turns into _
(space) turns into `` (empty string)Here is a more Tpetra-specific CMake rule:
# Function that turns a valid Scalar, LocalOrdinal, or GlobalOrdinal
# template parameter into a macro name (all caps, with no white space
# and no punctuation other than underscore).
#
# NAME_OUT [out] The mangled type name.
#
# NAME_IN [in] The type to mangle.
FUNCTION(TPETRA_SLG_MACRO_NAME NAME_OUT NAME_IN)
STRING(COMPARE EQUAL "${NAME_IN}" "__float128" IS_FLOAT128)
IF(IS_FLOAT128)
# __float128 is a special case; we remove the __ from the macro name.
SET(${NAME_OUT} "FLOAT128" PARENT_SCOPE)
ELSE()
STRING(COMPARE EQUAL "${NAME_IN}" "std::complex<float>" IS_COMPLEX_FLOAT)
IF(IS_COMPLEX_FLOAT)
SET(${NAME_OUT} "COMPLEX_FLOAT" PARENT_SCOPE)
ELSE()
STRING(COMPARE EQUAL "${NAME_IN}" "std::complex<double>" IS_COMPLEX_DOUBLE)
IF(IS_COMPLEX_DOUBLE)
SET(${NAME_OUT} "COMPLEX_DOUBLE" PARENT_SCOPE)
ELSE()
# Make upper-case version of ${NAME_IN}.
STRING(TOUPPER "${NAME_IN}" TMP0)
# Use the generic algorithm for mangling the type name.
TPETRA_MANGLE_TEMPLATE_PARAMETER(TMP1 "${TMP0}")
SET(${NAME_OUT} ${TMP1} PARENT_SCOPE)
ENDIF()
ENDIF()
ENDIF()
ENDFUNCTION(TPETRA_SLG_MACRO_NAME)
Rename files and NameSpace to KokkosKernels
The files start with Kokkos now. Namespace is Kokkos as well.
According to @crtrott , KokkosKernels is not allowed to assume UVM.
This issue is for the kokkoskernels integration into trilinos.
After commits :
77a2fde and f9b1559
cmake files should be ready to configure and compile "only" kokkoskernels in trilinos.
Next steps are to adapt the changes in the namespaces and filenames to rest of trilinos that uses kokkoskernels directly. To my knowledge, we have ifpack2 and tpetra that use kokkoskernels directly. Please extend this list if there are more packages.
For changes to be done in Trilinos, I created the branch:
https://github.com/mndevec/Trilinos/tree/kk_integration
Or any TPL. Alternately, get rid of KokkosSparse::trsv. The current implementation is sequential and implicitly assumes UVM.
The following are the environment variables set and arguments used for test_all_sandia to run the jenkins jobs:
Apollo:
Bowman:
export OMP_NUM_THREADS=256
export OMP_PROC_BIND=close
export OMP_PLACES=threads
White:
export OMP_NUM_THREADS=64
export OMP_PROC_BIND=close
export OMP_PLACES=threads
KokkosKernels doesn't have a unit test for KokkosBlas::dot. We have been relying on Tpetra testing this and other KokkosBlas functionality, but since KokkosKernels is now stand-alone, it must have its own unit tests now. I'm working on this, because I'll need it for #13.
Here is a first-pass implementation of such a function.
template<class KokkosSparseMatrix,
class RelOffsetType = typename KokkosSparseMatrix::ordinal_type>
KOKKOS_FUNCTION
typename KokkosSparseMatrix::ordinal_type
getCrsMatrixRowOffsets (RelOffsetType relOffsets[],
const KokkosSparseMatrix& A,
const typename KokkosSparseMatrix::ordinal_type lclRowInd,
const typename KokkosSparseMatrix::ordinal_type lclColInds[],
const typename KokkosSparseMatrix::ordinal_type numLclColInds,
const bool rowIsSorted = false,
const bool /* inputIsSorted */ = false)
{
typedef typename KokkosSparseMatrix::ordinal_type LO;
auto A_rowView = A.row (lclRowInd);
const LO numEntInRow = A_rowView.length;
const LO* const rowLclColInds = numEntInRow == 0 ? NULL : &(A_rowView.colidx(0));
LO hint = 0; // Guess for offset of current column index in row
LO numValid = 0; // number of valid local column indices
for (LO i = 0; i < numLclColInds; ++i) {
const LO relOffset =
KokkosSparse::findRelOffset (rowLclColInds, numEntInRow, lclColInds[i], hint, rowIsSorted);
relOffsets[i] = static_cast<RelOffsetType> (relOffset);
// If relOffset == numEntInRow, then the column index was not found in the row.
// Compare to iterators returning end() in the C++ Standard Library.
if (relOffset != numEntInRow) {
hint = offset + 1; // optimize for the case where input == row
++numValid;
}
}
return numValid;
}
You'll also need a function that does sumIntoValues / replaceValues, using the offsets rather than doing row search. I'll write that in the next comment. Note that Tpetra::BlockCrsMatrix already has functions like this, that use existing offsets to do sumInto / replace.
The implementation of KokkosBlas::dot incorrectly assumes the default execution space, by using expressions like Kokkos::parallel_reduce(numRows, op);
, instead of an explicit Kokkos::RangePolicy<execution_space, ...>
. Check other KokkosBlas kernels as well.
Need to change all the names etc.
Reported by Stefan Domino.
nalu/src/LinearSolver.C
In file included from nalu/src/LinearSolver.C:9:
In file included from nalu/include/LinearSolver.h:25:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Factory.hpp:1:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Factory_decl.hpp:48:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Details_Factory.hpp:2:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Details_Factory_def.hpp:46:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Details_OneLevelFactory.hpp:2:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Details_OneLevelFactory_def.hpp:51:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Relaxation.hpp:2:
In file included from TPLs_src/Trilinos_flat_headers/include/Ifpack2_Relaxation_def.hpp:54:
In file included from TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GaussSeidel.hpp:47:
In file included from TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GaussSeidel_impl.hpp:44:
In file included from TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GraphColor.hpp:47:
TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GraphColor_impl.hpp:1296:27: warning: equality comparison with extraneous parentheses [-Wparentheses-equality]
while ((forbidden[c]==i)) c++;
~~~~~~~~~~~~^~~
TPLs_src/Trilinos_flat_headers/include/KokkosKernels_GraphColor_impl.hpp:1020:13: note: in instantiation of member function 'KokkosKernels::Experimental::Graph::Impl::GraphColor_VB<<f>KokkosKernels::Experimental::Graph::GraphColoringHandle<<f>Kokkos::View<<f>const unsigned long *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<<f>int *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<<f>0> >, Kokkos::View<<f>int *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::Serial, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace>, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace> >, Kokkos::View<<f>const unsigned long *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<<f>0> >, Kokkos::View<<f>const int *, Kokkos::LayoutLeft, Kokkos::Device<<f>Kokkos::Serial, Kokkos::HostSpace>, Kokkos::MemoryTraits<<f>0> > >::resolveConflicts' requested here
this->resolveConflicts(
^
Graph coloring problem is normally defined on the structurally symmetric graphs. Current kokkos-kernels implementation assumes the graph is symmetric, if it is not a preprocessing is required to symmetrize the graph. This symmetrization step can be significantly expensive.
Instead the plan is to implement a distance-1 graph coloring that will also work on unsymmetric graphs.
The development cant be tracked in the branch:
https://github.com/mndevec/kokkos-kernels/tree/develop_unsymmetric_coloring
I get below error when I enable Kokkos::Pthread.
In file included from /home/mndevec/work/trilinoses/Trilinos/packages/kokkos-kernels/src/impl/Kokkos_Blas1_MV_impl_abs.hpp(339),
from /home/mndevec/work/trilinoses/Trilinos/packages/kokkos-kernels/src/impl/generated_specializations_cpp/abs/KokkosBlas1_impl_MV_abs_inst_specialization_Kokkos_complex_double__LayoutLeft_Cuda_CudaSpace.cpp(45):
/home/mndevec/work/trilinoses/Trilinos/packages/kokkos-kernels/src/impl/generated_specializations_hpp/KokkosBlas1_impl_MV_abs_decl_specializations.hpp(67): error: namespace "Kokkos" has no member "Pthread"
KOKKOSBLAS1_IMPL_MV_ABS_DECL(double, Kokkos::LayoutLeft, Kokkos::Pthread, Kokkos::HostSpace)
^
....
I just got a report from Albany users (@lxmota and @calleman21). They switched from Teuchos::ScalarTraits<double>::nan()
to Kokkos::ArithTraits<double>::nan()
which slowed down their entire application by over 3X because all variables are initialized to NaN using this function.
KokkosKernels uses strtod()
to implement this function (on the host), while Teuchos returns a global variable which is initialized to 0.0/0.0
. @lxmota also recommended that we simply call std::numeric_traits<double>::quiet_NaN()
.
All the above also applies to float
.
I think we should switch to either what Teuchos does or the quiet_NaN()
from the standard library.
@crtrott @mhoemmen any thoughts?
I'll pick one of these and submit a PR soon.
A systematic new unit test framework shall be defined laying out how to add new unit tests.
@crtrott @srajama1
I am having some issues when I enable Threads in KokkosKernels.
With standalone make:
Within Trilinos cmake,
I believe in both cases the ++numValid; line needs to be within the if(offset != length) branch, right now it is executed unconditionally.
See discussion here:
I removed most of the Teuchos dependencies that are related to UnitTestHarness, and replaced them with gtest.
However, MV unit tests depend on Teuchos MPI. Are they supposed to move to Tpetra? They are the last bits before removing the Teuchos dependency.
/ascldap/users/kyukim/Work/lib/kokkoskernels/master/src/Kokkos_ArithTraits.hpp(183): warning: pointless comparison of unsigned integer with zero
detected during instantiation of "IntType <unnamed>::intPowSigned(IntType, IntType) [with IntType=char]"
(1528): here
/ascldap/users/kyukim/Work/lib/kokkoskernels/master/src/Kokkos_ArithTraits.hpp(187): warning: pointless comparison of unsigned integer with a negative constant
detected during instantiation of "IntType <unnamed>::intPowSigned(IntType, IntType) [with IntType=char]"
(1528): here
Currently the ETI stuff uses Tpetra variables to figure out what to do. Fix that.
See also kokkos/kokkos#1048 this needs to be -DBL_MAX no DBL_MIN.
Provide Interface to allow for strided views as input.
When every Cuda thread tries to call sumInto we have tons of non-coalesced access on the input views.
Being able to provide strided subviews as input would make it possible to fix this.
Or at least have a CMake option to disable them. A large customer really does not care about multiple right-hand sides, and is starting to hit the 4GB linker limit for debug builds.
@crtrott already has a macro for this, so this should just be a matter of adding a CMake option to define or undefine the macro, then testing with both options.
Ok I think I am finally close to make this ETI stuff work properly. There is some funky compiler stuff with regards to using extern template instantiations for classes, in particular if you want to allow instantiations of other types but I believe my solution is now fool proof ......
Furthermore I believe the file structure and naming etc needs some cleanup. In particular this focus on MultiVector which historically comes from Tpetra is confusing for standalone users.
Lets start with some requirements what we need to be able to do::
In order to do all this we came up with a design which has 3 functionality layers (I will go into details later):
Now I want to go through a couple of design aspects in the next posts.
/ascldap/users/crtrott/Kokkos/kokkos-kernels/src/sparse/impl/KokkosSparse_spmv_impl_omp.hpp:54:22: warning: unused variable 'rowCount' [-Wunused-variable]
/ascldap/users/crtrott/Kokkos/kokkos-kernels/unit_test/sparse/Test_Sparse_spmv.hpp:54:9: warning: unused variable 'nc' [-Wunused-variable]
/ascldap/users/crtrott/Kokkos/kokkos-kernels/perf_test/graph/KokkosGraph_color.cpp:178:15: warning: unused variable 'm' [-Wunused-variable]
These warnings are discovered by XL
See trilinos/Trilinos#1194 for discussion.
I've been running some experiments in MueLu, and noticed that matrix-matrix multiplication time does not change when we use a filtered matrix with OpenMp node.
My assumption is that the currently implemented spgemm in kokkos-kernels does not have any shortcuts or optimization when the left matrix may have multiple zero elements. In the serial Tpetra version, the code checks for zeros in A and then skips fetching corresponding rows of B, thus significantly improving performance.
My question is: would something like that be possible in the threaded spgemm? This could significantly help with some applications that use multigrid, particularly Nalu when used with high geometric anisotropy.
Story: #1
KokkosSparse::Impl::getDiagCopyWithOffsets depends on it, but it lives in Tpetra. It's only a coincidence that the function builds correctly (probably because Tpetra is currently the only consumer of this file, and Tpetra must include the header file with OrdinalTraits before including the header file defining that function).
@kyungjoo-kim, is it possible to get a timeline on Kokkos-Kernels team level dense linear algebra?
It's not critical at the moment, but it would be useful to have an estimate on when that capability will be available. We can discuss offline further if need be.
There is some teuchos usage in unit and performance tests. That needs to go.
Blocks: trilinos/Trilinos#1169
KokkosBlas::gemv currently exists, but it does not currently call the BLAS library or cuBLAS where appropriate.
There is a subtlety in whether Tpetra uses this GEMV as a matrix-vector product, or as the dot product of each column of a MultiVector with a single Vector. The difference is that matrix-vector products do all computations with Scalar values (actually Tpetra::MultiVector::impl_scalar_type
), while the intermediate sums in a dot product have type Tpetra::MultiVector::dot_type
. The two types are usually the same, except for Scalar types that come from Stokhos. We have to decide whether we want GEMV to support both cases, or just one of them.
In practice, BLAS implementations only support types for which impl_scalar_type == dot_type
. Thus, this is really about the interface that we present to users.
XL 14 output during build. Yes we are looking at over 45mins compile time with 7 and 3 GB footprint respectively:
27473 crtrott 20 0 7897280 7.436g 58560 R 84.0 1.5 44:58.57 /home/projects/pwr8-rhel73-lsf/ibm/xl/xlC/14.1.0/exe/ipa -comp -qalias=ansi -qthreaded -qtls -qtls -maltivec -qtls -qlanglvl=extended0x -qarch=pwr8 -qtune=pwr8 -qsmp=omp /tmp/xlcW0wbmEmP /tmp/xlcW13OIGeu Test_OpenMP_Sparse_spgemm.o /tmp/xlcLj2CORsT.lst /tmp/xlcW25m6I68
27312 crtrott 20 0 3903744 3.159g 58624 R 80.0 0.6 45:19.68 /home/projects/pwr8-rhel73-lsf/ibm/xl/xlC/14.1.0/exe/ipa -comp -qalias=ansi -qthreaded -qtls -qtls -maltivec -qtls -qlanglvl=extended0x -qarch=pwr8 -qtune=pwr8 -qsmp=omp /tmp/xlcW02Rd4eQ /tmp/xlcW1XvNf7u Test_OpenMP_Sparse_gauss_seidel.o /tmp/xlcLOE0GKtT.lst /tmp/xlcW26HmrZ9
I noticed that the latest snapshot of kokkos in Trilinos included a large renaming of files which triggered some build errors in MueLu.
I talked with @mndevec and it seems that the integration test done before the snapshot push did not catch MueLu's dependencies on kokkos.
To have MueLu use Kokkos you need to add the following flags in your configure script:
-D MueLu_ENABLE_Experimental=ON \
-D MueLu_ENABLE_Kokkos_Refactor=ON \
-D Xpetra_ENABLE_Experimental=ON \
-D Xpetra_ENABLE_Kokkos_Refactor=ON \
as far as I can tell we have only issues with the renaming of Kokkos_CrsMatrix.hpp
into KokkosSparse_CrsMatrix.hpp
@jhux2 @tawiesn @csiefer2 do you have any comments/additions to make
I looked through a few of the dashboard tests and they all report KokkosKernels as not ETI-ing anything.
e.g.:
Processing ETI support: KokkosKernels
-- KokkosKernels: Processing ETI / test support
-- Enabled Scalar types:
-- Enabled LocalOrdinal types:
-- Enabled Device types:
-- Set of enabled types, before exclusions:
Is this to be expected?
Are the long compile and runtime for SPGEMM and GaussSeidel really necessary?
For OpenMP on my workstation SPGEMM takes 200s test time out of 340s for the whole library. If I add Gaussseidel in it is 296s out of 340s. Do we really need that for correctness checking?
Furthermore the compile times are also pretty high SPGEMM takes in a non-parallel build (i.e. -j 1) 78s and gaussseidel 54s out of a total of 296s for the all unit tests together.
--no-comit
, should be --no-commit
This is just to collect stuff. I will update the first post if more comes in. This is not a promise off what is gonna be there when, its just to help us planning. I differentiate global, team, and thread kernels.
BLAS
Global:
Team:
Thread
LAPACK
Global:
Team:
Thread:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.