Giter Site home page Giter Site logo

johelli / kokkos-kernels Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kokkos/kokkos-kernels

0.0 1.0 0.0 9.63 MB

Kokkos C++ Performance Portability Programming EcoSystem: Math Kernels - Provides BLAS, Sparse BLAS and Graph Kernels

License: Other

CMake 0.18% C++ 98.97% Makefile 0.21% Shell 0.47% Python 0.17% C 0.01%

kokkos-kernels's Introduction

Kokkos C++ Performance Portability Programming EcoSystem: Math Kernels -
Provides BLAS, Sparse BLAS and Graph Kernels 

KokkosKernels implements local computational kernels for linear
algebra and graph operations, using the Kokkos shared-memory parallel
programming model.  "Local" means not using MPI, or running within a
single MPI process without knowing about MPI.  "Computational kernels"
are coarse-grained operations; they take a lot of work and make sense
to parallelize inside using Kokkos.  KokkosKernels can be the building
block of a parallel linear algebra library like Tpetra that uses MPI
and threads for parallelism, or it can be used stand-alone in your
application.

Computational kernels in this subpackage include the following:

  - (Multi)vector dot products, norms, and updates (AXPY-like
    operations that add vectors together entry-wise)
  - Sparse matrix-vector multiply and other sparse matrix / dense
    vector kernels
  - Sparse matrix-matrix multiply
  - Graph coloring
  - Gauss-Seidel with coloring (generalization of red-black)
  - Other linear algebra and graph operations

KokkosKernels is licensed under standard 3-clause BSD terms of use.  For
specifics, please refer to the LICENSE file contained in the
repository or distribution.

We organize this directory as follows:

  1. Public interfaces to computational kernels live in the src/
     subdirectory (kokkos-kernels/src):

     - Kokkos_Blas1_MV.hpp: (Multi)vector operations that
       Tpetra::MultiVector uses
     - Kokkos_Sparse_CrsMatrix.hpp: Declaration and definition of
       KokkosSparse::CrsMatrix, the sparse matrix data structure used
       for the computational kernels below
     - KokkosSparse_spmv.hpp: Sparse matrix-vector multiply with a
       single vector, stored in a 1-D View + Sparse matrix-vector multiply with
       multiple vectors at a time (multivectors), stored in a 2-D View

  2. Implementations of computational kernels live in the src/impl/
     subdirectory (kokkos-kernels/src/impl)

  3. Correctness tests live in the unit_test/ subdirectory, and
     performance tests live in the perf_test/ subdirectory

  4. Simple example scripts to build Kokkoskernels are in
     example/buildlib/


Do NOT use or rely on anything in the KokkosBlas::Impl namespace, or
on anything in the impl/ subdirectory.

This separation of interface and implementation lets the interface
assign the users' Views to View types with the desired attributes
(e.g., read-only, RandomRead).  This also makes it easier to provide
full specializations of the implementation.  "Full specializations"
mean that all the template parameters are fixed, so that the compiler
can actually compile the code.  This technique keeps your library's or
application's build times down, since kernels are already precompiled
for certain template parameter combinations.  It also improves
performance, since compilers have an easier time optimizing code in
shorter .cpp files.

Building Kokkoskernels
----------------------

  1. Modify example/buildlib/compileKokkosKernelsSimple.sh or
     example/buildlib/compileKokkosKernels.sh for your environment
     and run it to generate the required makefiles. 
     - KOKKOS_DEVICES can be as below. You can remove any backend 
       that you don't need. If cuda backend is used, CXX compiler should point to ${KOKKOS_PATH}/bin/nvcc_wrapper.
       If you enable Cuda, a host space, either OpenMP or Serial should be enabled.
       KOKKOS_DEVICES=OpenMP,Serial,Cuda

     - For the best performance give the architecture flag to proper architecture.
       e.g. KNLs: KOKKOS_ARCHS=KNL, KOKKOS_ARCHS=HSW. 
       If you compile for P100 GPUs with Power8 Processor, give both architectures.
       KOKKOS_ARCHS=Pascal60,Power8
     
       For the architecture flags, run below command.
       %: scripts/generate_makefile.bash --help     

  2. Run "make build-test" to compile the tests.


Comments for building Trilinos with Kokkoskernels
----------------------
  - For Trilinos builds with the Cuda backend and complex double enabled with ETI,
    the cmake option below may need to be set to avoid Error 127 errors:
    CMAKE_CXX_USE_RESPONSE_FILE_FOR_OBJECTS:BOOL=ON

    If the option above is not set, a warning will be issued during configuration:

      "The CMake option CMAKE_CXX_USE_RESPONSE_FILE_FOR_OBJECTS is either
      undefined or OFF.  Please set
      CMAKE_CXX_USE_RESPONSE_FILE_FOR_OBJECTS:BOOL=ON when building with CUDA and
      complex double enabled."


Using Kokkoskernels Test Drivers 
--------------------------

In perf_test there are test drivers.

-- KokkosGraph_triangle.exe : Triangle counting driver. 
-- KokkosSparse_spgemm.exe : Sparse Matrix Sparse Matrix Multiply: 
   *****NOTE: KKMEM is outdated. Use default algorithm: KKSPGEMM = KKDEFAULT = DEFAULT****
   Or within the code:
       kh.create_spgemm_handle(KokkosSparse::SPGEMM_KK);
-- KokkosSparse_spmv.exe : Sparse matvec.
-- KokkosSparse_pcg.exe: CG method with Gauss Seidel as preconditioner.
-- KokkosGraph_color.exe: Distance-1 Graph coloring 
-- KokkosKernels_MatrixConverter.exe: given a matrix market format, converts it ".bin"
   binary format for fast input output readings, which can be read by other test drivers.
   
Please report bugs or performance issues to: https://github.com/kokkos/kokkos-kernels/issues

kokkos-kernels's People

Contributors

crtrott avatar mndevec avatar kyungjoo-kim avatar ndellingwood avatar vqd8a avatar srajama1 avatar dalg24 avatar lucbv avatar ambrad avatar nmhamster avatar ibaned avatar hcedwar avatar cgcgcg avatar csiefer2 avatar etphipp avatar mhoemmen avatar rppawlo avatar brian-kelley avatar aprokop avatar bavier avatar eric-c-cyr avatar pkestene avatar bartlettroscoe avatar william76 avatar avdelor avatar micheldemessieres avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.