Giter Site home page Giter Site logo

Comments (17)

mndevec avatar mndevec commented on May 25, 2024 1

Okay, I moved this topic to here:
https://github.com/kokkos/kokkos-kernels/wiki/ETI-System-and-file-structure

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

I thought now long and hard about a sane, sustainable way for organizing this into files. Here is what I came up with:

src:
  KokkosBlas.hpp: includes all the KokkosBlas function header files
      KokkosBlas1_foo.hpp (contains user interface functions for foo)
src/impl:
  KokkosBlas1_foo_impl.hpp: The actual implementation of the functions (Functors etc.)
  KokkosBlas1_foo_spec.hpp: The specialization layer
src/impl/tpl
  KokkosBlas1_foo_tpl_spec_avail.hpp: Availability of TPLs for particular types
  KokkosBlas1_foo_tpl_spec_decl.hpp: The Specialization declaration for using tuples
src/impl/generated_specializations_hpp
  KokkosBlas1_foo_eti_spec_avail.hpp: Availability declarations for ETI types
  KokkosBlas1_foo_eti_spec_decl.hpp: Specialization declarations for ETI types
src/impl/generated_specializations_cpp/foo
  KokkosBlas1_foo_eti_spec_inst_double_LayoutRight_Cuda_CudaSpace.cpp: one instantiation for an extern template

Lets talk about what you need to touch to do specific things:

Add a new function:

  • Add all those files based on the template provided later
  • Modify the scripts which generate the auto generated files

Modify the implementation of a function

  • Only src/impl/KokkosBlas1_foo_impl.hpp needs to be modified

Add a new ETI type

  • modify the scripts which generate the auto generated files

Add a new TPL variant

  • Modify the files in impl/tpl/ to add the new TPL (declare its availability, and provide the implementation of how to call it)

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

Lets look at the code and what those things do.

Public API in src/KokkosBlas1_foo.hpp

This file provides the public API for the function foo. The function internally calls the specialization layer after explicitly filling in all the necessary template arguments for the ViewTypes etc. For example for a dot(a,b) product, const modifiers should be added to the scalar type, if they are not already there. Otherwise this would require to compile the code potentially 4 times:

  • dot(View<double*>, View<double*>);
  • dot(View<double*>, View<const double*>);
  • dot(View<const double*>, View<double*>);
  • dot(View<const double*>, View<const double*>);
    If you then factor in explicit vs implicit specification of Layout, Memory Space, and MemoryTraits we end up with over 100 possible instantiations for something which is technically the exact same thing!

Furthermore this function should also do static asserts on things which are not allowed (for example wrong Rank of the view) in order to give users an early exit in a function which they can directly associate with the code they written.

Here is an example for:

// Include the specialziation layer which define the Impl::Foo struct
#include<impl/KokkosBlas1_foo_spec.hpp>

namespace KokkosBlas1 {
// User facing function accepts any ViewType
template<class ViewType>
void foo(const ViewType& a) {

  // Static assert on prohibited types
  static_assert(ViewType::rank==1, "Trying to call foo with View of rank other than 1");

  // Convert ViewType to internal ViewType to reduce instantiations
  // Without this wether you explicitly specify a Layout or not would be 
  // two different instantiations since Views have variadic template parameters
  // Furthermore this is the place to add missing const etc.
  typedef Kokkos::View<typename ViewType::data_type,
                       typename ViewType::array_layout,
                       typename ViewType::device_type>
          ViewTypeInternal;

  // Call the actual implementation
  Impl::Foo<ViewTypeInternal>::foo(a);
}
}

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

Next up:

The Specialization Layer

This layer is the one which not only serves as the focal point for the unified instantiation of the things the public layer requires, it is also the layer which allows for specialization for third party libraries (such as MKL and CUBLAS) and explicit template instantiation (ETI).

Generally this layer is very thin again and basically just passes through arguments.

The basic mechanism for ETI is the extern template mechanism of C++11. Unfortunately that thing has some funky semantics with respect to classes. In particular it looks like the compile can still choose to inline the implementation of the class, if it is visible in the same compilation unit instead of calling the externally available instantiation. This might also be compiler dependent.

To enable both TPL specialization and ETI specialization additional bool template parameters are added to the specialization layer which are defaulted to values based on whether said specializations are available:

From impl/KokkosBlas1_foo_spec.hpp:

template<class ViewType>
struct foo_eti_spec_avail {
  enum : bool { value = false };
};

template<class ViewType, bool tpl_spec_avail = foo_tpl_spec_avail<ViewType>::value,
                         bool eti_spec_avail = foo_eti_spec_avail<ViewType>::value>
struct Foo {
  static void foo(const ViewType& a);
};

In order to declare a specialization available a full specialization of foo_tpl_spec_avail or foo_eti_spec_avail must be made available. Those functions live in impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp and impl/generated_specializations_hpp/KokkosBlas1_foo_eti_spec_avail.hpp respectively with the latter auto generated. We come back to those files in a bit.

The next part in the specialization layer is the definition of the specialization layer for when no TPL is used. This calls the actual implementation provided in impl/KokkosBlas1_foo_impl.hpp
Note that the TPL bool is set to false, while the other one is set to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY. The latter one is only going to be true while compiling the KokkosKernels library with its explicit template instantiations.

template<class ViewType>
struct Foo<ViewType,false,KOKKOSKERNELS_IMPL_COMPILE_LIBRARY> {
  static void foo(const ViewType& a) {
    execute_foo(a);
  }
};

In this file we also need to define the macros which are later used in the auto generated files:

// Availability Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_AVAIL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template<> \
struct foo_eti_spec_avail<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE> > > { \
  enum : bool { value = true }; \
}; 

// Declaration Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_DECL( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
extern template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;

// Instantiation Macro
#define KOKKOSBLAS1_IMPL_FOO_ETI_SPEC_INST( SCALAR, LAYOUT, EXECSPACE, MEMSPACE ) \
template struct Foo<Kokkos::View<SCALAR*,LAYOUT,Kokkos::Device<EXECSPACE,MEMSPACE>>,false,true>;

// Include the actual declarations for tpls and eti
#if !KOKKOSKERNELS_IMPL_COMPILE_LIBRARY
#include<impl/tpls/foo_tpl_spec_decl.hpp>
#include<impl/generated_specializations_hpp/foo_eti_spec_decl.hpp>
#endif

Note how the actual declarations of those classes are only included when we are NOT compiling the library.

I'll post the whole file later after discussing some more Macro stuff.

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

The implementation layer in impl/KokkosBlas1_foo_impl.hpp is pretty much whatever we need it to be. In this case its just a simple function:

  template<class ViewType>
  void execute_foo(const ViewType& a) {
    Kokkos::parallel_for("KokkosBlas1::foo",a.extent(0), KOKKOS_LAMBDA (const int& i) {
      a(i) = i;
    });
  }

If we want to distinguish between multi vector and normal vector where to put the stuff the implementation layer may be one of the places.

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

The TPL layer consists of two files: the one which declares the availability of a specialization and the one which provides the specialization. The first one is impl/tpls/KokkosBlas1_foo_tpl_spec_avail.hpp:

template<class ViewType>
struct foo_tpl_spec_avail {
  enum : bool { value = false };
};

#ifdef KOKKOSKERNELS_ENABLE_MKL
template<>
struct foo_tpl_spec_avail<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>> {
  enum : bool { value = true };
};
#endif

Basically for every new TPL which we want to support we drop another full specialization of this stuff in.

The implementation is the counter part to it. Note that we can use the implementation to decide based on input parameters whether to call our own code or the tpl code. We also need to have two full specializations here based on whether ETI for the same type combination would be available or not.

#ifdef KOKKOSKERNELS_ENABLE_MKL
#include<mkl_foo.hpp>
namespace KokkosBlas1 {
namespace Impl {

// Only a TPL specialization is available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,false> {
  typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;

  static void foo(const ViewType& a) {
    #if (KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION)
    printf("Calling MKL Specialization\n");
    #endif
    mkl_foo(a.data(),a.extent(0));
  }
};

// Both a TPL specialization and an ETI instantiation are available
template<>
struct Foo<Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>>,true,true> {
  typedef Kokkos::View<double*,Kokkos::LayoutRight,Kokkos::Device<Kokkos::Serial,Kokkos::HostSpace>> ViewType;

  static void foo(const ViewType& a) {
    // Our code is better for large number of entries, so only use TPL for small lengths
    if(a.extent(0) < 100000)
      Foo<ViewType,true,false>::foo(a);
    else
      Foo<ViewType,false,true>::foo(a);
  }
};
}
}
#endif

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

Last but not least there are three auto generated files which are kind of like the TPL files: declare a ETI specialization available, provide the extern template declaration of those ETI specializations, and instantiate them in cpp files. Those simply use the previously defined macros with the right type combinations.

There is one more detail using two additional macros:

  • KOKKOSKERNELS_ENABLE_ETI_ONLY: is used to prevent instantiations of Non-ETI or Non-TPL types. This is used to hide the actual definition of the specialization layer when not compiling the library cpp files.
  • KOKKOSKERNELS_ENABLE_CHECK_SPECIALIZATION: this is more of a debug option which enables print statements stating which specialization (ETI, Non-ETI, TPL) was called. This is useful to make sure we don't instantiate stuff in cases where we can't turn on full ETI_ONLY.

Also one more word to KOKKOSKERNELS_IMPL_COMPILE_LIBRARY. This macro is always defined as false, except inside the auto generated ETI cpp files.

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

I will check in the actual full example code soon.

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

Some more thoughts: while this is a lot of different files, we are trying to serve a pretty complex use-case scenario. Most of this stuff is pretty boiler plate and doesn't really use much advanced C++ stuff. It basically comes down to a bunch of full specializations. The particular nice thing this scheme does for us is that it decouples the actual implementation from, providing specializations for TPLs, from providing ETI specializations. All three things can be modified independently. Furthermore this scheme clearly separates which files are responsible for which part of the hierarchy.

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

@mhoemmen @dsunder @hcedwar @srajama1
Most folks on KokkosKernels are not that much interested in software engineering as long as what they have to work with works. But maybe you guys wanna take a look and tell me what you think (and also if the explanation makes sense why this is the design I came up with).

from kokkos-kernels.

hcedwar avatar hcedwar commented on May 25, 2024

You can static_assert( is_view<T>::value , ... as well

Thought (tbd): Should we have something in Kokkos core to canonicalize a View?

template< class ViewType >
using canonical_view_of_const = 
  View< typename ViewType::const_data_type 
          , typename ViewType::layout 
          , typename ViewType::device_type 
          , typename ViewType::memory_traits > ;

The foo_eti_spec_avail and foo_tpl_spec_avail is an unfortunate need and, at first glance, a good minimalist approach.

from kokkos-kernels.

mhoemmen avatar mhoemmen commented on May 25, 2024

@crtrott I like @hcedwar 's idea of adding some "canonicalize the View" type functions.

I think the design makes sense, especially its ETI / TPL aspects. In particular, I think it's enough for us to specialize on whether some TPL is available. Very few users in practice want to swap different TPLs in and out at compile or run time. (They just want to know what's the fastest TPL to use on each platform.) I don't think it's worth complicating the design for this use case, which may only be of interest to the occasional computer science publication. We're a national lab; that should be at best a tertiary interest for us.

This design is good for "node-global" kernels. What about single-team or single-thread kernels? Are we worried about potential inlining overhead at those lower levels?

Also, what about asynchronous dispatch? This is relevant to design of the implementation layer's interface, because Views may need to stay managed as they enter the implementation layer.

from kokkos-kernels.

crtrott avatar crtrott commented on May 25, 2024

Regarding asynchronous dispatch: the internal view types are is function specific. So for asynchronous ones the internal views must be managed.

from kokkos-kernels.

mndevec avatar mndevec commented on May 25, 2024

By the way, it might be better to move this issue and #28 to Wiki.

from kokkos-kernels.

mhoemmen avatar mhoemmen commented on May 25, 2024

@mndevec I would say, @crtrott finished implementing the first-pass (more accurately, second-pass, or third-pass if you count Chris Baker's Tpetra kernels) design. Thus, it is my view that it would be proper to close this issue. We can always open new issues for new things to do.

from kokkos-kernels.

mndevec avatar mndevec commented on May 25, 2024

I mean, this issue was a nice guideline for me. It would be nice to save it in wiki of Kokkoskernels so that it can be easily found, rather than searching it in the issue history.

from kokkos-kernels.

mhoemmen avatar mhoemmen commented on May 25, 2024

@mndevec wrote:

It would be nice to save it in wiki of Kokkoskernels so that it can be easily found, rather than searching it in the issue history.

That's a good idea. I think it would be best, then, to close this issue, but copy its contents into the wiki. How about that?

from kokkos-kernels.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.