Giter Site home page Giter Site logo

bassoy / ttv Goto Github PK

View Code? Open in Web Editor NEW
19.0 4.0 4.0 6.04 MB

C++ Header-Only Library for High-Performance Tensor-Vector Multiplication

License: GNU Lesser General Public License v3.0

Shell 0.79% C++ 94.59% Makefile 0.74% Python 3.88%
tensor multidimensional arrays multilinear-algebra tensor-contraction c-plus-plus tensor-library blas tensor-vector-multiplications tensor-times-vector

ttv's Introduction

High-Performance Tensor-Vector Multiplication Library (TTV)

Language License Wiki Gitter Build Status

Summary

TTV is C++ high-performance tensor-vector multiplication header-only library It provides free C++ functions for parallel computing the mode-q tensor-times-vector product of the general form

ttv

where q is the contraction mode, A and C are tensors of order p and p-1, respectively, b is a tensor of order 1, thus a vector. Simple examples of tensor-vector multiplications are the inner-product c = a[i] * b[i] with q=1 and the matrix-vector multiplication c[i] = A[i,j] * b[j] with q=2. The number of dimensions (order) p and the dimensions n[r] as well as a non-hierarchical storage format pi of the tensors A and C can be chosen at runtime.

All function implementations are based on the Loops-Over-GEMM (LOG) approach and utilize high-performance GEMV or DOT routines of BLAS such as OpenBLAS or Intel MKL without transposing the tensor. The library is an extension of the boost/ublas tensor library containing the sequential version. Implementation details and runtime behevior of the tensor-vector multiplication functions are described in the research paper article.

Please have a look at the wiki page for more informations about the usage, function interfaces and the setting parameters.

Key Features

Flexibility

  • Contraction mode q, tensor order p, tensor extents n and tensor layout pi can be chosen at runtime
  • Supports any non-hierarchical storage format inlcuding the first-order and last-order storage layouts
  • Offers two high-level and one C-like low-level interfaces for calling the tensor-times-vector multiplication
  • Implemented independent of a tensor data structure (can be used with std::vector and std::array)
  • Supports float, double, complex and double complex data types (and more if a BLAS library is not used)

Performance

  • Multi-threading support with OpenMP
  • Can be used with and without a BLAS implementation
  • Performs in-place operations without transposing the tensor - no extra memory needed
  • For large tensors reaches peak matrix-times-vector performance

Requirements

  • Requires the tensor elements to be contiguously stored in memory.
  • Element types must be an arithmetic type suporting multiplication and addition operator

Experiments

The experiments were carried out on a Core i9-7900X Intel Xeon processor with 10 cores and 20 hardware threads running at 3.3 GHz. The source code has been compiled with GCC v7.3 using the highest optimization level -Ofast and -march=native, -pthread and -fopenmp. Parallel execution has been accomplished using GCC โ€™s implementation of the OpenMP v4.5 specification. We have used the dot and gemv implementation of the OpenBLAS library v0.2.20. The benchmark results of each of the following functions are the average of 10 runs.

The comparison includes three state-of-the-art libraries that implement three different approaches.

  • TCL (v0.1.1 ) implements the TTGT approach.
  • TBLIS ( v1.0.0 ) implements the GETT approach.
  • EIGEN ( v3.3.90 ) sequentially executes the tensor-times-vector in-place.

The experiments were carried out with asymmetrically-shaped and symmetrically-shaped tensors in order to provide a comprehensive test coverage where the tensor elements are stored according to the first-order storage format. The tensor order of the asymmetrically- and symmetrically-shaped tensors have been varied from 2 to 10 and 2 to 7, respectively. The contraction mode q has also been varied from 1 to the tensor order.

Symmetrically-Shaped Tensors

TTV has been executed with parameters tlib::execution::blas, tlib::slicing::large and tlib::loop_fusion::all

Drawing Drawing
Drawing Drawing

Asymmetrically-Shaped Tensors

TTV has been executed with parameters tlib::execution::blas, tlib::slicing::small and tlib::loop_fusion::all

Drawing Drawing
Drawing Drawing

Example

/*main.cpp*/
#include <vector>
#include <numeric>
#include <iostream>
#include <tlib/ttv.h>


int main()
{
  const auto q = 2ul; // contraction mode
  
  auto A = tlib::tensor<float>( {4,3,2} ); 
  auto B = tlib::tensor<float>( {3,1}   );
  std::iota(A.begin(),A.end(),1);
  std::fill(B.begin(),B.end(),1);

/*
  A =  { 1  5  9  | 13 17 21
         2  6 10  | 14 18 22
         3  7 11  | 15 19 23
         4  8 12  | 16 20 24 };

  B =   { 1 1 1 } ;
*/

  // computes mode-2 tensor-times-vector product with C(i,j) = A(i,k,j) * B(k)
  auto C1 = A (q)* B; 
  
/*
  C =  { 1+5+ 9 | 13+17+21
         2+6+10 | 14+18+22
         3+7+11 | 15+19+23
         4+8+12 | 16+20+24 };
*/
}

Compile with g++ -I../include/ -std=c++17 -Ofast -fopenmp main.cpp -o main and additionally -DUSE_OPENBLAS or -DUSE_INTELBLAS for fast execution.

Citation

If you want to refer to TTV as part of a research paper, please cite the article Design of a High-Performance Tensor-Vector Multiplication with BLAS

@inproceedings{ttv:bassoy:2019,
  author="Bassoy, Cem",
  editor="Rodrigues, Jo{\~a}o M. F. and Cardoso, Pedro J. S. and Monteiro, J{\^a}nio and Lam, Roberto and Krzhizhanovskaya, Valeria V. and Lees, Michael H. and Dongarra, Jack J. and Sloot, Peter M.A.",
  title="Design of a High-Performance Tensor-Vector Multiplication with BLAS",
  booktitle="Computational Science -- ICCS 2019",
  year="2019",
  publisher="Springer International Publishing",
  address="Cham",
  pages="32--45",
  isbn="978-3-030-22734-0"
}

ttv's People

Contributors

bassoy avatar hrhee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ttv's Issues

Tests linking fails with multiple undefined symbols

I get this failure at linking (tried gcc12 and gcc11, same result):

/opt/local/bin/g++-mp-11 -Wextra -Wall -Wpedantic -Ofast -std=c++17 -pthread -fopenmp  build/gtest_tlib_layout.o build/gtest_tlib_mtv.o build/gtest_tlib_shape.o build/gtest_tlib_strides.o build/gtest_tlib_ttv.o build/gtest_tlib_workload.o build/main.o -lgtest -lpthread -lgomp -lpthread -lm -lopenblas -o bin/main
Undefined symbols:
  "__ZN7testing8internal30GetBoolAssertionFailureMessageB5cxx11ERKNS_15AssertionResultEPKcS5_S5_", referenced from:
      __ZN30LayoutTest_inverse_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN30LayoutTest_inverse_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN30LayoutTest_inverse_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN30LayoutTest_inverse_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN28LayoutTest_inverse_mode_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN28LayoutTest_inverse_mode_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN29LayoutTest_output_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN29LayoutTest_output_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN29LayoutTest_output_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_4_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_4_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_4_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_4_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_4_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_4_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_4_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_1_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_1_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_1_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_1_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_1_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_1_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_2_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_2_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_2_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_2_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_3_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_3_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_3_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN32LayoutTest_generate_3_order_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN31LayoutTest_is_valid_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN31LayoutTest_is_valid_layout_Test8TestBodyEv in gtest_tlib_layout.o
      __ZN23ShapeTest_is_valid_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN23ShapeTest_is_valid_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN23ShapeTest_is_valid_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN23ShapeTest_is_valid_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN23ShapeTest_is_valid_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN23ShapeTest_is_valid_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN23ShapeTest_is_valid_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN24ShapeTest_is_tensor_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN24ShapeTest_is_tensor_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN24ShapeTest_is_scalar_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN24ShapeTest_is_scalar_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN24ShapeTest_is_vector_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN24ShapeTest_is_vector_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN24ShapeTest_is_matrix_Test8TestBodyEv in gtest_tlib_shape.o
      __ZN24ShapeTest_is_matrix_Test8TestBodyEv in gtest_tlib_shape.o
      __ZZN36ShapeTest_generate_output_shape_Test8TestBodyEvENKUlRKT_RKT0_jE_clISt6vectorIS8_IjSaIjEESaISA_EESC_EEDaS2_S5_j.constprop.0 in gtest_tlib_shape.o
      __ZZN36ShapeTest_generate_output_shape_Test8TestBodyEvENKUlRKT_RKT0_jE_clISt6vectorIS8_IjSaIjEESaISA_EESC_EEDaS2_S5_j.constprop.0 in gtest_tlib_shape.o
      __ZZN28StridesTest_TensorShape_Test8TestBodyEvENKUlmRKT_RKT0_RKT1_RKT2_E_clISt6vectorImSaImEESG_SG_SG_EEDamS2_S5_S8_SB_.constprop.0 in gtest_tlib_strides.o
      __ZZN28StridesTest_TensorShape_Test8TestBodyEvENKUlmRKT_RKT0_RKT1_RKT2_E_clISt6vectorImSaImEESG_SG_SG_EEDamS2_S5_S8_SB_.constprop.0 in gtest_tlib_strides.o
      __ZZN28StridesTest_TensorShape_Test8TestBodyEvENKUlmRKT_RKT0_RKT1_RKT2_E_clISt6vectorImSaImEESG_SG_SG_EEDamS2_S5_S8_SB_.constprop.0 in gtest_tlib_strides.o
      __ZN28StridesTest_ScalarShape_Test8TestBodyEv in gtest_tlib_strides.o
      __ZN28StridesTest_VectorShape_Test8TestBodyEv in gtest_tlib_strides.o
      __ZN28StridesTest_VectorShape_Test8TestBodyEv in gtest_tlib_strides.o
      __ZN28StridesTest_VectorShape_Test8TestBodyEv in gtest_tlib_strides.o
      __ZN28StridesTest_MatrixShape_Test8TestBodyEv in gtest_tlib_strides.o
      __ZN28StridesTest_MatrixShape_Test8TestBodyEv in gtest_tlib_strides.o
      __ZN28StridesTest_MatrixShape_Test8TestBodyEv in gtest_tlib_strides.o
      __ZN28StridesTest_MatrixShape_Test8TestBodyEv in gtest_tlib_strides.o
  "__ZN7testing8internal20StringStreamToStringEPNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEE", referenced from:
      __ZN7testing8internal24CmpHelperFloatingPointEQIfEENS_15AssertionResultEPKcS4_T_S5_ in gtest_tlib_ttv.o
      __ZN7testing8internal24CmpHelperFloatingPointEQIfEENS_15AssertionResultEPKcS4_T_S5_ in gtest_tlib_ttv.o
  "__ZN7testing8internal9EqFailureEPKcS2_RKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESA_b", referenced from:
      __ZN7testing8internal18CmpHelperEQFailureIdmEENS_15AssertionResultEPKcS4_RKT_RKT0_ in gtest_tlib_mtv.o
      __ZN7testing8internal18CmpHelperEQFailureIfmEENS_15AssertionResultEPKcS4_RKT_RKT0_ in gtest_tlib_mtv.o
      __ZN7testing8internal11CmpHelperEQIjmEENS_15AssertionResultEPKcS4_RKT_RKT0_.constprop.0 in gtest_tlib_strides.o
      __ZN7testing8internal18CmpHelperEQFailureImmEENS_15AssertionResultEPKcS4_RKT_RKT0_ in gtest_tlib_strides.o
      __ZN7testing8internal18CmpHelperEQFailureImjEENS_15AssertionResultEPKcS4_RKT_RKT0_ in gtest_tlib_strides.o
      __ZN7testing8internal24CmpHelperFloatingPointEQIfEENS_15AssertionResultEPKcS4_T_S5_ in gtest_tlib_ttv.o
      __ZN7testing8internal18CmpHelperEQFailureIjjEENS_15AssertionResultEPKcS4_RKT_RKT0_ in gtest_tlib_workload.o
ld: symbol(s) not found
collect2: error: ld returned 1 exit status
make: *** [bin/main] Error 1

What do I miss? gtest and OpenBLAS are installed and work fine otherwise.

MatrixTimesVector tests fail on macOS ppc (all other pass)

[==========] 31 tests from 5 test suites ran. (95552 ms total)
[  PASSED  ] 28 tests.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] MatrixTimesVector.Gemv
[  FAILED  ] MatrixTimesVector.GemvParallel
[  FAILED  ] MatrixTimesVector.GemvBLAS

 3 FAILED TESTS

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.