Giter Site home page Giter Site logo

oneapi-src / onedpl Goto Github PK

View Code? Open in Web Editor NEW
715.0 37.0 112.0 16.67 MB

oneAPI DPC++ Library (oneDPL) https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-library.html

License: Apache License 2.0

C++ 97.45% Shell 0.04% Groovy 1.10% CMake 1.13% Makefile 0.07% NASL 0.07% Batchfile 0.02% BitBake 0.12%
oneapi

onedpl's Introduction

oneDPL is part of oneAPI

oneAPI DPC++ Library (oneDPL)

oneAPI DPC++ Library (oneDPL) works with the Intel® oneAPI DPC++/C++ Compiler to provide high-productivity APIs to developers, which can minimize Data Parallel C++ (DPC++) programming efforts across devices for high performance parallel applications.

Prerequisites

Install the Intel® oneAPI Base Toolkit (Base Kit) to use oneDPL. Refer to the specific system requirements for more information.

Release Information

Visit the latest Release Notes.

License

oneDPL is licensed under Apache License Version 2.0 with LLVM exceptions. Refer to the LICENSE file for the full license text and copyright notice.

Security

See the Intel Security Center for information on how to report a potential security issue or vulnerability. You can also view the Security Policy.

Contributing

See CONTRIBUTING.md for details.

Documentation

See the full documentation set for oneDPL.

Samples

You can find oneDPL samples at the oneDPL Samples page.

Support and Contribution

Please report issues and suggestions via GitHub issues.


Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.

* Other names and brands may be claimed as the property of others.

onedpl's People

Contributors

adamfidel avatar aidanbeltons avatar akukanov avatar alexandr-konovalov avatar alexveprev avatar andreyfe1 avatar anuyawelling2801 avatar danhoeflinger avatar dcbenito avatar dmitriy-sobolev avatar doyleli avatar dparanic avatar haonanya avatar jinge90 avatar joeatodd avatar julianmi avatar kboyarinov avatar kcenia4010 avatar kseniiabakina avatar leeks-int avatar mikedvorskiy avatar mmichel11 avatar paveldyakov avatar rarutyun avatar reble avatar sergeykopienko avatar tbbdev avatar timmiesmith avatar valentinakats avatar zheltovs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

onedpl's Issues

SIMD Support Detection Broken for Clang

or is it intended?
in include/pstl/internal/pstl_config add the following to the Enable SIMD

(__PSTL_CLANG_VERSION>39000)

results in at least compilable code on my macOS with LLVM7 built from source.

Integrate async API changes into heterogenous backend

Heterogenous backend supports asynchronous behavior by design. Goal is to merge changes from asynchronous backend from experimental feature async API into the generic heterogenous backend, so it can be reused.

Remove async API dpcpp backend.

Integrate changes for asynchronous pattern from experimental feature into dpcpp backend. Goal is to remove code duplication, overhead for blocking algorithm has to be analyzed carefully.

OneDPL headers inside extern "C++" wrapper

Can some one please provide any pointer to this build issue.

While compiling a cxx source file or compilation unit (using dpcpp compiler) the first line has #define _GLIBCXX_USE_TBB_PAR_BACKEND 0 // for GCC 10 to address limitations between oneTBB and oneDPL according to the release notes. This line has resolved the issues with oneTBB (task) build errors as documented in the release notes.

The below error was generated using an application with C++ source file involving 3 headers. Out of these 3 headers, 2 headers contain standard ISO C++ headers and the other has a mixture of oneDPL, SYCL and some more C++ headers.

I wasn't able to reproduce this build issue using an isolated test case with oneDPL, C++ headers and defining appropriate macro to ZERO.

In file included from /soft/......./sdk/2021.04.30.001/oneapi/dpl/latest/linux/include/oneapi/dpl/execution:32:
In file included from /soft/packaging/spack-builds/linux-opensuse_leap15-x86_64/gcc-10.2.0/gcc-10.2.0-yudlyezca7twgd5o3wkkraur7wdbngdn/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/execution:38:
/soft/packaging/spack-builds/linux-opensuse_leap15-x86_64/gcc-10.2.0/gcc-10.2.0-yudlyezca7twgd5o3wkkraur7wdbngdn/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/pstl/glue_algorithm_impl.h:30:12: error: use of undeclared identifier '__internal'; did you mean '__pstl::__internal'?
    return __internal::__pattern_any_of(
           ^
/soft/packaging/spack-builds/linux-opensuse_leap15-x86_64/gcc-10.2.0/gcc-10.2.0-yudlyezca7twgd5o3wkkraur7wdbngdn/lib/gcc/x86_64-pc-linux-gnu/10.2.0/../../../../include/c++/10.2.0/pstl/numeric_fwd.h:18:11: note: '__pstl::__internal' declared here
namespace __internal
          ^           

GCC: gcc (Spack GCC) 10.2.0
DPCPP: Intel(R) oneAPI DPC++/C++ Compiler 2021.2.0 (2021.x.0.20210323)

[PSTL] Severe performance degradation in nth_element when input contains large number of duplicates

Hi,

I hope this is the right place to file issues for intel's PSTL.

Background:

  • We have a relatively large dataset (100 million rows) for ML that we would like to split by timestamp (int64) to 70%:30%.
  • For some reason the precision of those timestamps are only up to minutes, which resulted in large number of duplicated values in the dataset.
  • The execution time of std::nth_element(pstl::execution::par, begin, nth, end) jumped more than 10x compared to other datasets that have little to no duplicate values.

I ran a test with 30 million int64 values, with various number of duplicated values at the nth position (code attached at the end). And I got following result (dual socket Xeon Gold 6132, 14C/28T each, gcc 9.3 with -O3):

sequential   unique (seconds): 0.315 0.319 0.315 0.315 0.317  (avg = 0.316)
parallel     unique (seconds): 0.212 0.190 0.200 0.182 0.186  (avg = 0.194)
sequential   1k dup (seconds): 0.329 0.326 0.327 0.329 0.327  (avg = 0.328)
parallel     1k dup (seconds): 0.249 0.263 0.223 0.230 0.219  (avg = 0.237)
sequential  10k dup (seconds): 0.394 0.390 0.390 0.390 0.390  (avg = 0.391)
parallel    10k dup (seconds): 0.406 0.362 0.374 0.349 0.363  (avg = 0.371)
sequential 100k dup (seconds): 0.292 0.293 0.291 0.291 0.292  (avg = 0.292)
parallel   100k dup (seconds): 2.103 2.090 2.089 2.066 2.101  (avg = 2.090)
sequential   1m dup (seconds): 0.323 0.322 0.323 0.322 0.322  (avg = 0.322)
parallel     1m dup (seconds): 28.456 28.364 28.188 28.215 28.377  (avg = 28.320)

I made a simple change to this line to skip duplicate values at the pivot point:

if (!__comp(*__nth, *__x) && !__comp(*__x, *__nth))

-            // if *x == *nth then we can start new partition with x+1
-            if (!__comp(*__nth, *__x) && !__comp(*__x, *__nth))
+            // if *x == *nth then we start the new partition at the next index where *x != *nth
+            while (!__comp(*__nth, *__x) && !__comp(*__x, *__nth) && __x - __nth < 0)
             {
                 ++__x;
             }
-            else
-            {
-                iter_swap(__nth, __x);
-            }
+
+            iter_swap(__nth, __x);
             __first = __x;

Then the execution time goes back to normal:

sequential   unique (seconds): 0.335 0.335 0.334 0.333 0.333  (avg = 0.334)
parallel     unique (seconds): 0.185 0.171 0.186 0.188 0.147  (avg = 0.175)
sequential   1k dup (seconds): 0.358 0.363 0.358 0.356 0.361  (avg = 0.359)
parallel     1k dup (seconds): 0.243 0.203 0.207 0.231 0.228  (avg = 0.223)
sequential  10k dup (seconds): 0.415 0.413 0.414 0.414 0.415  (avg = 0.414)
parallel    10k dup (seconds): 0.194 0.192 0.199 0.190 0.181  (avg = 0.191)
sequential 100k dup (seconds): 0.312 0.311 0.308 0.310 0.311  (avg = 0.311)
parallel   100k dup (seconds): 0.249 0.221 0.219 0.209 0.258  (avg = 0.231)
sequential   1m dup (seconds): 0.333 0.334 0.334 0.332 0.334  (avg = 0.333)
parallel     1m dup (seconds): 0.206 0.266 0.214 0.199 0.168  (avg = 0.211)

I'm not 100% sure if this is the best solution in parallel context (gcc's libstdc++ falls back to heap-select when quick-select is not progressing quick enough). But at least it solves our problem (or probably we should just stick to the non-parallel version).

I hope this can be fixed in future releases. Thanks.

===
Code for Performance Test:

#include <algorithm>
#include <chrono>
#include <iostream>
#include <vector>

#include <pstl/algorithm>
#include <pstl/execution>

std::vector<int64_t> gen_test_data(size_t total_size, size_t num_dup, size_t nth) {
    std::vector<int64_t> result(total_size, 0);
    for (size_t i = 0; i < total_size; ++i) {
        result[i] = i;
    }

    size_t dup_start = nth > (num_dup / 2) ? nth - num_dup / 2 : 0;
    for (size_t i = 0; i < num_dup; ++i) {
        result[dup_start + i] = result[dup_start];
    }
    std::random_shuffle(result.begin(), result.end());
    return result;
}

void run_and_measure(const std::string &desc,
        const std::vector<int64_t> &test_data,
        std::function<void(std::vector<int64_t>)> func) {
    using clock_t = std::chrono::high_resolution_clock;
    using sec_t = std::chrono::duration<double, std::ratio<1>>;
    std::cout << desc << " (seconds): " << std::flush;
    double total_duration = 0;
    for (size_t i = 0; i < 5; ++i) {
        auto t_start = clock_t::now();
        func(test_data);
        auto t_end = clock_t::now();
        auto duration = std::chrono::duration_cast<sec_t>(t_end-t_start).count();

        printf("%.3f ", duration);
        std::cout << std::flush;
        total_duration += duration;
    }
    printf(" (avg = %.3f)\n", (total_duration / 5));
};

void test_nth_element() {
    const size_t size = 30000000;  // 30 million

    std::vector<size_t> dup_counts{0, 1000, 10000, 100000, 1000000};
    std::vector<std::string> desc {"  unique", "  1k dup", " 10k dup", "100k dup", "  1m dup"};
    for (size_t i = 0; i < dup_counts.size(); ++i) {
        auto test_data = gen_test_data(size, dup_counts[i], 10000000);
        run_and_measure("sequential " + desc[i], test_data, [](std::vector<int64_t> data){
            std::nth_element(data.begin(), data.begin() + 10000000, data.end());
        });
        run_and_measure("parallel   " + desc[i], test_data, [](std::vector<int64_t> data){
            std::nth_element(pstl::execution::par, data.begin(), data.begin() + 10000000, data.end());
        });
    }
}

int main(int, char **) {
    test_nth_element();
}

Can't run demo code with UHD630 cause ERROR InvalidBuiltinSetName: Expects OpenCL12, OpenCL20.

Can't run demo code with UHD630
Throw error InvalidBuiltinSetName: Expects OpenCL12, OpenCL20. Actual is OpenCL.DebugInfo.100

Intel(R) FPGA Emulation Platform for OpenCL(TM)
Device: Intel(R) FPGA Emulation Device
Intel(R) OpenCL
Device: Intel(R) Core(TM) i5-9400 CPU @ 2.90GHz
Intel(R) OpenCL HD Graphics
Device: Intel(R) UHD Graphics 630
SYCL host platform
Device: SYCL host device
Running on device: Intel(R) UHD Graphics 630
driver_version: 27.20.100.8681
max_computer_units: 23
local_computer_units: 65536
Vector size: 10000
InvalidBuiltinSetName: Expects OpenCL12, OpenCL20. Actual is OpenCL.DebugInfo.100 [Src: D:\qb\workspace\21461\source\gfx
-driver\Source\IGC\AdaptorOCL\SPIRV\libSPIRV\SPIRVModule.cpp:565 SPIRVBuiltinSetNameMap::rfind(BuiltinSetName, &BuiltinS
et) ]

internal compiler error, abnormal program termination

counting_iterator is not default constructible (ForwardIterator requirement)

counting_iterator is very useful when you want to access several containers during parallel execution.

The iterator (based on TBB) is modelled as random access iterator, but it's missing a default constructor (that is required by ForwardIterator type - it's "base" class as an iterator)
https://en.cppreference.com/w/cpp/named_req/ForwardIterator

is it easy to fix the issue and provide such default constructor?

it looks like ZipIterator has the same issue

I've noticed that issue when I tried to use counting_iterator with MSVC parallel stl implementation, as MSVC needs to default construct an iterator.

CL/sycl.hpp and ::sycl namespace usage

When rebasing my changes to support hipSYCL on the latest oneDPL, I noticed the following issue that I would like to get guidance on w.r.t. how I can best fix this upstream.

oneDPL and all tests include CL/sycl.hpp, but expect that SYCL objects are exposed in ::sycl. This is the case in DPC++, but I believe that this is neither guaranteed nor demanded by SYCL 2020. The relevant section is 4.3 of the SYCL 2020 spec:

SYCL provides one standard header file: <sycl/sycl.hpp>, which needs to be included in every translation unit that uses the SYCL programming API.
All SYCL classes, constants, types and functions defined by this specification should exist within the ::sycl namespace.
For compatibility with SYCL 1.2.1, SYCL provides another standard header file: <CL/sycl.hpp>, which can be included in place of <sycl/sycl.hpp>. In that case, all SYCL classes, constants, types and functions defined by this specification should exist within the ::cl::sycl C++ namespace.

My reading of this is that, when CL/sycl.hpp is used, everything should be exclusively in ::cl::sycl (after all, also exposing ::sycl would unnecessarily pollute the global namespace). As such, hipSYCL only exposes ::sycl when including the new header sycl/sycl.hpp.

This issue is also discussed in the Khronos SYCL WG, see e.g. KhronosGroup/SYCL-CTS#108

Since any changes to fix this potentially affect a lot of files in oneDPL, I'd like some guidance how this can best be resolved. My ideas are:

  • Does DPC++ already support the sycl/sycl.hpp header? If so, changing all includes to this header would be best, as this is the official SYCL 2020 header.
  • Alternatively we could special case all includes to include sycl/sycl.hpp when using hipSYCL
  • Or add some using namespace ::cl somewhere when using hipSYCL?

std::transform fails to apply to all elements

The following minimal reproducer fails randomly when using the par and par_unseq execution policies, but passes when using the seq or just the default transform.

#include <algorithm>
#include <execution>

#include <assert.h>

int main(int argc, char **argv) {
  constexpr int size = 1024;
  std::vector<double> v(size);
  for (int i = 0; i < size; i++) {
    const double value = static_cast<double>(i);
    v[i] = value * value + 5.0 * value;
  }

  constexpr double min_v = 1000000.0;
  std::vector<bool> ge_v(size);
  for (int pass = 0; pass < 10; pass++) {
    std::transform(std::execution::par, v.begin(), v.end(), ge_v.begin(),
                   [](const double v) { return v >= min_v; });
    for (int i = 0; i < v.size(); i++) {
      assert((v[i] >= min_v) == ge_v[i]);
    }
  }
  return 0;
}

This is with the version of the parallel included with GCC 9.1. My GCC version is 9.1.1 20190503 (Red Hat 9.1.1-1). My version of tbb is tbb-devel-2019.5-1.fc30.x86_64, which I believe is just the latest release, 2019 update 5.

Homebrew support

TBB has Homebrew support: tbb.tb.

Because PSTL is just headers, the formula should be extremely simple, as it just needs to grab the headers and register TBB as a dependency.

I'll try to work on this but it would be great if someone else did it first.

Update to TBB 2018 U2

I managed to use parallelstl (with gcc 7.2.1 on Linux) only reverting to TBB 2018 (Update 1 and 2 didn't link to tbb). Is it possible to have an updated version in the next future?

Thanks

handle sycl event

Hi, I'd like to know how I can handle the dependency when using kernels in oneDPL library.

For example, if there comes three kernels which should be executed in order:

kernel1 -> stable_sort -> kernel2

how to deal with the dependency tree by using sycl event?

Really thanks:)

Instruction to build and run tests, benchmark

Hello,

I want to test this repo as I plan to use c++17 TS Parallel in my library.

My system is on g++-9, Ubuntu, which should be a supported configuration.

When I clone this repo, cmake runs fine and finds TBB 2019 (which I installed).
Running make then produces nothing at all.

I think cmake targe all should compile all code required, or am I doing something wrong?
Thanks for any help

demo examples

May you please add code examples for the extension APIs (e.g. the segmented scan and reduce) ?

Thank you

Add transform_output_iterator

oneDPL has a variety of useful fancy iterators and views (transform, counting, iota, zip, permutation ...), but is missing an important type of iterator that can be used to fuse consecutive kernel calls and avoid intermediate storage:

transform_iterator operates like memory -> transformation -> kernel -> memory,
transform_output_iterator would operate like memory -> kernel -> transformation -> memory

A comparable iterator type would be Thrust's transform_output_iterator and transform_input_output_iterator wrappers.

[CMake] Incorrect TBB version detected by FindTBB.cmake

Regardless of version and where the TBB package coming from, latest version via GetTBB.cmake, Intel's or aligned channels, for example:

$ conda list | grep tbb
tbb                       2019.6               hc9558a2_0    conda-forge
tbb-devel                 2019.6               hc9558a2_0    conda-forge

Still, FindTBB.cmake still reports 2019.0 instead:
-- Found TBB: TBB::tbb;TBB::tbbmalloc (found version "2019.0") found components: tbb tbbmalloc

It also does not report location/dir where TBB is found like other packages do, e.g.:

-- Found Brotli: /local/miniconda3/envs/arrow/lib/libbrotlicommon.so
-- Checking for module 'libglog'
--   Found libglog, version 0.3.5
-- Found GLOG: /local/miniconda3/envs/arrow/include
-- Building (vendored) jemalloc from source
-- Found GTest: /local/miniconda3/envs/arrow/lib/libgtest.so
-- RapidJSON found. Headers: /local/miniconda3/envs/arrow/include
-- Found ZLIB: /local/miniconda3/envs/arrow/lib/libz.so (found version "1.2.11")
-- Checking for module 'liblz4'
--   Found liblz4, version 1.8.1
-- Found Lz4: /local/miniconda3/envs/arrow/lib/liblz4.so
-- Checking for module 'libzstd'
--   Found libzstd, version 1.3.3
-- Found ZSTD: /local/miniconda3/envs/arrow/lib/libzstd.so
-- Boost version: 1.68.0
-- Found the following Boost libraries:
--   regex
--   system
--   filesystem
-- Boost include dir: /local/miniconda3/envs/arrow/include

Integrate oneDPL extension tests into test framework

The tests for the by_segment and multi-value binary search algorithms need to be refactored to use the test framework used by the rest of the pstl algorithms. Specific issues to be addressed by this include

  • Updating USM tests to use std::vector<usm::alloc> instead of raw allocations.
  • Using invoke_on_all_hetero_policies and the approach from test1buffer, test2buffers, etc. as is done for other tests

FindTBB.cmake should check for TBB_FOUND

I saw that you have recently added FindTBB.cmake which is used by CMakeLists.txt. I think you should put all contents from FindTBB.cmake in a

if(NOT TBB_FOUND)
...
endif()

At least that is what is done in most other FindXYZ.cmake files.
In my case find_library(TBB) is called somewhere else und produces errors when FindTBB.cmake tries to add targets twice.

Cannot build with CMake

Any ideas? Just trying to build a simple CMake project.

cmake_minimum_required(VERSION 3.2.2)
project(Tutorial)

set(CMAKE_CXX_STANDARD 11)

include_directories("${PROJECT_BINARY_DIR}")

include_directories("/opt/intel/tbb/include")
include_directories("/opt/intel/pstl/include")

set(common_libraries stdc++ pthread)
add_executable(main ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp)

# Input: add an additional dependency: tbb_debug.lib or tbb.lib ?

link_directories(/opt/intel/compilers_and_libraries_2018.1.163/linux/tbb/lib/intel64/gcc4.7)
target_link_libraries(main PRIVATE ${common_libraries})
#include <iostream>

#include "pstl/execution"
#include "pstl/algorithm"

int main() {
}
dom@dom:~/gu/Project/playground/intel/build$ cmake ..
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dom/gu/Project/playground/intel/build
dom@dom:~/gu/Project/playground/intel/build$ make
Scanning dependencies of target main
[ 50%] Building CXX object CMakeFiles/main.dir/main.cpp.o
[100%] Linking CXX executable main
CMakeFiles/main.dir/main.cpp.o: In function `__icp_algorithm::__PSTL_get_workers_num()':
main.cpp:(.text+0x5): undefined reference to `tbb::internal::tbb_thread_v3::hardware_concurrency()'
collect2: error: ld returned 1 exit status
CMakeFiles/main.dir/build.make:94: recipe for target 'main' failed
make[2]: *** [main] Error 1
CMakeFiles/Makefile2:67: recipe for target 'CMakeFiles/main.dir/all' failed
make[1]: *** [CMakeFiles/main.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

CMake config cannot be used with macOS Homebrew

This is a problem with some similarities to #16; see discussion on Homebrew/homebrew-core#53751.

The gist is that CMake doesn't install stdlib and that Homebrew (rightly) doesn't want to link the directory into /usr/local. /usr/local/lib/cmake/ParallelSTL is a symbolic link to the true install directory /usr/local/Cellar/parallelstl/<version>/lib/cmake/ParallelSTL, but ParallelSTLTargets.cmake doesn't recognize this, so tries to find stdlib in /usr/local rather than in /usr/local/Cellar/parallelstl/<version>.

I'm not sure what the proper solution from your end is, but one way might be to install stdlib into ${CMAKE_INSTALL_PREFIX}/include/pstl and adjusting the INSTALL_INTERFACE to accommodate:

-    $<INSTALL_INTERFACE:stdlib>)
+    $<INSTALL_INTERFACE:include/pstl/stdlib>)
...
+install(DIRECTORY stdlib
+        DESTINATION include/pstl)

article request

Could you write article about this project at habrahabr.ru in Intel blog?

Namespace conflict or something missing

Hi there,
I installed the Intel® oneAPI Base Toolkit for Linux* (Beta) thorugh the Linux Local installer. When I was trying to build an oneDPL example(dpc_reduce) from the oneapi-cli, it failed to build. Below is the debug information

kim@kim-NUC10i7FNK:~/dpc_reduce/build$ cmake ..
-- Configuring done
-- Generating done
-- Build files have been written to: /home/kim/dpc_reduce/build
kim@kim-NUC10i7FNK:~/dpc_reduce/build$ make
[ 50%] Building CXX object CMakeFiles/dpc_reduce.dir/src/main.cpp.o
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
In file included from /opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:20:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/pstl/pstl_config.h:23:9: warning: '_PSTL_VERSION' macro redefined [-Wmacro-redefined]
#define _PSTL_VERSION 10000
        ^
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/pstl/pstl_config.h:14:9: note: previous definition is here
#define _PSTL_VERSION 9000
        ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
In file included from /opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:20:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/pstl/pstl_config.h:135:9: warning: '_PSTL_CPP17_EXECUTION_POLICIES_PRESENT' macro redefined [-Wmacro-redefined]
#define _PSTL_CPP17_EXECUTION_POLICIES_PRESENT                                                                         \
        ^
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/pstl/pstl_config.h:84:9: note: previous definition is here
#define _PSTL_CPP17_EXECUTION_POLICIES_PRESENT (_MSC_VER >= 1912)
        ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
In file included from /opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:20:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/pstl/pstl_config.h:157:9: warning: '_PSTL_UDS_PRESENT' macro redefined [-Wmacro-redefined]
#define _PSTL_UDS_PRESENT (__INTEL_COMPILER >= 1900 && __INTEL_COMPILER_BUILD_DATE >= 20180626)
        ^
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/pstl/pstl_config.h:106:13: note: previous definition is here
#    define _PSTL_UDS_PRESENT 0
            ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
In file included from /opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:20:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/pstl/pstl_config.h:180:9: warning: '_PSTL_PRAGMA_DECLARE_REDUCTION' macro redefined [-Wmacro-redefined]
#define _PSTL_PRAGMA_DECLARE_REDUCTION(NAME, OP)                                                                       \
        ^
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/pstl/pstl_config.h:130:9: note: previous definition is here
#define _PSTL_PRAGMA_DECLARE_REDUCTION(NAME, OP)                                                                       \
        ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:60:20: error: no member named 'any_of' in namespace 'oneapi::dpl'
using oneapi::dpl::any_of;
      ~~~~~~~~~~~~~^
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:61:20: error: no member named 'all_of' in namespace 'oneapi::dpl'
using oneapi::dpl::all_of;
      ~~~~~~~~~~~~~^
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:62:20: error: no member named 'none_of' in namespace 'oneapi::dpl'
using oneapi::dpl::none_of;
      ~~~~~~~~~~~~~^
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:63:20: error: no member named 'for_each' in namespace 'oneapi::dpl'
using oneapi::dpl::for_each;
      ~~~~~~~~~~~~~^
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:64:7: error: no member named 'for_each_n' in namespace 'oneapi::dpl'; did you mean '::std::for_each_n'?
using oneapi::dpl::for_each_n;
      ^~~~~~~~~~~~~~~~~~~~~~~
      ::std::for_each_n
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:3858:5: note: '::std::for_each_n' declared here
    for_each_n(_InputIterator __first, _Size __n, _Function __f)
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:65:7: error: no member named 'find_if' in namespace 'oneapi::dpl'; did you mean '::std::find_if'?
using oneapi::dpl::find_if;
      ^~~~~~~~~~~~~~~~~~~~
      ::std::find_if
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:3919:5: note: '::std::find_if' declared here
    find_if(_InputIterator __first, _InputIterator __last,
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:66:7: error: no member named 'find_if_not' in namespace 'oneapi::dpl'; did you mean '::std::find_if_not'?
using oneapi::dpl::find_if_not;
      ^~~~~~~~~~~~~~~~~~~~~~~~
      ::std::find_if_not
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:505:5: note: '::std::find_if_not' declared here
    find_if_not(_InputIterator __first, _InputIterator __last,
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:67:20: error: no member named 'find' in namespace 'oneapi::dpl'
using oneapi::dpl::find;
      ~~~~~~~~~~~~~^
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:68:7: error: no member named 'find_end' in namespace 'oneapi::dpl'; did you mean '::std::find_end'?
using oneapi::dpl::find_end;
      ^~~~~~~~~~~~~~~~~~~~~
      ::std::find_end
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:367:5: note: '::std::find_end' declared here
    find_end(_ForwardIterator1 __first1, _ForwardIterator1 __last1,
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:69:7: error: no member named 'find_first_of' in namespace 'oneapi::dpl'; did you mean '::std::find_first_of'?
using oneapi::dpl::find_first_of;
      ^~~~~~~~~~~~~~~~~~~~~~~~~~
      ::std::find_first_of
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:3951:5: note: '::std::find_first_of' declared here
    find_first_of(_InputIterator __first1, _InputIterator __last1,
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:70:7: error: no member named 'adjacent_find' in namespace 'oneapi::dpl'; did you mean '::std::adjacent_find'?
using oneapi::dpl::adjacent_find;
      ^~~~~~~~~~~~~~~~~~~~~~~~~~
      ::std::adjacent_find
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:4025:5: note: '::std::adjacent_find' declared here
    adjacent_find(_ForwardIterator __first, _ForwardIterator __last)
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:71:20: error: no member named 'count' in namespace 'oneapi::dpl'
using oneapi::dpl::count;
      ~~~~~~~~~~~~~^
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:72:7: error: no member named 'count_if' in namespace 'oneapi::dpl'; did you mean '::std::count_if'?
using oneapi::dpl::count_if;
      ^~~~~~~~~~~~~~~~~~~~~
      ::std::count_if
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:4101:5: note: '::std::count_if' declared here
    count_if(_InputIterator __first, _InputIterator __last, _Predicate __pred)
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:73:7: error: no member named 'search' in namespace 'oneapi::dpl'; did you mean '::bsearch'?
using oneapi::dpl::search;
      ^~~~~~~~~~~~~~~~~~~
      ::bsearch
/usr/include/x86_64-linux-gnu/bits/stdlib-bsearch.h:20:1: note: '::bsearch' declared here
bsearch (const void *__key, const void *__base, size_t __nmemb, size_t __size,
^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:74:7: error: no member named 'search_n' in namespace 'oneapi::dpl'; did you mean '::std::search_n'?
using oneapi::dpl::search_n;
      ^~~~~~~~~~~~~~~~~~~~~
      ::std::search_n
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:4218:5: note: '::std::search_n' declared here
    search_n(_ForwardIterator __first, _ForwardIterator __last,
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:75:20: error: no member named 'copy' in namespace 'oneapi::dpl'
using oneapi::dpl::copy;
      ~~~~~~~~~~~~~^
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:76:7: error: no member named 'copy_n' in namespace 'oneapi::dpl'; did you mean '::std::copy_n'?
using oneapi::dpl::copy_n;
      ^~~~~~~~~~~~~~~~~~~
      ::std::copy_n
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:768:5: note: '::std::copy_n' declared here
    copy_n(_InputIterator __first, _Size __n, _OutputIterator __result)
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:77:7: error: no member named 'copy_if' in namespace 'oneapi::dpl'; did you mean '::std::copy_if'?
using oneapi::dpl::copy_if;
      ^~~~~~~~~~~~~~~~~~~~
      ::std::copy_if
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/stl_algo.h:688:5: note: '::std::copy_if' declared here
    copy_if(_InputIterator __first, _InputIterator __last,
    ^
In file included from /home/kim/dpc_reduce/src/main.cpp:14:
/opt/intel/oneapi/dpl/2021.1-beta10/linux/include/oneapi/dpl/algorithm:78:7: error: no member named 'swap_ranges' in namespace 'oneapi::dpl'; did you mean '::std::swap_ranges'?
using oneapi::dpl::swap_ranges;
      ^~~~~~~~~~~~~~~~~~~~~~~~
      ::std::swap_ranges
/usr/lib/gcc/x86_64-linux-gnu/10/../../../../include/c++/10/bits/algorithmfwd.h:661:5: note: '::std::swap_ranges' declared here
    swap_ranges(_FIter1, _FIter1, _FIter2);
    ^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
4 warnings and 20 errors generated.
CMakeFiles/dpc_reduce.dir/build.make:62: recipe for target 'CMakeFiles/dpc_reduce.dir/src/main.cpp.o' failed
make[2]: *** [CMakeFiles/dpc_reduce.dir/src/main.cpp.o] Error 1
CMakeFiles/Makefile2:99: recipe for target 'CMakeFiles/dpc_reduce.dir/all' failed
make[1]: *** [CMakeFiles/dpc_reduce.dir/all] Error 2
Makefile:83: recipe for target 'all' failed
make: *** [all] Error 2

My environemnt is Ubuntu 18.04 LTS on an Intel NUC10.

I also tried to specify the include path with -I parameter but it still could not build.

It seems that there exists some namespace conflict between oneDPL and gcc's standard library, or maybe some files are missing that they were not loaded correctly during building.

I look forward to your reply. Thanks very much in advance!

Issues with reduce_by_segment with zip_iterators

Hi,

I was trying to figure out a test case that involves using dpl::reduce_by_segment with zip_iterators(tuple) and was facing some difficultly with compiling it.

Can someone please comment if there is something wrong the way the test case is setup or otherwise.

#define PSTL_USE_PARALLEL_POLICIES 0
#define _GLIBCXX_USE_TBB_PAR_BACKEND 0

#include <CL/sycl.hpp>
#include <oneapi/dpl/execution>
#include <oneapi/dpl/algorithm>
#include <oneapi/dpl/iterator>
#include <oneapi/dpl/functional>

#include <functional>
#include <iostream>
#include <vector>

int main()
{
    sycl::queue q(sycl::gpu_selector{});

    std::vector<int> keys1{11, 11, 21, 20, 21, 21, 21, 37, 37};
    std::vector<int> keys2{11, 11, 20, 20, 20, 21, 21, 37, 37};
    std::vector<int> values{0, 1, 2, 3, 4, 5, 6, 7, 8};
    std::vector<int> output_keys1(keys1.size());
    std::vector<int> output_keys2(keys2.size());    
    std::vector<int> output_values(values.size());

    int* d_keys1         = sycl::malloc_device<int>(9, q);
    int* d_keys2         = sycl::malloc_device<int>(9, q);
    int* d_values        = sycl::malloc_device<int>(9, q);
    int* d_output_keys1  = sycl::malloc_device<int>(9, q);
    int* d_output_keys2  = sycl::malloc_device<int>(9, q);
    int* d_output_values = sycl::malloc_device<int>(9, q);

    q.memcpy(d_keys1, keys1.data(), sizeof(int)*9);
    q.memcpy(d_keys2, keys2.data(), sizeof(int)*9);
    q.memcpy(d_values, values.data(), sizeof(int)*9);

    auto begin_keys_in = oneapi::dpl::make_zip_iterator(d_keys1, d_keys2);
    auto end_keys_in   = oneapi::dpl::make_zip_iterator(d_keys1 + 9, d_keys2 + 9);
    auto begin_keys_out= oneapi::dpl::make_zip_iterator(d_output_keys1, d_output_keys2);

    auto new_last = oneapi::dpl::reduce_by_segment(oneapi::dpl::execution::make_device_policy(q),
						   begin_keys_in, end_keys_in, d_values, begin_keys_out, d_output_values);

    q.memcpy(output_keys1.data(), d_output_keys1, sizeof(int)*9);
    q.memcpy(output_keys2.data(), d_output_keys2, sizeof(int)*9);    
    q.memcpy(output_values.data(), d_output_values, sizeof(int)*9);
    q.wait();

    // Expected output
    // {11, 11}: 1
    // {21, 20}: 2
    // {20, 20}: 3
    // {21, 20}: 4
    // {21, 21}: 11
    // {37, 37}: 15
    for(int i=0; i<9; i++) {
      std::cout << "{" << output_keys1[i] << ", " << output_keys2 << "}: " << output_values[i] << std::endl;
    }
}

Environment
Target device and vendor: Intel GPUs
DPC++ version: Intel(R) oneAPI DPC++/C++ Compiler 2021.2.0 (2021.x.0.20210323)

Doesn't compile with C++20

std::binary_negate and std::unary_negate were removed in C++20:

/opt/local/include/oneapi/dpl/internal/../functional:31:14: error: no member named 'binary_negate' in namespace 'std'
using ::std::binary_negate;
      ~~~~~~~^
/opt/local/include/oneapi/dpl/internal/../functional:61:14: error: no member named 'unary_negate' in namespace 'std'
using ::std::unary_negate;
      ~~~~~~~^
In file included from 18.cpp:2:
In file included from /opt/local/include/oneapi/dpl/execution:38:
In file included from /opt/local/include/oneapi/dpl/internal/binary_search_impl.h:22:
/opt/local/include/oneapi/dpl/internal/../pstl/iterator_impl.h:708:39: warning: unused parameter 'other' [-Wunused-parameter]
    operator==(const ignore_copyable& other) const

/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:23:14: error: no member named 'add_const' in namespace 'std'

It seems like I should be able to compile this. This error was distilled out of a more complex application (LULESH).

bug.cc

#include <type_traits>

int main(void)
{
    return 0;
}

error

$ icpx -c -fast -std=c++17 -I/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl bug.cc 
In file included from bug.cc:1:
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:23:14: error: no member named 'add_const' in namespace 'std'
using ::std::add_const;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:24:14: error: no member named 'add_cv' in namespace 'std'
using ::std::add_cv;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:25:14: error: no member named 'add_lvalue_reference' in namespace 'std'
using ::std::add_lvalue_reference;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:26:14: error: no member named 'add_pointer' in namespace 'std'
using ::std::add_pointer;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:27:14: error: no member named 'add_rvalue_reference' in namespace 'std'
using ::std::add_rvalue_reference;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:28:14: error: no member named 'add_volatile' in namespace 'std'
using ::std::add_volatile;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:29:14: error: no member named 'aligned_storage' in namespace 'std'
using ::std::aligned_storage;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:30:14: error: no member named 'aligned_union' in namespace 'std'
using ::std::aligned_union;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:31:14: error: no member named 'alignment_of' in namespace 'std'
using ::std::alignment_of;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:32:14: error: no member named 'common_type' in namespace 'std'
using ::std::common_type;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:33:14: error: no member named 'conditional' in namespace 'std'
using ::std::conditional;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:34:14: error: no member named 'decay' in namespace 'std'
using ::std::decay;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:35:14: error: no member named 'enable_if' in namespace 'std'
using ::std::enable_if;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:36:14: error: no member named 'extent' in namespace 'std'
using ::std::extent;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:37:14: error: no member named 'false_type' in namespace 'std'
using ::std::false_type;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:38:14: error: no member named 'has_virtual_destructor' in namespace 'std'
using ::std::has_virtual_destructor;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:39:14: error: no member named 'integral_constant' in namespace 'std'
using ::std::integral_constant;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:40:14: error: no member named 'is_abstract' in namespace 'std'
using ::std::is_abstract;
      ~~~~~~~^
/opt/intel/oneapi/dpl/2021.2.0/linux/include/oneapi/dpl/type_traits:41:14: error: no member named 'is_arithmetic' in namespace 'std'
using ::std::is_arithmetic;
      ~~~~~~~^
fatal error: too many errors emitted, stopping now [-ferror-limit=]
20 errors generated.

compiler info

$ icpx --version
Intel(R) oneAPI DPC++ Compiler 2021.2.0 (2021.2.0.20210317)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/intel/oneapi/compiler/2021.2.0/linux/bin

Error on configuring parallelstl using CMake on Windows

Complaining that a TBB CMake directory is missing, though TBB is not build (doesn't provide) CMake configuration and building

**CMake Error at CMakeLists.txt:39 (find_package):
By not providing "FindTBB.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "TBB", but
CMake did not find one.

Could not find a package configuration file provided by "TBB" (requested
version 2018) with any of the following names:

TBBConfig.cmake
tbb-config.cmake

Add the installation prefix of "TBB" to CMAKE_PREFIX_PATH or set "TBB_DIR"
to a directory containing one of the above files. If "TBB" provides a
separate development package or SDK, be sure it has been installed.**

Error with TBB CMake on OS X

Hello,

I've been using parallelstl for a few months, and it is really a fantastic library.

I've just upgraded to the latest release, which includes CMake support. But, when I add the parallelstl subdirectory in CMake, I get the following error:

CMake Error at /usr/local/lib/cmake/TBB/TBBConfig.cmake:77 (message):
  Missed required Intel TBB component: tbb
Call Stack (most recent call first):
  third_party/parallelstl-20180619/CMakeLists.txt:39 (find_package)


-- Configuring incomplete, errors occurred!

To be clear, I definitely have TBB installed. I'm on OS X and I installed it via brew. Its header files and libraries are setting in /usr/local/include and /usr/local/lib respectively. If I skip the parallelstl CMake and add things manually, everything is fine.

I was hoping there would be some variable somewhere, like TBB_ROOT or something, that I could set that would make the TBB CMake happy, but so far nothing has worked. Any thoughts?

Thanks!

Refactoring broke glue_algorithm_defs.h

pstl namespace usages should have also been renamed to __pstl in glue_algorithm_defs.h, commit b962d08

I got errors trying to build the current master branch, and these were fixed by doing this renaming.

[BUG] Terrible DPC++ configuration for Linux Ubuntu machine

What is going on

Here is an instruction from intel how to configure latest DPC++ in a new machine:

https://intel.github.io/llvm-docs/GetStartedGuide.html

From "Build DPC++ toolchain" and "Use DPC++ toolchain" , they just don't tell it clear how to build and test the DPC++ libraries.

After compilation, I try to run one of test cases from "oneAPI-Samples" for reduce operation. It apparently failed.

Expected

  1. build it in a linux env like Ubuntu 18.04 and run it with a example. Make it as simple as few lines of automation codes in a script
  2. pass the test. Being exported and used in a CMake project

Build test failed:

OS : Ubuntu 18.04 docker OS, with necessary build essential toolkit
gcc : 7.5
cmake : 3.19

[Docker-cyber_dev_base] yiakwy@yiakwy-XPS-15-9500:~/sycl_workspace$ python llvm/buildbot/check.py 
args:Namespace(base_branch=None, branch=None, build_number=None, builder_dir=None, obj_dir=None, pr_number=None, src_dir=None, test_suite='check-all')
[Cmake Command]: cmake --build /home/yiakwy/sycl_workspace/llvm/build -- check-all -j 12
[2/3] cd /home/yiakwy/sycl_workspace/llvm/clang/bindings/python && /...ycl_workspace/llvm/build/lib /usr/bin/python3.6 -m unittest discover
..............................................................................................................................
----------------------------------------------------------------------
Ran 126 tests in 0.393s

OK
[2/3] Running all regression tests
llvm-lit: /home/yiakwy/sycl_workspace/llvm/llvm/utils/lit/lit/llvm/config.py:428: note: using clang: /home/yiakwy/sycl_workspace/llvm/build/bin/clang
llvm-lit: /home/yiakwy/sycl_workspace/llvm/llvm/utils/lit/lit/llvm/config.py:428: note: using clang: /home/yiakwy/sycl_workspace/llvm/build/bin/clang
FAIL: LLVM :: tools/llvm-ranlib/D-flag.test (6208 of 72287)
******************** TEST 'LLVM :: tools/llvm-ranlib/D-flag.test' FAILED ********************
Script:
--
: 'RUN: at line 4';   /home/yiakwy/sycl_workspace/llvm/build/bin/yaml2obj /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/../llvm-ar/Inputs/add-lib1.yaml -o /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.o
: 'RUN: at line 5';   env TZ=UTC touch -t 200001020304 /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.o
: 'RUN: at line 6';   rm -f /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp-no-index.a && /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ar cqSU /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp-no-index.a /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.o
: 'RUN: at line 9';   env TZ=UTC /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ar tv /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp-no-index.a | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=REAL-VALUES
: 'RUN: at line 12';   cp /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp-no-index.a /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a && /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib -D /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a
: 'RUN: at line 13';   env TZ=UTC /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ar tv /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=DETERMINISTIC-VALUES
: 'RUN: at line 16';   cp /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp-no-index.a /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a && /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib -U /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a
: 'RUN: at line 17';   env TZ=UTC /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ar tv /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=REAL-VALUES
: 'RUN: at line 20';   cp /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp-no-index.a /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a && /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib -UDU /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a
: 'RUN: at line 21';   env TZ=UTC /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ar tv /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=REAL-VALUES
: 'RUN: at line 22';   cp /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp-no-index.a /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a && /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib -UUD /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a
: 'RUN: at line 23';   env TZ=UTC /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ar tv /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=DETERMINISTIC-VALUES
: 'RUN: at line 26';   cp /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp-no-index.a /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a && /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib -U /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a -D -U
: 'RUN: at line 27';   env TZ=UTC /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ar tv /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=REAL-VALUES
: 'RUN: at line 31';   not /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib --D /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a 2>&1 | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=BAD-OPT-D
: 'RUN: at line 33';   not /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib --U /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a 2>&1 | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=BAD-OPT-U
: 'RUN: at line 35';   not /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib -x /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a 2>&1 | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=BAD-OPT-x
: 'RUN: at line 40';   not /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib -Dx /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a 2>&1 | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=BAD-OPT-x
: 'RUN: at line 41';   not /home/yiakwy/sycl_workspace/llvm/build/bin/llvm-ranlib -DxD /home/yiakwy/sycl_workspace/llvm/build/test/tools/llvm-ranlib/Output/D-flag.test.tmp.a 2>&1 | /home/yiakwy/sycl_workspace/llvm/build/bin/FileCheck /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test --check-prefix=BAD-OPT-xD
--
Exit Code: 1

Command Output (stderr):
--
/home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test:44:25: error: DETERMINISTIC-VALUES: expected string not found in input
# DETERMINISTIC-VALUES: {{[-rwx]+}} 0/0 712 Jan 1 00:00 1970 D-flag.test.tmp.o
                        ^
<stdin>:1:1: note: scanning from here
rw-r--r-- 0/0 712 Dec 31 19:00 1969 D-flag.test.tmp.o
^

Input file: <stdin>
Check file: /home/yiakwy/sycl_workspace/llvm/llvm/test/tools/llvm-ranlib/D-flag.test

-dump-input=help explains the following input dump.

Input was:
<<<<<<
          1: rw-r--r-- 0/0 712 Dec 31 19:00 1969 D-flag.test.tmp.o 
check:44     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
>>>>>>

Suggestion

  1. Most libraries in Linux is using gcc not clang.
  2. Few people now are developing libraries in Make, please switch to CMake or Bazel system.
  3. Tell instructions with complete examples. A dockerfile is better.

Could not find a package configuration file provided by "TBB" with any of TBBConfig.cmake tbb-config.cmake

Environment:

OS: Ubuntu 18.04.5 LTS

$ cmake --version
cmake version 3.17.5
CMake suite maintained and supported by Kitware (kitware.com/cmake).

I run the project in OpenVINO.
In demo_squeezenet_download_convert_run.sh, it contains:

cmake -DCMAKE_BUILD_TYPE=Release $samples_path

however, it shows:


-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc - works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ - works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include unistd.h
-- Looking for C++ include unistd.h - found
-- Looking for C++ include stdint.h
-- Looking for C++ include stdint.h - found
-- Looking for C++ include sys/types.h
-- Looking for C++ include sys/types.h - found
-- Looking for C++ include fnmatch.h
-- Looking for C++ include fnmatch.h - found
-- Looking for strtoll
-- Looking for strtoll - found
-- Found InferenceEngine: /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libinference_engine.so (Required is at least version "2.1")
CMake Warning at /opt/intel/openvino_2021/deployment_tools/inference_engine/share/ie_parallel.cmake:6 (find_package):
  By not providing "FindTBB.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "TBB", but
  CMake did not find one.

  Could not find a package configuration file provided by "TBB" with any of
  the following names:

    TBBConfig.cmake
    tbb-config.cmake

  Add the installation prefix of "TBB" to CMAKE_PREFIX_PATH or set "TBB_DIR"
  to a directory containing one of the above files.  If "TBB" provides a
  separate development package or SDK, be sure it has been installed.
Call Stack (most recent call first):
  /opt/intel/openvino_2021/deployment_tools/inference_engine/share/InferenceEngineConfig.cmake:170 (include)
  CMakeLists.txt:141 (find_package)


CMake Warning at /opt/intel/openvino_2021/deployment_tools/inference_engine/share/InferenceEngineConfig.cmake:32 (message):
  TBB was not found by the configured TBB_DIR/TBBROOT path.  SEQ method will
  be used.
Call Stack (most recent call first):
  /opt/intel/openvino_2021/deployment_tools/inference_engine/share/ie_parallel.cmake:14 (ext_message)
  /opt/intel/openvino_2021/deployment_tools/inference_engine/share/InferenceEngineConfig.cmake:170 (include)
  CMakeLists.txt:141 (find_package)


-- Configuring done
-- Generating done
-- Build files have been written to: /home/apple/inference_engine_samples_build
[ 36%] Built target gflags_nothreads_static
[ 81%] Built target format_reader
[ 90%] Linking CXX executable ../intel64/Release/classification_sample_async
/usr/bin/ld: warning: libtbb.so.2, needed by /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libinference_engine_legacy.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libngraph.so, needed by /opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libinference_engine_legacy.so, not found (try using -rpath or -rpath-link)
//opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libinference_engine_transformations.so: undefined reference to `ngraph::op::util::BinaryElementwiseComparison::validate_and_infer_types()'
//opt/intel/openvino_2021/deployment_tools/inference_engine/lib/intel64/libinference_engine_transformations.so: undefined reference to `ngraph::op::util::BinaryElementwiseComparison::visit_attributes(ngraph::AttributeVisitor&)'

I notice that #13 and #14 said :

cd /tmp
git clone https://github.com/intel/parallelstl
wget https://github.com/01org/tbb/releases/download/2018_U5/tbb2018_20180618oss_lin.tgz
tar zxf tbb2018_20180618oss_lin.tgz
mkdir build && cd build
cmake -DTBB_DIR=/tmp/tbb2018_20180618oss/cmake /tmp/parallelstl 

However, Underthe /tmp/parallelstl, it does not have CMakeList.txt:

app@ubuntu:/tmp$ mkdir build && cd build
app@ubuntu:/tmp/build$ cmake -DTBB_DIR=/tmp/tbb2018_20180618oss/cmake /tmp/parallelstl
CMake Error: The source directory "/tmp/parallelstl" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.

How to solve it ?

std::sort with device_policy causes an exception on a large data

Greetings!
While using std::sort with oneapi::dpl::execution::device_policy, an exception occurs on a large amount of data. Sorting was performed on the CPU.

Also, when using the usual std::sort or tbb::parallel_sort, everything works without problems

Code:

#include <CL/sycl.hpp>
#include <oneapi/dpl/algorithm>
#include <oneapi/dpl/execution>
#include <oneapi/dpl/iterator>

#include <oneapi/tbb/parallel_sort.h>

#include <random>
#include <vector>

template <typename T>
std::vector<T> make_random(size_t size, size_t random_range_left = 1,
                           size_t random_range_right = 10000) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::vector<T> out(size);
  std::uniform_int_distribution<T> dist(random_range_left, random_range_right);
  std::generate(out.begin(), out.end(), [&]() { return dist(gen); });
  return out;
}

void sycl_sort(std::vector<int> &src) {
  size_t buf_size = src.size();
  auto sel = sycl::cpu_selector{};
  sycl::queue q{sel};
  auto dev_policy = oneapi::dpl::execution::device_policy{sel};

  sycl::buffer<int> buff_src(src.data(), sycl::range<1>{buf_size});

  std::sort(dev_policy, oneapi::dpl::begin(buff_src),
            oneapi::dpl::end(buff_src));
}

void tbb_sort(std::vector<int> &src) { tbb::parallel_sort(begin(src), end(src)); }

int main(int argc, char* argv[]) {
  size_t buf_size;
  std::cin >> buf_size;
  std::vector<int> src = make_random<int>(buf_size);
  std::cout << "Data generated" << std::endl;

  tbb_sort(src);
  std::cout << "TBB sort performed" << std::endl;

  sycl_sort(src);
  std::cout << "SYCL sort permormed" << std::endl;
}
$ ./sort_issue
1000000000
Data generated
TBB sort performed
terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  Native API failed. Native API returns: -5 (CL_OUT_OF_RESOURCES) -5 (C                                                                             L_OUT_OF_RESOURCES)
Aborted (core dumped)

Environment:

  • OS: Linux
  • Device: Intel(R) Xeon(R) Platinum 8280L CPU @ 2.70GHz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.