codeplaysoftware / computecpp-sdk Goto Github PK

View Code? Open in Web Editor NEW

321.0 43.0 92.0 1.17 MB

Collection of samples and utilities for using ComputeCpp, Codeplay's SYCL implementation

Home Page: https://developer.codeplay.com/computecppce/latest/overview

License: Other

CMake 9.22% Makefile 0.57% C++ 31.65% Cuda 1.01% Python 1.91% C 55.63%

cpp cpp11 cplusplus sycl opencl gpgpu

computecpp-sdk's People

Contributors

Stargazers

Watchers

Forkers

chriscummins marcelsheeny ruyk adambrouwersharries mehdi-goli rodburns mcopik keryell yachupravin akshayjh mathiasmagnus vtec234 ralic td2014 wyaneva jgraha8 progtx taihulight alexander-johnston aws-master cobi00 pretyman pkestene mcleary xiaopengyou0000 nazpyro yochju jwlawson lukeiwanski mmha psalz sfrias chao-peng jiapei100 profcab alyashev rossbrunton aneshlya kbergstr xaviera86 davidtranno1 marios-codeplay aenjon petrovkp georgeweb singhkuljeet takanokage mgnther saluber rcao8 jcranmer-intel dzenanz u235axe pkeir bader coffeeman11 zubair23517 srividya-sundaram dongyanchaotj xfong alexbatashev kosasih synchronicity89 michoumichmich hasmonteiro nitin-mane bensuo cheng-hsiang-chiu martinstarkov uwedolinsky shivampr21 09443439666 qiaoptdun hjabird arkrampage zhgsxf sharpshooter777 dbduncan vancraar wellalbuquerque colepjohn northy1985 rnshah9 sunchuljung weishi-deng maxslimb xcleancode doorkn-b yskale wolvs mfkiwl

computecpp-sdk's Issues

Is there plan to support mobile GPU?

Recently there is a trend to deploy DNN models on mobile phones, so offloading the computation to GPU can be a plus (although mobile GPU like Mali/Adreno/PowerVR have weak speedup compared to the desktop/server counterpart). Is there a plan to support such GPUs?

"Advanced Micro Devices, Inc." vendor is unsupported.

inferno@hmstr:~$ /usr/local/computecpp/bin/computecpp_info
********************************************************************************

ComputeCpp Info (CE 0.1.1)

********************************************************************************

Toolchain information:

GLIBCXX: 20150426
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : NO - Unsupported vendor
  CL_DEVICE_NAME                          : Pitcairn
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 1912.5 (VM)
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
********************************************************************************

********************************************************************************

********************************************************************************

My system is Ubuntu 16.04
clinfo output: https://gist.github.com/inferrna/60f2a7a9e8283a6b8edb9a260e55aef0

Unable to build samples

Hello, I am unable to build the samples provided with the repository.

Any advice is appreciated.

Unhandled exception, "Failed to build program" error in TensorFlow sample

I have a Radeon RX580, running Ubuntu 18.04, with amdgpu driver (18.20-579836) from AMD. Simple OpenCL works fine on the GPU, both in Blender Cycles, LuxMark, and darktable-cltest. (clinfo output)
ComputeCpp version is CE 0.8.0. (computecpp_info output)

After building TensorFlow (on the dev/amd_gpu branch, at tensorflow/tensorflow@00b0040 plus tensorflow/tensorflow#17508 cherry-picked on top of it), by following the instructions here, the models/tutorials/image/mnist sample outputs this error:

2018-05-17 01:15:34.090983: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:70] Found following OpenCL devices:
2018-05-17 01:15:34.091027: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:72] id: 0, type: GPU, name: Ellesmere, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE
terminate called after throwing an instance of 'cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)7, cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)6, cl::sycl::exception> >'
Aborted (core dumped)

All I could figure out is that the actual exception is:

Error: [ComputeCpp:RT0100] Failed to build program

And it is thrown from somewhere inside PhiloxRandomOp::Compute, at least in this specific case. Trying to run the default generator from the textgenrnn project also results in the same error. I think I've also seen a similar exception coming from CastOpBase::Compute in a different sample.

I couldn't find out any more details about the error so far.

I think #77 might be related, but not quite the same.

Filtering / masking OpenCL devices

I could not find anything related to that in the documentation. One feature that CUDA has is the CUDA_VISIBLE_DEVICES environment variable, making it possible to mask / filter some devices when running code that relies on CUDA. How can this be achieved with ComputeCpp ? In my case, I have two cards on my system, one nvidia and one AMD, and I'd like to use only the AMD GPU for running OpenCL code.

Create a sample showing how to use placeholder accessors

This was a 1.2.1 addition, and it is less well-known than non-placeholder accessors. We should highlight their use and API in a sample.

Need a method to detect compute++ compiler

Hi,
I want to check if my source code is being processed by the compute++ compiler.
Is there a compiler definition like clang or _MSC_VER for the compute++ compiler?

-fno-ms-compatibility causes hard to find bugs with lambda const capture

I just spent a considerable amount of time trying to figure out why this kernel crashes somewhere inside ComputeCpp's process_functor_arguments:

my_queue.submit([&](cl::sycl::handler& cgh) {
  auto out = my_buf.get_access<cl::sycl::access::mode::write>(cgh);
  const int some_const_value = 16;
  auto my_kernel = [=](cl::sycl::item<1> item) {
    int cause_capture = some_const_value;
    out[item] = 0.f;
  };
  std::cout << "sizeof(my_kernel) = " << sizeof(my_kernel) << std::endl;
  cgh.parallel_for<class const_crash>(cl::sycl::range<1>(64), my_kernel);
});

As it turns out, this only happens on Windows, and only when using a CMake build (not the Visual Studio project wizard).

The culprit becomes apparent when comparing the *.sycl files generated by the device compiler.

CMake on Windows:

const size_t kernel_info< const_crash >::arg_size = 3;
const size_t kernel_info< const_crash >::arg_sizes[] = {
  8, // __attribute__((address_space(1))) float *
  16, // class cl::sycl::device_index_array
  16, // class cl::sycl::device_index_array
  0
};

Visual Studio wizard:

const size_t kernel_info< const_crash >::arg_size = 4;
const size_t kernel_info< const_crash >::arg_sizes[] = {
  4, // const int
  8, // __attribute__((address_space(1))) float *
  16, // class cl::sycl::device_index_array
  16, // class cl::sycl::device_index_array
  0
};

It appears that clang propagates the const value into the lambda, while MSVC (presumably) does not. This causes a mismatch between the lambda and the kernel at runtime.

When comparing the device compiler flags used by the two projects, I found the -fno-ms-compatibility flag set for CMake builds here. And, lo and behold, removing this flag fixes the issue.

I'm not sure if the original reasons for including this flag are still relevant, nor the specifics of the issue, but I tried to compile the program above with const char16_t some_const_value = 16;, which appeared to work just fine.

Tested with ComputeCpp 0.6.1, 0.7.0 and 0.8.0, on Windows 10 and Ubuntu 16.04.

CMake script supporting multiple translation units

The current version of add_sycl_to_target supports only targets with one source file. It is a non-negligible limitation, many C++ applications or libraries consist of multiple translation units. Is there a technical reason behind this limitation?

In my fork I implemented a single extension creating multiple custom targets for each source file. I'm using an additional counter to distinguish between targets created for source files with the same name; it may happen for files located in different directories.
A modified example proves that an application with SYCL kernels in two source files builds correctly; I can't verify if it executes correctly though, due to problems with SYCL runtime, but I do not expect any problems here.

I can open a pull request, as long as you don't see a problem with this approach and you are interested in it.

CMAKE_INSTALL_DIR is unused

The readme mentions CMAKE_INSTALL_DIR . When this is set I get a warning saying it is unused by the project. I suspect that the correct var is CMAKE_INSTALL_PREFIX.

Alongside this, when I set CMAKE_INSTALL_PREFIX there is no install target.

This is with cmake version 3.5.1 and ninja version 1.5.1.

Extract IR doesn't work when given llvm-dis argument

duncan:build$ ../tools/extract_ir/extract_ir.py -h
usage: extract_ir.py [-h] [-i INPUT_FILE] [-o OUTPUT_FILE]
                     [--llvm-dis LLVM_DIS]

Finds an IR module in a SYCL integration header and writes it out to a binary
file. If path to llvm-dis is provided, tries to disassemble the binary and
write the disassembled text into the output file instead.

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        The header to process, defaults to stdin.
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        The file to output to, defaults to stdout.
  --llvm-dis LLVM_DIS   The path to llvm-dis. Bytecode will be disassembled
                        only if this is provided.
duncan:build$ ../tools/extract_ir/extract_ir.py -i complex/complex.cpp.sycl -o bitcode.bc
duncan:build$ ../tools/extract_ir/extract_ir.py -i complex/complex.cpp.sycl -o bitcode.bc --llvm-dis llvm-dis
Traceback (most recent call last):
  File "../tools/extract_ir/extract_ir.py", line 164, in <module>
    main()
  File "../tools/extract_ir/extract_ir.py", line 142, in main
    output_file_name)
  File "../tools/extract_ir/extract_ir.py", line 70, in try_disassemble_spir
    input_file,
NameError: global name 'input_file' is not defined

This is the output I get when using extract-ir.py on Ubuntu 14.04, python 2. I can provide more information if needed. It appears that when providing an llvm-dis via the command-line, it somehow cannot reference the input file any more.

cmake error for samples

[c++ newbie alert..!] Any guidance would be appreciated...

I'm trying to follow the https://developer.codeplay.com/computecppce/latest/getting-started-guide

computecpp is available in /usr/local/computecpp and bin is in the PATH
computecpp-sdk is available in ~/computecpp-sdk

I managed to build tensorflow and I can run the MNIST training example.

~/computecpp-sdk/build$ cat ~/.profile
# SNIP UNRELATED LINES.
export ComputeCpp_ROOT_DIR=/usr/local/computecpp
export COMPUTECPP_DIR=/usr/local/computecpp
export LD_LIBRARY_PATH=/usr/local/computecpp/lib
export PATH=${PATH}:/usr/local/computecpp/bin

However, I can't seem to run the SYCL examples:

~/computecpp-sdk/build$ cmake ../ -DComputeCpp_ROOT_DIR=/usr/local/computecpp
-- platform - your system can support ComputeCpp
-- Found ComputeCpp: /usr/local/computecpp (found version "CE ..")
-- compute++ flags - -O2;-mllvm;-inline-threshold=1000;-sycl;-intelspirmetadata;-sycl-target;spir64
-- Configuring done
-- Generating done
-- Build files have been written to: /home/derek/computecpp-sdk/build

~/computecpp-sdk/build$ make
[  1%] Building ComputeCpp integration header file /home/derek/computecpp-sdk/build/samples/simple-example-of-vectors/simple-example-of-vectors.cpp.sycl
remark: [Computecpp:CC0027]: Some memcpy/memset intrinsics added by the llvm optimizer were replaced by serial functions. This is a workaround for OpenCL drivers that do not support those
      intrinsics. This may impact performance, consider using -no-serial-memop. [-Rsycl-serial-memop]
[  1%] Built target simple-example-of-vectors_simple-example-of-vectors.cpp_0_ih
[  2%] Building CXX object samples/simple-example-of-vectors/CMakeFiles/simple-example-of-vectors.dir/simple-example-of-vectors.cpp.o
[  4%] Linking CXX executable simple-example-of-vectors
[  4%] Built target simple-example-of-vectors
[  5%] Building ComputeCpp integration header file /home/derek/computecpp-sdk/build/samples/custom-device-selector/custom-device-selector.cpp.sycl
remark: [Computecpp:CC0027]: Some memcpy/memset intrinsics added by the llvm optimizer were replaced by serial functions. This is a workaround for OpenCL drivers that do not support those
      intrinsics. This may impact performance, consider using -no-serial-memop. [-Rsycl-serial-memop]
[  5%] Built target custom-device-selector_custom-device-selector.cpp_0_ih
[  7%] Building CXX object samples/custom-device-selector/CMakeFiles/custom-device-selector.dir/custom-device-selector.cpp.o
[  8%] Linking CXX executable custom-device-selector
[  8%] Built target custom-device-selector
[ 10%] Building ComputeCpp integration header file /home/derek/computecpp-sdk/build/samples/tiled-convolution/tiled-convolution.cpp.sycl
/home/derek/computecpp-sdk/samples/tiled-convolution/tiled-convolution.cpp:79:28: error: no member named 'get_global_id' in 'cl::sycl::nd_item<2>'; did you mean 'get_global'?
    index_t id_m = item_id.get_global_id(0);  // global id with offset m
                           ^~~~~~~~~~~~~
                           get_global
/usr/local/computecpp/include/SYCL/item.h:254:10: note: 'get_global' declared here
  size_t get_global(unsigned int dimension) const {
         ^
/home/derek/computecpp-sdk/samples/tiled-convolution/tiled-convolution.cpp:80:28: error: no member named 'get_global_id' in 'cl::sycl::nd_item<2>'; did you mean 'get_global'?
    index_t id_n = item_id.get_global_id(1);  // global id with offser n
                           ^~~~~~~~~~~~~
                           get_global
/usr/local/computecpp/include/SYCL/item.h:254:10: note: 'get_global' declared here
  size_t get_global(unsigned int dimension) const {
         ^
2 errors generated.
samples/tiled-convolution/CMakeFiles/tiled-convolution_tiled-convolution.cpp_0_ih.dir/build.make:61: recipe for target 'samples/tiled-convolution/tiled-convolution.cpp.sycl' failed
make[2]: *** [samples/tiled-convolution/tiled-convolution.cpp.sycl] Error 1
CMakeFiles/Makefile2:383: recipe for target 'samples/tiled-convolution/CMakeFiles/tiled-convolution_tiled-convolution.cpp_0_ih.dir/all' failed
make[1]: *** [samples/tiled-convolution/CMakeFiles/tiled-convolution_tiled-convolution.cpp_0_ih.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

Output from computecpp_info:

computecpp_info
********************************************************************************

ComputeCpp Info (CE ..)

********************************************************************************

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 2 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Ellesmere
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2482.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v../platform-support-notes

********************************************************************************

PTX support

I noticed that ComputeCpp 0.5.0 has PTX support.

FindComputeCpp.cmake should probably be modified to enable option "--use-ptx"

Tiled convolution crash

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : Loveland
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 1800.12
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
  CL_DEVICE_VERSION                       : OpenCL 1.2 AMD-APP (1800.12)
  CL_DEVICE_PROFILE                       : FULL_PROFILE
  CL_DEVICE_MAX_COMPUTE_UNITS             : 2
  CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS      : 3
  CL_DEVICE_MAX_WORK_ITEM_SIZES           : 256 / 256 / 256
  CL_DEVICE_MAX_WORK_GROUP_SIZE           : 256
  CL_DEVICE_MAX_CLOCK_FREQUENCY           : 492 MHz
  CL_DEVICE_ADDRESS_BITS                  : 32
  CL_DEVICE_HOST_UNIFIED_MEMORY           : YES
  CL_DEVICE_MAX_MEM_ALLOC_SIZE            : 128 MByte
  CL_DEVICE_GLOBAL_MEM_SIZE               : 256 MByte
  CL_DEVICE_ERROR_CORRECTION_SUPPORT      : NO
  CL_DEVICE_LOCAL_MEM_TYPE                : local
  CL_DEVICE_LOCAL_MEM_SIZE                : 32 KByte
  CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE      : 64 KByte
  CL_DEVICE_QUEUE_PROPERTIES              : CL_QUEUE_PROFILING_ENABLE
  CL_DEVICE_IMAGE_SUPPORT                 : YES
  CL_DEVICE_MAX_READ_IMAGE_ARGS           : 128
  CL_DEVICE_MAX_WRITE_IMAGE_ARGS          : 8
  CL_DEVICE_IMAGE2D_MAX_WIDTH             : 16384
  CL_DEVICE_IMAGE2D_MAX_HEIGHT            : 16384
  CL_DEVICE_IMAGE3D_MAX_WIDTH             : 2048
  CL_DEVICE_IMAGE3D_MAX_HEIGHT            : 2048
  CL_DEVICE_IMAGE3D_MAX_DEPTH             : 2048
  CL_DEVICE_PREFERRED_VECTOR_WIDTH        : CHAR 16 SHORT 8 INT 4 LONG 2 FLOAT 4 DOUBLE 0 
  CL_DEVICE_EXTENSIONS                    : cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_counters_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_amd_image2d_from_buffer_read_only cl_khr_spir cl_khr_gl_event

Unfortunately the thing happens completely inside libamdocl64.so, so I can't provide any more data.

Zero-size SYCLMalloc returns non-unique addresses

Calling SYCLMalloc with size zero returns a non-null, non-unique address.

auto a = SYCLMalloc(0, pMap);
auto b = SYCLMalloc(0, pMap);
auto c = SYCLMalloc(0, pMap);

This will currently set a,b,c == 0x1000. If zero size mallocs are allowed (which I believe is required for Eigen compatibility), then SYCLMalloc should return unique virtual addresses.

Kernels not running on GPU: cl::sycl::gpu_selector selector not work

Hi,

I successfully installed computecpp-sdk on my PC with:

a NVIDIA GeForce GT 745M GPU hoted in an Intel I5.
Ubuntu 14.04.
CUDA Driver / Runtime 8.0 (and NVIDIA OpenCL implementation located in /usr/local/cuda/)
gcc / g++ 4.8 and 6.0
ComputeCpp release ComputeCpp-CE-0.2.1-Linux

All samples compiled without any error and runtime error.

When I wanted to check that the samples were running on the GPU using:
cl::sycl::gpu_selector selector; cl::sycl::queue myQueue(selector)

I got the following runtime error:

terminate called after throwing an instance of 'cl::sycl::exception'
what(): Error: [ComputeCpp:RT0106] Device not found
Aborted (core dumped)

Declaring a queue with queue myQueue; all things running fine but myQueue.is_host() return true, so kernel are running on host.

I notice that, in cmake output, I see:

platform - your system CANNOT support ComputeCpp

Can this explain the fact that gpu_selector does not work and that my kernels run on the host?

Thanks.

Compiler crash when using ptx64 (ComputeCpp 0.7.0 / 0.8.0, Ubuntu / amd64)

100% reproductible on my system (debian/sid uptodate) with those steps:

fetch current https://github.com/lukeiwanski/tensorflow/tree/integration/1.8, HEAD at lukeiwanski/tensorflow@79966b6
configure with OpenCL/SYCL and ComputeCpp 0.7.0 (ubuntu 16.04 binaries)
build

Travis scripts rely on hardcoded ComputeCpp version number

At the moment, the travis config will fail when we update ComputeCpp as the version numbers change. We should change the scripts so that either they always download the same version (so we can update the number and avoid a failing config) or so that they will use whatever version is downloaded.

The travis config is broken at the moment, I'll fix it for now but a longer-term fix would do something like the above.

Unable to build the SDK

Using Compute++ 0.5.1 CE. This commit may have something to do with it.

dom@sovu:~/computecpp-sdk/build$ cmake ../ -DCOMPUTECPP_PACKAGE_ROOT_DIR=/home/dom/computecpp-ce-0.5.1
-- host compiler - gcc 5.4.0
-- ComputeCpp package - Found
-- compute++ - Found
-- computecpp_info - Found
-- ComputeCpp runtime: /home/dom/computecpp-ce-0.5.1/lib/libComputeCpp.so - Found
-- ComputeCpp includes - Found
-- Package version - CE 0.5.1
-- compute++ flags - -O2 -mllvm -inline-threshold=1000 -sycl -emit-llvm -intelspirmetadata
-- platform - your system CANNOT support ComputeCpp
-- Configuring done
-- Generating done
-- Build files have been written to: /home/dom/computecpp-sdk/build
dom@sovu:~/computecpp-sdk/build$ make
[  1%] Built target async-handler_async-handler.cpp_0_ih
[  4%] Built target async-handler
[  6%] Built target vptr_vptr.cpp_0_ih
[  9%] Built target vptr
[ 10%] Building ComputeCpp integration header file /home/dom/computecpp-sdk/build/samples/opencl-c-interop/opencl-c-interop.cpp.sycl
/home/dom/computecpp-sdk/samples/opencl-c-interop/opencl-c-interop.cpp:124:12: error: no matching constructor for initialization of 'cl::sycl::kernel'
    kernel pow_kernel(clKernel, gpu_queue.get_context());
           ^          ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/dom/computecpp-ce-0.5.1/include/SYCL/kernel.h:258:3: note: candidate constructor not viable: no known conversion from 'cl::sycl::context' to 'dprogram_shptr'
      (aka 'std::shared_ptr<cl::sycl::detail::program>') for 2nd argument
  kernel(cl_kernel clKernel, dprogram_shptr program);
  ^
/home/dom/computecpp-ce-0.5.1/include/SYCL/kernel.h:340:3: note: candidate constructor not viable: cannot convert argument of incomplete type 'cl_kernel' (aka '_cl_kernel *') to 'const cl::sycl::context' for 1st
      argument
  kernel(const context &context, cl_kernel clKernel);
  ^
/home/dom/computecpp-ce-0.5.1/include/SYCL/kernel.h:290:12: note: candidate constructor not viable: requires single argument 'detail', but 2 arguments were provided
  explicit kernel(dkernel_shptr detail);
           ^
/home/dom/computecpp-ce-0.5.1/include/SYCL/kernel.h:297:3: note: candidate constructor not viable: requires single argument 'rhs', but 2 arguments were provided
  kernel(const kernel &rhs) : m_impl{rhs.get_impl()} {}
  ^
/home/dom/computecpp-ce-0.5.1/include/SYCL/kernel.h:310:3: note: candidate constructor not viable: requires single argument 'rhs', but 2 arguments were provided
  kernel(kernel &&rhs) : m_impl{std::move(rhs.get_impl())} {}
  ^
/home/dom/computecpp-ce-0.5.1/include/SYCL/kernel.h:333:3: note: candidate constructor not viable: requires single argument 'clKernel', but 2 arguments were provided
  kernel(cl_kernel clKernel);
  ^
/home/dom/computecpp-ce-0.5.1/include/SYCL/kernel.h:252:3: note: candidate constructor not viable: requires 0 arguments, but 2 were provided
  kernel();
  ^
1 error generated.
samples/opencl-c-interop/CMakeFiles/opencl-c-interop_opencl-c-interop.cpp_0_ih.dir/build.make:61: recipe for target 'samples/opencl-c-interop/opencl-c-interop.cpp.sycl' failed
make[2]: *** [samples/opencl-c-interop/opencl-c-interop.cpp.sycl] Error 1
CMakeFiles/Makefile2:340: recipe for target 'samples/opencl-c-interop/CMakeFiles/opencl-c-interop_opencl-c-interop.cpp_0_ih.dir/all' failed
make[1]: *** [samples/opencl-c-interop/CMakeFiles/opencl-c-interop_opencl-c-interop.cpp_0_ih.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

FindComputeCpp.cmake cannot deal with relative paths

Currently, if you give a relative path to add_sycl_to_target, the device compiler won't be able to find the file (for example, you might be a directory build/, so all your sources are located one directory up). CMake handles this for the host side, and adjusts the path to be relative to the build directory, but FindComputeCpp.cmake does no such thing, so the device compiler can't find the file. We should fix this (or perhaps at the least warn when a relative path is passed to add_sycl_to_target).

Identified by #86.

Update Travis config (possibly move away from 14.04?)

The Travis builds are becoming unstable on this older OS. Since we have a version of ComputeCpp that builds on 16.04, we should consider using that so that we don't have to add some other PPA to be able to get GCC 5.

It would also be nice if we only installed clang when building with clang, which would reduce the job time by a little.

Return an error when ComputeCpp_INFO_EXECUTABLE cannot be found

Running with ComputeCpp 0.9.0 on 32-bit ARM. If the computecpp_info binary cannot be found, the command silently succeeds, but ${ComputeCpp_INFO_EXECUTABLE} "--dump-version" fails. It would be easier to debug if the initial error was reported.

Unresolved symbol: _Z13get_global_idj Aborting...

Hi! I am testing the samples and after compiling and running the simple_vector_add I got this crash:

Unresolved symbol: _Z13get_global_idj
Aborting...

My setup is a intel board running Fedora Server 25:

uname -r
4.8.6-300.fc25.x86_64

OpenCL

clinfo
Number of platforms                               1
  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 1.2 beignet 1.3
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
  Platform Extensions function suffix             Intel

  Platform Name                                   Intel Gen OCL Driver
Number of devices                                 1
  Device Name                                     Intel(R) HD Graphics Cherryview
  Device Vendor                                   Intel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 beignet 1.3
  Driver Version                                  1.3
  Device OpenCL C Version                         OpenCL C 1.2 beignet 1.3
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Max compute units                               12
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None, None, None
  Max work item dimensions                        3
  Max work item sizes                             512x512x512
  Max work group size                             512
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              16
  Preferred / native vector sizes
    char                                                16 / 8
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 2 / 2
    half                                                 0 / 8        (cl_khr_fp16)
    float                                                4 / 4
    double                                               0 / 2        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    32, Little-Endian
  Global memory size                              1011875840 (965MiB)
  Error Correction support                        No
  Max memory allocation                           758120448 (723MiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8192 (8KiB)
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4096 bytes
    Pitch alignment for 2D image buffers          1 bytes
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             8192x8192x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Global
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        134217728 (128MiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      80ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_spir cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing cl_khr_fp64 cl_khr_fp16

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel Gen OCL Driver
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [Intel]
  clCreateContext(NULL, ...) [default]            Success [Intel]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Cherryview
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Cherryview
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Cherryview
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Cherryview
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Cherryview

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

ComputeCPP

/usr/local/computecpp/bin/computecpp_info
********************************************************************************

ComputeCpp Info (CE 0.1.2)

********************************************************************************

Toolchain information:

GLIBCXX: 20150623
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : Intel(R) HD Graphics Cherryview
  CL_DEVICE_VENDOR                        : Intel
  CL_DRIVER_VERSION                       : 1.3
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU
********************************************************************************

********************************************************************************

********************************************************************************

CMake Samples

cmake ../samples -DCOMPUTECPP_PACKAGE_ROOT_DIR=/usr/local/computecpp 
-- The C compiler identification is GNU 6.3.1
-- The CXX compiler identification is GNU 6.3.1
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - not found
-- Looking for CL_VERSION_1_2
-- Looking for CL_VERSION_1_2 - found
-- Found OpenCL: /usr/lib64/libOpenCL.so (found version "1.2")
-- host compiler - gcc 6.3.1 (note pre 5.1 gcc ABI enabled)
-- ComputeCpp package - Found
-- compute++ - Found
-- computecpp_info - Found
-- libComputeCpp.so - Found
-- ComputeCpp includes - Found
-- Package version - CE 0.1.2
-- compute++ flags - -O2 -mllvm -inline-threshold=1000 -sycl -intelspirmetadata -emit-llvm
Device open failed, aborting...
Device open failed, aborting...
-- platform - your system can support ComputeCpp
-- Configuring done
-- Generating done
-- Build files have been written to: ...

Searching about this error I could only found this thread:
https://software.intel.com/en-us/forums/opencl/topic/565212

For Atom processors so far: only 32 bit SPIR is supported

That is my case. So I installed all dependencies to build and link on openCL in 32bits, but everytime that the make process reachs libcomputecpp.so (that is 64bits) it fails.
There is a 32bits version of libcomputecpp.so? Maybe this error indicates another problem not correlated with this SPIR 32bits problem?

TensorFlow build fails with --config=sycl

Exact command to reproduce:

bazel build -c opt --config=sycl //tensorflow:libtensorflow_cc.so

Logs

ERROR: /home/ashok/Ashok/tensorflow-c++/tensorflow/core/kernels/BUILD:3355:1: C++ compilation of rule '//tensorflow/core/kernels:sendrecv_ops' failed: computecpp failed: error executing command external/local_config_sycl/crosstool/computecpp -fPIE -fno-omit-frame-pointer -Wall -msse3 -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections '-std=c++11' -MD -MF ... (remaining 119 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1
In file included from tensorflow/core/kernels/sendrecv_ops.cc:16:
In file included from ./tensorflow/core/kernels/sendrecv_ops.h:19:
In file included from ./tensorflow/core/framework/op_kernel.h:19:
In file included from /usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:55:
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:1404:14: error: no matching constructor for initialization of 'tuple<const tensorflow::Status &&, const tensorflow::Rendezvous::Args &&, const tensorflow::Rendezvous::Args &&, const tensorflow::Tensor &&, bool &&>'
{ return tuple<_Elements&&...>(std::forward<_Elements>(__args)...); }
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:992:13: note: in instantiation of function template specialization 'std::forward_as_tuple<const tensorflow::Status &, const tensorflow::Rendezvous::Args &, const tensorflow::Rendezvous::Args &, const tensorflow::Tensor &, bool>' requested here
std::forward_as_tuple(std::forward<_Args>(__args)...),
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1731:2: note: in instantiation of function template specialization 'std::_Bind<(lambda at tensorflow/core/kernels/sendrecv_ops.cc:155:7) (std::function<void ()>, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)>::operator()<const tensorflow::Status &, const tensorflow::Rendezvous::Args &, const tensorflow::Rendezvous::Args &, const tensorflow::Tensor &, bool, void>' requested here
(*_Base::_M_get_pointer(__functor))(
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:2115:33: note: in instantiation of member function 'std::_Function_handler<void (const tensorflow::Status &, const tensorflow::Rendezvous::Args &, const tensorflow::Rendezvous::Args &, const tensorflow::Tensor &, bool), std::_Bind<(lambda at tensorflow/core/kernels/sendrecv_ops.cc:155:7) (std::function<void ()>, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)> >::_M_invoke' requested here
_M_invoker = &_My_handler::_M_invoke;
^
tensorflow/core/kernels/sendrecv_ops.cc:154:38: note: in instantiation of function template specialization 'std::function<void (const tensorflow::Status &, const tensorflow::Rendezvous::Args &, const tensorflow::Rendezvous::Args &, const tensorflow::Tensor &, bool)>::function<std::_Bind<(lambda at tensorflow/core/kernels/sendrecv_ops.cc:155:7) (std::function<void ()>, std::_Placeholder<1>, std::_Placeholder<2>, std::_Placeholder<3>, std::_Placeholder<4>, std::_Placeholder<5>)>, void, void>' requested here
Rendezvous::DoneCallback done_cb = std::bind(
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:600:18: note: candidate template ignored: disabled by 'enable_if' [with _Dummy = void]
_TCC<_Dummy>::template
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:611:18: note: candidate template ignored: disabled by 'enable_if' [with _Dummy = void]
_TCC<_Dummy>::template
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:628:5: note: candidate template ignored: disabled by 'enable_if' [with _UElements = <const tensorflow::Status &, const tensorflow::Rendezvous::Args &, const tensorflow::Rendezvous::Args &, const tensorflow::Tensor &, bool>]
_TC<sizeof...(_UElements) == 1, _Elements...>::template
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:641:5: note: candidate template ignored: disabled by 'enable_if' [with _UElements = <const tensorflow::Status &, const tensorflow::Rendezvous::Args &, const tensorflow::Rendezvous::Args &, const tensorflow::Tensor &, bool>]
_TC<sizeof...(_UElements) == 1, _Elements...>::template
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:737:19: note: candidate template ignored: disabled by 'enable_if' [with _Alloc = tensorflow::Rendezvous::Args, _UElements = <const tensorflow::Rendezvous::Args &, const tensorflow::Tensor &, bool>]
enable_if<_TMC<_UElements...>::template
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:748:19: note: candidate template ignored: disabled by 'enable_if' [with _Alloc = tensorflow::Rendezvous::Args, _UElements = <const tensorflow::Rendezvous::Args &, const tensorflow::Tensor &, bool>]
enable_if<_TMC<_UElements...>::template
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:579:17: note: candidate constructor template not viable: requires 0 arguments, but 5 were provided
constexpr tuple()
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:589:26: note: candidate constructor template not viable: requires 0 arguments, but 5 were provided
explicit constexpr tuple()
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:670:19: note: candidate constructor template not viable: requires single argument '__in', but 5 arguments were provided
constexpr tuple(const tuple<_UElements...>& __in)
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:682:28: note: candidate constructor template not viable: requires single argument '__in', but 5 arguments were provided
explicit constexpr tuple(const tuple<_UElements...>& __in)
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:694:19: note: candidate constructor template not viable: requires single argument '__in', but 5 arguments were provided
constexpr tuple(tuple<_UElements...>&& __in)
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:705:28: note: candidate constructor template not viable: requires single argument '__in', but 5 arguments were provided
explicit constexpr tuple(tuple<_UElements...>&& __in)
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:721:2: note: candidate constructor template not viable: requires 7 arguments, but 5 were provided
tuple(allocator_arg_t __tag, const _Alloc& __a,
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:732:11: note: candidate constructor template not viable: requires 7 arguments, but 5 were provided
explicit tuple(allocator_arg_t __tag, const _Alloc& __a,
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:711:2: note: candidate constructor template not viable: requires 2 arguments, but 5 were provided
tuple(allocator_arg_t __tag, const _Alloc& __a)
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:759:2: note: candidate constructor template not viable: requires 3 arguments, but 5 were provided
tuple(allocator_arg_t __tag, const _Alloc& __a, const tuple& __in)
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:763:2: note: candidate constructor template not viable: requires 3 arguments, but 5 were provided
tuple(allocator_arg_t __tag, const _Alloc& __a, tuple&& __in)
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:772:2: note: candidate constructor template not viable: requires 3 arguments, but 5 were provided
tuple(allocator_arg_t __tag, const _Alloc& __a,
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:784:11: note: candidate constructor template not viable: requires 3 arguments, but 5 were provided
explicit tuple(allocator_arg_t __tag, const _Alloc& __a,
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:796:2: note: candidate constructor template not viable: requires 3 arguments, but 5 were provided
tuple(allocator_arg_t __tag, const _Alloc& __a,
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:808:11: note: candidate constructor template not viable: requires 3 arguments, but 5 were provided
explicit tuple(allocator_arg_t __tag, const _Alloc& __a,
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:654:17: note: candidate constructor not viable: requires 1 argument, but 5 were provided
constexpr tuple(tuple&&) = default;
^
/usr/lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/tuple:652:17: note: candidate constructor not viable: requires 1 argument, but 5 were provided
constexpr tuple(const tuple&) = default;
^
1 error generated.
Target //tensorflow:libtensorflow_cc.so failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1003.058s, Critical Path: 53.74s
FAILED: Build did NOT complete successfully

Allow user device compiler flags to be set more easily

At the moment, device compiler flags are set internally, and it's harder for users to provide custom flags which might be necessary under some circumstances. We should add a user flags section which can be (optionally) provided to allow, for example, 32-bit SPIR on a 64-bit device (see #19).

Custom OpenCL include dirs is not passed to the device compiler

When using the following line with a custom OpenCL include dir:

cmake .. -DCOMPUTECPP_PACKAGE_ROOT_DIR=$HOME/ComputeCPP/ComputeCpp-CE-0.1-Linux -DOpenCL_INCLUDE_DIR=/usr/local/include

The OpenCL include dir path is not used on the device compiler, as you can see in the following make output produced by cmake:

cd $HOME/ComputeCPP/computecpp-sdk/samples/build/images && $HOME/ComputeCPP/ComputeCpp-CE-0.1-Linux/bin/compute++ -O2 -mllvm -inline-threshold=1000 -sycl -intelspirmetadata -emit-llvm -isystem $HOME/ComputeCPP/ComputeCpp-CE-0.1-Linux/include/ -I$HOME/ComputeCPP/ComputeCpp-CE-0.1-Linux/include -I$HOME/ComputeCPP/ComputeCpp-CE-0.1-Linux/include -o $HOME/ComputeCPP/computecpp-sdk/samples/build/images/images.cpp.sycl -c $HOME/ComputeCPP/computecpp-sdk/samples/images/images.cpp

Debugging kernel creation failure on Intel GPU w/Beignet driver (tl;dr use Intel Neo unified driver)

Trying to get OpenCL builds on top of TensorFlow, I am running into that kind of failure:

alex@portable-alex:~/tmp/deepspeech/sycl_eigen_hack$ LC_ALL=C ./deepspeech ../models/output_graph.pb ../audio/2830-3980-0043.wav ../models/alphabet.txt
2017-12-21 01:31:01.407356: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Platform name intel gen ocl driver
2017-12-21 01:31:01.476690: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:66] Found following OpenCL devices:
2017-12-21 01:31:01.476719: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:68] id: 0, type: GPU, name: Intel(R) HD Graphics 5500 BroadWell U-Processor GT2, vendor: Intel, profile: FULL_PROFILE
One module without kernel function!
terminate called after throwing an instance of 'cl::sycl::cl_exception'
  what():  Error: [ComputeCpp:RT0101] Failed to create kernel ((Kernel Name: SYCL_cac2b3592d2272412db5415963f17f08_0))
Abandon (core dumped)
alex@portable-alex:~/tmp/deepspeech/sycl_eigen_hack$

I had to force-enable support for Intel GPU's with Beignet, as suggested on #78, but I am seeing the very same error (same kernel name, same error code) on a NVIDIA GTX1080 card. Now, I understand that both are not really expected to work, as documented here: #78 (comment) for NVIDIA, and the blacklist was likely here for a reason.

However, i'd like to dig more and understand better why this is failing, especially in case it is not related to hardware / driver support but rather to the model itself. So far, looking into the SDK/source for this RT0101 error code was not helpful at all, and I could not find anything documenting how to debug further ComputeCpp kernel creations.

Thanks for any debugging pointers, docs and tips :)

Capturing a sycl::cl_float16 variable in lambda kernel causes program to crash

Hi,

I'm writing a SYCL function to calculate the Euclidean distance between one 2D point and every point from a large 2D point set.

typedef std::vector<float, aligned_allocator<float, 64U>> aligned_float_vector;
typedef std::vector<cv::Point2f, aligned_allocator<cv::Point2f, 64U>> aligned_point2f_vector;

static boost::optional<sycl::device> cl_device;
static boost::optional<sycl::queue> cl_queue;

aligned_float_vector batch_distance_sycl(cv::Point2f p, const aligned_point2f_vector& d)
{
        // d.size() is always a multiple of 8
	aligned_float_vector result(d.size());
	if (!cl_queue)
	{
		sycl::intel_selector selector;
		cl_device = selector.select_device();
		cl_queue = sycl::queue(cl_device.get());
	}
	{
		sycl::range<1> input_size(d.size() / 8);
		sycl::property_list properties = sycl::property_list{ sycl::property::buffer::use_host_ptr() };
		sycl::buffer<const sycl::cl_float16, 1> in_buf((const sycl::cl_float16*)&d[0], input_size, properties);
		sycl::buffer<sycl::cl_float8, 1> out_buf((sycl::cl_float8*)&result[0], input_size, properties);
		sycl::cl_float16 p_s(p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y);
		cl_queue->submit([&](sycl::handler& cgh) {

			auto in_acc = in_buf.template get_access<sycl::access::mode::read>(cgh);
			auto out_acc = out_buf.template get_access<sycl::access::mode::discard_write>(cgh);

			cgh.parallel_for<class batch_distance_kernel>(input_size, [=](sycl::id<1> id) {
				sycl::cl_float16 diff = in_acc[id] - p_s;
				sycl::cl_float16 sqdiff = diff * diff;
				out_acc[id] = sycl::native_sqrt(sycl::cl_float8(sqdiff.s0() + sqdiff.s1(), sqdiff.s2() + sqdiff.s3(), sqdiff.s4() + sqdiff.s5(), sqdiff.s6() + sqdiff.s7()
					, sqdiff.s8() + sqdiff.s9(), sqdiff.sA() + sqdiff.sB(), sqdiff.sC() + sqdiff.sD(), sqdiff.sE() + sqdiff.sF()));
			});
		});
	}
	return result;
}

Above program crashes at run time(Platform: Win10, VS2015, ComputeCpp 0.6.0), but if I do not use "p_s" inside the lambda function, it will not crash:

// sycl::cl_float16 diff = in_acc[id] - p_s;
sycl::cl_float16 diff = in_acc[id] - sycl::cl_float16(p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y, p.x, p.y);

Confusing output in parallel_for example

Parallel_for prints in the end that results are correct even if all of them are incorrect:
https://github.com/codeplaysoftware/computecpp-sdk/blob/master/samples/parallel_for/parallel_for.cpp#L103

The sample should terminate after finding an incorrect result or a boolean flag should be used to not display this message.

CMake integration for ComputeCpp

We have been very happy to use CMake scripts for building SYCL apps with ComputeCpp in our project. It simplifies integration with existing codebases significantly, but our experience suggests that the current approach is not perfect. I don't know the rationale behind this design but so far I have seen two ways of integrating CMake in open-source projects:

CMake config file - in a third-party project user is required to pass only one install directory where CMake config file can be located; variables are read and used later to find headers, libraries, compilation and linking flags etc.
CMake find_package script

The first option is obviously not applicable in this scenario because the interesting part is code for adding targets for building SYCL code. The second option is convenient as long as the script is freely available, either already distributed with CMake itself or shipped with the package itself - then integration is built on top of one user-defined installation directory. In the case of ComputeCpp, the location of the script is completely independent of ComputeCpp's current location. Therefore, a user of HPX has to either modify CMAKE_MODULE_PATH on the command line or pass two variables, one used to extend module path internally and one used to locate ComputeCpp. It's a very non-standard solution and quite inconvenient.

We could have included FindComputeCpp in our project and distributed it with HPX but it is not a preferred option - we would have to keep watching your repo to not miss updates and solve possible licensing issues, which may not be a problem, but we are always cautious when adding a third-party code to HPX which is not on Boost license.

Is there a specific reason why CMake scripts are distributed with SDK? We believe that it's perfectly reasonable to include it in ComputeCpp installation package and it would allow SYCL-based projects to configure build with just one user-defined path.

Strange workaround in FindComputeCpp.cmake

The following piece of code from the FindComputeCpp.cmake looks weird. The force include flags, used in properties later on, should be lists, and not strings. However, the comment seems to indicate the Visual Studio generator requires strings, and not lists. This could be a bug on the CMake generator,

    # NOTE: The Visual Studio generators parse compile flags differently,
    # hence the different argument syntax
    if(CMAKE_GENERATOR MATCHES "Visual Studio")
      set(forceIncludeFlags "/FI\"${includedFile}\" /TP")
    else()
      set(forceIncludeFlags /FI ${includedFile} /TP)
    endif()

FindComputeCpp sets C++11 on host compiler overriding defaults

The FindComputeCpp module sets the C++11 flags on the host compiler, overriding the user settings:

# Set the host compiler C++ standard to C++11
set_property(TARGET ${targetName} PROPERTY CXX_STANDARD 11)

This should only be set if there is not already a flag (or not set at all, and trust the user)

Unable to build sdk samples: libComputeCpp.so: undefined reference to `clReleaseSampler@OPENCL_1.0'

I am trying to build the samples following the getting-started-guide:

The first cmake step goes well:

% cmake ../ -DCOMPUTECPP_PACKAGE_ROOT_DIR=/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- host compiler - gcc 5.4.0
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
-- Found OpenCL: /usr/local/lib/libOpenCL.so (found version "2.0") 
-- ComputeCpp package - Found
-- compute++ - Found
-- computecpp_info - Found
-- ComputeCpp runtime: /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so - Found
-- ComputeCpp includes - Found
-- Package version - CE 0.8.0
-- compute++ flags - -O2 -mllvm -inline-threshold=1000 -sycl -emit-llvm -intelspirmetadata
-- platform - your system can support ComputeCpp
-- Configuring done
-- Generating done
-- Build files have been written to: /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-sdk/build

but then:

% make
Scanning dependencies of target accessors_accessors.cpp_0_ih
[  1%] Building ComputeCpp integration header file /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-sdk/build/samples/accessors/accessors.cpp.sycl
[  1%] Built target accessors_accessors.cpp_0_ih
Scanning dependencies of target accessors
[  2%] Building CXX object samples/accessors/CMakeFiles/accessors.dir/accessors.cpp.o
[  4%] Linking CXX executable accessors
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clReleaseSampler@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueWriteImage@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clReleaseMemObject@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clRetainEvent@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clSetEventCallback@OPENCL_1.1'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetKernelWorkGroupInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clFlush@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clReleaseKernel@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clReleaseCommandQueue@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clReleaseEvent@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueMapImage@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueWriteBuffer@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueCopyBufferRect@OPENCL_1.1'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateBuffer@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clFinish@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueReadImage@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueCopyBuffer@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clBuildProgram@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueFillBuffer@OPENCL_1.2'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetContextInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetKernelInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clReleaseDevice@OPENCL_1.2'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clSetKernelArg@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clRetainKernel@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateProgramWithBinary@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateUserEvent@OPENCL_1.1'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateContext@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clReleaseProgram@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetDeviceIDs@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueCopyImage@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clRetainDevice@OPENCL_1.2'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueFillImage@OPENCL_1.2'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetEventProfilingInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueUnmapMemObject@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetPlatformInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clRetainCommandQueue@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateProgramWithSource@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateSampler@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateKernel@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetDeviceInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clReleaseContext@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetEventInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetImageInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clRetainContext@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCompileProgram@OPENCL_1.2'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateCommandQueue@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueNDRangeKernel@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clSetUserEventStatus@OPENCL_1.1'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clWaitForEvents@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueReadBuffer@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueMapBuffer@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetMemObjectInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateImage@OPENCL_1.2'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueWriteBufferRect@OPENCL_1.1'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clRetainProgram@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clLinkProgram@OPENCL_1.2'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateSubDevices@OPENCL_1.2'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetProgramBuildInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetPlatformIDs@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetProgramInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clCreateSubBuffer@OPENCL_1.1'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clRetainMemObject@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clGetCommandQueueInfo@OPENCL_1.0'
/home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so: undefined reference to `clEnqueueReadBufferRect@OPENCL_1.1'
collect2: error: ld returned 1 exit status
samples/accessors/CMakeFiles/accessors.dir/build.make:96: recipe for target 'samples/accessors/accessors' failed
make[2]: *** [samples/accessors/accessors] Error 1
CMakeFiles/Makefile2:199: recipe for target 'samples/accessors/CMakeFiles/accessors.dir/all' failed
make[1]: *** [samples/accessors/CMakeFiles/accessors.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

Checking OpenCL info:

% clinfo
Number of platforms                               1
  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2580.4)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx900
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 1.2 AMD-APP (2580.4)
  Driver Version                                  2580.4 (PAL,HSAIL)
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         Radeon RX Vega
  Device Topology (AMD)                           PCI-E, 03:00.0
  Max compute units                               56
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1590MHz
  Graphics IP (AMD)                               9.0
  Device Partition                                (core)
    Max number of sub-devices                     56
    Supported partition types                     none specified
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              8573157376 (7.984GiB)
  Global free memory (AMD)                        8370112 (7.982GiB)
  Global memory channels (AMD)                    64
  Global memory banks per channel (AMD)           4
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           4244635648 (3.953GiB)
  Unified memory for Host and Device              No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               32768 (32KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max constant buffer size                        4244635648 (3.953GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1529492751692737778ns (Wed Jun 20 13:05:51 2018)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx900
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 AMD Accelerated Parallel Processing
    Device Name                                   gfx900

I wonder if the device is the problem (Vega56) as it is marked as UNTESTED:

% ../../computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/bin/computecpp_info                                                    
********************************************************************************

ComputeCpp Info (CE 0.8.0)

********************************************************************************

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : gfx900
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2580.4 (PAL,HSAIL)
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v0.8.0/platform-support-notes

********************************************************************************

My system:

%lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 16.04.4 LTS
Release:	16.04
Codename:	xenial
% uname -r
4.13.0-45-generic

Any hints would be greatly apprecciated!

-std=gnu++11 gets attached automatically

This is not exactly a computecpp-sdk issue, but I am unable to change the -std compiler flag as it gets overwritten by a default set (as I understand) by computecpp. My Makefile:

cmake_minimum_required(VERSION 3.2.2)
project(Tutorial)
set(CMAKE_CXX_STANDARD 11)

set(CMAKE_CXX_COMPILER "/home/dom/computecpp-ce-0.6.0/bin/compute++")

set(COMPUTECPP_PACKAGE_ROOT_DIR /home/dom/computecpp-ce-0.6.0)

add_executable(main ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp)

target_compile_options(main PRIVATE "-std=gnu++14")

Make verbose:

/home/dom/computecpp-ce-0.6.0/bin/compute++     -std=gnu++14 -std=gnu++11 -o CMakeFiles/main.dir/main.cpp.o -c /home/dom/playground/sycl/main.cpp

The -std=gnu++14 gets overwritten by -std=gnu++11 and I cannot figure out where it's coming from. Any advice?

Support for find_package in CMake

To uphold standard practice in the CMake community, it would be good to define a finder module for ComputeCpp compatible with find_package. This module would define COMPUTECPP_INCLUDE_DIR, COMPUTECPP_LIBRARY, etc. Due to the nature of compiling SYCL code with a device compiler, we need to have a set of helper functions that support adding SYCL targets easily. These can be defined in a second CMake module. However, it is currently not possible to obtain an integration header for manual inclusion easily, which makes some more complicated CMake build scripts more complicated than necessary. For this reason, we could provide an add_sycl_integration_header function, which would set a user-provided variable to the path of the integration header. add_sycl_to_target would then use this function internally, but also include the integraton header for the user.

Adding constructor for Queue_interface to accept the pointer mapper as an input

Error following Integration Guide using CMake

I was following integration guide using cmake for computecpp.

After running cmake command for the project, I get

-- platform - your system can support ComputeCpp
-- compute++ flags - -O2;-mllvm;-inline-threshold=1000;-sycl;-intelspirmetadata;-sycl-target;spir64
CMake Error at /home/akshit/Libraries/computecpp-sdk/cmake/Modules/FindComputeCpp.cmake:420 (target_link_libraries):
  Cannot specify link libraries for target "PUBLIC" which is not built by
  this project.
Call Stack (most recent call first):
  CMakeLists.txt:13 (add_sycl_to_target)


-- Configuring incomplete, errors occurred!
See also "/home/akshit/Projects/012_SYCL_Hello_World/build/CMakeFiles/CMakeOutput.log".

Can someone help me to compile a simple hello world with cmake ?

Currently, my whole CMakeLists.txt file is as follows:

project(syclProgram)

cmake_minimum_required(VERSION 3.2.2)

set(CMAKE_MODULE_PATH /home/akshit/Libraries/computecpp-sdk/cmake/Modules/)

include(FindComputeCpp)

include_directories(${COMPUTECPP_INCLUDE_DIRECTORY})

add_executable(syclProgram ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp)

add_sycl_to_target(syclProgram 
     ${CMAKE_CURRENT_SOURCE_DIR}/main.cpp
     ${CMAKE_CURRENT_BINARY_DIR}
     )

Thanks

vec type conversion error in device code

Hi,

I created a template function to read from an accessor with id range check(when id is out of range, return a default-constructed value instead):

template<typename AccessorT>
inline typename AccessorT::value_type guarded_read(AccessorT& acc, const sycl::id<2>& id)
{
	auto range = acc.get_range();
	return (id[0] < range[0] && id[1] < range[1]) ? acc[id] : typename AccessorT::value_type();
}

This function works when AccessorT::value_type is a scalar type but when AccessorT::value_type is a vec type such as cl::sycl::cl_uchar16, compiler errors occurred:

1>  C:/Develop/TestSYCL/TestSYCL.cpp:36:9: error: no viable conversion from returned value of type 'unsigned char __attribute__((ext_vector_type(16)))' (vector of 16 'unsigned char' values) to function return type 'const cl::sycl::vec<unsigned char, 16>'
1>          return (id[0] < range[0] && id[1] < range[1]) ? acc[id] : typename AccessorT::value_type();
1>                 ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1>  C:/Program Files/Codeplay/ComputeCpp/include\SYCL/vec.h:9461:3: note: candidate constructor not viable: no known conversion from 'unsigned char __attribute__((ext_vector_type(16)))' (vector of 16 'unsigned char' values) to 'const vec<unsigned char, 16> &' for 1st argument
1>    vec(const vec<dataT, kElems> &rhs) {
1>    ^
1>  C:/Program Files/Codeplay/ComputeCpp/include\SYCL/vec.h:9437:3: note: candidate template ignored: could not match 'swizzled_vec<unsigned char, kElemsRhs, kIndexRhsN...>' against 'unsigned char __attribute__((ext_vector_type(16)))' (vector of 16 'unsigned char' values)
1>    vec(const swizzled_vec<dataT, kElemsRhs, kIndexRhsN...> &rhs) {

I'm using ComputeCpp 0.5.0 with Visual Studio 2015.

Update: the following variations of the function do compile without errors:

template<typename AccessorT>
inline typename AccessorT::value_type guarded_read(AccessorT& acc, const sycl::id<2>& id)
{
	auto range = acc.get_range();
	if (id[0] < range[0] && id[1] < range[1])
		return acc[id];
	else
		return typename AccessorT::value_type();
}

template<typename AccessorT>
inline typename AccessorT::value_type guarded_read(AccessorT& acc, const sycl::id<2>& id)
{
	auto range = acc.get_range();
	return (id[0] < range[0] && id[1] < range[1]) ? typename AccessorT::value_type(acc[id]) : typename AccessorT::value_type();
}

Segfault on TF tutorials

ComputeCpp Info (CE 0.5.0)
GLIBC version: 2.26
GLIBCXX: 20160609
This version of libstdc++ is supported.

Device Info:
Device 0:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : Loveland
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 1800.11
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 
Device 1:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : AMD E-350 Processor
  CL_DEVICE_VENDOR                        : AuthenticAMD
  CL_DRIVER_VERSION                       : 1800.11 (sse2)
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_CPU

Gdb backtrace with MNIST

#0  0x00007f91d284712d in ?? () from /usr/lib/libamdocl12cl64.so
#1  0x00007f91d1e67eb9 in ?? () from /usr/lib/libamdocl12cl64.so
#2  0x00007f91d1fe6e1d in ?? () from /usr/lib/libamdocl12cl64.so
#3  0x00007f91d1e635be in ?? () from /usr/lib/libamdocl12cl64.so
#4  0x00007f91d283978f in ?? () from /usr/lib/libamdocl12cl64.so
#5  0x00007f91d2839b83 in ?? () from /usr/lib/libamdocl12cl64.so
#6  0x00007f91d2839e3f in ?? () from /usr/lib/libamdocl12cl64.so
#7  0x00007f91d2839f7c in ?? () from /usr/lib/libamdocl12cl64.so
#8  0x00007f91d194fb0a in ?? () from /usr/lib/libamdocl12cl64.so
#9  0x00007f91d194fdb0 in ?? () from /usr/lib/libamdocl12cl64.so
#10 0x00007f91d195c60a in ?? () from /usr/lib/libamdocl12cl64.so
#11 0x00007f91d195e71c in ?? () from /usr/lib/libamdocl12cl64.so
#12 0x00007f91b52578c9 in aclCompile () from /usr/lib/libamdocl64.so
#13 0x00007f91b495a0c5 in ?? () from /usr/lib/libamdocl64.so
#14 0x00007f91b497e3dc in ?? () from /usr/lib/libamdocl64.so
#15 0x00007f91b492802f in ?? () from /usr/lib/libamdocl64.so
#16 0x00007f91b4938120 in ?? () from /usr/lib/libamdocl64.so
#17 0x00007f91b49190e0 in clBuildProgram () from /usr/lib/libamdocl64.so
#18 0x00007f91dc5b465b in clBuildProgram () from /usr/lib/libOpenCL.so.1
#19 0x00007f91dcb9af00 in cl::sycl::detail::program::build_current_program(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#20 0x00007f91dcb9b26e in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#21 0x00007f91dcb9b4dd in cl::sycl::detail::program::create_program_for_binary(unsigned char const*, int, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#22 0x00007f91dcb1da6e in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /opt/ComputeCpp-CE-0.5.0-Ubuntu-16.04-64bit/lib/libComputeCpp.so
#23 0x00007f91e162cdf7 in cl::sycl::program cl::sycl::program::create_program_for_kernel<tensorflow::functor::FillRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> > >(cl::sycl::context) ()
   from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
#24 0x00007f91e162c7fc in void cl::sycl::handler::parallel_for_impl<tensorflow::functor::FillRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float> >, tensorflow::functor::FillPhiloxRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>, true> >(cl::sycl::detail::nd_range_base const&, tensorflow::functor::FillPhiloxRandomKernel<tensorflow::random::TruncatedNormalDistribution<tensorflow::random::SingleSampleAdapter<tensorflow::random::PhiloxRandom>, float>, true> const&) () from /usr/lib/python3.6/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so

Memory issues in matrix_multiply example

Reproduced this on both Hawaii and Fiji GPUs. When given matrix size is not a power of two, it fails.
Steps to reproduce:

$ git clone https://github.com/codeplaysoftware/computecpp-sdk
$ cd computecpp-sdk/samples
$ cmake . -DCOMPUTECPP_PACKAGE_ROOT_DIR=/usr/local
$ cd matrix_multiply
$ ./matrix_multiply 32
 Input matrix 
C++: Time: 0
GFLOPs: inf
 The Device Max Work Group Size is : 256
 The order is : 32
 The blockSize is : 8
SYCL: Time: 24
GFLOPs: 0.00273067
 Output 
Success
$ ./matrix_multiply 33
 Input matrix 
C++: Time: 1
GFLOPs: 0.071874
*** Error in `./matrix_multiply': free(): invalid next size (normal): 0x00000000015a7d40 ***
Aborted (core dumped)

I am hoping this is only a problem in the sample?

Building samples error

Have been following the guide on building the samples, and running cmake works fine. However, make fails with the following output

[  2%] Building ComputeCpp integration header file /hdd/repos/computecpp-sdk/build/samples/simple-local-barrier/simple-local-barrier.cpp.sycl
compute++: error: no such file or directory: 'Error:'
compute++: error: no such file or directory: 'Fail'
compute++: error: no such file or directory: 'to'
compute++: error: no such file or directory: 'load'
compute++: error: no such file or directory: 'fglrx'
compute++: error: no such file or directory: 'kernel'
compute++: error: no such file or directory: 'module!'
compute++: error: no input files
make[2]: *** [samples/simple-local-barrier/simple-local-barrier.cpp.sycl] Error 1
make[1]: *** [samples/simple-local-barrier/CMakeFiles/simple-local-barrier_simple-local-barrier.cpp_0_ih.dir/all] Error 2
make: *** [all] Error 2

Unsure what direction to head in, tried poking around removing and reinstalling fglrx via apt, to no avail. Aid would be appreciated

Make sample that has both synchronous and asynchronous exceptions

SYCL code can have both synchronous and asynchronous exceptions. While our samples very well demonstrate async handlers, errors that happen synchronously are exhibited less, and are dealt with even more poorly. To that end we should add a sample that shows some of the ways that synchronous exceptions can be thrown (e.g. a device selector that returns a negative number for all devices, contexts constructed from invalid OpenCL contexts - there are lots of possibilities).

Program execution on GPU throwing "cl::sycl::detail::exception_implementation"

After the setup performed here, I am able to build sdk samples successfully (/usr/local/lib/libOpenCL.so -> /opt/rocm/opencl/lib/x86_64/libamdocl64.so).

Now, executing the binaries on an AMD Vega 56 throws the following:

./hello-world 
Running on gfx900
terminate called after throwing an instance of 'cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)7, cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)6, cl::sycl::exception> >'
[1]    30893 abort (core dumped)  ./hello-world

This is what I get using gdb and "thread apply all bt":

% gdb ./hello-world
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./hello-world...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-sdk/build/samples/hello-world/hello-world 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff1494700 (LWP 31388)]
[New Thread 0x7ffff0c93700 (LWP 31389)]
[New Thread 0x7ffff0492700 (LWP 31390)]
[New Thread 0x7fffefc91700 (LWP 31391)]
[New Thread 0x7fffef490700 (LWP 31392)]
[New Thread 0x7fffee44f700 (LWP 31395)]
[New Thread 0x7fffec944700 (LWP 31396)]
[New Thread 0x7fffe7fff700 (LWP 31397)]
[New Thread 0x7fffe77fe700 (LWP 31398)]
[New Thread 0x7fffe6ffd700 (LWP 31399)]
[New Thread 0x7ffff7fa7700 (LWP 31400)]
Running on gfx900
terminate called after throwing an instance of 'cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)7, cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)6, cl::sycl::exception> >'

Thread 1 "hello-world" received signal SIGABRT, Aborted.
0x00007ffff2d2f428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) thread apply all bt

Thread 12 (Thread 0x7ffff7fa7700 (LWP 31400)):
#0  0x00007ffff2aec827 in futex_abstimed_wait_cancelable (private=0, abstime=0x0, expected=0, futex_word=0x7b6848)
    at ../sysdeps/unix/sysv/linux/futex-internal.h:205
#1  do_futex_wait (sem=sem@entry=0x7b6848, abstime=0x0) at sem_waitcommon.c:111
#2  0x00007ffff2aec8d4 in __new_sem_wait_slow (sem=0x7b6848, abstime=0x0) at sem_waitcommon.c:181
#3  0x00007ffff2aec97a in __new_sem_wait (sem=<optimized out>) at sem_wait.c:29
#4  0x00007ffff3b7b710 in ?? () from /usr/local/lib/libOpenCL.so
#5  0x00007ffff3b79556 in ?? () from /usr/local/lib/libOpenCL.so
#6  0x00007ffff3b6ff3f in ?? () from /usr/local/lib/libOpenCL.so
#7  0x00007ffff3b704fd in ?? () from /usr/local/lib/libOpenCL.so
#8  0x00007ffff3ad9d4f in ?? () from /usr/local/lib/libOpenCL.so
#9  0x00007ffff3b79b5c in ?? () from /usr/local/lib/libOpenCL.so
#10 0x00007ffff2ae46ba in start_thread (arg=0x7ffff7fa7700) at pthread_create.c:333
#11 0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 11 (Thread 0x7fffe6ffd700 (LWP 31399)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff5d8730d in ?? () from /usr/local/lib/libOpenCL.so
#2  0x00007ffff5ca90aa in ?? () from /usr/local/lib/libOpenCL.so
#3  0x00007ffff2ae46ba in start_thread (arg=0x7fffe6ffd700) at pthread_create.c:333
#4  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 10 (Thread 0x7fffe77fe700 (LWP 31398)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff5d8730d in ?? () from /usr/local/lib/libOpenCL.so
#2  0x00007ffff5ca90aa in ?? () from /usr/local/lib/libOpenCL.so
#3  0x00007ffff2ae46ba in start_thread (arg=0x7fffe77fe700) at pthread_create.c:333
#4  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 9 (Thread 0x7fffe7fff700 (LWP 31397)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff5d8730d in ?? () from /usr/local/lib/libOpenCL.so
#2  0x00007ffff5ca90aa in ?? () from /usr/local/lib/libOpenCL.so
#3  0x00007ffff2ae46ba in start_thread (arg=0x7fffe7fff700) at pthread_create.c:333
#4  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 8 (Thread 0x7fffec944700 (LWP 31396)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff5d8730d in ?? () from /usr/local/lib/libOpenCL.so
#2  0x00007ffff5ca8d52 in ?? () from /usr/local/lib/libOpenCL.so
#3  0x00007ffff2ae46ba in start_thread (arg=0x7fffec944700) at pthread_create.c:333
#4  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 7 (Thread 0x7fffee44f700 (LWP 31395)):
#0  0x00007ffff2df6f47 in ioctl () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff1f03f88 in ?? () from /opt/rocm/libhsakmt/lib/libhsakmt.so.1
#2  0x00007ffff1efe42f in hsaKmtWaitOnMultipleEvents () from /opt/rocm/libhsakmt/lib/libhsakmt.so.1
#3  0x00007ffff287aac3 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#4  0x00007ffff2867006 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#5  0x00007ffff2876ad2 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#6  0x00007ffff2841817 in ?? () from /opt/rocm/hsa/lib/libhsa-runtime64.so.1
#7  0x00007ffff2ae46ba in start_thread (arg=0x7fffee44f700) at pthread_create.c:333
#8  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 6 (Thread 0x7fffef490700 (LWP 31392)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff338d91c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff7a8275e in std::thread::_Impl<std::_Bind_simple<cl::sycl::detail::worker_thread::start()::{lambda()#1} ()> >::_M_run() ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#3  0x00007ffff3392c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff2ae46ba in start_thread (arg=0x7fffef490700) at pthread_create.c:333
#5  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 5 (Thread 0x7fffefc91700 (LWP 31391)):
---Type <return> to continue, or q <return> to quit---
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff338d91c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff7a8275e in std::thread::_Impl<std::_Bind_simple<cl::sycl::detail::worker_thread::start()::{lambda()#1} ()> >::_M_run() ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#3  0x00007ffff3392c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff2ae46ba in start_thread (arg=0x7fffefc91700) at pthread_create.c:333
#5  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7ffff0492700 (LWP 31390)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff338d91c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff7a8275e in std::thread::_Impl<std::_Bind_simple<cl::sycl::detail::worker_thread::start()::{lambda()#1} ()> >::_M_run() ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#3  0x00007ffff3392c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff2ae46ba in start_thread (arg=0x7ffff0492700) at pthread_create.c:333
#5  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (Thread 0x7ffff0c93700 (LWP 31389)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00007ffff338d91c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#2  0x00007ffff7a8275e in std::thread::_Impl<std::_Bind_simple<cl::sycl::detail::worker_thread::start()::{lambda()#1} ()> >::_M_run() ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#3  0x00007ffff3392c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff2ae46ba in start_thread (arg=0x7ffff0c93700) at pthread_create.c:333
#5  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7ffff1494700 (LWP 31388)):
#0  0x00007ffff2de4827 in sched_yield () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007ffff7ab4e49 in cl::sycl::detail::scheduler::scheduler_loop() ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#2  0x00007ffff3392c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff2ae46ba in start_thread (arg=0x7ffff1494700) at pthread_create.c:333
#4  0x00007ffff2e0141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 1 (Thread 0x7ffff7fa9780 (LWP 31383)):
#0  0x00007ffff2d2f428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007ffff2d3102a in __GI_abort () at abort.c:89
#2  0x00007ffff336984d in __gnu_cxx::__verbose_terminate_handler() ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007ffff33676b6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007ffff3367701 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007ffff3367919 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x00007ffff7ad39fd in void cl::sycl::detail::handle_sycl_log<cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)7, cl::sycl::detail::exception_implementation<(cl::sycl::detail::exception_types)6, cl::sycl::exception> > >(cl::sycl::detail::sycl_log&&) ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#7  0x00007ffff7acf0a3 in cl::sycl::detail::trigger_sycl_log(cl::sycl::log_type, char const*, int, int, cl::sycl::detail::cpp_error_code, cl::sycl::detail::context const*, char const*) ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#8  0x00007ffff7ae8bd7 in cl::sycl::detail::program::create_from_binary(unsigned char const*, unsigned long) ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#9  0x00007ffff7aeb9f6 in cl::sycl::detail::program::build(unsigned char const*, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
---Type <return> to continue, or q <return> to quit---
#10 0x00007ffff7adbe56 in cl::sycl::detail::context::create_program_for_binary(std::shared_ptr<cl::sycl::detail::context> const&, unsigned char const*, int, bool) ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#11 0x00007ffff7b0273f in cl::sycl::program::create_program_for_kernel_impl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>, bool) ()
   from /home/lvs/ESA/dev-ocladock-syclmaster/computecpp-installer/ComputeCpp-CE-0.8.0-Ubuntu-16.04-x86_64/lib/libComputeCpp.so
#12 0x000000000040bbda in cl::sycl::program cl::sycl::program::create_program_for_kernel<hello_world>(cl::sycl::context) ()
#13 0x0000000000409eb9 in void cl::sycl::handler::single_task_impl<hello_world, main::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda()#1}>(main::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda()#1} const&) ()
#14 0x0000000000409d4b in void cl::sycl::handler::single_task<hello_world, main::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda()#1}>(main::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const::{lambda()#1} const&) ()
#15 0x0000000000409b2d in main::{lambda(cl::sycl::handler&)#1}::operator()(cl::sycl::handler&) const ()
#16 0x000000000040a117 in cl::sycl::event cl::sycl::detail::command_group::submit_handler<main::{lambda(cl::sycl::handler&)#1}>(main::{lambda(cl::sycl::handler&)#1}, std::shared_ptr<cl::sycl::detail::queue> const&, cl::sycl::detail::standard_handler_tag) ()
#17 0x0000000000409dde in cl::sycl::event cl::sycl::queue::submit<main::{lambda(cl::sycl::handler&)#1}>(main::{lambda(cl::sycl::handler&)#1}) ()
#18 0x0000000000409c72 in main ()
(gdb)

Moreover, setting the following /usr/local/lib/libOpenCL.so -> /opt/amdgpu-pro/lib/x86_64-linux-gnu/libamdocl-orca64.so, results in a full execution on the host CPU, but not yet on our Vega56 GPU:

% ./hello-world
Running on Host Device
Hello, World!

% ./matrix-multiply 32 sycl
 Input matrix 
 ***** SYCL 
 The Device Max Work Group Size is : 1024
 The order is : 32
 The blockSize is : 16
SYCL: Time: 51
GFLOPs: 0.00128502
 Output 
Success

I would really apprecciate if you give some hints!

[Resolved] GCC libstdcxx ABI version issue on GCC 6.3.0

The initial CMake output:

-- The C compiler identification is GNU 6.3.0
-- The CXX compiler identification is GNU 6.3.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for CL_VERSION_2_0
-- Looking for CL_VERSION_2_0 - found
-- Found OpenCL: /usr/lib/libOpenCL.so (found version "2.0") 
-- host compiler - gcc 6.3.0
-- ComputeCpp package - Found
-- compute++ - Found
-- computecpp_info - Found
-- libComputeCpp.so - Found
-- ComputeCpp includes - Found
-- Package version - CE 0.2.0
-- compute++ flags - -O2 -mllvm -inline-threshold=1000 -sycl -intelspirmetadata -emit-llvm
-- platform - your system can support ComputeCpp
-- Configuring done
-- Generating done
-- Build files have been written to: /home/wojtek/Programming/C++/ccpp-sdk/samples/build

and the make output:

Scanning dependencies of target accessors_accessors.cpp_0_integration_header
[  1%] Building ComputeCpp integration header file /home/wojtek/Programming/C++/ccpp-sdk/samples/build/accessors/accessors.cpp.sycl
/home/wojtek/Programming/C++/ccpp/bin/compute++: /usr/lib/libtinfo.so.5: no version information available (required by /home/wojtek/Programming/C++/ccpp/bin/compute++)
/home/wojtek/Programming/C++/ccpp/bin/compute++: /usr/lib/libtinfo.so.5: no version information available (required by /home/wojtek/Programming/C++/ccpp/bin/compute++)
[  1%] Built target accessors_accessors.cpp_0_integration_header
Scanning dependencies of target accessors
[  3%] Building CXX object accessors/CMakeFiles/accessors.dir/accessors.cpp.o
[  5%] Linking CXX executable accessors
CMakeFiles/accessors.dir/accessors.cpp.o: In function `cl::sycl::program cl::sycl::program::create_program_for_kernel<multiply>(cl::sycl::context)':
accessors.cpp:(.text._ZN2cl4sycl7program25create_program_for_kernelI8multiplyEES1_NS0_7contextE[_ZN2cl4sycl7program25create_program_for_kernelI8multiplyEES1_NS0_7contextE]+0x1f5): undefined reference to `cl::sycl::program::create_program_for_kernel_impl(std::string, unsigned char const*, int, char const* const*, std::shared_ptr<cl::sycl::detail::context>)'
CMakeFiles/accessors.dir/accessors.cpp.o: In function `cl::sycl::kernel cl::sycl::program::get_kernel<multiply>() const':
accessors.cpp:(.text._ZNK2cl4sycl7program10get_kernelI8multiplyEENS0_6kernelEv[_ZNK2cl4sycl7program10get_kernelI8multiplyEENS0_6kernelEv]+0x59): undefined reference to `cl::sycl::program::get_kernel_impl(std::string) const'
accessors.cpp:(.text._ZNK2cl4sycl7program10get_kernelI8multiplyEENS0_6kernelEv[_ZNK2cl4sycl7program10get_kernelI8multiplyEENS0_6kernelEv]+0xae): undefined reference to `cl::sycl::program::get_kernel_impl(std::string) const'
collect2: error: ld returned 1 exit status
make[2]: *** [accessors/CMakeFiles/accessors.dir/build.make:97: accessors/accessors] Error 1
make[1]: *** [CMakeFiles/Makefile2:87: accessors/CMakeFiles/accessors.dir/all] Error 2
make: *** [Makefile:95: all] Error 2

Unable to link to standard libraries?

Hello, I am having issues adding SYCL to my target application. My situation is that I need to link to standard libraries, however using add_sycl_to_target seems to give me an error when I try to use target_link_libraries.

# For testing
set(common_libraries stdc++ pthread)
add_executable(reduction ${CMAKE_CURRENT_SOURCE_DIR}/reduction.cpp)

target_link_libraries(reduction ${common_libraries})
add_sycl_to_target(reduction ${CMAKE_CURRENT_BINARY_DIR} ${CMAKE_CURRENT_SOURCE_DIR}/reduction.cpp)

The error:

CMake Error at /home/dom/computecpp-sdk/cmake/Modules/FindComputeCpp.cmake:366 (target_link_libraries):
  The plain signature for target_link_libraries has already been used with
  the target "reduction".  All uses of target_link_libraries with a target
  must be either all-keyword or all-plain.

Now after a bit of googling I understand that it's a generic CMake error involving dependencies as mix of keywords and plaintext, but I still cannot seem to figure out how to resolve the issue. Any advice is appreciated.

Copyright year in FindComputeCpp.cmake

constexpr problems with MSVC

I've noticed that the Gaussian blur sample fails to compile with Visual Studio 2015 using ComputeCpp 0.6:

gaussian-blur.cpp(159): error C2131: expression did not evaluate to a constant
gaussian-blur.cpp(159): note: failure was caused by non-constant arguments or reference to a non-constant symbol

This is referring to this line of the GaussianKernel kernel lambda:

constexpr auto offset = 3 * stddev;

Even though stddev is a constexpr variable that was declared outside the kernel lambda, it's not considered constexpr inside the kernel lambda.

I believe this is a Visual Studio bug that's not even been fixed in MSVC 2017 (https://godbolt.org/g/t7n1uW).

Interestingly, it seems that making the constexpr variable static constexpr solves this problem, so maybe this should be the default way of using constexpr variables inside kernels.

Warn user about blacklisted device (allow ovverride too?)

2017-12-17 12:00:03.729830: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:45] No OpenCL accelerator nor GPU found that is supported by ComputeCpp/triSYCL trying OpenCL CPU
2017-12-17 12:00:03.729943: W ./tensorflow/core/common_runtime/sycl/sycl_device.h:52] No OpenCL CPU found that is supported by ComputeCpp/triSYCL, checking for host sycl device
2017-12-17 12:00:03.729990: F ./tensorflow/core/common_runtime/sycl/sycl_device.h:67] No SYCL host and no OpenCL device found that is supported by ComputeCPP/triSYCL

Other CL programs have no problems utilizing it.

ComputeCpp Info (CE 0.5.0)
Toolchain information:

GLIBC version: 2.26
GLIBCXX: 20160609
This version of libstdc++ is supported.

Device Info:
Discovered 1 devices matching:
  platform    : <any>
  device type : <any>

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Untested OS
  CL_DEVICE_NAME                          : AMD E-350 Processor
  CL_DEVICE_VENDOR                        : AuthenticAMD
  CL_DRIVER_VERSION                       : 1912.5 (sse2)
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_CPU

The odd thing is that it is recognized here.

cl::sycl::exception

I make this sdk with opencl1.2 .It can make successfully ,but When I run hello world ,I get this error

terminate called after throwing an instance of 'cl::sycl::exception'
Aborted (core dumped)

this is my environment
ubuntu 16.04 gcc 5.4.0 computecpp 0.5.0

clinfo :

Number of platforms 1
Platform Name AMD Accelerated Parallel Processing
Platform Vendor Advanced Micro Devices, Inc.
Platform Version OpenCL 2.0 AMD-APP (2482.3)
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_amd_event_callback cl_amd_offline_devices
Platform Extensions function suffix AMD

Platform Name AMD Accelerated Parallel Processing
Number of devices 1
Device Name Hainan
Device Vendor Advanced Micro Devices, Inc.
Device Vendor ID 0x1002
Device Version OpenCL 1.2 AMD-APP (2482.3)
Driver Version 2482.3
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Board Name (AMD) AMD Radeon HD 8500M
Device Topology (AMD) PCI-E, 04:00.0
Max compute units 4
SIMD per compute unit (AMD) 4
SIMD width (AMD) 16
SIMD instruction width (AMD) 1
Max clock frequency 850MHz
Graphics IP (AMD) 6.0
Device Partition (core)
Max number of sub-devices 4
Supported partition types none specified
Max work item dimensions 3
Max work item sizes 256x256x256
Max work group size 256
Preferred work group size multiple 64
Wavefront width (AMD) 64
Preferred / native vector sizes
char 4 / 4
short 2 / 2
int 1 / 1
long 1 / 1
half 1 / 1 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals No
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 32, Little-Endian
Global memory size 2140311552 (1.993GiB)
Global free memory (AMD) <printDeviceInfo:68: get number of CL_DEVICE_GLOBAL_FREE_MEMORY_AMD : error -33>
Global memory channels (AMD) 2
Global memory banks per channel (AMD) 8
Global memory bank width (AMD) 256 bytes
Error Correction support No
Max memory allocation 1591773593 (1.482GiB)
Unified memory for Host and Device No
Minimum alignment for any data type 128 bytes
Alignment of base address 2048 bits (256 bytes)
Global Memory cache type Read/Write
Global Memory cache size 16384
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 16
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 256 bytes
Pitch alignment for 2D image buffers 256 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 128
Max number of write image args 8
Local memory type Local
Local memory size 32768 (32KiB)
Local memory syze per CU (AMD) 65536 (64KiB)
Local memory banks (AMD) 32
Max constant buffer size 65536 (64KiB)
Max number of constant args 8
Max size of kernel argument 1024
Queue properties
Out-of-order execution No
Profiling Yes
Prefer user sync for interop Yes
Profiling timer resolution 1ns
Profiling timer offset since Epoch (AMD) 1514963480756412943ns (Wed Jan 3 15:11:20 2018)
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Thread trace supported (AMD) No
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

NULL platform behavior
clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...) AMD Accelerated Parallel Processing
clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...) Success [AMD]
clCreateContext(NULL, ...) [default] Success [AMD]
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Hainan
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM) No devices found in platform
clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL) Success (1)
Platform Name AMD Accelerated Parallel Processing
Device Name Hainan

ICD loader properties
ICD loader Name OpenCL ICD Loader
ICD loader Vendor OCL Icd free software
ICD loader Version 2.2.8
ICD loader Profile OpenCL 1.2
NOTE: your OpenCL library declares to support OpenCL 1.2,
but it seems to support up to OpenCL 2.1 too.

computecpp_info :

ComputeCpp Info (CE 0.5.0)

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

Device Info:

Discovered 1 devices matching:
platform :
device type :

Device 0:

Device is supported : UNTESTED - Vendor not tested on this OS
CL_DEVICE_NAME : Hainan
CL_DEVICE_VENDOR : Advanced Micro Devices, Inc.
CL_DRIVER_VERSION : 2482.3
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU

ARM cross-compilation (tl;dr: use proper SPIR target)

As mentionned in mozilla/DeepSpeech#1346 I'm currently investigating how much we can rely on the OpenCL VC4CL [https://github.com/doe300/VC4CL#opencl-support] driver to leverage RPi3's GPU.

So far, I built successfully the driver with a linaro cross-compiler and vc4c's testsuite somehow works. I could also verify that comptecpp_info can at least see the things.

Now I am facing a dumb issue: how to cross-compile for ARM from SYCL branches. We have setup to cross-compile for ARM and ARMv8 on https://github.com/mozilla/tensorflow, so I blindly did a configure step referencing the ARM version of ComputeCpp:

echo "" | TF_NEED_GCP=0 TF_NEED_GDR=0 TF_NEED_HDFS=0 TF_NEED_S3=0 TF_NEED_JEMALLOC=1 TF_ENABLE_XLA=0 TF_NEED_MKL=0 TF_NEED_VERBS=0 TF_NEED_MPI=0 TF_NEED_CUDA=0 TF_NEED_OPENCL_SYCL=1 TF_NEED_COMPUTECPP=1 COMPUTECPP_TOOLKIT_PATH=../ComputeCpp-CE-0.7.0-Ubuntu-14.04-ARM_32/ TF_USE_DOUBLE_SYCL=0 TF_USE_HALF_SYCL=0 ./configure

And then, I built it:

bazel build --config=sycl -s -j 96 --config=monolithic  --config=rpi3 --config=rpi3_opt -c opt --copt=-fvisibility=hidden --copt=-DCTC_DISABLE_OMP --verbose_failures //native_client:libdeepspeech.so //native_client:deepspeech_utils

This do build an ARM lib, linked with libComputeCpp.so. But at runtime, it does not seems like it runs OpenCL.

Now, I've also stumbled upon TF_SYCL_CROSS_TOOLCHAIN and TF_SYCL_CROSS_TOOLCHAIN_NAME, but they lack of documentation, and trying to use them do fail:

$ echo "" | TF_NEED_GCP=0 TF_NEED_GDR=0 TF_NEED_HDFS=0 TF_NEED_S3=0 TF_NEED_JEMALLOC=1 TF_ENABLE_XLA=0 TF_NEED_MKL=0 TF_NEED_VERBS=0 TF_NEED_MPI=0 TF_NEED_CUDA=0 TF_NEED_OPENCL_SYCL=1 TF_NEED_COMPUTECPP=1 COMPUTECPP_TOOLKIT_PATH=ComputeCpp-CE-0.7.0-Ubuntu-14.04-ARM_32/ TF_SYCL_CROSS_TOOLCHAIN=gcc-linaro-4.9.4-2017.01-x86_64_arm-linux-gnueabihf/bin/ TF_SYCL_CROSS_TOOLCHAIN_NAME=arm-linux-gnueabihf- TF_USE_DOUBLE_SYCL=0 TF_USE_HALF_SYCL=0 ./configure 
$ bazel build --config=sycl -s -j 96 --config=monolithic -c opt --copt=-fvisibility=hidden --copt=-DCTC_DISABLE_OMP --verbose_failures //native_client:libdeepspeech.so //native_client:deepspeech_utils
[...]
  ComputeCpp-CE-0.7.0-Ubuntu-14.04-ARM_32/bin/compute -ffunction-sections -fdata-sections -fPIE -fno-omit-frame-pointer -Wall -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections -DGEMMLOWP_ALLOW_SLOW_SCALAR_FALLBACK '-DDISABLE_SKINNY=1' '-fvisibility=hidden' -DCTC_DISABLE_OMP '-std=c++11' -fsycl-ih-last -sycl-driver -Xclang -cl-denorms-are-zero -Xclang -cl-fp32-correctly-rounded-divide-sqrt -Xclang -cl-mad-enable -sycl-target spir64 '-DTENSORFLOW_USE_SYCL=1' '-DEIGEN_USE_SYCL=1' '-DEIGEN_HAS_C99_MATH=1' '-DEIGEN_HAS_CXX11_MATH=1' -Wno-unused-variable -Wno-unused-const-variable '-DTENSORFLOW_SYCL_NO_HALF=1' '-DTENSORFLOW_SYCL_NO_DOUBLE=1' -MD -MF bazel-out/k8-opt/bin/external/double_conversion/_objs/double-conversion/external/double_conversion/double-conversion/diy-fp.pic.d '-frandom-seed=bazel-out/k8-opt/bin/external/double_conversion/_objs/double-conversion/external/double_conversion/double-conversion/diy-fp.pic.o' -fPIC -iquote external/double_conversion -iquote bazel-out/k8-opt/genfiles/external/double_conversion -iquote external/bazel_tools -iquote bazel-out/k8-opt/genfiles/external/bazel_tools -isystem external/double_conversion -isystem bazel-out/k8-opt/genfiles/external/double_conversion -isystem external/bazel_tools/tools/cpp/gcc3 -Wno-builtin-macro-redefined '-D__DATE__="redacted"' '-D__TIMESTAMP__="redacted"' '-D__TIME__="redacted"' -no-canonical-prefixes -c external/double_conversion/double-conversion/diy-fp.cc -o bazel-out/k8-opt/bin/external/double_conversion/_objs/double-conversion/external/double_conversion/double-conversion/diy-fp.pic.o)
ComputeCpp-CE-0.7.0-Ubuntu-14.04-ARM_32/bin/compute: 1 ComputeCpp-CE-0.7.0-Ubuntu-14.04-ARM_32/bin/compute: Syntax error: word unexpected (expecting ")")

which seems expected, given ComputeCpp-CE-0.7.0-Ubuntu-14.04-ARM_32/bin/compute++: ELF 32-bit LSB executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for GNU/Linux 2.6.32, BuildID[sha1]=df02ab122bb64fc87724de838f7d5a45b8e3f1a5, not stripped

So, what step am I missing to be able to cross-compile ?