Giter Site home page Giter Site logo

adaptivecpp / adaptivecpp Goto Github PK

View Code? Open in Web Editor NEW
1.0K 40.0 150.0 12 MB

Implementation of SYCL and C++ standard parallelism for CPUs and GPUs from all vendors: The independent, community-driven compiler for C++-based heterogeneous programming models. Lets applications adapt themselves to all the hardware in the system - even at runtime!

Home Page: https://adaptivecpp.github.io/

License: BSD 2-Clause "Simplified" License

C++ 94.15% CMake 2.44% Python 1.68% Shell 1.70% C 0.02%
sycl gpgpu high-performance gpu-computing high-performance-computing hipsycl opensycl stdpar adaptivecpp compiler

adaptivecpp's Introduction

Project logo

AdaptiveCpp (formerly known as hipSYCL / Open SYCL)

(Note: This project is currently in progress of changing its name to AdaptiveCpp due to external legal pressure. Documentation and code may still use the older name hipSYCL / Open SYCL)

AdaptiveCpp is the independent, community-driven modern platform for C++-based heterogeneous programming models targeting CPUs and GPUs from all major vendors. AdaptiveCpp lets applications adapt themselves to all the hardware found in the system. This includes use cases where a single binary needs to be able to target all supported hardware, or utilize hardware from different vendors simultaneously.

It currently supports the following programming models:

  1. SYCL: At its core is a highly competitive and flexible SYCL implementation that supports many compilation flows.
  2. C++ standard parallelism: Additionally, AdaptiveCpp features experimental support for offloading C++ algorithms from the parallel STL. See here for details on which algorithms can be offloaded. AdaptiveCpp is currently the only solution capable of demonstrating C++ standard parallelism performance across NVIDIA, AMD and Intel GPUs, and in most cases outperforms vendor compilers.

AdaptiveCpp supports CPUs (including x86, arm and other LLVM-supported architectures) as well as GPUs from Intel, NVIDIA, and AMD. This includes the ability to generate a single binary that can offload to all supported devices.

AdaptiveCpp's compilation flows include

  1. A powerful, generic LLVM JIT compiler. This is AdaptiveCpp's default, most portable and usually most performant compilation flow. It is also the world's only SYCL compiler that only needs to parse the source code a single time across both host and device compilation.
  2. Compilation flows focused on providing interoperability at source code level with vendor programming models (including e.g. the ability to mix-and-match CUDA and SYCL in the same source file)
  3. Library-only compilation flows focused on deployment simplicity. These compilation flows allow utilizing third-party compilers, with AdaptiveCpp merely acting as a library.

A full list of its compilation capabilities can be found here.

Because a program compiled with AdaptiveCpp appears just like any other program written in vendor-supported programming models (like CUDA or HIP) to vendor-provided software, vendor tools such as profilers or debuggers also work with AdaptiveCpp.

An illustration on how the project fits into the SYCL ecosystem can be found (here).

Performance & benchmarking

See the AdaptiveCpp performance guide.

Installing and using AdaptiveCpp

In order to compile software with AdaptiveCpp, use acpp. acpp can be used like a regular compiler, i.e. you can use acpp -o test test.cpp to compile your application called test.cpp with AdaptiveCpp.

acpp accepts both command line arguments and environment variables to configure its behavior (e.g., to select the target to compile for). See acpp --help for a comprehensive list of options.

For details and instructions on using AdaptiveCpp in CMake projects, please see the documentation on using AdaptiveCpp.

About the project

Development of AdaptiveCpp is currently primarily led by Heidelberg University, with contributions from a growing community. We see AdaptiveCpp as a community-driven project, in contrast to the many vendor-driven heterogeneous compiler efforts. AdaptiveCpp not only serves as a research platform, but is also a solution used in production on machines of all scales, including some of the most powerful supercomputers.

Getting in touch

Join us on Discord! Alternatively, open a discussion or issue in this repository.

Contributing to AdaptiveCpp

We encourage contributions and are looking forward to your pull request! Please have a look at CONTRIBUTING.md. If you need any guidance, please just open an issue and we will get back to you shortly.

If your institution or organization is considering to support the AdaptiveCpp development in some official capacity, we are always happy to discuss collaborations and to broaden the developer community. Please do reach out :-)

If you are a student at Heidelberg University and wish to work on AdaptiveCpp, please get in touch with us. There are various options possible and we are happy to include you in the project :-)

Citing AdaptiveCpp

AdaptiveCpp is a production platform for heterogeneous computing, but also a research project. As such, if you use AdaptiveCpp in your research, we kindly request that you cite one of the following publications, depending on your focus:

  • A general overview, SYCL 2020, performance and the relationship with oneAPI: Aksel Alpay, Bálint Soproni, Holger Wünsche, and Vincent Heuveline. 2022. Exploring the possibility of a hipSYCL-based implementation of oneAPI. In International Workshop on OpenCL (IWOCL'22). Association for Computing Machinery, New York, NY, USA, Article 10, 1–12. https://doi.org/10.1145/3529538.3530005
  • The generic single-pass compiler: Aksel Alpay and Vincent Heuveline. 2023. One Pass to Bind Them: The First Single-Pass SYCL Compiler with Unified Code Representation Across Backends. In Proceedings of the 2023 International Workshop on OpenCL (IWOCL '23). Association for Computing Machinery, New York, NY, USA, Article 7, 1–12. https://doi.org/10.1145/3585341.3585351
  • Our CPU compiler: Joachim Meyer, Aksel Alpay, Sebastian Hack, Holger Fröning, and Vincent Heuveline. 2023. Implementation Techniques for SPMD Kernels on CPUs. In Proceedings of the 2023 International Workshop on OpenCL (IWOCL '23). Association for Computing Machinery, New York, NY, USA, Article 1, 1–12. https://doi.org/10.1145/3585341.3585342
  • The original talk and the idea of implementing SYCL on non-OpenCL backends: Aksel Alpay and Vincent Heuveline. 2020. SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL. In Proceedings of the International Workshop on OpenCL (IWOCL ’20). Association for Computing Machinery, New York, NY, USA, Article 8, 1. DOI:https://doi.org/10.1145/3388333.3388658

(The latter is a talk and available online. Note that some of the content in this talk is outdated by now)

Acknowledgements

We gratefully acknowledge contributions from the community.

Documentation

adaptivecpp's People

Contributors

9prady9 avatar aaronmondal avatar acmnpv avatar al42and avatar atom3333 avatar ax3l avatar bkmgit avatar blinkfrog avatar bmanga avatar breyerml avatar diegoldeneente avatar fknorr avatar fodinabor avatar fxzjshm avatar illuhad avatar invexed avatar mabraham avatar michoumichmich avatar mrzv avatar nazavode avatar nilsfriess avatar nmnobre avatar normallytangent avatar peterth avatar psalz avatar psath avatar sbalint98 avatar tdavidcl avatar tom91136 avatar vilelasagna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

adaptivecpp's Issues

hipsycl_transform_source: Rewrite includes

Currently, during compilation hipsycl_transform_source doesn't rewrite includes. This means that the automatic addition of __host__,__device__ markers won't work for included header files.
Ideally, we shouldn't rewrite all includes to avoid conflicts between clang's standard library headers and and whatever nvcc or hcc decide to do. Instead, best would be to only rewrite those headers where we found missing __host__ or __device__ attributes.

SYCL extension: Extend parallel for ndrange and parallel for work item with mechanism to assert no synchronization

This issue serves to track progress on implementing a SYCL extension to assert to the SYCL implementation that no synchronization

  • happens in parallel for ndrange
  • or independently is required at the end of parallel_for_work_item().

For ndrange, this is beneficial since, if we know that no barriers are expected in the kernel, we can select a more efficient execution model on CPU. Once we have support in the clang plugin for splitting the inner loop over the work items at every barrier() call, this is no longer required.

For parallel_for_work_item(), we may have unnecessary barriers on GPU, for example if the user launches two parallel_for_work_item() without accessing local memory in them. This could be a use case e.g. if the user is just interested in using two different flexible work group sizes for the two calls.
Note that on CPU, hipSYCL never inserts barriers at the the end of parallel_for_work_item() calls since it's inherently unnecessary for our execution model of multithreading over work groups and iterating/vectorizing over work items within a group.

As a solution, a mechanism could be implemented that tells the SYCL implementation to launch the kernel without assuming that synchronization is required. This could either be a template parameter

group.parallel_for_work_item<barrier_mode::nosync>(...);

or a new function variant group.parallel_for_work_item_nosync(...);
(analogously for parallel for ndrange).

syclcc-clang mangle assertion

Hello,

I would like to install hipSYCL on a NVIDIA GPU cluster, but unfortunately, have been running into errors that I struggle to make progress on. I can build and install hipSYCL without any problems, but when trying to compile using cyclcc-clang on any code (for instance, the vector addition example of the README.md) I'm getting mangler assertions.

Commands I use for building:

$ cmake -DCMAKE_INSTALL_PREFIX=/p/project/celerity/local/ -DWITH_CPU_BACKEND=ON -DWITH_CUDA_BACKEND=ON /p/project/celerity/code/hipSYCL/
$ make install

Then, compiling the vector addition example yields an assertion failure of "Invalid mangleName() call on 'structor decl!"':

$ syclcc-clang --hipsycl-gpu-arch=sm_70 -o test test.cpp
clang-9: /dev/shm/swmanage/Clang/9.0.0/GCCcore-8.3.0-rc1-CUDA-10.1.105/llvm-9.0.0-rc1/tools/clang/lib/AST/ItaniumMangle.cpp:4861: virtual void (anonymous namespace)::ItaniumMangleContextImpl::mangleCXXName(const clang::NamedDecl *, llvm::raw_ostream &): Assertion `!isa<CXXConstructorDecl>(D) && !isa<CXXDestructorDecl>(D) && "Invalid mangleName() call on 'structor decl!"' failed.
Stack dump:
0.      Program arguments: /software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9 -cc1 -triple nvptx64-nvidia-cuda-aux-triple x86_64-unknown-linux-gnu -S -disable-free -main-file-name test.cpp -mrelocation-model static -mthread-model posix -mdisable-fp-elim -no-integrated-as -fuse-init-array 
-fcuda-is-device -mlink-builtin-bitcode /software/CUDA/10.1.105/nvvm/libdevice/libdevice.10.bc -target-feature +ptx64 -target-sdk-version=10.1 -target-cpu sm_70 -dwarf-column-info -debugger-tuning=gdb -resource-dir /software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/lib/clang/9.0.0 -internal-isystem /
software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/lib/clang/9.0.0/include/cuda_wrappers -internal-isystem /software/CUDA/10.1.105/include -include __clang_cuda_runtime_wrapper.h -D HIPSYCL_CLANG -I /p/project/celerity/local/bin/../include/ -I /p/project/celerity/local/bin/../include/hipSYCL/ -I/softw
are/gettext/0.19.8.1-GCCcore-8.3.0/include -I/software/cURL/7.64.1-GCCcore-8.3.0/include -I/software/Boost/1.69.0-gompi-2019a-Python-3.6.8/include -I/software/ICU/64.1-GCCcore-8.3.0/include -I/software/Python/3.6.8-GCCcore-8.3.0/include -I/software/libjpeg-turbo/2.0.2-GCCcore-8.3.0/include -I/software/l
ibspatialindex/1.9.0-GCCcore-8.3.0/include -I/software/gflags/2.2.2-GCCcore-8.3.0/include -I/software/protobuf/3.7.1-GCCcore-8.3.0/include -I/software/PostgreSQL/11.2-GCCcore-8.3.0/include -I/software/Java/1.8/include -I/software/libyaml/0.2.2-GCCcore-8.3.0/include -I/software/libxslt/1.1.33-GCCcore-8.3
.0/include -I/software/Tk/8.6.9-GCCcore-8.3.0/include -I/software/SQLite/3.27.2-GCCcore-8.3.0/include -I/software/Tcl/8.6.9-GCCcore-8.3.0/include -I/software/libreadline/8.0-GCCcore-8.3.0/include -I/software/OpenMPI/4.0.1-GCC-8.3.0/include -I/software/UCX/1.5.1-GCCcore-8.3.0/include -I/software/hwloc/2.
0.3-GCCcore-8.3.0/include -I/software/X11/20190311-GCCcore-8.3.0/include -I/software/fontconfig/2.13.1-GCCcore-8.3.0/include -I/software/util-linux/2.33.1-GCCcore-8.3.0/include -I/software/ncurses/6.1-GCCcore-8.3.0/include -I/software/freetype/2.10.0-GCCcore-8.3.0/include/freetype2 -I/software/libpng/1.
6.36-GCCcore-8.3.0/include -I/software/expat/2.2.6-GCCcore-8.3.0/include -I/software/bzip2/1.0.6-GCCcore-8.3.0/include -I/software/libxml2/2.9.9-GCCcore-8.3.0/include/libxml2 -I/software/libxml2/2.9.9-GCCcore-8.3.0/include -I/software/XZ/5.2.4-GCCcore-8.3.0/include -I/software/numactl/2.0.12-GCCcore-8.3
.0/include -I/software/zlib/1.2.11-GCCcore-8.3.0/include -I/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/include -I/software/CUDA/10.1.105/nvvm/include -I/software/CUDA/10.1.105/extras/CUPTI/include -I/software/CUDA/10.1.105/include -I/software/GMP/6.1.2-GCCcore-8.3.0/include -I/software/binutil
s/2.32-GCCcore-8.3.0/include -I/software/GCCcore/8.3.0/include -I/gpfs/software/global/jscslurm/opt/slurm-jsc1711/include -internal-isystem /software/GCCcore/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0 -internal-isystem /software/GCCcore/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/
../../../../include/c++/8.3.0/x86_64-pc-linux-gnu -internal-isystem /software/GCCcore/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward -internal-isystem /software/GCCcore/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0 -internal-isystem /software/GCCco
re/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/x86_64-pc-linux-gnu -internal-isystem /software/GCCcore/8.3.0/lib/gcc/x86_64-pc-linux-gnu/8.3.0/../../../../include/c++/8.3.0/backward -internal-isystem /usr/local/include -internal-isystem /software/Clang/9.0.0-GCCcore-8.3.0-rc1-C
UDA-10.1.105/lib/clang/9.0.0/include -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/lib/clang/9.0.0/include -internal-externc-isystem /include -internal-externc-isystem
 /usr/include -std=c++14 -fdeprecated-macro -fno-dwarf-directory-asm -fdebug-compilation-dir /p/project/celerity/sandbox -ferror-limit 19 -fmessage-length 0 -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -load /p/project/celerity/local/bin/../lib/libhipSYC
L_clang.so -o /tmp/test-20701f.s -x cuda test.cpp
1.      /software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/lib/clang/9.0.0/include/__clang_cuda_builtin_vars.h:63:1: current parser token 'struct'
 #0 0x00000000018249c4 PrintStackTraceSignalHandler(void*) (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x18249c4)
 #1 0x000000000182267e llvm::sys::RunSignalHandlers() (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x182267e)
 #2 0x0000000001824dd8 SignalHandler(int) (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x1824dd8)
 #3 0x00002b43a95ff5d0 __restore_rt (/lib64/libpthread.so.0+0xf5d0)
 #4 0x00002b43aa17a207 raise (/lib64/libc.so.6+0x36207)
 #5 0x00002b43aa17b8f8 abort (/lib64/libc.so.6+0x378f8)
 #6 0x00002b43aa173026 __assert_fail_base (/lib64/libc.so.6+0x2f026)
 #7 0x00002b43aa1730d2 (/lib64/libc.so.6+0x2f0d2)
 #8 0x00000000033d6925 (anonymous namespace)::ItaniumMangleContextImpl::mangleCXXName(clang::NamedDecl const*, llvm::raw_ostream&) (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x33d6925)
 #9 0x00002b43aa54e2f4 hipsycl::FrontendASTVisitor::getMangledName[abi:cxx11](clang::FunctionDecl*) (/p/project/celerity/local/bin/../lib/libhipSYCL_clang.so+0x3d2f4)
#10 0x00002b43aa54db0c hipsycl::FrontendASTVisitor::processFunctionDecl(clang::FunctionDecl*) (/p/project/celerity/local/bin/../lib/libhipSYCL_clang.so+0x3cb0c)
#11 0x00002b43aa547bb3 clang::RecursiveASTVisitor<hipsycl::FrontendASTVisitor>::TraverseDecl(clang::Decl*) (/p/project/celerity/local/bin/../lib/libhipSYCL_clang.so+0x36bb3)
#12 0x00002b43aa549557 clang::RecursiveASTVisitor<hipsycl::FrontendASTVisitor>::TraverseDecl(clang::Decl*) (/p/project/celerity/local/bin/../lib/libhipSYCL_clang.so+0x38557)
#13 0x00002b43aa5473cb hipsycl::FrontendASTConsumer::HandleTopLevelDecl(clang::DeclGroupRef) (/p/project/celerity/local/bin/../lib/libhipSYCL_clang.so+0x363cb)
#14 0x0000000001f52bcc clang::MultiplexConsumer::HandleTopLevelDecl(clang::DeclGroupRef) (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x1f52bcc)
#15 0x00000000028b7624 clang::ParseAST(clang::Sema&, bool, bool) (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x28b7624)
#16 0x0000000001f1b4b8 clang::FrontendAction::Execute() (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x1f1b4b8)
#17 0x0000000001ebe7c2 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x1ebe7c2)
#18 0x0000000001fba5dd clang::ExecuteCompilerInvocation(clang::CompilerInstance*) (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x1fba5dd)
#19 0x00000000008de1d0 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x8de1d0)
#20 0x00000000008dc225 main (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x8dc225)
#21 0x00002b43aa1663d5 __libc_start_main (/lib64/libc.so.6+0x223d5)
#22 0x00000000008d921d _start (/software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin/clang-9+0x8d921d)
clang-9: error: unable to execute command: Aborted
clang-9: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 9.0.0 
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /software/Clang/9.0.0-GCCcore-8.3.0-rc1-CUDA-10.1.105/bin
clang-9: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.

I am not entirely sure what may be causing it, and was wondering whether someone else has seen this sort of issue before? I'm suspecting some sort of incompatibility, but then it builds fine without any issues.

swizzled vector implementation is incomplete

The detail::vec_swizzle<> class at the moment lacks most of the operator overloads that vec<> provides. This causes most operator expressions to fail if only vec_swizzle objects are involved in the operation. The missing operators should therefore be implemented.

Permission issue in manual build

Hello,
I'm trying to build hipSYCL manually following the steps in README file.

git clone --recurse-submodules https://github.com/illuhad/hipSYCL
cd hipSYCL
mkdir build
cd build
cmake -DCMAKE_INSTALL_PREFIX=. ..

and this is the output

-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Warning (dev) at CMakeLists.txt:17 (set):
  implicitly converting 'INTEGER' to 'STRING' type.
This warning is for project developers.  Use -Wno-dev to suppress it.

-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Found CUDA: /software/nvidia/cuda/10.0 (found version "10.0")
-- Boost version: 1.57.0
-- Found the following Boost libraries:
--   filesystem
--   system
-- Boost version: 1.57.0
-- Configuring done
-- Generating done
-- Build files have been written to: /path/hipSYCL/build

then I tried to build it with:
make install
it causes fatal error:

make install
Scanning dependencies of target hipSYCL_cuda
[  2%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o
syclcc fatal error: [Errno 13] Permission denied: '/path/hipSYCL/bin/'
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o] Error 255
make[1]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/all] Error 2
make: *** [all] Error 2

however the permissions for source and build directory have not changed since their creation and also /path/hipSYCL/bin/ exists and contains syclcc syclcc-clang.

Implement 2d/3d copy functions in hipCPU

As discussed in PR #39, hipCPU at the moment lacks implementations for 2d and 3d copies, which causes linking errors for SYCL applications that use the CPU backend and use the SYCL explicit copy API.

We should therefore implement these functions in hipCPU.

(Note: I'm opening an issue in hipSYCL instead of hipCPU since hipSYCL is at the moment pretty much the only user of hipCPU, and this allows us to have a better overview over open tasks)

OpenCL interoperability for NVIDIA GPUs with pocl's cuda backend

One of the major caveats of hipSYCL is that doesn't support the OpenCL interoperability layer required by the spec. This is because it doesn't build on OpenCL but on HIP/CUDA instead.
We could however have interoperability to pocl's cuda backend. This OpenCL implementation internally makes use of CUDA objects, with which hipSYCL could then interact.
We would then get OpenCL interoperability at least when using nvidia GPUs. It won't work with NVIDIA's official OpenCL implementation, but that implementation doesn't support SPIR and is hence a second-class citizen in the SYCL world anyway :)

Fatal error when using logf in kernel

With the following source:

#include <math.h>
#include <stdio.h>

#include <CL/sycl.hpp>

namespace sycl = cl::sycl;

class InitB;

int main() {
	sycl::queue q;
	sycl::buffer<float, 1> b{{64}};
	q.submit([&](sycl::handler& cgh) {
		auto B = b.get_access<sycl::access::mode::write>(cgh);
		cgh.parallel_for<InitB>(sycl::range<1>{64}, [=](sycl::id<1> index) {
			B[index] = logf(1 + index[0]);
		});
	});
	auto B = b.get_access<sycl::access::mode::read>();
	for (size_t i = 0; i < 64; ++i) {
		printf("B[%zu] = %f\n", i, B[i]);
	}
	return 0;
}

I get the following error:

$ /opt/hipSYCL/bin/syclcc-clang test-log.cpp -o test-log --hipsycl-platform=cuda --hipsycl-gpu-arch=sm_62 --std=c++17 -Wall -Wextra -Wpedantic -lm

[snip]

fatal error: error in backend: Undefined external symbol "logf"
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 8.0.0-3~ubuntu18.04.1 (tags/RELEASE_800/final)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
clang: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/test-log-204a3f.cu
clang: note: diagnostic msg: /tmp/test-log-40d41f.cu
clang: note: diagnostic msg: /tmp/test-log-204a3f.sh
clang: note: diagnostic msg: 

********************

Replacing logf with sinf or cosf produces a different message like this one:

fatal error: error in backend: Cannot select: 0x23214a40: f32 = fsin 0x23214150
  0x23214150: f32 = uint_to_fp 0x23221628
    0x23221628: i64 = add 0x232219d0, Constant:i64<1>
      0x232219d0: i64,ch = load<(dereferenceable load 8 from %ir.9)> 0x23221218:1, 0x23221218, undef:i64
        0x23221218: i64,ch,glue = NVPTXISD::ProxyReg 0x23221148, 0x23220fa8, 0x23221148:1
          0x23220fa8: i64,ch,glue = NVPTXISD::LoadParam<(load 8)> 0x23220ed8, Constant:i32<1>, Constant:i32<0>, 0x23220ed8:1
            0x2321cdb8: i32 = Constant<1>
            0x23221420: i32 = Constant<0>
            0x23220ed8: ch,glue = NVPTXISD::CallArgEnd 0x23220e08, Constant:i32<1>, 0x23220e08:1
              0x2321cdb8: i32 = Constant<1>
              0x23220e08: ch,glue = NVPTXISD::LastCallArg 0x23220da0, Constant:i32<1>, Constant:i32<1>, 0x23220da0:1
                0x2321cdb8: i32 = Constant<1>
                0x2321cdb8: i32 = Constant<1>
                0x23220da0: ch,glue = NVPTXISD::CallArg 0x2321d3d0, Constant:i32<1>, Constant:i32<0>, 0x2321d3d0:1
                  0x2321cdb8: i32 = Constant<1>
                  0x23221420: i32 = Constant<0>
                  0x2321d3d0: ch,glue = NVPTXISD::CallArgBegin 0x2321d300, 0x2321d300:1

          0x23221148: ch,glue = callseq_end 0x23220fa8:1, TargetConstant:i64<12>, TargetConstant:i64<13>, 0x23220fa8:2
            0x23224648: i64 = TargetConstant<12>
            0x23221078: i64 = TargetConstant<13>
            0x23220fa8: i64,ch,glue = NVPTXISD::LoadParam<(load 8)> 0x23220ed8, Constant:i32<1>, Constant:i32<0>, 0x23220ed8:1
              0x2321cdb8: i32 = Constant<1>
              0x23221420: i32 = Constant<0>
              0x23220ed8: ch,glue = NVPTXISD::CallArgEnd 0x23220e08, Constant:i32<1>, 0x23220e08:1
                0x2321cdb8: i32 = Constant<1>
                0x23220e08: ch,glue = NVPTXISD::LastCallArg 0x23220da0, Constant:i32<1>, Constant:i32<1>, 0x23220da0:1
                  0x2321cdb8: i32 = Constant<1>
                  0x2321cdb8: i32 = Constant<1>
                  0x23220da0: ch,glue = NVPTXISD::CallArg 0x2321d3d0, Constant:i32<1>, Constant:i32<0>, 0x2321d3d0:1



        0x23221a38: i64 = undef
      0x23221c40: i64 = Constant<1>
In function: _ZZZ4mainENK3$_0clERN2cl4sycl7handlerEENKUlNS1_2idILi1EEEE_clES5_
clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
clang version 8.0.0-3~ubuntu18.04.1 (tags/RELEASE_800/final)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
clang: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/test-log-3eeeb4.cu
clang: note: diagnostic msg: /tmp/test-log-1259b2.cu
clang: note: diagnostic msg: /tmp/test-log-3eeeb4.sh
clang: note: diagnostic msg: 

********************

Info about my system, copied from an earlier issue:

  • distro is Ubuntu 18.04 (on aarch64, Jetson TX2 to be precise)
  • Clang: 8.0.0
  • CUDA: 10.0
  • Python: 3.6.8 (in case you want to know :P)
  • Boost: 1.65

Forward declared template functions not correctly pruned

Consider this simple program:

template<typename T>
void foo(T);

template <typename T>
void bar(T) {}

template<typename T>
void foo(T value) {
    bar(value);
}

int main() {
    return 0;
}                                                              

After the rewrite and transformation passes, nvcc complains with #endif without #if and unterminated #if. The temporary *.cu file reveals the reason:

# 1 "<built-in>"
# 1 "/tmp/forward_decl.cpp"
template<typename T>
void foo(T)
#endif
;

template <typename T>
void bar(T) {}

template<typename T>
void foo(T value) ;
#if 0 // -- definition stripped by hipsycl_transform_source
{
        bar(value);
}

int main() {
        return 0;
}

Apparently, according to the Clang AST, the beginning of the body is at the definition, while the end is at the declaration. This of course causes a complete mess and breaks compilation of larger headers (e.g. Boost). I've been told that his might be annoying to fix, so my question is: Why are you pruning these uninstantiated templates anyway? Is this just an optimization or is there a practical reason?

More testing with complex software required

hipSYCL could use more testing. triSYCL contains a lot of tests, some of which may also be interesting test cases for hipSYCL. hipSYCL should also be tested with more complex pieces of software such as the sycl parallel STL or sycl blas.

Compilation fails at standard macros

Hey,
I'm currently trying to compile a bunch of tests with hipSYCL on Ubuntu 18.04 with CUDA 10 with syclcc-clang and clang 8.
I stumbled upon a seemingly banal issue.. It looks like the preprocessing is doing something weird.. Is there special preprocessing happening in hipSYCL or is it just using standard clangs's preprocessing?

here the error:

warning: argument unused during compilation: '-L/home/user/Documents/joachim-ba/bachelor-thesis-gpgpu/Project/external/bin/../lib' [-Wunused-command-line-argument]
warning: argument unused during compilation: '-L/home/user/Documents/joachim-ba/bachelor-thesis-gpgpu/Project/external/bin/../lib' [-Wunused-command-line-argument]
In file included from /home/user/Documents/joachim-ba/bachelor-thesis-gpgpu/Project/test/blackbox/src/test_sycl.cpp:1:
/home/user/Documents/joachim-ba/bachelor-thesis-gpgpu/Project/test/./catch.hpp:298:38: error: token is not a valid binary operator in a preprocessor subexpression
#  if (1)/*__has_include(<variant>)*/# 297 "/home/user/Documents/joachim-ba/bachelor-thesis-gpgpu/Project/test/./catch.hpp" 3
      ~~~                            ^
In file included from /home/user/Documents/joachim-ba/bachelor-thesis-gpgpu/Project/test/performance/src/test_sycl_performance.cpp:1:
/home/user/Documents/joachim-ba/bachelor-thesis-gpgpu/Project/test/./catch.hpp:298:38: error: token is not a valid binary operator in a preprocessor subexpression
#  if (1)/*__has_include(<variant>)*/# 297 "/home/user/Documents/joachim-ba/bachelor-thesis-gpgpu/Project/test/./catch.hpp" 3
      ~~~                            ^
1 error generated.
Error while processing /tmp/hipsycl_80fad85028f1e033/hipsycl_f27f8ed03d95abc8.cpp.
syclcc fatal error: Error during source-to-source transformation, aborting compilation.
test/blackbox/CMakeFiles/test-gpgpuevaluation-blackbox.dir/build.make:182: recipe for target 'test/blackbox/CMakeFiles/test-gpgpuevaluation-blackbox.dir/src/test_sycl.cpp.o' failed
make[2]: *** [test/blackbox/CMakeFiles/test-gpgpuevaluation-blackbox.dir/src/test_sycl.cpp.o] Error 255
CMakeFiles/Makefile2:176: recipe for target 'test/blackbox/CMakeFiles/test-gpgpuevaluation-blackbox.dir/all' failed
make[1]: *** [test/blackbox/CMakeFiles/test-gpgpuevaluation-blackbox.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
1 error generated.
Error while processing /tmp/hipsycl_5e235c1b499ea1ac/hipsycl_f027440edd185992.cpp.
syclcc fatal error: Error during source-to-source transformation, aborting compilation.
test/performance/CMakeFiles/test-gpgpuevaluation-performance.dir/build.make:182: recipe for target 'test/performance/CMakeFiles/test-gpgpuevaluation-performance.dir/src/test_sycl_performance.cpp.o' failed
make[2]: *** [test/performance/CMakeFiles/test-gpgpuevaluation-performance.dir/src/test_sycl_performance.cpp.o] Error 255
CMakeFiles/Makefile2:232: recipe for target 'test/performance/CMakeFiles/test-gpgpuevaluation-performance.dir/all' failed
make[1]: *** [test/performance/CMakeFiles/test-gpgpuevaluation-performance.dir/all] Error 2
Makefile:94: recipe for target 'all' failed
make: *** [all] Error 2

here the relevant line from Catch2 starting with line 297:

#if defined(__has_include)
#  if __has_include(<variant>) && defined(CATCH_CPP17_OR_GREATER)

#    if defined(__clang__) && (__clang_major__ < 8)
       // work around clang bug with libstdc++ https://bugs.llvm.org/show_bug.cgi?id=31852
       // fix should be in clang 8, workaround in libstdc++ 8.2
#      include <ciso646>
#      if defined(__GLIBCXX__) && defined(_GLIBCXX_RELEASE) && (_GLIBCXX_RELEASE < 9)
#        define CATCH_CONFIG_NO_CPP17_VARIANT
#      else
#        define CATCH_INTERNAL_CONFIG_CPP17_VARIANT
#      endif // defined(__GLIBCXX__) && defined(_GLIBCXX_RELEASE) && (_GLIBCXX_RELEASE < 9)
#    else
#      define CATCH_INTERNAL_CONFIG_CPP17_VARIANT
#    endif // defined(__clang__) && (__clang_major__ < 8)
#  endif // __has_include(<variant>) && defined(CATCH_CPP17_OR_GREATER)
#endif // __has_include

So.. do you have a clue what happens / where I should investigate?

Pruned definition of class template function causes re-declaration error

In this example

class foo {
    template<typename T>
    void bar(T value);
};

template <typename T>
void baz(T) {}

template <typename T>
void foo::bar(T value) {
    baz(value);
}

int main() {
    return 0;
}

nvcc will throw a re-declaration error of foo::bar, as only the declaration is left after pruning. I think the correct approach in this case would be to comment out the entire definition outside the class.

Windows Support

As of now, hipSyCL is the only option for building and running SyCL codes with CUDA backend.
It would be of high interest to get this working on Windows because this opens possibitilies to writing cross-platform compute kernels with something other than OpenCL which has many downsides. (Nvidia tooling is nowhere as good as CUDA, Nvidia only supports OpenCL 1.2, etc..)

This ticket proposes two things:

  • Verify and document that hipSyCL with CUDA backend successfully compiles and runs on Windows
  • If there's something preventing hipSyCL from building and running on Windows, create a PR to fix that issue.

For #2 I would like to make a PR eventually once I get this working on my machine.

Install location of syclcc.json -> /etc/syclcc.json?

Installing hipSYCL to e.g. "/usr/local/" results in Gentoo Linux complaining about this as a non standard location. If I install to /usr, then syclcc.json would go to /usr/etc. Which I think is also not the desired location. Is it possible to install to "/etc/syclcc.json" in general?

Compiler error when indexing sycl::accessor with dimensions > 1

When compiling this code:

$ /opt/hipSYCL/bin/syclcc-clang mandelbrot.cpp -o mandelbrot --hipsycl-platform=cuda --hipsycl-gpu-arch=sm_62 --std=c++17 -Wall -Wextra -Wpedantic

[snip]

In file included from mandelbrot.cpp:4:
In file included from /opt/hipSYCL/bin/../include/CL/sycl.hpp:42:
In file included from /opt/hipSYCL/bin/../include/CL/sycl/queue.hpp:41:
In file included from /opt/hipSYCL/bin/../include/CL/sycl/handler.hpp:37:
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:528:76: error: calling a private constructor of class 'cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read,
      cl::sycl::access::target::host_buffer, cl::sycl::access::placeholder::false_t>'
    accessor<dataT, dimensions-1, accessmode, accessTarget, isPlaceholder> sub_accessor;
                                                                           ^
mandelbrot.cpp:74:19: note: in instantiation of function template specialization 'cl::sycl::accessor<unsigned char, 2, cl::sycl::access::mode::read, cl::sycl::access::target::host_buffer,
      cl::sycl::access::placeholder::false_t>::operator[]<2, void>' requested here
                        uint8_t idx = I[y][x];
                                       ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:598:3: note: declared private here
  accessor(){}
  ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:529:18: error: '_range' is a private member of 'cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read,
      cl::sycl::access::target::host_buffer, cl::sycl::access::placeholder::false_t>'
    sub_accessor._range = detail::range::omit_first_dimension(this->_range);
                 ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:602:21: note: declared private here
  range<dimensions> _range;
                    ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:530:18: error: '_buffer_range' is a private member of 'cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read,
      cl::sycl::access::target::host_buffer, cl::sycl::access::placeholder::false_t>'
    sub_accessor._buffer_range = _buffer_range;
                 ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:601:21: note: declared private here
  range<dimensions> _buffer_range;
                    ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:530:32: error: no viable overloaded '='
    sub_accessor._buffer_range = _buffer_range;
    ~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~
/opt/hipSYCL/bin/../include/CL/sycl/detail/../id.hpp:42:7: note: candidate function (the implicit move assignment operator) not viable: no known conversion from 'const range<2 aka 2>'
      to 'range<1>' for 1st argument
class range;
      ^
/opt/hipSYCL/bin/../include/CL/sycl/detail/../range.hpp:42:7: note: candidate function (the implicit copy assignment operator) not viable: no known conversion from 'const range<2 aka 2>'
      to 'const range<1>' for 1st argument
class range {
      ^
In file included from mandelbrot.cpp:4:
In file included from /opt/hipSYCL/bin/../include/CL/sycl.hpp:42:
In file included from /opt/hipSYCL/bin/../include/CL/sycl/queue.hpp:41:
In file included from /opt/hipSYCL/bin/../include/CL/sycl/handler.hpp:37:
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:531:18: error: '_ptr' is a private member of 'cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read,
      cl::sycl::access::target::host_buffer, cl::sycl::access::placeholder::false_t>'
    sub_accessor._ptr = this->_ptr + index * sub_accessor._range.size();
                 ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:600:16: note: declared private here
  pointer_type _ptr;
               ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:531:59: error: '_range' is a private member of 'cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read,
      cl::sycl::access::target::host_buffer, cl::sycl::access::placeholder::false_t>'
    sub_accessor._ptr = this->_ptr + index * sub_accessor._range.size();
                                                          ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:602:21: note: declared private here
  range<dimensions> _range;
                    ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:598:3: error: constructor for 'cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read, cl::sycl::access::target::host_buffer,
      cl::sycl::access::placeholder::false_t>' must explicitly initialize the base class 'detail::accessor_base<(target)2018, (placeholder)0>' which does not have a default constructor
  accessor(){}
  ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:528:76: note: in instantiation of member function 'cl::sycl::accessor<unsigned char, 1, cl::sycl::access::mode::read,
      cl::sycl::access::target::host_buffer, cl::sycl::access::placeholder::false_t>::accessor' requested here
    accessor<dataT, dimensions-1, accessmode, accessTarget, isPlaceholder> sub_accessor;
                                                                           ^
mandelbrot.cpp:74:19: note: in instantiation of function template specialization 'cl::sycl::accessor<unsigned char, 2, cl::sycl::access::mode::read, cl::sycl::access::target::host_buffer,
      cl::sycl::access::placeholder::false_t>::operator[]<2, void>' requested here
                        uint8_t idx = I[y][x];
                                       ^
/opt/hipSYCL/bin/../include/CL/sycl/accessor.hpp:197:7: note: 'cl::sycl::detail::accessor_base<cl::sycl::access::target::host_buffer, cl::sycl::access::placeholder::false_t>' declared here
class accessor_base<access::target::host_buffer, IsPlaceholder>
      ^

For the errors regarding private members, I think adding a templated friend declaration inside the definition of accessor is enough:

template<typename dataT2, int dimensions2, access::mode accessmode2, access::target accessTarget2, access::placeholder isPlaceholder2>
     friend class accessor;

The assigment on line 530 of accessor.hpp also needs to explicitly convert _buffer_range into the appropriate type. I'm not sure how to fix the error about the base class initialization, though.

Building sample program fails: undefined reference to symbol '_ZTVN10__cxxabiv117__class_type_infoE@@CXXABI_1.3'

  • distro is Ubuntu 18.04 (on aarch64, Jetson TX2 to be precise)
  • Clang: 8.0.0
  • CUDA: 10.0
  • Python: 3.6.8 (in case you want to know :P)
  • Boost: 1.65

I built hipSYCL successfully, but when I try to compile a program with syclcc-clang, I get the linker error below:

$ /opt/hipSYCL/bin/syclcc-clang test.cpp -o test --hipsycl-platform=cuda --hipsycl-gpu-arch=sm_62 -v
clang version 8.0.0-3~ubuntu18.04.1 (tags/RELEASE_800/final)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/aarch64-linux-gnu/7
Found candidate GCC installation: /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/bin/../lib/gcc/aarch64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/7.4.0
Found candidate GCC installation: /usr/lib/gcc/aarch64-linux-gnu/8
Selected GCC installation: /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /usr/local/cuda-10.0, version 10.0
 "/usr/lib/llvm-8/bin/clang" -cc1 -triple nvptx64-nvidia-cuda -aux-triple aarch64-unknown-linux-gnu -S -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.cpp -mrelocation-model static -mthread-model posix -mdisable-fp-elim -no-integrated-as -fuse-init-array -fcuda-is-device -mlink-builtin-bitcode /usr/local/cuda-10.0/nvvm/libdevice/libdevice.10.bc -target-feature +ptx61 -target-cpu sm_62 -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /usr/lib/llvm-8/lib/clang/8.0.0 -internal-isystem /usr/lib/llvm-8/lib/clang/8.0.0/include/cuda_wrappers -internal-isystem /usr/local/cuda-10.0/include -include __clang_cuda_runtime_wrapper.h -D HIPSYCL_CLANG -I /opt/hipSYCL/bin/../include/ -I /opt/hipSYCL/bin/../include/hipSYCL/ -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward -internal-isystem /usr/include/clang/8.0.0/include/ -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward -internal-isystem /usr/include/clang/8.0.0/include/ -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-8/lib/clang/8.0.0/include -internal-externc-isystem /usr/include/aarch64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-8/lib/clang/8.0.0/include -internal-externc-isystem /usr/include/aarch64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++14 -fdeprecated-macro -fno-dwarf-directory-asm -fdebug-compilation-dir /home/kozet/sycl-test -ferror-limit 19 -fmessage-length 190 -fno-signed-char -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -load /opt/hipSYCL/bin/../lib/libhipSYCL_clang.so -o /tmp/test-6d28b3.s -x cuda test.cpp
clang -cc1 version 8.0.0 based upon LLVM 8.0.0 default target aarch64-unknown-linux-gnu
ignoring nonexistent directory "/include"
ignoring nonexistent directory "/include"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward"
ignoring duplicate directory "/usr/include/clang/8.0.0/include"
ignoring duplicate directory "/usr/include/clang/8.0.0/include"
ignoring duplicate directory "/usr/local/include"
ignoring duplicate directory "/usr/include/clang/8.0.0/include"
ignoring duplicate directory "/usr/include/aarch64-linux-gnu"
ignoring duplicate directory "/usr/include"
#include "..." search starts here:
#include <...> search starts here:
 /opt/hipSYCL/bin/../include
 /opt/hipSYCL/bin/../include/hipSYCL
 /usr/lib/llvm-8/lib/clang/8.0.0/include/cuda_wrappers
 /usr/local/cuda-10.0/include
 /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0
 /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0
 /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward
 /usr/include/clang/8.0.0/include
 /usr/local/include
 /usr/include/aarch64-linux-gnu
 /usr/include
End of search list.
 "/usr/local/cuda-10.0/bin/ptxas" -m64 -O0 -v --gpu-name sm_62 --output-file /tmp/test-50e5c7.o /tmp/test-6d28b3.s
ptxas info    : 0 bytes gmem
ptxas info    : Compiling entry function '__hipsycl_kernel_$vector_add' for 'sm_62'
ptxas info    : Function properties for __hipsycl_kernel_$vector_add
    296 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 33 registers, 448 bytes cmem[0]
ptxas info    : Function properties for _ZN2cl4sycl2idILi1EEC1ERKS2_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl2idILi1EEC1Ev
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl2idILi1EEC1ILb0EEERKNS0_4itemILi1EXT_EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl2idILi1EEC1ILi1EvEEm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl2idILi1EEC2ERKS2_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl2idILi1EEC2Ev
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl2idILi1EEC2ILb0EEERKNS0_4itemILi1EXT_EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl2idILi1EEC2ILi1EvEEm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl4itemILi1ELb0EEC1ERKNS0_2idILi1EEERKNS0_5rangeILi1EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl4itemILi1ELb0EEC2ERKNS0_2idILi1EEERKNS0_5rangeILi1EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail12device_arrayImLm1EEixEm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail13get_global_idILi1EEENS0_2idIXT_EEEv
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail15get_global_id_xEv
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail8dispatch16item_is_in_rangeILi1ELb0EEEbRKNS0_4itemIXT_EXT0_EEERKNS0_5rangeIXT_EEERKNS0_2idIXT_EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail8dispatch6device20get_global_id_helperILi1EEENS0_2idIXT_EEEv
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail9item_baseILi1EEC2ERKNS0_2idILi1EEERKNS0_5rangeILi1EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail9linear_idILi1EE3getERKNS0_2idILi1EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail9linear_idILi1EE3getERKNS0_2idILi1EEERKNS0_5rangeILi1EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZN2cl4sycl6detail9make_itemILi1EEENS0_4itemIXT_ELb0EEERKNS0_2idIXT_EEERKNS0_5rangeIXT_EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZNK2cl4sycl2idILi1EE3getEi
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZNK2cl4sycl2idILi1EEixEi
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZNK2cl4sycl5rangeILi1EE3getEi
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZNK2cl4sycl6detail12device_arrayImLm1EEixEm
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZNK2cl4sycl6detail9item_baseILi1EE6get_idEi
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZNK2cl4sycl8accessorIfLi1ELNS0_6access4modeE1024ELNS2_6targetE2014ELNS2_11placeholderE0EEixILi1ELS3_1024EvEEfNS0_2idILi1EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZNK2cl4sycl8accessorIfLi1ELNS0_6access4modeE1025ELNS2_6targetE2014ELNS2_11placeholderE0EEixILS3_1025ELi1EvEERfNS0_2idILi1EEE
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Function properties for _ZZZ3addRN2cl4sycl5queueERKSt6vectorIfSaIfEES7_ENK3$_0clERNS0_7handlerEENKUlNS0_2idILi1EEEE_clESC_
    0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
 "/usr/local/cuda-10.0/bin/fatbinary" --cuda -64 --create /tmp/test-d4e11e.fatbin --image=profile=sm_62,file=/tmp/test-50e5c7.o --image=profile=compute_62,file=/tmp/test-6d28b3.s
 "/usr/lib/llvm-8/bin/clang" -cc1 -triple aarch64-unknown-linux-gnu -aux-triple nvptx64-nvidia-cuda -emit-obj -mrelax-all -disable-free -disable-llvm-verifier -discard-value-names -main-file-name test.cpp -mrelocation-model static -mthread-model posix -mdisable-fp-elim -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu generic -target-feature +neon -target-abi aapcs -fallow-half-arguments-and-returns -dwarf-column-info -debugger-tuning=gdb -v -resource-dir /usr/lib/llvm-8/lib/clang/8.0.0 -internal-isystem /usr/lib/llvm-8/lib/clang/8.0.0/include/cuda_wrappers -internal-isystem /usr/local/cuda-10.0/include -include __clang_cuda_runtime_wrapper.h -D HIPSYCL_CLANG -I /opt/hipSYCL/bin/../include/ -I /opt/hipSYCL/bin/../include/hipSYCL/ -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward -internal-isystem /usr/include/clang/8.0.0/include/ -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0 -internal-isystem /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward -internal-isystem /usr/include/clang/8.0.0/include/ -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-8/lib/clang/8.0.0/include -internal-externc-isystem /usr/include/aarch64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-8/lib/clang/8.0.0/include -internal-externc-isystem /usr/include/aarch64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -std=c++14 -fdeprecated-macro -fdebug-compilation-dir /home/kozet/sycl-test -ferror-limit 19 -fmessage-length 190 -fno-signed-char -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -load /opt/hipSYCL/bin/../lib/libhipSYCL_clang.so -o /tmp/test-6a824a.o -x cuda test.cpp -fcuda-include-gpubinary /tmp/test-d4e11e.fatbin -faddrsig
clang -cc1 version 8.0.0 based upon LLVM 8.0.0 default target aarch64-unknown-linux-gnu
ignoring nonexistent directory "/include"
ignoring nonexistent directory "/include"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0"
ignoring duplicate directory "/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward"
ignoring duplicate directory "/usr/include/clang/8.0.0/include"
ignoring duplicate directory "/usr/include/clang/8.0.0/include"
ignoring duplicate directory "/usr/local/include"
ignoring duplicate directory "/usr/include/clang/8.0.0/include"
ignoring duplicate directory "/usr/include/aarch64-linux-gnu"
ignoring duplicate directory "/usr/include"
#include "..." search starts here:
#include <...> search starts here:
 /opt/hipSYCL/bin/../include
 /opt/hipSYCL/bin/../include/hipSYCL
 /usr/lib/llvm-8/lib/clang/8.0.0/include/cuda_wrappers
 /usr/local/cuda-10.0/include
 /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0
 /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/aarch64-linux-gnu/c++/7.4.0
 /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../../include/c++/7.4.0/backward
 /usr/include/clang/8.0.0/include
 /usr/local/include
 /usr/include/aarch64-linux-gnu
 /usr/include
End of search list.
 "/usr/bin/ld" -EL -z relro --hash-style=gnu --build-id --eh-frame-hdr -m aarch64linux -dynamic-linker /lib/ld-linux-aarch64.so.1 -o test /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../aarch64-linux-gnu/crt1.o /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../aarch64-linux-gnu/crti.o /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/crtbegin.o -L/usr/local/cuda-10.0/lib64/ -L/opt/hipSYCL/bin/../lib/ -L/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0 -L/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../aarch64-linux-gnu -L/lib/aarch64-linux-gnu -L/usr/lib/aarch64-linux-gnu -L/usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../.. -L/usr/lib/llvm-8/bin/../lib -L/lib -L/usr/lib -lcudart -lhipSYCL_cuda /tmp/test-6a824a.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/crtend.o /usr/bin/../lib/gcc/aarch64-linux-gnu/7.4.0/../../../aarch64-linux-gnu/crtn.o
/usr/bin/ld: /opt/hipSYCL/bin/../lib//libhipSYCL_cuda.so: undefined reference to symbol '_ZTVN10__cxxabiv117__class_type_infoE@@CXXABI_1.3'
//usr/lib/aarch64-linux-gnu/libstdc++.so.6: error adding symbols: DSO missing from command line
clang: error: linker command failed with exit code 1 (use -v to see invocation)

SYCL extension: Parallel primitives?

I believe that it may be useful to have functions for parallel primitives (sort, reduction, scan etc) available as a SYCL extension. The reason is that at the moment we already target three different devices (CPU, AMD GPUs, NVIDIA GPUs), for each of which there are high performance vendor-optimized libraries available [For us, hipCUB would probably be a solid choice here].
So, from a performance portability point of view it would be useful to have a SYCL interface that just forwards the operations to these highly optimized libraries.
Personally, I also believe that this interface should be within the scope of SYCL implementations (i.e. as a SYCL extension instead of an independent library), because of the variety of hardware focus of the different implementations. In that way, each SYCL implementation can focus on the hardware of its expertise.

Implementing this would require some effort, so I would first like to ask:

  • Is there an interest in this at all?
  • Do you think that it makes sense to implement this as a SYCL extension inside hipSYCL, opposed to e.g. modifying libraries like SYCL parallel STL to make use of vendor-optimized libraries?
  • Which algorithms/features do you think are most important to get started?

Problems with compilation on ubuntu 18.04 with rocm 2.3

I tried to run cmake with default parameters seems on my system it will be compiled with gcc 7.3.0 but I am getting errors like error: no member named 'make_unique' in namespace 'std' so I tried to switch to clang 6.0.0 which I have on my system but with same result CC=clang CXX=clang++ cmake. I thought clang 6 has default std c++14. So I tried it with CXXFLAGS+=-std=c++17 cmake but I am getting error The platform rocm was explicitly chosen, but it is not available.. This was again with gcc and 7.3.0 so I suppose gcc is not supported. I would expect that your cmake config will try to choose clang as C++ compiler as I see it listed as dependency and will add -std=c++14 at least.

At last I tried CC=clang CXX=clang++ CXXFLAGS+=-std=c++17 cmake and it was finally successful (with warning in many places warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]). But when I tried to compile sycl source syclcc test.cpp --hipsycl-platform=amd --std=c++17 I got this error:

warning: argument unused during compilation: '-L/usr/local/bin/../lib' [-Wunused-command-line-argument]
ld: /opt/rocm/bin/../lib/libhip_hcc.so: undefined reference to symbol 'hsa_system_major_extension_supported@@ROCR_1'
//opt/rocm/hsa/lib/libhsa-runtime64.so.1: error adding symbols: DSO missing from command line
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

This clang-9 come from rocm hcc. So I am going to try to set default C and C++ compiler on my system to newest clang.

So I tried it with clang-7. I again had to specify CXXFLAGS=-std=c++14 before cmake. This time I tried c++14 and hipsycl compiles without warning. But syclcc test.cpp --hipsycl-platform=rocm gives this strange error:

warning: argument unused during compilation: '-L/usr/local/bin/../lib' [-Wunused-command-line-argument]
syclcc fatal error: [Errno 2] No such file or directory: 'hipsycl_211bd330025b7952.cpp'

I tried also singulatiry container. I see that it is using clang 6 and adds std c++14:

+ export CXX=clang++-6.0
+ cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_CXX_FLAGS=-std=c++14 -DWITH_CPU_BACKEND=ON -DWITH_ROCM_BACKEND=ON ..
-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is Clang 6.0.0

But when compiling sycl program I am getting similar error as previously (with additional perl locale warning):

rocm-user@miso:~$ syclcc test.cpp --hipsycl-platform=rocm -std=c++14
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = "en_US:en",
	LC_ALL = (unset),
	LANG = "en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").
warning: argument unused during compilation: '-L/usr/bin/../lib' [-Wunused-command-line-argument]
ld: /opt/rocm/bin/../lib/libhip_hcc.so: undefined reference to symbol 'hsa_system_major_extension_supported@@ROCR_1'
//opt/rocm/hsa/lib/libhsa-runtime64.so.1: error adding symbols: DSO missing from command line
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

I have latest rocm 2.3. I now tested sample hip program and hcc program which is calling function hsa_system_major_extension_supported and linking with -lhsa-runtime64 and seems everything is working fine.

But seems when I skip -lhsa-runtime64 then I am getting similar error as syclcc:

ld: /tmp/tmp.EcD56X0on0/main.host.o: undefined reference to symbol 'hsa_system_major_extension_supported@@ROCR_1'
//opt/rocm/hsa/lib/libhsa-runtime64.so.1: error adding symbols: DSO missing from command line
clang-9: error: linker command failed with exit code 1 (use -v to see invocation)

And this finally worked from singularity container: syclcc test.cpp --hipsycl-platform=rocm -std=c++14 -lhsa-runtime64

hipCYCL build fails on Ubuntu 18.04 with CUDA 10.0

Hi,
I have tried to build hipCYCL on Ubuntu 18.04, with various versions of gcc, clang and cuda installed, but so far no combination has worked.

Here is an example using GCC 7.3, clang 8, and CUDA 10.0.

Configuring seems fine, as

git clone --recurse-submodules https://github.com/illuhad/hipSYCL
mkdir build
cd build
CXX=`which g++-7` CC=`which gcc-7` cmake -DCMAKE_INSTALL_PREFIX=/opt/hipSYCL -DLLVM_DIR=/usr/lib/llvm-8 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda-10.0 ..

results in

-- The C compiler identification is GNU 7.3.0
-- The CXX compiler identification is GNU 7.3.0
-- Check for working C compiler: /usr/bin/gcc-7
-- Check for working C compiler: /usr/bin/gcc-7 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/g++-7
-- Check for working CXX compiler: /usr/bin/g++-7 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda-10.0 (found version "10.0") 
-- Boost version: 1.65.1
-- Found the following Boost libraries:
--   filesystem
--   system
-- Boost version: 1.65.1
-- Configuring done
-- Generating done

However the actual build

make

fails with

Scanning dependencies of target hipSYCL_cpu
[  3%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/application.cpp.o
[  6%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/device.cpp.o
[  9%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/device_selector.cpp.o
[ 12%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/exception.cpp.o
[ 15%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o
[ 18%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/handler.cpp.o
[ 21%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/buffer.cpp.o
[ 24%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/task_graph.cpp.o
[ 27%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/accessor.cpp.o
[ 30%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/async_worker.cpp.o
[ 33%] Linking CXX shared library libhipSYCL_cpu.so
[ 33%] Built target hipSYCL_cpu
Scanning dependencies of target hipSYCL_cuda
[ 36%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:62: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o] Error 1
CMakeFiles/Makefile2:156: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/all' failed
make[1]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

I managed to compile the failing file by showing the actual command line with make VERBOSE=1 and then re-running it adding the option --cuda-path=/usr/local/cuda-10.0 - but then it waill fail on the next file, etc.

Is there a way to let CMake pass this option automatically to all the commands that need it ?

Proposal: Support Non-Standard SYCL Extensions / ComputeCpp compatibility

As I’m transitioning my application from ComputeCpp to hipSYCL, I’m realizing that there are many things I’ve been using that aren’t necessarily “SYCL conformant”, or at least somewhat ambiguous with regards to the spec. Supporting those in hipSYCL might seem like a bad idea, but hear me out:

There are many places were the ComputeCpp implementation allows stuff that currently does not compile with hipSYCL. Some things, like having an alternative <SYCL/sycl.hpp> entry-point I’m not so sure about (presumably, if SYCL does move away from OpenCL in the future, this might become relevant). Some other things are really convenient though. For example:

  • cl::sycl::id and cl::sycl::range can not only be constructed from one another (which is in the spec), but also from instances of other dimensionality. For example, if I construct a cl::sycl::range<3>(cl::sycl::id<1>(5)) I will end up with {5, 1, 1} (this of course also works for two cl::sycl::id or cl::sycl::range).
  • cl::sycl::id and cl::sycl::range are implicitly 3-dimensional with default values for index >= dimensions. For example, cl::sycl::range<1>(...)[2] returns 1.

Now I’d argue that these examples are actually representative of two types of extensions that ComputeCpp makes:

  • The first is a straight-up API extension. If Khronos intended this to exist, it would be listed in the table of constructors for both types (and hopefully it will be in the next version).
  • Since according to Codeplay, ComputeCpp is passing the Khronos-provided SYCL conformance test suite, I would say that the second example is essentially unspecified behavior.

ComputeCpp is probably the most prevalent SYCL implementation right now, and I suspect that many people will begin their “SYCL journey” with ComputeCpp. Thus, I think for the sake of compatibility (and convenience) we should strive to support a similar feature-set as ComputeCpp.

However that being said, I would not necessarily do it by default. There certainly is merit to the idea of having a fully spec-conformant program, especially when trying to target arbitrary SYCL implementations.

I think a good way of approaching this could thus be through feature flags (macros):

  • For example, add a COMPUTECPP_COMPAT macro that enables extensions such as the first example.
  • Alternatively, enable it by default and add a HIPSYCL_STRICT (or SYCL_STRICT) macro that disables these behaviors.

With regards to the second example – undefined behavior – it’s of course hard to say what e.g. a HIPSYCL_STRICT would entail. In any case, I do think that some way or another these (and other) use-cases should be supported.

To be clear: I’m not suggesting to now meticulously compare all the differences between hipSYCL and ComputeCpp and try to reach parity, but rather to have a plan for where I (and others) can put these kinds of extensions whenever they are encountered.

Use llvm cxx flags when compiling hipSYCL clang plugin

When compiling the clang plugin, we should add the llvm cxx flags to the compiler flags to make sure that llvm and the clang plugin are binary compatible. For example, some llvm distributions are compiled with fno-rtti and some some without. Getting this wrong will prevent the plugin from loading correctly.

Transformation steps currently have hard dependency on OpenMP

With the addition of the hipCPU backend the source transformation steps now include the hipCPU version of hip_runtime.h (here). This unfortunately causes compilation to fail on my system, as apparently my OpenMP headers are located outside the default system include search paths. The culprit is kernel_execution_context.hpp, which includes <omp.h>.

I don't know all the reasons for including hipCPU during transformations (presumably one being the definition of the __host__ and __device__ macros) so I don't know what the best course of action is here.

hipsycl_transform_source: Generalized interpretation of constructor call locations

At the moment, hipsycl_transform_source doesn't recognize all locations of constructor calls. For example, default constructors of base classes or default constructors of attributes are not yet found as a dependency of the class constructor. Consequently, automatic __host__, __device__ attributes don't work for these cases.

how do I tell hipSYCL where to find CUDA?

I installed CUDA from Apt so it's in /usr not /usr/local as LLVM seems to want. Is there a CMake command to tell hipSYCL to pass the right flags to Clang? I am surprised that Clang can't figure out how to use /usr. Thanks.

~/Work/OpenCL/hipSYCL/build$ rm -rf * ; cmake -DCMAKE_INSTALL_PREFIX=/opt/sycl/hipsycl .. && make -k
-- The C compiler identification is GNU 7.4.0
-- The CXX compiler identification is GNU 7.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr (found version "9.1") 
-- Boost version: 1.66.0
-- Found the following Boost libraries:
--   filesystem
--   system
-- Boost version: 1.66.0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jrhammon/Work/OpenCL/hipSYCL/build
Scanning dependencies of target hipSYCL_cpu
[  2%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/application.cpp.o
[  5%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/device.cpp.o
[  8%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/device_selector.cpp.o
[ 11%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/exception.cpp.o
[ 14%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o
[ 17%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/handler.cpp.o
[ 20%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/buffer.cpp.o
[ 22%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/task_graph.cpp.o
[ 25%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/accessor.cpp.o
[ 28%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/async_worker.cpp.o
[ 31%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/local_memory.cpp.o
[ 34%] Linking CXX shared library libhipSYCL_cpu.so
[ 34%] Built target hipSYCL_cpu
Scanning dependencies of target hipSYCL_cuda
[ 37%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:62: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/application.cpp.o] Error 1
[ 40%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/device.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:75: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/device.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/device.cpp.o] Error 1
[ 42%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/device_selector.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:88: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/device_selector.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/device_selector.cpp.o] Error 1
[ 45%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/exception.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:101: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/exception.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/exception.cpp.o] Error 1
[ 48%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/queue.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:114: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/queue.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/queue.cpp.o] Error 1
[ 51%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/handler.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:127: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/handler.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/handler.cpp.o] Error 1
[ 54%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/buffer.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:140: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/buffer.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/buffer.cpp.o] Error 1
[ 57%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/task_graph.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:153: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/task_graph.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/task_graph.cpp.o] Error 1
[ 60%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/accessor.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:166: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/accessor.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/accessor.cpp.o] Error 1
[ 62%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/async_worker.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:179: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/async_worker.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/async_worker.cpp.o] Error 1
[ 65%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/local_memory.cpp.o
clang: error: cannot find libdevice for sm_52. Provide path to different CUDA installation via --cuda-path, or pass -nocudalib to build without linking with libdevice.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
clang: error: cannot find CUDA installation.  Provide its path via --cuda-path, or pass -nocudainc to build without CUDA includes.
src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/build.make:192: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/local_memory.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cuda.dir/local_memory.cpp.o] Error 1

RFC: Using a modified clang-cuda frontend instead of source-to-source transformation?

At the moment, we are using a source-to-source transformation to allow SYCL code to be ingested by regular CUDA/HIP compilers. clang already has builtin support both for CUDA/HIP, both of which are, as far as I know, handled by the same CUDA frontend.
This raises the question if it would not be possible to modify clang's CUDA frontend such that, if a certain flag is passed, e.g. -sycl-cuda

  • it automatically infers whether a function should be compiled for __device__ and adds __device__ attributes instead of raising an error if such attributes are missing
  • it considers declarations in hierarchical parallel for invocation as __shared__, unless they are of type cl::sycl::private_memory<T>.

If we get this implemented, we wouldn't need the source-to-source transformation at all and would likely end up with a solution that is both very robust and very fast in terms of compilation times. This would also allow for easily integrating further transformations of the AST or IR for optimizations in the future.
The biggest downside I see is that people would need to compile a patched clang before they can use hipSYCL. We also would have to maintain a fork of clang (unless we can get our changes into mainline, which is perhaps rather unlikely since they are so specific?).

In this issue, I would like to collect thoughts on this approach regarding the usefulness, feasibility and technical approach.

One file of interest that I have already found is clang/lib/Sema/SemaCUDA.cpp. This is where clang seems to decide if a function can be called from another function based on their __host__, __device__ attributes.

Potential hipSYCL related crash in Mandelbrot fractal viewer

@VileLasagna, I'm moving the discussion here so that we don't clutter the cmake discussion :) I could not reproduce any crashes so far. Your SYCL code looks okay at first glance. However, I don't see a fractal - there's a black screen and if I click and drag the mouse, white rectangles are drawn on top. Is this expected (e.g. unfinished rendering code)?

EDIT: Nevermind - it seems that although the command group handler runs, the actual kernel is never running. This is likely a problem in hipSYCL.

error: __float128 is not supported on this target

I am trying to build hipSYCL for the Radeon GPU (Advanced Micro Devices, Inc. [AMD/ATI] Polaris 22 [Radeon RX Vega M GH] (rev c0)) that is associated with the Intel Hades Canyon NUC (Intel(R) Core(TM) i7-8809G CPU).

I cannot figure out how to get past the __float error.

System C++ compiler

$ /usr/bin/c++ -v Using built-in specs. COLLECT_GCC=/usr/bin/c++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/8/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 8.2.0-7ubuntu1' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu Thread model: posix gcc version 8.2.0 (Ubuntu 8.2.0-7ubuntu1)

CMake output

-- The C compiler identification is GNU 8.2.0
-- The CXX compiler identification is GNU 8.2.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Boost version: 1.67.0
-- Found the following Boost libraries:
--   filesystem
--   system
-- Boost version: 1.67.0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/user/AMD/hipSYCL/build

Make output

/usr/bin/cmake -H/home/user/AMD/hipSYCL -B/home/user/AMD/hipSYCL/build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/bin/cmake -E cmake_progress_start /home/user/AMD/hipSYCL/build/CMakeFiles /home/user/AMD/hipSYCL/build/CMakeFiles/progress.marks
make -f CMakeFiles/Makefile2 all
make[1]: Entering directory '/home/user/AMD/hipSYCL/build'
make -f src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/build.make src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/depend
make[2]: Entering directory '/home/user/AMD/hipSYCL/build'
cd /home/user/AMD/hipSYCL/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/user/AMD/hipSYCL /home/user/AMD/hipSYCL/src/libhipSYCL /home/user/AMD/hipSYCL/build /home/user/AMD/hipSYCL/build/src/libhipSYCL /home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/DependInfo.cmake --color=
Dependee "/home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/DependInfo.cmake" is newer than depender "/home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/depend.internal".
Dependee "/home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/depend.internal".
Scanning dependencies of target hipSYCL_cpu
make[2]: Leaving directory '/home/user/AMD/hipSYCL/build'
make -f src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/build.make src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/build
make[2]: Entering directory '/home/user/AMD/hipSYCL/build'
[  3%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/application.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/application.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/application.cpp
[  6%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/device.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/device.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/device.cpp
[  9%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/device_selector.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/device_selector.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/device_selector.cpp
[ 12%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/exception.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/exception.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/exception.cpp
[ 16%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/queue.cpp
[ 19%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/handler.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/handler.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/handler.cpp
[ 22%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/buffer.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/buffer.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/buffer.cpp
[ 25%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/task_graph.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/task_graph.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/task_graph.cpp
[ 29%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/accessor.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/accessor.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/accessor.cpp
[ 32%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/async_worker.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_cpu_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=cpu --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_cpu.dir/async_worker.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/async_worker.cpp
[ 35%] Linking CXX shared library libhipSYCL_cpu.so
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /usr/bin/cmake -E cmake_link_script CMakeFiles/hipSYCL_cpu.dir/link.txt --verbose=1
/home/user/AMD/hipSYCL/bin/syclcc -fPIC -O3 -DNDEBUG  -shared -Wl,-soname,libhipSYCL_cpu.so -o libhipSYCL_cpu.so CMakeFiles/hipSYCL_cpu.dir/application.cpp.o CMakeFiles/hipSYCL_cpu.dir/device.cpp.o CMakeFiles/hipSYCL_cpu.dir/device_selector.cpp.o CMakeFiles/hipSYCL_cpu.dir/exception.cpp.o CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o CMakeFiles/hipSYCL_cpu.dir/handler.cpp.o CMakeFiles/hipSYCL_cpu.dir/buffer.cpp.o CMakeFiles/hipSYCL_cpu.dir/task_graph.cpp.o CMakeFiles/hipSYCL_cpu.dir/accessor.cpp.o CMakeFiles/hipSYCL_cpu.dir/async_worker.cpp.o --hipsycl-platform=cpu --hipsycl-bootstrap 
make[2]: Leaving directory '/home/user/AMD/hipSYCL/build'
[ 35%] Built target hipSYCL_cpu
make -f src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/build.make src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/depend
make[2]: Entering directory '/home/user/AMD/hipSYCL/build'
cd /home/user/AMD/hipSYCL/build && /usr/bin/cmake -E cmake_depends "Unix Makefiles" /home/user/AMD/hipSYCL /home/user/AMD/hipSYCL/src/libhipSYCL /home/user/AMD/hipSYCL/build /home/user/AMD/hipSYCL/build/src/libhipSYCL /home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/DependInfo.cmake --color=
Dependee "/home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/DependInfo.cmake" is newer than depender "/home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/depend.internal".
Dependee "/home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/CMakeDirectoryInformation.cmake" is newer than depender "/home/user/AMD/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/depend.internal".
Scanning dependencies of target hipSYCL_rocm
make[2]: Leaving directory '/home/user/AMD/hipSYCL/build'
make -f src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/build.make src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/build
make[2]: Entering directory '/home/user/AMD/hipSYCL/build'
[ 38%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/application.cpp.o
cd /home/user/AMD/hipSYCL/build/src/libhipSYCL && /home/user/AMD/hipSYCL/bin/syclcc  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_rocm_EXPORTS -I/home/user/AMD/hipSYCL/include -I/home/user/AMD/hipSYCL/contrib/hipCPU/include  -O3 -DNDEBUG -fPIC   --hipsycl-platform=rocm --hipsycl-bootstrap -std=gnu++14 -o CMakeFiles/hipSYCL_rocm.dir/application.cpp.o -c /home/user/AMD/hipSYCL/src/libhipSYCL/application.cpp
In file included from /home/user/AMD/hipSYCL/src/libhipSYCL/application.cpp:28:
In file included from /home/user/AMD/hipSYCL/bin/../include/CL/sycl/detail/application.hpp:31:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/memory:62:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/stl_algobase.h:64:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/stl_pair.h:59:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/move.h:55:
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/type_traits:335:39: error: __float128 is not supported on this target
    struct __is_floating_point_helper<__float128>
                                      ^
In file included from /home/user/AMD/hipSYCL/src/libhipSYCL/application.cpp:28:
In file included from /home/user/AMD/hipSYCL/bin/../include/CL/sycl/detail/application.hpp:31:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/memory:80:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/unique_ptr.h:37:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/tuple:39:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/array:39:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/stdexcept:39:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/string:52:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/basic_string.h:6391:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/ext/string_conversions.h:41:
In file included from /usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/cstdlib:77:
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:102:7: error: __float128 is not supported on this target
  abs(__float128 __x)
      ^
/usr/lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/bits/std_abs.h:101:3: error: __float128 is not supported on this target
  __float128
  ^
3 errors generated.
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/build.make:63: src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/application.cpp.o] Error 1
make[2]: *** Deleting file 'src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/application.cpp.o'
make[2]: Leaving directory '/home/user/AMD/hipSYCL/build'
make[1]: *** [CMakeFiles/Makefile2:154: src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/all] Error 2
make[1]: Leaving directory '/home/user/AMD/hipSYCL/build'
make: *** [Makefile:130: all] Error 2

First fix attempt

$ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 0b23568..1065fa6 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,6 +1,8 @@
 cmake_minimum_required (VERSION 3.5)
 project(hipSYCL)
 
+set(CMAKE_CXX_STANDARD 14)
+set(CMAKE_CXX_EXTENSIONS ON)
 
 set(HIPSYCL_DEVICE_COMPILER ${PROJECT_SOURCE_DIR}/bin/syclcc)
 set(HIPSYCL_SOURCE_DIR ${PROJECT_SOURCE_DIR})
diff --git a/bin/syclcc b/bin/syclcc
index 0b30add..92e70e1 100755
--- a/bin/syclcc
+++ b/bin/syclcc
@@ -251,7 +251,7 @@ class cuda_clang_compiler:
 
       gpu_target_arch = self._get_gpu_target_arch(args)
       compiler_args = ["--cuda-gpu-arch="+gpu_target_arch] if gpu_target_arch else []
-      compiler_args += ["-std=c++14",
+      compiler_args += ["-std=gnu++14",
                         "-pthread",
                         "-Wno-unused-command-line-argument",
                         "-Wno-deprecated-declarations"]
@@ -397,7 +397,7 @@ class hip_compiler:
 
       compiler_args = ["-Wno-unused-command-line-argument",
                         "-Wno-ignored-attributes",
-                        "-std=c++14"]+self.hcc_options
+                        "-std=gnu++14"]+self.hcc_options
       compiler_args += self.config.hipsycl_common_arguments
       compiler_args += transformed_args
 
@@ -446,7 +446,7 @@ class hipcpu_compiler:
 
       compiler_args = ["-Wno-unused-command-line-argument",
                         "-Wno-ignored-attributes",
-                        "-std=c++14",
+                        "-std=gnu++14",
                         "-fopenmp"]
       compiler_args += self.config.hipsycl_common_arguments
       compiler_args += transformed_args

Clues

/home/jrhammon/AMD/hipSYCL/bin/syclcc --hipsycl-platform=rocm
clang-9: error: ROCm agent detector could not identify any valid targets; please specify the target explicitly by passing a valid value to -amdgpu-target
clang-9: warning: -amdgpu-target argument 'gfx000' is not recognized; using gfx803 instead [-Winvalid-command-line-argument]

Second fix attempt

$ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 0b23568..1065fa6 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -1,6 +1,8 @@
 cmake_minimum_required (VERSION 3.5)
 project(hipSYCL)
 
+set(CMAKE_CXX_STANDARD 14)
+set(CMAKE_CXX_EXTENSIONS ON)
 
 set(HIPSYCL_DEVICE_COMPILER ${PROJECT_SOURCE_DIR}/bin/syclcc)
 set(HIPSYCL_SOURCE_DIR ${PROJECT_SOURCE_DIR})
diff --git a/bin/syclcc b/bin/syclcc
index 0b30add..5d9e3bf 100755
--- a/bin/syclcc
+++ b/bin/syclcc
@@ -251,7 +251,7 @@ class cuda_clang_compiler:
 
       gpu_target_arch = self._get_gpu_target_arch(args)
       compiler_args = ["--cuda-gpu-arch="+gpu_target_arch] if gpu_target_arch else []
-      compiler_args += ["-std=c++14",
+      compiler_args += ["-std=gnu++14",
                         "-pthread",
                         "-Wno-unused-command-line-argument",
                         "-Wno-deprecated-declarations"]
@@ -397,7 +397,7 @@ class hip_compiler:
 
       compiler_args = ["-Wno-unused-command-line-argument",
                         "-Wno-ignored-attributes",
-                        "-std=c++14"]+self.hcc_options
+                        "-std=gnu++14"]+self.hcc_options
       compiler_args += self.config.hipsycl_common_arguments
       compiler_args += transformed_args
 
@@ -417,6 +417,8 @@ class hip_compiler:
     additional_args = []
     if self.config.gpu_target_arch != None:
       additional_args.append("-amdgpu-target="+self.config.gpu_target_arch)
+    else:
+      additional_args.append("-amdgpu-target=gfx803")
 
     # ToDo: What if the hcc include/library paths in cxxconfig and ldconfig
     # contain spaces?
@@ -446,7 +448,7 @@ class hipcpu_compiler:
 
       compiler_args = ["-Wno-unused-command-line-argument",
                         "-Wno-ignored-attributes",
-                        "-std=c++14",
+                        "-std=gnu++14",
                         "-fopenmp"]
       compiler_args += self.config.hipsycl_common_arguments
       compiler_args += transformed_args

Silent cudaLaunch failures when compiling with clang's CUDA implementation

Moving the discussion from #42 to a dedicated issue. As mentioned in #42, @psalz found out that this simple code:

#include <CL/sycl.hpp>

int main() {
    cl::sycl::queue queue;
    cl::sycl::buffer<float, 1> buf(10);

    queue.submit([&](cl::sycl::handler& cgh) {}); // The culprit

    queue.submit([&](cl::sycl::handler& cgh) {
        auto acc = buf.get_access<cl::sycl::access::mode::discard_write>(cgh);
        cgh.parallel_for<class fail>(buf.get_range(), [=](cl::sycl::item<1> item) {
            acc[item] = 1.f;
        }); 
    }); 

    return 0;
}

Silently fails to run the kernel. The error is however not restricted to this particular code and can also seemingly "strike at random". An error can only be seen if using cuda-memcheck, which reveals error cudaErrorInvalidDeviceFunction (error 8) due to "invalid device function" on CUDA API call to cudaLaunch.

We know:

  • the problem disappears when changing small seemingly unrelated bits of code (in this case, either removing the first empty command group or moving it after the second command group solves the issue)
  • Generated device code and cuda launch code on the host side is the same for both working and non-working versions
  • Reproducible (at least?) with clang 8
  • Everything works fine if compiled with nvcc.

Things to try:

  • Does adding an explicit queue.wait_and_throw() at the end change anything? Terminating a program without synchronization either via queue or by creating a host accessor is not allowed by spec, although hipSYCL historically has handled that well. The question is: Could it happen that hipSYCL in some destructor tries to run the kernel while CUDA runtime has already started shutting down? EDIT: No, explicit synchronization doesn't help

clang compatibility question

I'm having trouble getting syclcc-clang to work on NixOS with nixos-rocm, and realized I may be trying something known not to work.

The REAMDE says,

In order to use the new clang-based toolchain, hipSYCL must be compiled against the same clang version used by ROCm (at the moment clang 9)

and

llvm/clang (with development headers and libraries). LLVM/clang 6, 7 and 8 are supported at the moment.

Should I expect to be able to be able to get clang-9 as used by ROCm 2.4 to work with syclcc-clang? I'm hitting issues with amp annotations in hcc headers, so this is entirely possibly something I'm doing wrong with exposing hcc's clang, but I'd like to be sure ROCm 2.4 and clang-9 is a supported configuration.

Proposal: Explicit inclusion rewriting

I propose adding a special flag for syclcc which controls whether an include path should be rewritten or not. Here’s why:

First of all I suspect that for many smaller SYCL applications, all kernel code will be situated within implementation files anyway, so the entire rewrite pass could be omitted, improving compilation speeds.
More importantly though, inlining of headers can sometimes cause compilation errors. As an example, the following program does not compile with syclcc (using nvcc, as of Boost 1.68):

#include <boost/optional.hpp>

int main() {
    boost::optional<int> foo, bar;
    foo = bar;
    return 0;
}

If I prevent the inclusion rewriter from inlining Boost (by adding a condition within RewriteSelector::shouldIncludeBeRewritten), everything works fine. I can’t tell you the exact reason, as Boost is huge and contains a metric ton of workarounds for various compilers (I really don’t envy those guys!), however, to me this looks like a case of compiler detection which causes the compilation to take different branches during the inclusion rewriting and the compilation with nvcc. I actually have no idea how they do it, I tried to create a small test case to replicate this and couldn’t come up with one. Nevertheless, it apparently does happen.

Moreover, all of the weird egde-cases I’ve encountered with pruning were within libraries that don’t contain any device code at all. Omitting them from the rewrite pass would also alleviate those problems.

Lastly, this could generally improve compilation performance, as we would have less text to dump back to disk.


So, instead of rewriting everything but hipSYCL and system libraries, I propose that headers which should be rewritten be made explicit through some flag for syclcc, e.g. -IR or --rewrite-includes-path=. I suspect that in many cases the rewrite step would only concern local includes (e.g. -IR./include) or could even be omitted entirely. Also maintaining a list of system library headers to exclude wouldn’t be required anymore either.

Build error on fedora 31

I want to test hipSYCL;

  • singularity cointainer with fedora 31
  • rocm from the rpms
  • hipSYCL trunk checkout
  • cmake3 -DCMAKE_INSTALL_PREFIX=/opt/hipSYCL/ ../git/
  • VERBOSE=1 make -j1
[  2%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_rocm.dir/application.cpp.o
cd /opt/hipSYCL/build/src/libhipSYCL && /opt/hipSYCL/git/bin/syclcc-clang  -DHIPSYCL_DEBUG_LEVEL=1 -DhipSYCL_rocm_EXPORTS -I/opt/hipSYCL/git/include -I/opt/hipSYCL/git/contrib/hipCPU/include -I/opt/rocm/include  -O2 -DNDEBUG -fPIC   --hipsycl-platform=rocm --hipsycl-gpu-arch=gfx900 --hipsycl-bootstrap --hipsycl-config-file=/opt/hipSYCL/build/syclcc.json -std=c++14 -o CMakeFiles/hipSYCL_rocm.dir/application.cpp.o -c /opt/hipSYCL/git/src/libhipSYCL/application.cpp
clang-8: error: no such file or directory: 'hip.amdgcn.bc'
clang-8: error: no such file or directory: 'opencl.amdgcn.bc'
clang-8: error: no such file or directory: 'ocml.amdgcn.bc'
clang-8: error: no such file or directory: 'ockl.amdgcn.bc'
clang-8: error: no such file or directory: 'oclc_finite_only_off.amdgcn.bc'
clang-8: error: no such file or directory: 'oclc_daz_opt_off.amdgcn.bc'
clang-8: error: no such file or directory: 'oclc_correctly_rounded_sqrt_on.amdgcn.bc'
clang-8: error: no such file or directory: 'oclc_unsafe_math_off.amdgcn.bc'
clang-8: error: no such file or directory: 'oclc_isa_version_900.amdgcn.bc'
  • The same with clang-9 from rocm
find /opt/rocm/ -name hip.amdgcn.bc
/opt/rocm/hcc/lib/hip.amdgcn.bc
/opt/rocm/hcc/rocdl/lib/hip.amdgcn.bc
/opt/rocm/hcc/rocdl/hip/hip.amdgcn.bc
/opt/rocm/opencl/lib/x86_64/bitcode/hip.amdgcn.bc

How can i add a search directory for the bc files?

Strategy: The future of the source-to-source transformation

Since we'll have a clang plugin hopefully soonish that will directly allow the clang CUDA/HIP frontend to ingest SYCL code (see issue #34), we could in principle drop the source-to-source transformation entirely.
I'd like to start a discussion here with hipSYCL users and developers to get some feedback on possible futures of the hipSYCL compilation toolchain. Is the source-to-source transformation important to you and we should support both source-to-source and the new clang plugin? Do you need nvcc support? Or is clang support (with the plugin) sufficient for you?

Here are some pros of the source-to-source transformation that come to my mind:

  • Allows compilation of SYCL code with nvcc. This can be interesting from a marketing position ("you can do anything that nvcc can and can use the newest CUDA features right away")
  • Possible to specify areas in the code with preprocessor definitions that hipSYCL shouldn't modify. This could be beneficial if you're interested in mix-and-match with SYCL and CUDA/HIP code.

The new clang plugin on the other hand gives us:

  • Much more robustness and reliability (there are edge cases in the source-to-source transformation...)
  • Faster compilation speed
  • Solid SYCL support
  • Paves the way for runtime selection whether a kernel should be executed on host or device (could in principle however also be implemented with source-to-source and clang's CUDA implementation [but likely not with nvcc])
  • Potentially even some parts of the C++ standard library could be used in kernels (although not yet with the initial version of the plugin)
  • Implementation of specific optimizations in the future since we have access to the IR

While not impossible, it may require some additional effort to support both the current source-to-source transformation and the new clang plugin approach because the clang plugin treats any function without attributes implicitly as __host__ __device__. This means that all functions for SYCL kernels (e.g. math functions) must also support compilation for both host and device. At the moment, we assume in the runtime that everything used in kernels is __device__ only. This is also assumed by the current source-to-source transformation.
Also, if we still have to support source-to-source, it may limit our ability to implement things with IR transformations.

unit tests: error: in "device_test_suite/hierarchical_dispatch": check computed == expected has failed

Running unit tests results in errors:

./unit_tests:

Running 32 test cases...
Entering test module "hipsycl unit tests"
[path]/hipSYCL/tests/unit_tests.cpp(30): Entering test suite "device_test_suite"
[path]/hipSYCL/tests/unit_tests.cpp(32): Entering test case "basic_single_task"
[path]/hipSYCL/tests/unit_tests.cpp(32): Leaving test case "basic_single_task"; testing time: 2241us
[path]/hipSYCL/tests/unit_tests.cpp(45): Entering test case "basic_parallel_for"
[path]/hipSYCL/tests/unit_tests.cpp(45): Leaving test case "basic_parallel_for"; testing time: 1750us
[path]/hipSYCL/tests/unit_tests.cpp(62): Entering test case "basic_parallel_for_with_offset"
[path]/hipSYCL/tests/unit_tests.cpp(62): Leaving test case "basic_parallel_for_with_offset"; testing time: 1775us
[path]/hipSYCL/tests/unit_tests.cpp(84): Entering test case "basic_parallel_for_nd"
[path]/hipSYCL/tests/unit_tests.cpp(84): Leaving test case "basic_parallel_for_nd"; testing time: 1665us
[path]/hipSYCL/tests/unit_tests.cpp(104): Entering test case "hierarchical_dispatch"
[path]/hipSYCL/tests/unit_tests.cpp(144): error: in "device_test_suite/hierarchical_dispatch": check computed == expected has failed [0 != 32640]
[path]/hipSYCL/tests/unit_tests.cpp(144): error: in "device_test_suite/hierarchical_dispatch": check computed == expected has failed [0 != 98176]
[path]/hipSYCL/tests/unit_tests.cpp(144): error: in "device_test_suite/hierarchical_dispatch": check computed == expected has failed [0 != 163712]
[path]/hipSYCL/tests/unit_tests.cpp(144): error: in "device_test_suite/hierarchical_dispatch": check computed == expected has failed [0 != 229248]
[path]/hipSYCL/tests/unit_tests.cpp(104): Leaving test case "hierarchical_dispatch"; testing time: 1871us
[path]/hipSYCL/tests/unit_tests.cpp(148): Entering test case "dynamic_local_memory"
[path]/hipSYCL/tests/unit_tests.cpp(148): Leaving test case "dynamic_local_memory"; testing time: 2028us
[path]/hipSYCL/tests/unit_tests.cpp(189): Entering test case "placeholder_accessors"
[path]/hipSYCL/tests/unit_tests.cpp(189): Leaving test case "placeholder_accessors"; testing time: 1415607us
[path]/hipSYCL/tests/unit_tests.cpp(221): Entering test case "task_graph_synchronization"
Test case device_test_suite/task_graph_synchronization did not check any assertions
[path]/hipSYCL/tests/unit_tests.cpp(221): Leaving test case "task_graph_synchronization"; testing time: 35666us
[path]/hipSYCL/tests/unit_tests.cpp(266): Entering test case "buffer_versioning"
[path]/hipSYCL/tests/unit_tests.cpp(266): Leaving test case "buffer_versioning"; testing time: 1848us
[path]/hipSYCL/tests/unit_tests.cpp(294): Entering test case "vec_api"
[path]/hipSYCL/tests/unit_tests.cpp(294): Leaving test case "vec_api"; testing time: 1552us
[path]/hipSYCL/tests/unit_tests.cpp(404): Entering test case "range_api<mpl___integral_c<int, 1>>"
[path]/hipSYCL/tests/unit_tests.cpp(404): Leaving test case "range_api<mpl___integral_c<int, 1>>"; testing time: 74us
[path]/hipSYCL/tests/unit_tests.cpp(404): Entering test case "range_api<mpl___integral_c<int, 2>>"
[path]/hipSYCL/tests/unit_tests.cpp(404): Leaving test case "range_api<mpl___integral_c<int, 2>>"; testing time: 56us
[path]/hipSYCL/tests/unit_tests.cpp(404): Entering test case "range_api<mpl___integral_c<int, 3>>"
[path]/hipSYCL/tests/unit_tests.cpp(404): Leaving test case "range_api<mpl___integral_c<int, 3>>"; testing time: 58us
[path]/hipSYCL/tests/unit_tests.cpp(489): Entering test case "id_api<mpl___integral_c<int, 1>>"
[path]/hipSYCL/tests/unit_tests.cpp(489): Leaving test case "id_api<mpl___integral_c<int, 1>>"; testing time: 52us
[path]/hipSYCL/tests/unit_tests.cpp(489): Entering test case "id_api<mpl___integral_c<int, 2>>"
[path]/hipSYCL/tests/unit_tests.cpp(489): Leaving test case "id_api<mpl___integral_c<int, 2>>"; testing time: 55us
[path]/hipSYCL/tests/unit_tests.cpp(489): Entering test case "id_api<mpl___integral_c<int, 3>>"
[path]/hipSYCL/tests/unit_tests.cpp(489): Leaving test case "id_api<mpl___integral_c<int, 3>>"; testing time: 74us
[path]/hipSYCL/tests/unit_tests.cpp(580): Entering test case "item_api<mpl___integral_c<int, 1>>"
[path]/hipSYCL/tests/unit_tests.cpp(580): Leaving test case "item_api<mpl___integral_c<int, 1>>"; testing time: 2100us
[path]/hipSYCL/tests/unit_tests.cpp(580): Entering test case "item_api<mpl___integral_c<int, 2>>"
[path]/hipSYCL/tests/unit_tests.cpp(580): Leaving test case "item_api<mpl___integral_c<int, 2>>"; testing time: 2258us
[path]/hipSYCL/tests/unit_tests.cpp(580): Entering test case "item_api<mpl___integral_c<int, 3>>"
[path]/hipSYCL/tests/unit_tests.cpp(580): Leaving test case "item_api<mpl___integral_c<int, 3>>"; testing time: 5532us
[path]/hipSYCL/tests/unit_tests.cpp(819): Entering test case "explicit_buffer_copy_host_ptr<mpl___integral_c<int, 1>>"
[path]/hipSYCL/tests/unit_tests.cpp(819): Leaving test case "explicit_buffer_copy_host_ptr<mpl___integral_c<int, 1>>"; testing time: 1761us
[path]/hipSYCL/tests/unit_tests.cpp(819): Entering test case "explicit_buffer_copy_host_ptr<mpl___integral_c<int, 2>>"
[path]/hipSYCL/tests/unit_tests.cpp(819): Leaving test case "explicit_buffer_copy_host_ptr<mpl___integral_c<int, 2>>"; testing time: 3858us
[path]/hipSYCL/tests/unit_tests.cpp(930): Entering test case "explicit_buffer_copy_two_accessors_d2d<mpl___integral_c<int, 1>>"
[path]/hipSYCL/tests/unit_tests.cpp(930): Leaving test case "explicit_buffer_copy_two_accessors_d2d<mpl___integral_c<int, 1>>"; testing time: 244us
[path]/hipSYCL/tests/unit_tests.cpp(930): Entering test case "explicit_buffer_copy_two_accessors_d2d<mpl___integral_c<int, 2>>"
[hipSYCL Error] task_graph: submit() caught async error,  invoking async handler.

Proposal: Add support for compiling CUDA with Clang

I’d love to see support for compiling CUDA using Clang (as detailed here) added to syclcc. As far as I can tell this pretty much only has advantages over using nvcc (I could be wrong though, as I don’t have much CUDA experience).

For example, Clang will happily compile this program:

#include <CL/sycl.hpp>

int main() {
    cl::sycl::queue queue;

    queue.submit([](auto& cgh) {
        cgh.template parallel_for<class test>(cl::sycl::range<1>(10), [=](cl::sycl::item<1> item) {
            printf("Hello %d\n", item[0]);
        });
    });

    return 0;
}

As you know, generic lambas are among the things that nvcc’s (experimental!) extended lambda feature currently does not support at all. Don’t ask me why, but I actually need this.

As for feasibility: I added Clang as a third option into syclcc and successfully compiled both libhipSYCL itself (with very minor adjustments) as well as my application (which makes use of generic lambdas).

This would likely also allow us to omit the pruning step, as the call-graphs should be the same between the libTooling and Clang invocations.

Other advantages:

  • All (except for the syclcc specific) parameters can be simply passed to clang++, no juggling required.
  • You get Clang’s awesome error reporting.

In fact I think it might even be worth considering making Clang the default for compiling to CUDA, and dropping nvcc altogether. Anyone using hipSYCL has to have Clang installed anyway, so that is not a problem. However again, I’m not certain if there’s any downsides to this, like e.g. any features that aren’t supported. Dropping nvcc would allow us to use features that only Clang supports, like overloading based on device and host attributes. Although this might be moot as I’m not sure what hcc’s stance is on this.

Clang 9.0.0 chokes on hipSYCL

I installed LLVM 9.0.0 from the official LLVM Apt packages on Ubuntu 18.04 and it choked as follows. Have you seen this before? I can file an upstream bug report but I would like to know it's something worth addressing and not a known problem related to hipSYCL. Thanks.

~/Work/OpenCL/hipSYCL/build$ rm -rf * ; cmake -DCMAKE_INSTALL_PREFIX=/opt/sycl/hipsycl -DCMAKE_C_COMPILER=clang-9 -DCMAKE_CXX_COMPILER=clang++-9 .. 
-- The C compiler identification is Clang 9.0.0
-- The CXX compiler identification is Clang 9.0.0
-- Check for working C compiler: /usr/bin/clang-9
-- Check for working C compiler: /usr/bin/clang-9 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/clang++-9
-- Check for working CXX compiler: /usr/bin/clang++-9 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr (found version "9.1") 
-- Boost version: 1.66.0
-- Found the following Boost libraries:
--   filesystem
--   system
-- Boost version: 1.66.0
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jrhammon/Work/OpenCL/hipSYCL/build
jrhammon@klondike:~/Work/OpenCL/hipSYCL/build$  make
Scanning dependencies of target hipSYCL_cpu
[  2%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/application.cpp.o
[  5%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/device.cpp.o
[  8%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/device_selector.cpp.o
[ 11%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/exception.cpp.o
[ 14%] Building CXX object src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o
Stack dump:
0.	Program arguments: /usr/lib/llvm-9/bin/clang -cc1 -triple x86_64-pc-linux-gnu -emit-obj -disable-free -disable-llvm-verifier -discard-value-names -main-file-name queue.cpp -mrelocation-model pic -pic-level 2 -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -munwind-tables -fuse-init-array -target-cpu x86-64 -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/jrhammon/Work/OpenCL/hipSYCL/build/src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/queue.cpp.gcno -resource-dir /usr/lib/llvm-9/lib/clang/9.0.0 -D HIPSYCL_DEBUG_LEVEL=1 -D hipSYCL_cpu_EXPORTS -I /home/jrhammon/Work/OpenCL/hipSYCL/include -I /home/jrhammon/Work/OpenCL/hipSYCL/contrib/hipCPU/include -D NDEBUG -I/opt/intel/compilers_and_libraries_2019.3.199/linux/ipp/include -I/opt/intel/compilers_and_libraries_2019.3.199/linux/mkl/include -I/opt/intel/compilers_and_libraries_2019.3.199/linux/pstl/include -I/opt/intel/compilers_and_libraries_2019.3.199/linux/tbb/include -I/opt/intel/compilers_and_libraries_2019.3.199/linux/tbb/include -I/opt/intel/compilers_and_libraries_2019.3.199/linux/daal/include -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/x86_64-linux-gnu/c++/8 -internal-isystem /usr/bin/../lib/gcc/x86_64-linux-gnu/8/../../../../include/c++/8/backward -internal-isystem /usr/include/clang/9.0.0/include/ -internal-isystem /usr/local/include -internal-isystem /usr/lib/llvm-9/lib/clang/9.0.0/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -O3 -std=c++14 -fdeprecated-macro -fdebug-compilation-dir /home/jrhammon/Work/OpenCL/hipSYCL/build/src/libhipSYCL -ferror-limit 19 -fmessage-length 0 -fopenmp -fobjc-runtime=gcc -fcxx-exceptions -fexceptions -fdiagnostics-show-option -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -o CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o -x c++ /home/jrhammon/Work/OpenCL/hipSYCL/src/libhipSYCL/queue.cpp 
1.	/home/jrhammon/Work/OpenCL/hipSYCL/include/CL/sycl/group.hpp:211:5: current parser token 'mem_fence'
2.	/home/jrhammon/Work/OpenCL/hipSYCL/include/CL/sycl/group.hpp:40:1: parsing namespace 'cl'
3.	/home/jrhammon/Work/OpenCL/hipSYCL/include/CL/sycl/group.hpp:41:1: parsing namespace 'cl::sycl'
4.	/home/jrhammon/Work/OpenCL/hipSYCL/include/CL/sycl/group.hpp:44:1: parsing struct/union/class body 'cl::sycl::group'
5.	/home/jrhammon/Work/OpenCL/hipSYCL/include/CL/sycl/group.hpp:199:3: parsing function body 'cl::sycl::group::async_work_group_copy'
6.	/home/jrhammon/Work/OpenCL/hipSYCL/include/CL/sycl/group.hpp:199:3: in compound statement ('{}')
 #0 0x00007fd8a6cf0d7f llvm::sys::PrintStackTrace(llvm::raw_ostream&) (/usr/lib/x86_64-linux-gnu/libLLVM-9.so.1+0xa2ed7f)
 #1 0x00007fd8a6cef1a0 llvm::sys::RunSignalHandlers() (/usr/lib/x86_64-linux-gnu/libLLVM-9.so.1+0xa2d1a0)
 #2 0x00007fd8a6cf1181 (/usr/lib/x86_64-linux-gnu/libLLVM-9.so.1+0xa2f181)
 #3 0x00007fd8ac8d9890 __restore_rt (/lib/x86_64-linux-gnu/libpthread.so.0+0x12890)
 #4 0x00007fd8ab495a68 isFromASTFile /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/AST/DeclBase.h:702:39
 #5 0x00007fd8ab495a68 getFirstDecl /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/AST/Redeclarable.h:329:13
 #6 0x00007fd8ab495a68 getCanonicalDecl /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/AST/Decl.h:2808:54
 #7 0x00007fd8ab495a68 getCanonicalDecl /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:875:14
 #8 0x00007fd8ab495a68 checkDecl /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:4944:9
 #9 0x00007fd8ab49596d VisitStmt /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:5002:35
#10 0x00007fd8ab4957c0 doesDependOnLoopCounter /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:5027:7
#11 0x00007fd8ab4957c0 setLCDeclAndLB /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:4846:22
#12 0x00007fd8ab496d68 checkOpenMPIterationSpace /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:5672:11
#13 0x00007fd8ab476298 checkOpenMPLoop /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:6036:9
#14 0x00007fd8ab467f6e clang::Sema::ActOnOpenMPSimdDirective(llvm::ArrayRef<clang::OMPClause*>, clang::Stmt*, clang::SourceLocation, clang::SourceLocation, llvm::SmallDenseMap<clang::ValueDecl const*, clang::Expr const*, 4u, llvm::DenseMapInfo<clang::ValueDecl const*>, llvm::detail::DenseMapPair<clang::ValueDecl const*, clang::Expr const*> >&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:6721:30
#15 0x00007fd8ab4614ad clang::Sema::ActOnOpenMPExecutableDirective(clang::OpenMPDirectiveKind, clang::DeclarationNameInfo const&, clang::OpenMPDirectiveKind, llvm::ArrayRef<clang::OMPClause*>, clang::Stmt*, clang::SourceLocation, clang::SourceLocation) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Sema/SemaOpenMP.cpp:4066:11
#16 0x00007fd8aac36a61 clang::Parser::ParseOpenMPDeclarativeOrExecutableDirective(clang::Parser::ParsedStmtContext) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseOpenMP.cpp:1464:25
#17 0x00007fd8aac4a9d2 clang::Parser::ParseStatementOrDeclarationAfterAttributes(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*, clang::Parser::ParsedAttributesWithRange&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseStmt.cpp:366:12
#18 0x00007fd8aac494ea clang::Parser::ParseStatementOrDeclaration(llvm::SmallVector<clang::Stmt*, 32u>&, clang::Parser::ParsedStmtContext, clang::SourceLocation*) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseStmt.cpp:106:20
#19 0x00007fd8aac50ae0 clang::Parser::ParseCompoundStatementBody(bool) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseStmt.cpp:1063:11
#20 0x00007fd8aac517ed clang::Parser::ParseFunctionStatementBody(clang::Decl*, clang::Parser::ParseScope&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseStmt.cpp:2076:21
#21 0x00007fd8aabcc72e isNot /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Lex/Token.h:98:52
#22 0x00007fd8aabcc72e clang::Parser::ParseLexedMethodDef(clang::Parser::LexedMethod&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseCXXInlineMethods.cpp:556:14
#23 0x00007fd8aabcb83e clang::Parser::ParseLexedMethodDefs(clang::Parser::ParsingClass&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseCXXInlineMethods.cpp:479:63
#24 0x00007fd8aabf7239 clang::Parser::ParseCXXMemberSpecification(clang::SourceLocation, clang::SourceLocation, clang::Parser::ParsedAttributesWithRange&, unsigned int, clang::Decl*) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDeclCXX.cpp:3353:21
#25 0x00007fd8aabf4c0d clang::Parser::ParseClassSpecifier(clang::tok::TokenKind, clang::SourceLocation, clang::DeclSpec&, clang::Parser::ParsedTemplateInfo const&, clang::AccessSpecifier, bool, clang::Parser::DeclSpecContext, clang::Parser::ParsedAttributesWithRange&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDeclCXX.cpp:0:7
#26 0x00007fd8aabd7d3f getPointer /build/llvm-toolchain-snapshot-9~svn366056/include/llvm/ADT/PointerIntPair.h:58:58
#27 0x00007fd8aabd7d3f isNull /build/llvm-toolchain-snapshot-9~svn366056/include/llvm/ADT/PointerUnion.h:189:19
#28 0x00007fd8aabd7d3f empty /build/llvm-toolchain-snapshot-9~svn366056/include/llvm/ADT/TinyPtrVector.h:162:13
#29 0x00007fd8aabd7d3f empty /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Sema/ParsedAttr.h:816:40
#30 0x00007fd8aabd7d3f clang::Parser::ParseDeclarationSpecifiers(clang::DeclSpec&, clang::Parser::ParsedTemplateInfo const&, clang::AccessSpecifier, clang::Parser::DeclSpecContext, clang::Parser::LateParsedAttrList*) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDecl.cpp:3820:23
#31 0x00007fd8aac57d80 is /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Lex/Token.h:97:49
#32 0x00007fd8aac57d80 clang::Parser::ParseSingleDeclarationAfterTemplate(clang::DeclaratorContext, clang::Parser::ParsedTemplateInfo const&, clang::ParsingDeclRAIIObject&, clang::SourceLocation&, clang::ParsedAttributes&, clang::AccessSpecifier) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseTemplate.cpp:216:11
#33 0x00007fd8aac572c6 clang::Parser::ParseTemplateDeclarationOrSpecialization(clang::DeclaratorContext, clang::SourceLocation&, clang::ParsedAttributes&, clang::AccessSpecifier) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseTemplate.cpp:0:0
#34 0x00007fd8aac56cd4 clang::Parser::ParseDeclarationStartingWithTemplate(clang::DeclaratorContext, clang::SourceLocation&, clang::ParsedAttributes&, clang::AccessSpecifier) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseTemplate.cpp:0:0
#35 0x00007fd8aabd7337 clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDecl.cpp:1740:18
#36 0x00007fd8aac6685d clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/Parser.cpp:0:3
#37 0x00007fd8aabee7ca ~AttributePool /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Sema/ParsedAttr.h:712:22
#38 0x00007fd8aabee7ca ~ParsedAttributes /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Sema/ParsedAttr.h:906:7
#39 0x00007fd8aabee7ca clang::Parser::ParseInnerNamespace(llvm::SmallVector<clang::Parser::InnerNamespaceInfo, 4u> const&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDeclCXX.cpp:247:5
#40 0x00007fd8aabee3ad Exit /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Parse/Parser.h:1020:11
#41 0x00007fd8aabee3ad clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDeclCXX.cpp:227:18
#42 0x00007fd8aabd7431 clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDecl.cpp:0:0
#43 0x00007fd8aac6685d clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/Parser.cpp:0:3
#44 0x00007fd8aabee7ca ~AttributePool /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Sema/ParsedAttr.h:712:22
#45 0x00007fd8aabee7ca ~ParsedAttributes /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Sema/ParsedAttr.h:906:7
#46 0x00007fd8aabee7ca clang::Parser::ParseInnerNamespace(llvm::SmallVector<clang::Parser::InnerNamespaceInfo, 4u> const&, unsigned int, clang::SourceLocation&, clang::ParsedAttributes&, clang::BalancedDelimiterTracker&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDeclCXX.cpp:247:5
#47 0x00007fd8aabee3ad Exit /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/include/clang/Parse/Parser.h:1020:11
#48 0x00007fd8aabee3ad clang::Parser::ParseNamespace(clang::DeclaratorContext, clang::SourceLocation&, clang::SourceLocation) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDeclCXX.cpp:227:18
#49 0x00007fd8aabd7431 clang::Parser::ParseDeclaration(clang::DeclaratorContext, clang::SourceLocation&, clang::Parser::ParsedAttributesWithRange&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseDecl.cpp:0:0
#50 0x00007fd8aac6685d clang::Parser::ParseExternalDeclaration(clang::Parser::ParsedAttributesWithRange&, clang::ParsingDeclSpec*) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/Parser.cpp:0:3
#51 0x00007fd8aac65450 clang::Parser::ParseTopLevelDecl(clang::OpaquePtr<clang::DeclGroupRef>&, bool) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/Parser.cpp:682:10
#52 0x00007fd8aabc877d clang::ParseAST(clang::Sema&, bool, bool) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Parse/ParseAST.cpp:157:5
#53 0x00007fd8abe64e68 clang::FrontendAction::Execute() /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Frontend/FrontendAction.cpp:938:10
#54 0x00007fd8abe25830 getPtr /build/llvm-toolchain-snapshot-9~svn366056/include/llvm/Support/Error.h:273:42
#55 0x00007fd8abe25830 operator bool /build/llvm-toolchain-snapshot-9~svn366056/include/llvm/Support/Error.h:236:16
#56 0x00007fd8abe25830 clang::CompilerInstance::ExecuteAction(clang::FrontendAction&) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/Frontend/CompilerInstance.cpp:944:23
#57 0x00007fd8abec77b0 clang::ExecuteCompilerInvocation(clang::CompilerInstance*) /build/llvm-toolchain-snapshot-9~svn366056/tools/clang/lib/FrontendTool/ExecuteCompilerInvocation.cpp:291:25
#58 0x0000000000498646 cc1_main(llvm::ArrayRef<char const*>, char const*, void*) (/usr/lib/llvm-9/bin/clang+0x498646)
#59 0x00000000004969c1 main (/usr/lib/llvm-9/bin/clang+0x4969c1)
#60 0x00007fd8a55b3b97 __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:344:0
#61 0x0000000000493e6a _start (/usr/lib/llvm-9/bin/clang+0x493e6a)
clang: error: unable to execute command: Segmentation fault (core dumped)
clang: error: clang frontend command failed due to signal (use -v to see invocation)
clang version 9.0.0-svn366056-1~exp1+0~20190715114903.1088~1.gbp7d3830 (trunk)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
clang: note: diagnostic msg: PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
clang: note: diagnostic msg: 
********************

PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
Preprocessed source(s) and associated run script(s) are located at:
clang: note: diagnostic msg: /tmp/queue-d95b49.cpp
clang: note: diagnostic msg: /tmp/queue-d95b49.sh
clang: note: diagnostic msg: 

********************
src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/build.make:114: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o' failed
make[2]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/queue.cpp.o] Error 254
CMakeFiles/Makefile2:146: recipe for target 'src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/all' failed
make[1]: *** [src/libhipSYCL/CMakeFiles/hipSYCL_cpu.dir/all] Error 2
Makefile:129: recipe for target 'all' failed
make: *** [all] Error 2

using namespace cl::sycl leads to name collisions

When using namespace cl::sycl, terrible name collisions will appear because some SYCL types/functions have the same name as some CUDA/HIP types and functions living in the global namespace. This affects e.g. the vector types such as int4.
Not sure how to fix that in a reliable way at the moment. Please let me know if you have ideas :)

buffer: Don't transfer data if access mode is discard_write

Simple performance optimization: If access mode is discard_write, there's no point transferring data between host and device since it will be overwritten anyway. For this, we only need to modify buffer_state_monitor::register_host_access and buffer_state_monitor::register_device_access.

multiple definitions of cl::sycl::program::get_kernel

With the latest master, when I am linking multiple files with SYCL kernels in them together, I am getting multiple definitions linker errors

/home/ireguly/OP2-Common/op2/c/lib/libop2_sycl.a(op_sycl_rt_support.o): In function ~_Sp_counted_ptr_inplace': /usr/local/bin/../include/CL/sycl/kernel.hpp:141: multiple definition of cl::sycl::program::get_kernel(std::__cxx11::basic_string<char, std::char_traits, std::allocator >) const'
/tmp/airfoil_kernels-31c8a3.o:airfoil_kernels.cpp:(.text+0x0): first defined here

This was not present with 46216fe

Unit testing

While trying to compile more complex applications I often find areas of hipSYCL where I’d like to contribute something, be it a missing operator on some class or error handling for some particularly obscure situation (into which I tend to run into quite frequently for some reason). Also I’m getting a lot of (mostly minor) compiler warnings with hipSYCL right now, and I’d like to cut down on those as well. Anyway, I’d like to not break the existing implementation while doing this – so I think it’s time to add unit tests :).

While at some point it would be great to have test coverage for all the different components of hipSYCL (i.e. the source transformations, syclcc, ...), I think for now the most important thing is to add tests for the C++ library itself.

With SYCL being a specification, it certainly could be a good approach to try and add tests by systematically going through the spec. Unfortunately I don’t have the resources to embark on such a journey right now, so I’d suggest to take a more ad-hoc approach for now (i.e., adding tests as new features are added or when bugs are fixed). Once the conformance test suite goes public (#6) things should also become a lot easier in this regard.

A couple of questions of how to approach testing come to mind (I’m sure there’s more to figure out further down the line...):

  • Some functionality will have to be tested on the host, some on a device and some on both. It would be good to have a concise way of specifying this within a test case.
  • Should functions in tests include “__device__” attributes, or should they also exercise the source transformation step?

Do you have any preferences regarding a unit testing framework? I can highly recommend using Catch2, as it's modern, very easy to set up and doesn't get in your way.

Let me know your thoughts on this; I can begin to set this up if you’d like.


PS: Sorry for the issue-avalanche, as I mentioned I had some stuff in my backlog and I figured to just do it all today and be done with it ;-).

syclcc-clang fails to build libhipsycl_rocm: missing basic cstdint types (uint8_t, etc..)

So, I've updating things on my end and testing out the new syclcc-clang stuff now that the change by psalz on the CMake side. Took me a while to sort out the whole rocm side, updating everything to 2.6 (still not quite done there), understanding and migrating to the hip-clang plugin... But after some back and forth I've done it and quite like not having to rely on hcc.

Still not 100% and this issue is about one problem which I actually found a fix to. As I was getting this up, the issue mentioned in #74 where clang would fail the build complaining about not knowing basic int types was back. After banging my head in frustration for a while, I've managed to sort it out and am going to open a PR for it, would just like to confirm direction for the fix.

The root cause of the issue is that clang was trying to load incorrect internal headers. In my case, as I've built the packages from source from my fork of Experimental_ROC, rocm's raw clang gets put into <rocm_path>/llvm/ with the executables in bin and the headers located in lib/clang/9.0.0/include

This is relevant because in line 434, syclcc-clang looks for these files in <rocm_path>/hcc/lib/clang, which are still there because I have hcc as well (it's still required to build HC and is it's final release). Turns out those headers don't quite match up and the fix for the build is correcting that path.

My doubt here, though, is that I'm somewhat surprised I seem to be the first one bumping into this issue given how it happens. Not returning that path explicitly seems to cause troubles as well (I've tried to return empty or /usr/include and build did not succeed).

As such I'm trying to both understand a bit more how come no one else seems to have this issue as well as what would be the ideal fix to put in the PR.

My suggestion would be walking back one dir up from hipsycl-clang's location and, from there, look into /lib/clang instead of assuming the full path. Again, only unsure because I seem to be the odd one out here =P

ps: I've mentioned that this is not the only issue I'm having, and while this is true, the others are not related to this one and aren't things that I've explored. Just for curiosity's sake, here they are:

  • My performance on CPU is very very bad. My own version of toybrot with OpenMP takes about 1-1.5 secs, and this is taking 15 and not behaving. But I think this is on the included project so not quite hipsycl's scope
  • I've also been trying to build CUDA from the same clang. On it's own it seems to work fine but my project through hipsycl just crashes out. I haven't begun investigating this yet
  • I'm going to open a pr with a change to the ADD_SYCL_TO_TARGET CMake macro to enable us to forward hipsycl-platform per target. Reasoning is I want to be able to have the same project build, say, tbSYCL-rocm``tbSYCL-cuda and tbSYCL-openmp all at once. The way we're currently doing it, we need to change the cmake cache variable and also delete the build folder as CMake doesn't even invalidate the build when you do it, so it's still a little clunky in this one very specific area

ROCm backend - build instructions

Hi,

to build hipSYCL with ROCm backend, the instructions say that the "amd-common" branch for llvm/clang/lld from AMD should be used (Link). But currently this is a llvm/clang/lld version 10 (Link).

And according to the "CMakeLists.txt" of hipSYCL it supports only llvm/clang/lld up to version 9.

Shouldn´t it be "roc-ocl-2.7x" instead of "amd-common" branch?

Performance: multiple cudaSetDevice calls before kernel launch

Profiling the execution of SYCL code on NVIDIA GPUs, I see that before each kernel launch there are about 7-10 calls to cudaSetDevice. These introduce considerable latency along with whatever is done on the CPU: for my smaller application it's up to ~150 us (and the execution of my kernel is ~130 us).
I'm not sure where this is coming from, but I can't think of a reason for calling cudaSetDevice repeatedly... It would considerably help kernel launch overheads to get rid of this.

hipCPU/hip/hip_runtime.h: No such file or directory

Just did a clean checkout from master. Compiling yields:

In file included from /data/code/hipSYCL/include/CL/sycl/detail/task_graph.hpp:32:0,
                 from /data/code/hipSYCL/include/CL/sycl/detail/runtime.hpp:32,
                 from /data/code/hipSYCL/include/CL/sycl/detail/application.hpp:33,
                 from /data/code/hipSYCL/src/libhipSYCL/application.cpp:28:
/data/code/hipSYCL/include/CL/sycl/detail/../backend/backend.hpp:58:38: fatal error: hipCPU/hip/hip_runtime.h: No such file or directory
compilation terminated.

Fully support syclcc in CMake

I’ve been trying to make hipSYCL work with a more “complex” CMake setup than what is used for the tests, in particular, one that finds and links other libraries (standard CMake stuff, really). I did manage to get it working, however not without some changes.

Right now there's a couple of issues as far as I can tell:

The most pressing one is the in-source temporary files. CMake often uses try_compile for feature detection or to determine a library version. For example, the FindMPI module compiles a small program to see if a particular library it found actually works. The problem is that these files are often located somewhere in a read-only location (if it’s a system installation), so syclcc cannot write its temporaries and the find_package call simply fails (and the reason is not particularly well communicated by CMake).

Now within syclcc you remarked that temporaries are located in-source for more robust include resolution, however so far I’ve had no problems with out-of-source builds. Maybe this could be made configurable through a CLI parameter? In any case I think writing into the source tree is a bit hairy and should best be avoided, at least by default.

Another thing is compiler detection. The very first thing CMake does is to compile a small program (however this time within the local CMakeTmp folder) to try and determine the compiler version. Right now this identifies nvcc as gcc, since (afaik) that is the default host compiler of nvcc. I’m not sure what the implications of this are, i.e. if there is any difference in parameters that will be passed to syclcc, e.g. something gcc-specific, which could then maybe interfere with the Clang preprocessing passes. As an aside, and to alleviate this, it might be worth considering setting the host compiler of nvcc to Clang by default, which would fit better with the rest of the compilation stages anyway, as they are all LLVM-based.

In addition to these points, if /tmp is a tmpfs mount, compilation performance could also be greatly improved by moving all temporaries there.

Finally, something that might be worth investigating is to provide a CMake function like ComputeCpp's add_sycl_to_target, as provided by their SDK. I see several benefits to this:

  • First it would allow to only compile certain translation units with the device compiler. For larger projects this could really cut down on compilation times.
  • Secondly, this would improve compatibility with existing projects that work with ComputeCpp.
  • This could also make the specification of rewrite paths -- as proposed in #20 -- more convenient, as instead of adding a CLI parameter this could be done through an additional parameter for add_sycl_to_target.

Releases or at least tags?

Do you plan any releases or tagging to mark a specific amount of features?
I would make the packaging easier.

unsupported atomic accessors ?

This code snippet:

  // construct a SYCL queue for the selected device
  auto queue = cl::sycl::queue(device);
  
  // SYCL buffers
  int counter = 0;
  auto counter_buf = cl::sycl::buffer<int>(& counter, 1);
  
  // submit the kernel to the queue
  queue.submit([&](cl::sycl::handler &cgh) {
    // access the SYCL buffers from the device kernel
    auto counter_d = counter_buf.get_access<cl::sycl::access::mode::atomic>(cgh);
    
    // launch the kernel
    cgh.parallel_for_work_group<class count_groups>(
        cl::sycl::range<1>{num_work_groups},
        cl::sycl::range<1>{work_group_size},
        [=](cl::sycl::group<1> group) {
        
      // print the id of all the groups
      printf("group id: %lu\n", group.get_id(0));
      
      // print the id of all the threads
      group.parallel_for_work_item([&](cl::sycl::h_item<1> item) {
        printf("global thread id: %zu\n", item.get_global_id(0));
      });
      
      cl::sycl::atomic_fetch_add(counter_d[0], 1);
    });

it fails with

test.cc:53:43: error: no viable overloaded operator[] for type 'const cl::sycl::accessor<int, 1, cl::sycl::access::mode::atomic, cl::sycl::access::target::global_buffer,
      cl::sycl::access::placeholder::false_t>'
      cl::sycl::atomic_fetch_add(counter_d[0], 1);
                                 ~~~~~~~~~^~

and in fact the atomic accessors in include/CL/sycl/accessor.hpp are commented out:

  /* Available only when: accessMode == access::mode::atomic && dimensions == 0*/
  //operator atomic<dataT, access::address_space::global_space> () const;

  /* Available only when: accessMode == access::mode::atomic && dimensions > 0*/
  //atomic<dataT, access::address_space::global_space> operator[](
  //    id<dimensions> index) const;

  //atomic<dataT, access::address_space::global_space> operator[](
  //    size_t index) const;

What is the reason for that ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.