Giter Site home page Giter Site logo

Comments (22)

illuhad avatar illuhad commented on May 22, 2024

Thank you very much for your feedback! I'm sorry it was so difficult for you to get hipSYCL working. From your feedback, it seems we should link with -lhsa-runtime64 in syclcc on ROCm by default, and pass -std=c++14. Is there anything else you would like to suggest? The latter seems to clash with the CUDA side of things, see the discussion in PR #55.

but I am getting error The platform rocm was explicitly chosen, but it is not available.

This usually happens when syclcc cannot find hcc. Was hcc in $PATH?

from adaptivecpp.

VileLasagna avatar VileLasagna commented on May 22, 2024

Hey @misos1 I'm looking at this, actually, as I'd already tried enforce the CXX standard before but did an oopsie and missed out a detail. As a general though, when you're passing these things to CMAKE, you generally want to pass it something like cmake -D CMAKE_CXX_COMPILER=clang++ -D CMAKE_C_COMPILER=clang -D CMAKE_CXX_STANDARD=14 .. instead of using environment variables and raw compile flags.

The new changes are in PR #57, If there's anything else on the cmake side of things that I can help with, lemme know

from adaptivecpp.

misos1 avatar misos1 commented on May 22, 2024

Yes seems syclcc needs to include -lhsa-runtime64 for compiling sycl program on rocm. I see hipsycl uses some hsa functions in at least one header. Maybe before was not needed to manually add -lhsa-runtime64 but in some version of rocm this changed.

C++14 is needed to compile hipsycl itself and also sycl program by syclcc so why do not include it at least when platform is one of that rocm possibilities (or is there some problem when targeting cuda with hipcc?) and in hipsycl compilation when it is needed. There should be also possibility for user to use C++17 and later for sycl source. (Why there are so many platforms like amd, rocm, hip and hcc all for amd hardware? Is there some difference?)

If I understand correctly cuda hardware can be targeted by using platform cuda which is compiled directly by clang or with platform hip where it will be compiled with hipcc which supports both cuda and amd backends.

Regarding The platform rocm was explicitly chosen, but it is not available. ignore it. I got it when I run sudo make install without before running make (I needed sudo for install part because I left default prefix /usr/local and your readme said to just run make install after cmake so I first just followed that ;) ). So probably I was wrong and maybe hipsycl can be compiled also with gcc 7.3.0. I will try this again.

I have also some problems with cpu platform hipsycl_b67153a7c55c312b.cpp:(.text+0x1b3): undefined reference to cl::sycl::detail::check_error(hipError_t)'. Ok seems for this I need to configure cmake with -DWITH_CPU_BACKEND=ON. Would be great to have such info in readme. With singularity container this worked fine when I forced syclcc to use clang 6. With default gcc 5.4 there was compilation error:

/usr/include/hipSYCL/hipCPU/hip/detail/runtime.hpp:271:25: error: enclosing class of constexpr non-static member function 'std::size_t hipcpu::device::get_max_shared_memory() const' is not a literal type
   constexpr std::size_t get_max_shared_memory() const
                         ^
/usr/include/hipSYCL/hipCPU/hip/detail/runtime.hpp:184:7: note: 'hipcpu::device' is not literal because:
 class device
       ^
/usr/include/hipSYCL/hipCPU/hip/detail/runtime.hpp:184:7: note:   'hipcpu::device' has a non-trivial destructor

from adaptivecpp.

misos1 avatar misos1 commented on May 22, 2024

As a general though, when you're passing these things to CMAKE, you generally want to pass it something like cmake -D CMAKE_CXX_COMPILER=clang++ -D CMAKE_C_COMPILER=clang -D CMAKE_CXX_STANDARD=14 .. instead of using environment variables and raw compile flags.

Btw in singularity container is used export CXX=clang++-6.0 and -DCMAKE_CXX_FLAGS=-std=c++14.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

Yes seems syclcc needs to include -lhsa-runtime64 for compiling sycl program on rocm. I see hipsycl uses some hsa functions in at least one header. Maybe before was not needed to manually add -lhsa-runtime64 but in some version of rocm this changed.

Possibly. We don't call HSA stuff ourselves, but HIP likely does.

C++14 is needed to compile hipsycl itself and also sycl program by syclcc so why do not include it at least when platform is one of that rocm possibilities (or is there some problem when targeting cuda with hipcc?) and in hipsycl compilation when it is needed. There should be also possibility for user to use C++17 and later for sycl source. (Why there are so many platforms like amd, rocm, hip and hcc all for amd hardware? Is there some difference?)

As @VileLasagna mentioned, we are already working on setting std=c++14 per default everywhere. I agree that C++17 should also be allowed. We do not target cuda with hipcc, but always use CUDA compilers directly. This is because we don't want to require NVIDIA users to have dependencies on ROCm bits.
The platforms amd, rocm, hip, hcc are all synonymous. Perhaps we should simplify that in the future to avoid confusion (but that may break existing build configurations of users).

If I understand correctly cuda hardware can be targeted by using platform cuda which is compiled directly by clang or with platform hip where it will be compiled with hipcc which supports both cuda and amd backends.

No. Platform hip just means AMD. We do not use hipcc but use directly hcc [we had some issues with hipcc in earler ROCm versions, so we decided to use hcc directly. Also, since hipcc is just another compiler wrapper like syclcc, I think that another layer of compiler wrappers fiddling around with arguments is likely not really desirable]. If you have used hipcc previously, this may also be a reason why you didn't have to link with -lhsa-runtime64, but had to do it with syclcc since hipcc may link with that library automatically.

I have also some problems with cpu platform hipsycl_b67153a7c55c312b.cpp:(.text+0x1b3): undefined reference to cl::sycl::detail::check_error(hipError_t)'. Ok seems for this I need to configure cmake with -DWITH_CPU_BACKEND=ON. Would be great to have such info in readme. With singularity container this worked fine when I forced syclcc to use clang 6. With default gcc 5.4 there was compilation error:

-DWITH_CPU_BACKEND=ON should be enabled by default. Perhaps you have disabled it previously when trying to get ROCm to work? I've never tried the CPU backend with such an older gcc version. In general I think it's a good idea for us to expand the set of compilers automatically tested in CI, which at the moment focuses pretty much on clang exclusively. The motivation for this focus is that there is tighter integration planned between hipSYCL and clang in the near future. clang will then be a requirement for compilation with syclcc (this also includes hcc which is a clang fork).

from adaptivecpp.

misos1 avatar misos1 commented on May 22, 2024

I overlooked that hsa is actually used only in hip headers. Now I understand. You use only hip headers (so you does not must write specific things for cuda and hcc). And you compile this directly with clang (cuda mode) or hcc.

I suppose hipcc is better now. You would not need to add c++14 or hsa-runtime64 or possibly account for other future changes in hip if you would let hipcc to do these things ;). I suppose hipcc is thin and just calls directly hcc and supply required arguments and flags. Also hcc will be unfortunately deprecated in a few months:

https://github.com/RadeonOpenCompute/hcc#deprecation-notice
ROCm/hcc#1098 (comment)

I've never tried the CPU backend with such an older gcc version.

GCC 5.4 is default C++ compiler in your singularity container. For example update-alternatives can be used to specify different compiler to be default (but perhaps syclcc should use clang instead of c++ and check its version/s).

The motivation for this focus is that there is tighter integration planned between hipSYCL and clang in the near future.

I would like to know more about this.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

I would like to know more about this.

See the discussion in #42 :) Short summary is that we want to replace the source-to-source toolchain with a clang plugin that does our required AST and IR transformations to make clang's CUDA frontend accept SYCL code. clang's HIP implementation also relies on the CUDA frontend, so with this plugin we will still be able to compile both for nvidia and AMD.
The current state is that I have a working version of the plugin (in an experimental state), but many changes still need to be done at the runtime/SYCL header level to be able to compile code with the plugin. The main difference is that everything that's callable from kernels must now be __host__ __device__ instead of __device__ as it is now.

I try to focus my efforts at the moment mostly on getting this plugin ready instead of fixing the complexities of the source-to-source transformation. I expect that most of your issues will be solved once the plugin lands :)

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

Probably not an issue anymore because of completed transition to the clang plugin.

from adaptivecpp.

misos1 avatar misos1 commented on May 22, 2024

I actually started writing another comment about experiences with new hipSYCL last weekend but I did not had time to finish it. I hoped to find some time this weekend.

I had few problems with dependencies like FileCheck some "sample analyzer" which all was found once I installed clang and llvm dev and tools but more problems with lit-cpuid which I could not find with apt-file but eventually found in lldb package. I needed these to run hipSYCL cmake and if I am not wrong I did not need them when compiling HIP-clang. Then there were problems like error: no member named 'int8_t' in the global namespace.

I hope I will elaborate this weekend. I plan to try again with cpu-only backend and rocm backend.

Now I noticed there are new commits this week related to building and this was after my effort last weekend so I will see this weekend whether now will be everything ok.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

That FileCheck and lit-cpuid problem is a problem with the llvm packaging and the cmake files coming with llvm. I don't believe there's something we can do about that from hipSYCL side until llvm fixes their cmake integration, but if you have ideas I would really like to hear them since this is also annoying the hell out of me ;)

Were the int8_t errors on ROCm? If so, it is highly likely that they were fixed by PR #113.

from adaptivecpp.

VileLasagna avatar VileLasagna commented on May 22, 2024

@misos1 I've had those issues as well a while back. As mentioned they should be gone on current main if you still run into them, it's likely those are caused by conflicting clang internals includes from different compilers.

In my case, even though I was building things with LLVM from ROCM (9) but hipSYCL had been built using the system LLVM (8) and, as such I believe it was looking for system includes wrong. The problem went away once I pointed it to the correct place (for me it's /opt/rocm/llvm/lib/clang/9.0.0/include )

from adaptivecpp.

misos1 avatar misos1 commented on May 22, 2024

Ok now I am able to run successfully cmake and also make. For cmake I needed packages clang-9, libclang-9-dev (SampleAnalyzerPlugin.so), llvm-9-tools (FileCheck), lldb-9 (lit-cpuid). But seems llvm is now broken even more because when I installed lldb-9 it uninstalled llvm-9-tools. Seems these two cannot be installed at same time so I just installed llvm-9-tools because there were more required files and just copied lit-cpuid into llvm-9 bin directory. For make I needed lld-9 and clang-tools-9 (clang-offload-bundler).

Maybe in readme could be mentioned which clang and llvm packages are needed to successfully build hipSYCL instead of "LLVM and clang >= 8 must be installed, including development files". Including lldb until is fixed llvm packaging.

Also for me was better to run separately make and then make install. Because for make install I needed sudo and I did not want files inside build directory to be owned by root.

Interestingly I was able to compile your sample sycl source by using just hipcc -std=c++17 -lhipSYCL_rocm test.cpp.

And sycl-clang worked syclcc-clang --hipsycl-gpu-arch=gfx900 test.cpp.

But with syclcc-clang --hipsycl-platform=cpu test.cpp I got segmentation fault.

With clang-10 there was not clash between llvm-10-tools and lldb-10 but I got compilation errors:

In file included from /home/miso/sycl/src/hipsycl_clang_plugin/HipsyclClangPlugin.cpp:30:0:
/home/miso/sycl/src/hipsycl_clang_plugin/FrontendPlugin.hpp: In member function ‘virtual std::unique_ptr<clang::ASTConsumer> hipsycl::FrontendASTAction::CreateASTConsumer(clang::CompilerInstance&, llvm::StringRef)’:
/home/miso/sycl/src/hipsycl_clang_plugin/FrontendPlugin.hpp:47:18: error: ‘make_unique’ is not a member of ‘llvm’
     return llvm::make_unique<FrontendASTConsumer>(CI);
                  ^~~~~~~~~~~
/home/miso/sycl/src/hipsycl_clang_plugin/FrontendPlugin.hpp:47:18: note: suggested alternative: ‘make_range’
     return llvm::make_unique<FrontendASTConsumer>(CI);
                  ^~~~~~~~~~~
                  make_range
/home/miso/sycl/src/hipsycl_clang_plugin/FrontendPlugin.hpp:47:49: error: expected primary-expression before ‘>’ token
     return llvm::make_unique<FrontendASTConsumer>(CI);

Btw it would be nice to have direct SYCL support directly in regular clang.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

Ok now I am able to run successfully cmake and also make. For cmake I needed packages clang-9, libclang-9-dev (SampleAnalyzerPlugin.so), llvm-9-tools (FileCheck), lldb-9 (lit-cpuid). But seems llvm is now broken even more because when I installed lldb-9 it uninstalled llvm-9-tools. Seems these two cannot be installed at same time so I just installed llvm-9-tools because there were more required files and just copied lit-cpuid into llvm-9 bin directory. For make I needed lld-9 and clang-tools-9 (clang-offload-bundler).
Maybe in readme could be mentioned which clang and llvm packages are needed to successfully build hipSYCL instead of "LLVM and clang >= 8 must be installed, including development files". Including lldb until is fixed llvm packaging.

Not sure if this is a good idea. This varies greatly from distribution to distribution, and if you use the nightly llvm packages from apt.llvm.org, it is a moving target and can change from one day to the next (e.g. these issues appeared suddenly a couple of weeks ago when they branched llvm 10). I feel like providing llvm installation instructions for all sorts of distributions is beyond the scope of the project, in particular because these llvm package issues don't only affect hipSYCL, they affect any software that uses cmake and llvm.
If you had trouble with that, it is probably better for you to file a bug report with upstream llvm so that they fix their stuff.
As we work towards the 0.8 release, my plan is rather to provide installer scripts that automatically fetch and install supported versions of ROCm, CUDA and llvm inside a given directory.

In general, please note that hipSYCL is mostly a research project. While I care deeply about the project and do what I can to help users, there's no company backing this. So I do not have resources to provide support on the level a company might be able to.

Interestingly I was able to compile your sample sycl source by using just hipcc -std=c++17 -lhipSYCL_rocm test.cpp.

Interesting. This is absolutely unsupported and I have no idea what happened there. It is impossible that this compiles correctly for GPU. For CPU perhaps it could work.

But with syclcc-clang --hipsycl-platform=cpu test.cpp I got segmentation fault.

Please be more specific. Does the compiler segfault or the resulting binary? There were issues with earlier hipSYCL versions that clang 9 would crash with certain OpenMP constructs that we use. This was fixed in clang 10, and newer hipSYCL versions circumvent the issue.

With clang-10 there was not clash between llvm-10-tools and lldb-10 but I got compilation errors:

This was introduced by a change in the llvm 10 packages on thursday or friday where they removed those functions in upstream llvm (as I said, it's a moving target). I will have to adapt the hipSYCL source accordingly in the next few days.

Btw it would be nice to have direct SYCL support directly in regular clang.

Sure, this is what Intel is working on, at least for Intel CPUs/GPUs. I do not have the resources to maintain a clang fork and try to upstream hipSYCL stuff (this is a big effort), which is why we are using clang's plugin mechanism.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

llvm 10 build errors regarding make_unique() should be fixed by PR #122.

from adaptivecpp.

misos1 avatar misos1 commented on May 22, 2024

Not sure if this is a good idea. This varies greatly from distribution to distribution, and if you use the nightly llvm packages from apt.llvm.org, it is a moving target and can change from one day to the next (e.g. these issues appeared suddenly a couple of weeks ago when they branched llvm 10). I feel like providing llvm installation instructions for all sorts of distributions is beyond the scope of the project, in particular because these llvm package issues don't only affect hipSYCL, they affect any software that uses cmake and llvm.

Ok so at least is this info for ubuntu/debian now here in these comments. I used apt.llvm.org because it was the only place where I found clang with high enough version - ...stable, qualification and development branches (currently 8, 9 and 10)... - before was stable 7. Now I can see package clang-8 in ubuntu universe repository.

In general, please note that hipSYCL is mostly a research project. While I care deeply about the project and do what I can to help users, there's no company backing this. So I do not have resources to provide support on the level a company might be able to.

It was just suggestion.

Interestingly I was able to compile your sample sycl source by using just hipcc -std=c++17 -lhipSYCL_rocm test.cpp.

Interesting. This is absolutely unsupported and I have no idea what happened there. It is impossible that this compiles correctly for GPU. For CPU perhaps it could work.

It is really working with your example from readme and for GPU. But I suppose it is just accident maybe because your kernel is lambda function. If I am not wrong hcc or hip can sometimes annotate functions automatically:

https://rocm-documentation.readthedocs.io/en/latest/ROCm_API_References/HCC-API.html#annotation-of-device-functions

"The [[hc]] annotation for the kernel function called by parallel_for_each is optional as it is automatically annotated as a device function by the hcc compiler. The compiler also supports partial automatic [[hc]] annotation for functions that are called by other device functions within the same source file:"

But with syclcc-clang --hipsycl-platform=cpu test.cpp I got segmentation fault.

Please be more specific. Does the compiler segfault or the resulting binary? There were issues with earlier hipSYCL versions that clang 9 would crash with certain OpenMP constructs that we use. This was fixed in clang 10, and newer hipSYCL versions circumvent the issue.

Resulting binary (also computed results are bad it should be 0 4 0 8 0):

0
0
0
0
0
Process 112722 stopped
* thread #1, name = 'a.out', stop reason = signal SIGSEGV: invalid address (fault address: 0x71)
    frame #0: 0x00007ffff705cfa0 libpthread.so.0`__GI___pthread_mutex_lock(mutex=0x0000000000000061) at pthread_mutex_lock.c:65

Seems building with clang-10 is fixed.

Btw it would be nice to have direct SYCL support directly in regular clang.

Sure, this is what Intel is working on, at least for Intel CPUs/GPUs. I do not have the resources to maintain a clang fork and try to upstream hipSYCL stuff (this is a big effort), which is why we are using clang's plugin mechanism.

This intel thing would be interesting if it supports also AMD GPUs. I think it would be great if rocm would add SYCL.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

Ok so at least is this info for ubuntu/debian now here in these comments. I used apt.llvm.org because it was the only place where I found clang with high enough version - ...stable, qualification and development branches (currently 8, 9 and 10)... - before was stable 7. Now I can see package clang-8 in ubuntu universe repository.

yes, I understand the situation.

It was just suggestion.

sure, thanks for the suggestion :) To be clear, it's not that I don't want to make it easier for people to use hipSYCL. In fact, I am currently investing a huge amount of time in that topic, but with a slightly different strategy: I'm trying to create installer scripts that download and compile/install supported versions of CUDA, the entire ROCm stack and clang/llvm (that can then reliably target both NVIDIA and AMD), and then compiles & installs hipSYCL against these stacks.
Since these scripts take forever to run on most machines due to lots of compilation happening, I intend to release binary packages of the compiled stack for the most common distributions. This already works very well on the NVIDIA side of things, but I'm having a lot of trouble on the ROCm side. Compiling HIP with clang seems to be pretty much beta quality, and I haven't yet found a single configuration that was completely free of bugs such that HIP (and therefore hipSYCL) works 100%. Unfortunately, it's also not very well documented by AMD which makes things even more difficult.

Most bug reports I see for hipSYCL on ROCm boil down to some issue in ROCm or clang (as seems to be the case for your segfault issue), where I cannot really help from the hipSYCL side.

"The [[hc]] annotation for the kernel function called by parallel_for_each is optional as it is automatically annotated as a device function by the hcc compiler. The compiler also supports partial automatic [[hc]] annotation for functions that are called by other device functions within the same source file:"

I didn't know hcc could do that, thank you! In that case, we might perhaps be able to use hcc as fallback until hip-clang is more mature. It won't work for all SYCL features (since some, like allocation of local memory in hierarchical parallel for, definitely need dedicated compiler support), but maybe for a subset that is sufficient for most users..

This intel thing would be interesting if it supports also AMD GPUs. I think it would be great if rocm would add SYCL.

Intel SYCL supports anything that supports SPIR-V. So, AMD's OpenCL implementation only would need to support ingestion of SPIR-V kernels (which is in the spec of newer OpenCL versions anyway) and then it should work. Of course, personally, I would prefer if AMD supported hipSYCL ;)

from adaptivecpp.

misos1 avatar misos1 commented on May 22, 2024

I'm trying to create installer scripts that download and compile/install supported versions of CUDA, the entire ROCm stack

Maybe would be good to also leave option to use already installed rocm stack.

Most bug reports I see for hipSYCL on ROCm boil down to some issue in ROCm or clang (as seems to be the case for your segfault issue), where I cannot really help from the hipSYCL side.

It was same with clang-9, clang-10 and gcc 7.4 which is really strange.

I didn't know hcc could do that, thank you! In that case, we might perhaps be able to use hcc as fallback until hip-clang is more mature.

Actually this hcc automatic annotation is not so perfect. I was myself surprised that it worked because I always annotate hcc kernels because as soon as you try to do something more in such unannotated kernel like calling function which is already marked as device you can get errors like no overloaded function has restriction specifiers that are compatible with the ambient context. Maybe because that function was not in same source file as they wrote. I do not think that hcc is as good at automatic annotations as hipSYCL.

It won't work for all SYCL features (since some, like allocation of local memory in hierarchical parallel for, definitely need dedicated compiler support), but maybe for a subset that is sufficient for most users..

You mean this is possible when using hip (or hip-clang) but not when using hcc?

Regarding OpenCL path and SPIR-V I am afraid that I would lose possibility to use AMD specific things like __amdgcn_move_dpp/__llvm_amdgcn_move_dpp or maybe some HSA interoperability. Which if I am not wrong should be still possible in hipSYCL.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

Maybe would be good to also leave option to use already installed rocm stack.

Of course this is always possible by manually compiling hipSYCL against your ROCm installation, but if we are to provide binary packages we need to have some stack to compile them against. Additionally, I would like to have a stack where I can validate functionality and then tell people that if they use it, everything should work.
With pre-compiled ROCm packages, things may or may not work. For example, if ROCm LLVM is compiled with enabled debug assertions, it won't work with hipSYCL because we do things that the HIP toolchain inside clang does not expect and therefore trigger assert() statements in clang that are arguably too strict. Most clang installations from distributions ship without assertions, but AMD's aomp compiler for example does not.
For packaging, I would also like to be able to only provide a single hipSYCL package to users that will work both for nvidia and amd. Again, with precompiled ROCm binaries this may not work because they may have the nvptx backend disabled.

You mean this is possible when using hip (or hip-clang) but not when using hcc?

In hipSYCL, we use a clang plugin to do some additional AST passes. This clang plugin makes the clang CUDA/HIP frontend accept and correctly compile SYCL code and take care of these issues. Since the clang plugin is written for the clang CUDA/HIP toolchain, it would be of no use if you loaded it into hcc since hcc does not use clang's CUDA frontend. So, in order to have SYCL with compiler support you have to use hip-clang (which syclcc-clang will call for you). Otherwise you are effectively only using hipSYCL as runtime library (which is possible - we call this "manual mode" where you have to manually take care of annotation and correct semantics in parallel hierarchical for etc).

It was same with clang-9, clang-10 and gcc 7.4 which is really strange.

I understood that your problem was fixed by moving to clang 10? Is that not the case?

Actually this hcc automatic annotation is not so perfect. I was myself surprised that it worked because I always annotate hcc kernels because as soon as you try to do something more in such unannotated kernel like calling function which is already marked as device you can get errors like no overloaded function has restriction specifiers that are compatible with the ambient context. Maybe because that function was not in same source file as they wrote. I do not think that hcc is as good at automatic annotations as hipSYCL.

Okay. Thanks for the information.

Regarding OpenCL path and SPIR-V I am afraid that I would lose possibility to use AMD specific things like __amdgcn_move_dpp/__llvm_amdgcn_move_dpp or maybe some HSA interoperability. Which if I am not wrong should be still possible in hipSYCL.

Yes, combining open standards with such vendor specific optimizations is exactly the rationale behind hipSYCL. Very happy to see that there are indeed users who have precisely this use case and therefore really need hipSYCL :)

from adaptivecpp.

misos1 avatar misos1 commented on May 22, 2024

Additionally, I would like to have a stack where I can validate functionality and then tell people that if they use it, everything should work.

That sounds great.

Most clang installations from distributions ship without assertions, but AMD's aomp compiler for example does not.

I do not understand for what this aomp is. There is HIP support in official clang, they have clang fork for HIP-clang. So I suppose aomp clang should be better in something. They are abandoning hcc and aomp was formerly hcc2. Maybe openmp on gpu?

I wonder why CUDA guys created it in way that it requires annotations of device functions. It could be improved long time ago and so also HIP would not require them and then hipSYCL would be much simpler.

Also I noticed that your cmake is installing syclcc without transform tools when COMPILE_SOURCE_TRANSFORMATION_TOOLS is false. Without them is syclcc not working. I still think that syclcc_clang is much better.

It was same with clang-9, clang-10 and gcc 7.4 which is really strange.

I understood that your problem was fixed by moving to clang 10? Is that not the case?

No that problem is for all compilers I tried. I tested this hip example which works with (correct computation and no segfault):
c++ -I/usr/local/include/hipSYCL/hipCPU -fopenmp test_hip.cpp

But this sycl example gives mentioned problem when compiled with:
syclcc-clang --hipsycl-platform=cpu test_sycl.cpp

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

I do not understand for what this aomp is. There is HIP support in official clang, they have clang fork for HIP-clang. So I suppose aomp clang should be better in something. They are abandoning hcc and aomp was formerly hcc2. Maybe openmp on gpu?

aomp is a distribution of hip-clang with added support for targeted OpenMP (i.e. OpenMP on GPU, as you say). Additionally, it contains other bits to make things work, e.g. their hostcall library to make printf work properly in device code which does not necessarily work with regular clang. The interesting thing about it compared to plain hip-clang is that AMD seems to be actively supporting it and even releases binary packages. aomp formerly being hcc2 does not mean that it is going to be abandoned like hcc, it likely rather hints at it being intended as the clang distribution to actually succeed hcc.
hip support in mainline clang is often buggy and you need to have exactly the right version of the ROCm device library, while aomp brings everything it needs with it. This is also why I would prefer to build the hipSYCL stack on top of aomp since it simplifies maintenance.

I wonder why CUDA guys created it in way that it requires annotations of device functions. It could be improved long time ago and so also HIP would not require them and then hipSYCL would be much simpler.

Well, CUDA is a 10 year old programming model and it shows. The reason why they needed those annotations in the first place is probably related to the fact that they are doing split compilation, i.e. nvcc physically splits all device and host code into two separate source files, combines one with the regular host compiler and compiles the other one itself. This can get really messy, and I suppose explicitly annotating what should be where can simplify this.. In modern clang, where everything is compiled by the same compiler, this is much simpler - but I suppose clang wasn't really a thing yet when they started designing CUDA.

I doubt that would have made hipSYCL much simpler - we need a compiler portion anyway since there are other things in SYCL (e.g. hierachical parallel for) where the compiler needs to know what is happening.

Also I noticed that your cmake is installing syclcc without transform tools when COMPILE_SOURCE_TRANSFORMATION_TOOLS is false. Without them is syclcc not working. I still think that syclcc_clang is much better.

The old toolchain is deprecated, don't use it. For the pending 0.8 release, I will likely make syclcc a symlink to syclcc-clang.

But this sycl example gives mentioned problem when compiled with:
syclcc-clang --hipsycl-platform=cpu test_sycl.cpp

Thank you, I can reproduce now. The crash is because there is no call to queue::wait() at the end of the program. In original SYCL 1.2, queue did an implicit wait in the destructor. This was removed for SYCL 1.2.1. We still kind of do this wait for compatibility reasons, however sometimes this leads to a crash if there is no explicit wait() at the end of the program since the SYCL runtime may then start to do stuff for the wait() after the main function (when the queue is destroyed) where some stuff may already be destroyed..
In general, it's good practice to have a wait() call before the queue is destroyed. We should probably update the example and perhaps remove the implicit wait on the queue destruction.

The issue with the wrong results seems to be because the input data is apparently not copied correctly to the buffer if the input data is const - I will investigate.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

Incorrect results issue should be fixed by PR #131.

from adaptivecpp.

illuhad avatar illuhad commented on May 22, 2024

Segfault issue should be fixed by issue #132.

from adaptivecpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.