uob-hpc / babelstream Goto Github PK
View Code? Open in Web Editor NEWSTREAM, for lots of devices written in many programming models
License: Other
STREAM, for lots of devices written in many programming models
License: Other
it was mentioned in #21 that GPU-STREAM is switching to a Makefile based build (I am not sure of the reasons for this as pointed out there). I wanted to create a PR for contributing a HCC compiled memory benchmark based on rocm. How should I go about it? Use a Makefile or integrate it into the cmake setup of this repo?
CUDA and RAJA interfeer in CMakeLists.txt
This is because map(tofrom: sum)
should be before the reduction
clause.
This is due to order of clause evaluation in the OpenMP 4.5 specification.
To target NVIDIA with OpenMP with the clang-based CCE, we need to do something like this:
make COMPILER=CRAY TARGET=NVIDIA EXTRA_FLAGS="-fopenmp -fopenmp-targets=nvptx64 -Xopenmp-target -march=sm_60"
Add option for OpenCL 2.0 shared virtual memory buffers. This will allow measurement of the overhead of the implementations over the device interconnect.
Hi, I just wanted to start adding a bare HC stream and started off with the HIPified version in cc90cef and got this:
$ $ make -f HIP.make
hipcc -std=c++11 -DHIP main.cpp HIPStream.cu -o hip-stream
HIPStream.cu:185:10: error: cannot combine with previous 'extern' declaration specifier
extern __shared__ __align__(sizeof(T)) unsigned char smem[];
^
/opt/rocm/include/hip/hcc_detail/host_defines.h:49:24: note: expanded from macro '__shared__'
#define __shared__ tile_static
^
/opt/rocm/hcc/include/hc_defines.h:52:21: note: expanded from macro 'tile_static'
#define tile_static static __attribute__((section("clamp_opencl_local")))
^
HIPStream.cu:185:31: error: expected parameter declarator
extern __shared__ __align__(sizeof(T)) unsigned char smem[];
^
HIPStream.cu:185:31: error: expected ')'
HIPStream.cu:185:30: note: to match this '('
extern __shared__ __align__(sizeof(T)) unsigned char smem[];
^
HIPStream.cu:185:21: error: C++ requires a type specifier for all declarations
extern __shared__ __align__(sizeof(T)) unsigned char smem[];
~~~~~~~~~~~~~~~~~ ^
HIPStream.cu:185:21: error: tile_static can only be applied to a variable declaration
HIPStream.cu:185:41: error: expected ';' at end of declaration
extern __shared__ __align__(sizeof(T)) unsigned char smem[];
^
;
HIPStream.cu:185:56: error: definition of variable with array type needs an explicit size or an initializer
extern __shared__ __align__(sizeof(T)) unsigned char smem[];
^
HIPStream.cu:188:11: error: use of undeclared identifier 'blockDim'
int i = blockDim.x * blockIdx.x + threadIdx.x;
^
HIPStream.cu:188:24: error: use of undeclared identifier 'blockIdx'
int i = blockDim.x * blockIdx.x + threadIdx.x;
^
HIPStream.cu:188:37: error: use of undeclared identifier 'threadIdx'
int i = blockDim.x * blockIdx.x + threadIdx.x;
^
HIPStream.cu:189:26: error: use of undeclared identifier 'threadIdx'
const size_t local_i = threadIdx.x;
^
HIPStream.cu:192:31: error: use of undeclared identifier 'blockDim'
for (; i < array_size; i += blockDim.x*gridDim.x)
^
HIPStream.cu:192:42: error: use of undeclared identifier 'gridDim'
for (; i < array_size; i += blockDim.x*gridDim.x)
^
HIPStream.cu:195:21: error: use of undeclared identifier 'blockDim'
for (int offset = blockDim.x / 2; offset > 0; offset /= 2)
^
HIPStream.cu:205:9: error: use of undeclared identifier 'blockIdx'
sum[blockIdx.x] = tb_sum[local_i];
^
15 errors generated.
Died at /opt/rocm//bin/hipcc line 378.
make: *** [hip-stream] Error 1
which comes from this line in HIPStream.cu
(btw, I think this file needs a suffix rename as it technically doesn't contain any CUDA code):
extern __shared__ __align__(sizeof(T)) unsigned char smem[];
T *tb_sum = reinterpret_cast<T*>(smem);
Hi,
using that configuration I get:
In file included from ocl-stream.cpp:38:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/iostream:38:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/ios:216:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/__locale:15:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/string:439:
In file included from /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/algorithm:626:
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/utility:253:9: error:
field has incomplete type 'cl::Device'
_T1 first;
I will update Xcode to see if it fixes the issue..
I downloaded and configured with cmake ..
. This error results:
$ /usr/bin/nvcc /home/jrhammon/Work/GPU/GPU-STREAM/CUDAStream.cu -c
-o /home/jrhammon/Work/GPU/GPU-STREAM/build/CMakeFiles/gpu-stream-cuda.dir//./gpu-stream-cuda_generated_CUDAStream.cu.o
-ccbin /usr/bin/cc -m64 -Xcompiler ,\"-O3\",\"-DNDEBUG\" --std=c++11 -DNVCC
-I/usr/include -I$HOME/Work/GPU/GPU-STREAM/build -I/usr/include
/usr/include/string.h: In function ‘void* __mempcpy_inline(void*, const void*, size_t)’:
/usr/include/string.h:652:42: error: ‘memcpy’ was not declared in this scope
return (char *) memcpy (__dest, __src, __n) + __n;
cc
points to a reasonable toolchain (see below), so I do not think it is reasonable to blame it for this error.
$ cc -v
Using built-in specs.
COLLECT_GCC=cc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/5/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 5.4.0-6ubuntu1~16.04.4'
--with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++
--prefix=/usr --program-suffix=-5
--enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext
--enable-threads=posix --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new
--enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin
--with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.4)
Examples for updates are:
Can there be one OpenMP version?
The --arch
flag is defined in two places in the Kokkos build system so single command compile and link commands, as used in BabelStream, now fail.
The solution seems to be to require separate build and link steps in the Makefile.
Ref: kokkos/kokkos#1394
The Kokkos View mechanism does not require that a layout (seen as a DEVICE
in this code) is passed to the View type. That way the build of the Kokkos library determine the best layout for us.
I'm not a CMake guru but this project should test for OpenMP 4.5 compiler support generically rather than just associate this implementation with the Cray toolchain.
http://www.openmp.org/resources/openmp-compilers/ has the full list, but at least GCC 6.1+ and Intel 17+ support OpenMP 4.5 and I just confirmed that GCC 6.2.0 can correctly run gpu-stream-omp45
.
$ git diff
diff --git a/CMakeLists.txt b/CMakeLists.txt
index efee733..574fcc3 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -118,12 +118,12 @@ endif ()
#-------------------------------------------------------------------------------
# OpenMP 4.5
#-------------------------------------------------------------------------------
-if ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Cray")
- if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 4.5)
+#if ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Cray")
+# if (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 4.5)
add_executable(gpu-stream-omp45 main.cpp OMP45Stream.cpp)
target_compile_definitions(gpu-stream-omp45 PUBLIC OMP45)
- endif ()
-endif ()
+# endif ()
+#endif ()
The Intel OneAPI HPC Toolkit Beta links to an example for OpenMP target. We can see in the CMakeLists.txt
there that the icpc
flags required to add to this Makefile might be similar to:
-qnextgen -fiopenmp -std=c++11 -fopenmp-targets=spir64 -O3
Binaries don't tend to build with the correct flags. This needs fixing. In the mean time things can be compiled manually, but the following should be run first:
echo '#define VERSION_STRING "2.0"' > common.h
I am posting this because others may encounter it and the solution should be logged for posterity.
/opt/gcc/5.4.0/include/c++/5.4.0/type_traits(311): error: identifier "__float128" is undefined
GCC 5.4.0 include/c++/5.4.0/type_traits
needs to be modified to add && !defined(__CUDACC__)
in the following, which is at line 311 in my installation.
#if !defined(__STRICT_ANSI__) && defined(_GLIBCXX_USE_FLOAT128) && !defined(__CUDACC__)
template<>
struct __is_floating_point_helper<__float128>
: public true_type { };
#endif
Compile with strict ISO/ANSI flags (e.g. -std=c++11
instead of the default -std=gnu++11
) such that the GCC quadmath extensions will be disabled.
nvcc fatal : redefinition of argument 'std'
/usr/bin/nvcc -M -D__CUDACC__ ~/Work/GPU/GPU-STREAM/CUDAStream.cu -o ~/Work/GPU/GPU-STREAM/build/CMakeFiles/gpu-stream-cuda.dir//gpu-stream-cuda_generated_CUDAStream.cu.o.NVCC-depend -ccbin /opt/gcc/5.4.0/bin/gcc-5.4 -m64 --std c++11 -Xcompiler ,\"-g\",\"-O3\",\"-fopenmp\",\"-fopenacc\",\"-Wall\",\"-Wextra\",\"-O3\",\"-DNDEBUG\" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 --std=c++11 -DNVCC -I/usr/include -I~/Work/GPU/GPU-STREAM/build -I/usr/include
nvcc fatal : redefinition of argument 'std'
Both Caffe and FFTW have identified the same issue with nvcc
:
More detail can be found on StackOverflow.
Add options to do something like:
instead of just a number of elements
Is it required to have the map clause on all target regions?
The pointers are defined using the unstructured data directives.
It works OK without for the Clang compiler.
The Makefiles should define CXXFLAGS
etc with ?=
instead of =
so that it doesn't matter if you set the variable on the left or right and side of the make
invocation, i.e.:
VAR=val make
make VAR=val
GPU-STREAM is pretty ill-named now we have v3.0 running across CPUs, GPUs, Xeon Phi, etc. The different programming models also adds a different dimension to the supported hardware. As such, we should probably rename the project to something more accurate. Potential names are:
Thoughts/suggestions?
For processors with strong NUMA effects, we need to make sure run()
doesn't allocate too much memory before each model allocates its own memory. The memory in run()
is just an abstraction on checking the results, so we shouldn't have it impact the performance of the computation.
These would be self-contained ports as not sure it's worth attempting plugging in these languages into the C++ main routine here.
Hi,
just checked out caf367f and did:
$ cd repo
$ mkdir build && cd build
$ cmake ..
#...
- Configuring done
CMake Error at CMakeLists.txt:111 (add_executable):
Cannot find source file:
OMP3Stream.cpp
Tried extensions .c .C .c++ .cc .cpp .cxx .m .M .mm .h .hh .h++ .hm .hpp
.hxx .in .txx
CMake Error: CMake can not determine linker language for target: gpu-stream-omp3
CMake Error: Cannot determine link language for target "gpu-stream-omp3".
-- Generating done
-- Build files have been written to: /projects/hpcsupport/steinbac/development/gpu-stream/build
I see OMPStream.cpp
in the source directory, but OMP3Stream.cpp
as well as OMP45Stream.cpp
are missing.
I guess it's not a big issue since the output is a table with only one row, but still. Tested with the CUDA-version only.
$ cuda-stream --csv
Using CUDA device Tesla P100-SXM2-16GB
Driver: 9020
function,num_times,n_elements,sizeof,max_mbytes_per_sec,min_runtime,max_runtime,avg_runtime
Copy,100,..,
Mul,100,...,
Add,100,...,
Triad,100,...,
Dot,100,...,
$ cuda-stream --csv --triad-only
Running triad 100 times
Number of elements: 33554432
Precision: double
Array size: 268435.5 KB (=268.4 MB)
Total size: 805306.4 KB (=805.3 MB)
Using CUDA device Tesla P100-SXM2-16GB
Driver: 9020
--------------------------------
Runtime (seconds): ...
Bandwidth (GB/s): ...
-foffload=amdgcn-amdhsa="-march=gfx906"
-foffload=nvptx-none="-march=sm_75"
It seems like parallel loop
is becoming best practice for OpenACC.
On systems with GLIBC < 2.17 C11 is not available, and so aligned_alloc
is not defined. It can be defined using the POSIX equivalent by adding the following to the code:
void* aligned_alloc(size_t alignment, size_t size)
{
void* mem;
posix_memalign(&mem, alignment, size);
return mem;
}
We probably just need to increase the tolerance. The error will also be proportional to the size of the arrays (unlike with the other kernels), so we need to make sure whatever error checking tolerance we use is robust enough to avoid these sorts of false positives for any sort of input.
Validation failed on sum. Error 0.000209808
Sum was 39.7910385131836 but should be 39.7912483215332
This was caught by #91 after trying to enable ComputeCpp (still not possible yet due to authentication requirements).
I've added better filtering in #91 as well so all the compiler warnings show up in the CI log.
> /opt/ComputeCpp-CE-2.3.0-x86_64-linux-gnu/bin/compute++ -sycl -O2 -mllvm -inline-threshold=1000 -intelspirmetadata -sycl-target spir64 -std=c++1z -I"/opt/computecpp_archive/ComputeCpp-CE-2.3.0-x86_64-linux-gnu/include" -I"/home/tom/babelstream-upstream/CL" -DSYCL -DCL_TARGET_OPENCL_VERSION=220 -D_GLIBCXX_USE_CXX11_ABI=0 SYCLStream.cpp
SYCLStream.cpp:94:44: error: non-constant-expression cannot be narrowed from type 'int' to 'size_t' (aka 'unsigned long') in initializer list [-Wc++11-narrowing]
cgh.parallel_for<copy_kernel>(range<1>{array_size}, [=](id<1> idx)
^~~~~~~~~~
SYCLStream.cpp:313:16: note: in instantiation of member function 'SYCLStream<float>::copy' requested here
template class SYCLStream<float>;
^
SYCLStream.cpp:94:44: note: insert an explicit cast to silence this issue
cgh.parallel_for<copy_kernel>(range<1>{array_size}, [=](id<1> idx)
^~~~~~~~~~
... reports the same thing for all range<1>{array_size} calls ...
This is without any extra warning flags, we also got the same thing but as warnings in hipSYCL:
/opt/hipsycl/cff515c/lib/cmake/hipSYCL/syclcc-launcher --launcher-cxx-compiler=/usr/lib64/ccache/c++ --launcher-syclcc=/opt/hipsycl/cff515c/bin/syclcc-clang --hipsycl-platform=omp /usr/lib64/ccache/c++ -DNDEBUG CMakeFiles/babelstream.dir/SYCLStream.cpp.o CMakeFiles/babelstream.dir/main.cpp.o -o babelstream -Wl,-rpath,/opt/hipsycl/cff515c/lib /opt/hipsycl/cff515c/lib/libhipSYCL-rt.so
SYCLStream.cpp:94:44: warning: narrowing conversion of ‘(int)((SYCLStream<float>*)this)->SYCLStream<float>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:126:43: warning: narrowing conversion of ‘(int)((SYCLStream<float>*)this)->SYCLStream<float>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:110:43: warning: narrowing conversion of ‘(int)((SYCLStream<float>*)this)->SYCLStream<float>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:143:45: warning: narrowing conversion of ‘(int)((SYCLStream<float>*)this)->SYCLStream<float>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:160:47: warning: narrowing conversion of ‘(int)((SYCLStream<float>*)this)->SYCLStream<float>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:221:44: warning: narrowing conversion of ‘(int)((SYCLStream<float>*)this)->SYCLStream<float>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:94:44: warning: narrowing conversion of ‘(int)((SYCLStream<double>*)this)->SYCLStream<double>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:126:43: warning: narrowing conversion of ‘(int)((SYCLStream<double>*)this)->SYCLStream<double>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:110:43: warning: narrowing conversion of ‘(int)((SYCLStream<double>*)this)->SYCLStream<double>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:143:45: warning: narrowing conversion of ‘(int)((SYCLStream<double>*)this)->SYCLStream<double>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:160:47: warning: narrowing conversion of ‘(int)((SYCLStream<double>*)this)->SYCLStream<double>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
SYCLStream.cpp:221:44: warning: narrowing conversion of ‘(int)((SYCLStream<double>*)this)->SYCLStream<double>::array_size’ from ‘int’ to ‘size_t’ {aka ‘long unsigned int’} [-Wnarrowing]
DPC++ compiled fine and reported nothing (!)
We got several options here:
-Wno-narrowing
, ComputeCpp compiles after this but probably not a good thingstatic_cast
to the correct type at warning/error site, or don't use initialiser listssize_t
for array_size
The only place array_size
is used for SYCL are:
new buffer<T>(array_size);
range<1>{array_size}
for all the parallel_for
callsfor (int i = 0; i < array_size; i++)
in SYCLStream<T>::read_arrays(...)
size_t N = array_size; ... for (; i < N; i += global_size)
in SYCLStream<T>::dot()
Not part of this issue but the N
in the dot kernel might need to be int
as per 9a69d3d.
Other than that, we aren't using it in any benchmark kernels directly; I vote option 3.
Istvan Reguly Today at 1:08 PM
And here it is with @shinji Sumimoto’s flags:
FCC main.cpp OMPStream.cpp -DOMP -DSTREAM_ARRAY_SIZE=60000000 -O3 -Kfast,openmp -KA64FX -KSVE -KARMV8_3_A -Kzfill=100 -Kprefetch_sequential=soft -Kprefetch_line=8 -Kprefetch_line_L2=16 -o stream_fujitsu -std=c++11
Hi,
I just downloaded v3.0 and during cmake execution, I get:
$ cmake ..
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
...
CMake Error: File /home/steinbac/development/gpu-stream/GPU-STREAM-3.0/common.h.in does not exist.
CMake Error at CMakeLists.txt:25 (configure_file):
configure_file Problem configuring file
No CMAKE_BUILD_TYPE specified, defaulting to 'Release'
...
-- Configuring incomplete, errors occurred!
See also "/home/steinbac/development/gpu-stream/GPU-STREAM-3.0/build/CMakeFiles/CMakeOutput.log".
See also "/home/steinbac/development/gpu-stream/GPU-STREAM-3.0/build/CMakeFiles/CMakeError.log".
this error comes from (CMakeLists.txt#L25)[https://github.com/UoB-HPC/GPU-STREAM/blob/master/CMakeLists.txt#L25]
I tried to run SYCL stream but it crashes as shown below. I have my own implementation in the PRK project (here) that works, but some caveats. The NVIDIA GPU only works with -sycl-target ptx64
(and not -sycl-target spir64
). The Intel CPU (using Intel OpenCL) only works when I do the opposite.
It seems that I am hitting this issue here, but I don't know how to change your build system to do ptx64. My attempts were unsuccessful.
Any suggestions? The reason I am interested is that my implementation performs poorly on the GPU and I am trying to rule out user error.
$ make -f SYCL.make clean ; make -f SYCL.make && ./sycl-stream
rm -f sycl-stream SYCLStream.sycl SYCLStream.bc
/opt/sycl/latest/bin/compute++ SYCLStream.cpp -O2 -mllvm -inline-threshold=1000 -sycl -emit-llvm -intelspirmetadata -c -I/opt/sycl/latest/include -o SYCLStream.sycl
remark: [Computecpp:CC0027]: Some memcpy/memset intrinsics added by the llvm optimizer were replaced by serial functions. This
is a workaround for OpenCL drivers that do not support those intrinsics. This may impact performance, consider using
-no-serial-memop. [-Rsycl-serial-memop]
g++ -O3 -std=c++11 -DSYCL main.cpp SYCLStream.cpp -I/opt/sycl/latest/include -include SYCLStream.sycl -L/opt/sycl/latest/lib -lComputeCpp -lOpenCL -Wl,--rpath=/opt/sycl/latest/lib/ -o sycl-stream
BabelStream
Version: 3.3
Implementation: SYCL
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Using SYCL device GeForce GTX 960
Driver: 390.30
Reduction kernel config: 32 groups of size 1024
terminate called after throwing an instance of 'cl::sycl::exception'
Aborted (core dumped)
$ /opt/sycl/latest/bin/computecpp_info
********************************************************************************
ComputeCpp Info (CE 0.5.1)
********************************************************************************
Toolchain information:
GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.
********************************************************************************
Device Info:
Discovered 3 devices matching:
platform : <any>
device type : <any>
--------------------------------------------------------------------------------
Device 0:
Device is supported : NO - Device does not support SPIR
CL_DEVICE_NAME : GeForce GTX 960
CL_DEVICE_VENDOR : NVIDIA Corporation
CL_DRIVER_VERSION : 390.30
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:
Device is supported : YES - Tested internally by Codeplay Software Ltd.
CL_DEVICE_NAME : Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
CL_DEVICE_VENDOR : Intel(R) Corporation
CL_DRIVER_VERSION : 1.2.0.10
CL_DEVICE_TYPE : CL_DEVICE_TYPE_CPU
--------------------------------------------------------------------------------
Device 2:
Device is supported : YES - Tested internally by Codeplay Software Ltd.
CL_DEVICE_NAME : Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
CL_DEVICE_VENDOR : Intel(R) Corporation
CL_DRIVER_VERSION : 1.2.0.25
CL_DEVICE_TYPE : CL_DEVICE_TYPE_CPU
If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v0.5.1/platform-support-notes
********************************************************************************
$ clinfo
Number of platforms 3
Platform Name NVIDIA CUDA
Platform Vendor NVIDIA Corporation
Platform Version OpenCL 1.2 CUDA 9.1.84
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Platform Extensions function suffix NV
Platform Name Experimental OpenCL 2.1 CPU Only Platform
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 2.1 LINUX
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
Platform Host timer resolution 1ns
Platform Extensions function suffix INTEL
Platform Name Intel(R) OpenCL
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 1.2 LINUX
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64
Platform Extensions function suffix INTEL
Platform Name NVIDIA CUDA
Number of devices 1
Device Name GeForce GTX 960
Device Vendor NVIDIA Corporation
Device Vendor ID 0x10de
Device Version OpenCL 1.2 CUDA
Driver Version 390.30
Device OpenCL C Version OpenCL C 1.2
Device Type GPU
Device Profile FULL_PROFILE
Device Topology (NV) PCI-E, 03:00.0
Max compute units 8
Max clock frequency 1228MHz
Compute Capability (NV) 5.2
Device Partition (core)
Max number of sub-devices 1
Supported partition types None
Max work item dimensions 3
Max work item sizes 1024x1024x64
Max work group size 1024
Preferred work group size multiple 32
Warp size (NV) 32
Preferred / native vector sizes
char 1 / 1
short 1 / 1
int 1 / 1
long 1 / 1
half 0 / 0 (n/a)
float 1 / 1
double 1 / 1 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations Yes
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 2096168960 (1.952GiB)
Error Correction support No
Max memory allocation 524042240 (499.8MiB)
Unified memory for Host and Device No
Integrated memory (NV) No
Minimum alignment for any data type 128 bytes
Alignment of base address 4096 bits (512 bytes)
Global Memory cache type Read/Write
Global Memory cache size 131072
Global Memory cache line 128 bytes
Image support Yes
Max number of samplers per kernel 32
Max size for 1D images from buffer 134217728 pixels
Max 1D or 2D image array size 2048 images
Max 2D image size 16384x16384 pixels
Max 3D image size 4096x4096x4096 pixels
Max number of read image args 256
Max number of write image args 16
Local memory type Local
Local memory size 49152 (48KiB)
Registers per block (NV) 65536
Max constant buffer size 65536 (64KiB)
Max number of constant args 9
Max size of kernel argument 4352 (4.25KiB)
Queue properties
Out-of-order execution Yes
Profiling Yes
Prefer user sync for interop No
Profiling timer resolution 1000ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels No
Kernel execution timeout (NV) No
Concurrent copy and kernel execution (NV) Yes
Number of async copy engines 2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
Platform Name Experimental OpenCL 2.1 CPU Only Platform
Number of devices 1
Device Name Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 2.1 (Build 10)
Driver Version 1.2.0.10
Device OpenCL C Version OpenCL C 2.0
Device Type CPU
Device Profile FULL_PROFILE
Max compute units 16
Max clock frequency 3000MHz
Device Partition (core)
Max number of sub-devices 16
Supported partition types by counts, equally, by names (Intel)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Preferred work group size multiple 128
Max sub-groups per work group 1
Preferred / native vector sizes
char 1 / 32
short 1 / 16
int 1 / 8
long 1 / 4
half 0 / 0 (n/a)
float 1 / 8
double 1 / 4 (cl_khr_fp64)
Half-precision Floating-point support (n/a)
Single-precision Floating-point support (core)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero No
Round to infinity No
IEEE754-2008 fused multiply-add No
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Double-precision Floating-point support (cl_khr_fp64)
Denormals Yes
Infinity and NANs Yes
Round to nearest Yes
Round to zero Yes
Round to infinity Yes
IEEE754-2008 fused multiply-add Yes
Support is emulated in software No
Correctly-rounded divide and sqrt operations No
Address bits 64, Little-Endian
Global memory size 16645128192 (15.5GiB)
Error Correction support No
Max memory allocation 4161282048 (3.875GiB)
Unified memory for Host and Device Yes
Shared Virtual Memory (SVM) capabilities (core)
Coarse-grained buffer sharing Yes
Fine-grained buffer sharing Yes
Fine-grained system sharing Yes
Atomics Yes
Minimum alignment for any data type 128 bytes
Alignment of base address 1024 bits (128 bytes)
Preferred alignment for atomics
SVM 64 bytes
Global 64 bytes
Local 0 bytes
Max size for global variable 65536 (64KiB)
Preferred total size of global vars 65536 (64KiB)
Global Memory cache type Read/Write
Global Memory cache size 262144
Global Memory cache line 64 bytes
Image support Yes
Max number of samplers per kernel 480
Max size for 1D images from buffer 260080128 pixels
Max 1D or 2D image array size 2048 images
Base address alignment for 2D image buffers 64 bytes
Pitch alignment for 2D image buffers 64 bytes
Max 2D image size 16384x16384 pixels
Max 3D image size 2048x2048x2048 pixels
Max number of read image args 480
Max number of write image args 480
Max number of read/write image args 480
Max number of pipe args 16
Max active pipe reservations 16383
Max pipe packet size 1024
Local memory type Global
Local memory size 32768 (32KiB)
Max constant buffer size 131072 (128KiB)
Max number of constant args 480
Max size of kernel argument 3840 (3.75KiB)
Queue properties (on host)
Out-of-order execution Yes
Profiling Yes
Local thread execution (Intel) Yes
Queue properties (on device)
Out-of-order execution Yes
Profiling Yes
Preferred size 4294967295 (4GiB)
Max size 4294967295 (4GiB)
Max queues on device 4294967295
Max events on device 4294967295
Prefer user sync for interop No
Profiling timer resolution 1ns
Execution capabilities
Run OpenCL kernels Yes
Run native kernels Yes
Sub-group independent forward progress No
IL version SPIR-V_1.0
SPIR versions 1.2
printf() buffer size 1048576 (1024KiB)
Built-in kernels
Device Available Yes
Compiler Available Yes
Linker Available Yes
Device Extensions cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
Platform Name Intel(R) OpenCL
Number of devices 1
Device Name Intel(R) Core(TM) i7-5960X CPU @ 3.00GHz
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 1.2 (Build 25)
Driver Version 1.2.0.25
Device OpenCL C Version OpenCL C 1.2
Device Type CPU
Device Profile FULL_PROFILE
Max compute units 16
Max clock frequency 3000MHz
Device Partition (core)
Max number of sub-devices 16
Supported partition types by counts, equally, by names (Intel)
Max work item dimensions 3
Max work item sizes 8192x8192x8192
Max work group size 8192
Stack dump:
0. Running pass 'ChannelPipeTransformation' on module 'main'.
Segmentation fault (core dumped)
The Makefile should specify -Xopenmp-target -march=sm_??
when building with LLVM for NVPTX targets.
Bablestream is not working with HIP programming model on the latest ROCm1.7.1(1.7-137).
Application should be modified as per the latest changes in HIP, correct?
Can you please take a look at below log and provide the resolution?
taccuser@ROCM-DTB-VG10:~/Desktop/BabelStream$ make -f HIP.make
/opt/rocm/hip/bin/hipcc -std=c++11 -DHIP main.cpp HIPStream.cpp -o hip-stream
HIPStream.cpp:129:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(copy_kernel), dim3(array_size/TBSIZE), ...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::copy' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float )] not viable: no overload
of 'copy_kernel' matching 'void ()(hip_impl::Empty_launch_parm, float *,
float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:162:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(add_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::add' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float *, float )] not viable: no
overload of 'add_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
float *, float *, float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:146:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(mul_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::mul' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float )] not viable: no overload
of 'mul_kernel' matching 'void ()(hip_impl::Empty_launch_parm, float *,
float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:179:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(triad_kernel), dim3(array_size/TBSIZE)...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::triad' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float *, float )] not viable: no
overload of 'triad_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
float *, float *, float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:213:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(dot_kernel), dim3(DOT_NUM_BLOCKS), dim3...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::dot' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float , unsigned int>,
F = void ()(hip_impl::Empty_launch_parm, float *, float *, float ,
unsigned int)] not viable: no overload of 'dot_kernel' matching 'void
()(hip_impl::Empty_launch_parm, float *, float *, float *, unsigned int)'
for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:129:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(copy_kernel), dim3(array_size/TBSIZE), ...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::copy' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double )] not viable: no
overload of 'copy_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:162:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(add_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::add' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double *, double )] not viable:
no overload of 'add_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:146:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(mul_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::mul' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double )] not viable: no
overload of 'mul_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:179:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(triad_kernel), dim3(array_size/TBSIZE)...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::triad' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double *, double )] not viable:
no overload of 'triad_kernel' matching 'void
()(hip_impl::Empty_launch_parm, double *, double *, double *)' for 1st
argument
void hipLaunchKernel(
^
HIPStream.cpp:213:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(dot_kernel), dim3(DOT_NUM_BLOCKS), dim3...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::dot' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double , unsigned
int>, F = void ()(hip_impl::Empty_launch_parm, double *, double *, double
, unsigned int)] not viable: no overload of 'dot_kernel' matching 'void
()(hip_impl::Empty_launch_parm, double *, double *, double *, unsigned
int)' for 1st argument
void hipLaunchKernel(
^
10 errors generated.
HIPStream.cpp:129:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(copy_kernel), dim3(array_size/TBSIZE), ...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::copy' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float )] not viable: no overload
of 'copy_kernel' matching 'void ()(hip_impl::Empty_launch_parm, float *,
float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:162:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(add_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::add' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float *, float )] not viable: no
overload of 'add_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
float *, float *, float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:146:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(mul_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::mul' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float )] not viable: no overload
of 'mul_kernel' matching 'void ()(hip_impl::Empty_launch_parm, float *,
float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:179:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(triad_kernel), dim3(array_size/TBSIZE)...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::triad' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float *, float )] not viable: no
overload of 'triad_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
float *, float *, float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:213:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(dot_kernel), dim3(DOT_NUM_BLOCKS), dim3...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::dot' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float , unsigned int>,
F = void ()(hip_impl::Empty_launch_parm, float *, float *, float ,
unsigned int)] not viable: no overload of 'dot_kernel' matching 'void
()(hip_impl::Empty_launch_parm, float *, float *, float *, unsigned int)'
for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:129:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(copy_kernel), dim3(array_size/TBSIZE), ...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::copy' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double )] not viable: no
overload of 'copy_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:162:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(add_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::add' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double *, double )] not viable:
no overload of 'add_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:146:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(mul_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::mul' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double )] not viable: no
overload of 'mul_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:179:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(triad_kernel), dim3(array_size/TBSIZE)...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::triad' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double *, double )] not viable:
no overload of 'triad_kernel' matching 'void
()(hip_impl::Empty_launch_parm, double *, double *, double *)' for 1st
argument
void hipLaunchKernel(
^
HIPStream.cpp:213:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(dot_kernel), dim3(DOT_NUM_BLOCKS), dim3...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::dot' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double , unsigned
int>, F = void ()(hip_impl::Empty_launch_parm, double *, double *, double
, unsigned int)] not viable: no overload of 'dot_kernel' matching 'void
()(hip_impl::Empty_launch_parm, double *, double *, double *, unsigned
int)' for 1st argument
void hipLaunchKernel(
^
10 errors generated.
Died at /opt/rocm/hip/bin/hipcc line 498.
HIP.make:7: recipe for target 'hip-stream' failed
make: *** [hip-stream] Error 1
The OpenACC implementation requires GCC 6+ to compile, but then only with a patch.
Please rename the patch to remove the .txt
suffix, which Github forced me to add.
0001-use-restrict-instead-of-restrict-which-is-not-a-.patch.txt
Unfortunately, my version of CUDA only supports GCC 5, so I'm kludging them together like this (the important piece is -ccbin
):
cmake \
-DCMAKE_CXX_COMPILER=g++-6.2 \
-DCMAKE_C_COMPILER=gcc-6.2 \
-DCMAKE_CXX_FLAGS="-g -O3 -std=gnu++11 -fopenmp -fopenacc -Wall -Wextra" \
-DCMAKE_C_FLAGS="-g -O3 -std=gnu11 -fopenmp -fopenacc -Wall -Wextra" \
-DCUDA_NVCC_FLAGS="-gencode arch=compute_20,code=sm_20 \
-gencode arch=compute_30,code=sm_30 \
-gencode arch=compute_35,code=sm_35 \
-gencode arch=compute_37,code=sm_37 \
-gencode arch=compute_50,code=sm_50 \
-gencode arch=compute_52,code=sm_52 \
-ccbin gcc-5.4 " ..
jrhammon@klondike:~/Work/GPU/GPU-STREAM/build$ /opt/gcc/5.4.0/bin/g++-5.4 -DACC -I/home/jrhammon/Work/GPU/GPU-STREAM/build -g -O3 -std=gnu++11 -fopenmp -fopenacc -Wall -Wextra -O3 -DNDEBUG -std=gnu++11 -o CMakeFiles/gpu-stream-acc.dir/main.cpp.o -c /home/jrhammon/Work/GPU/GPU-STREAM/main.cpp
In file included from /home/jrhammon/Work/GPU/GPU-STREAM/ACCStream.h:15:0,
from /home/jrhammon/Work/GPU/GPU-STREAM/main.cpp:32:
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:69:40: error: expected primary-expression before ‘)’ token
int acc_get_num_devices (acc_device_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:70:41: error: expected primary-expression before ‘)’ token
void acc_set_device_type (acc_device_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:71:41: error: expected primary-expression before ‘)’ token
acc_device_t acc_get_device_type (void) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:72:45: error: expected primary-expression before ‘)’ token
void acc_set_device_num (int, acc_device_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:73:39: error: expected primary-expression before ‘)’ token
int acc_get_device_num (acc_device_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:74:26: error: expected primary-expression before ‘)’ token
int acc_async_test (int) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:75:31: error: expected primary-expression before ‘)’ token
int acc_async_test_all (void) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:76:21: error: expected primary-expression before ‘)’ token
void acc_wait (int) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:77:32: error: expected primary-expression before ‘)’ token
void acc_wait_async (int, int) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:78:26: error: expected primary-expression before ‘)’ token
void acc_wait_all (void) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:79:31: error: expected primary-expression before ‘)’ token
void acc_wait_all_async (int) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:80:30: error: expected primary-expression before ‘)’ token
void acc_init (acc_device_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:81:34: error: expected primary-expression before ‘)’ token
void acc_shutdown (acc_device_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:82:34: error: expected primary-expression before ‘)’ token
int acc_on_device (acc_device_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:83:27: error: expected primary-expression before ‘)’ token
void *acc_malloc (size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:84:24: error: expected primary-expression before ‘)’ token
void acc_free (void *) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:87:35: error: expected primary-expression before ‘)’ token
void *acc_copyin (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:88:46: error: expected primary-expression before ‘)’ token
void *acc_present_or_copyin (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:89:35: error: expected primary-expression before ‘)’ token
void *acc_create (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:90:46: error: expected primary-expression before ‘)’ token
void *acc_present_or_create (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:91:35: error: expected primary-expression before ‘)’ token
void acc_copyout (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:92:34: error: expected primary-expression before ‘)’ token
void acc_delete (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:93:41: error: expected primary-expression before ‘)’ token
void acc_update_device (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:94:39: error: expected primary-expression before ‘)’ token
void acc_update_self (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:95:44: error: expected primary-expression before ‘)’ token
void acc_map_data (void *, void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:96:30: error: expected primary-expression before ‘)’ token
void acc_unmap_data (void *) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:97:30: error: expected primary-expression before ‘)’ token
void *acc_deviceptr (void *) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:98:28: error: expected primary-expression before ‘)’ token
void *acc_hostptr (void *) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:99:37: error: expected primary-expression before ‘)’ token
int acc_is_present (void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:100:52: error: expected primary-expression before ‘)’ token
void acc_memcpy_to_device (void *, void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:101:54: error: expected primary-expression before ‘)’ token
void acc_memcpy_from_device (void *, void *, size_t) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:109:42: error: expected primary-expression before ‘)’ token
void *acc_get_current_cuda_device (void) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:110:43: error: expected primary-expression before ‘)’ token
void *acc_get_current_cuda_context (void) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:111:33: error: expected primary-expression before ‘)’ token
void *acc_get_cuda_stream (int) __GOACC_NOTHROW;
^
/opt/gcc/5.4.0/lib/gcc/x86_64-unknown-linux-gnu/5.4.0/include/openacc.h:112:39: error: expected primary-expression before ‘)’ token
int acc_set_cuda_stream (int, void *) __GOACC_NOTHROW;
^
jrhammon@klondike:~/Work/GPU/GPU-STREAM/build$ g++-6.2 -DACC -I/home/jrhammon/Work/GPU/GPU-STREAM/build -g -O3 -std=gnu++11 -fopenmp -fopenacc -Wall -Wextra -O3 -DNDEBUG -std=gnu++11 -o CMakeFiles/gpu-stream-acc.dir/main.cpp.o -c /home/jrhammon/Work/GPU/GPU-STREAM/main.cpp
Due to some ambiguities in the SYCL spec, BabelStream currently fails to build with hipSYCL. Workarounds until a solution in the SYCL spec and hipSYCL are found can are detailed in AdaptiveCpp/AdaptiveCpp#185
BabelStream uses base 10 output where MB = 10^6.
It would be useful to add a --base2
or --mibibytes
option to use base 2 output to calculate bandwidth for MiB = 2^20.
If you compile the code with VS2013 or lower, the timer resolution is not fine enough to collect accurate results [1]. You would expect to see minimum timings of zero, and very large memory bandwidths in this case.
This should be fixed in VS2015.
There will probably be a number of changes, but the main one is to update the accessors to use the simpler form:
auto ka = d_a->template get_access<access::mode::read>(cgh); // SYCL 1.2.1
auto ka = d_a->template get_access(cgh, sycl::read_only); // SYCL 2020 Provisional
Should probably use accessor
constructers instead of get_access
, with read and write versions.
Implement Dot using the reduction support in SYCL 2020 Provisional.
Compile errors when building with both double and single precision instantiations of the class. This only seems to happen when the dot kernel is included.
RAJA 0.3.x has breaking changes which are not backwards compatible, which means that BabelStream no longer builds with the latest versions of RAJA.
We should use the get_devices()
function of the cl::sycl::device
class to get all the devices in the system. This means we don't have to search through the platforms first.
I wanted to benchmark a GTX 1080 with cuda 8.0.27 under CentOS 7.2.1511. the gpu-stream-cuda app behaves normal with the default parameters.
Strange enough though, when I want to provide more than the default number of elements in the array:
$ gpu-stream-cuda --arraysize 67108864
the copy kernel dispatch throws a CUDA API error 0xb
which is Invalid Argument
. I tracked down the problem to (this line of code)[https://github.com/UoB-HPC/GPU-STREAM/blob/master/CUDAStream.cu#L112]:
template <class T>
void CUDAStream<T>::copy()
{
copy_kernel<<<array_size/TBSIZE, TBSIZE>>>(d_a, d_c);
check_error();
cudaDeviceSynchronize();
check_error();
}
strange enough, if I look at the values of array_size/TBSIZE, they are in plausible ranges arraysize/TBSIZE = 65536
.
Does anyone have an idea where this is coming from? (as this is a RC cuda, I see no problem forwarding this issue to nvidia)
Bablestream is not working with HIP programming model on the latest ROCm1.7.1(1.7-137).
Application should be modified as per the latest changes in HIP, correct?
Can you please take a look at below log and provide the resolution?
taccuser@ROCM-DTB-VG10:~/Desktop/BabelStream$ make -f HIP.make
/opt/rocm/hip/bin/hipcc -std=c++11 -DHIP main.cpp HIPStream.cpp -o hip-stream
HIPStream.cpp:129:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(copy_kernel), dim3(array_size/TBSIZE), ...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::copy' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float )] not viable: no overload
of 'copy_kernel' matching 'void ()(hip_impl::Empty_launch_parm, float *,
float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:162:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(add_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::add' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float *, float )] not viable: no
overload of 'add_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
float *, float *, float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:146:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(mul_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::mul' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float )] not viable: no overload
of 'mul_kernel' matching 'void ()(hip_impl::Empty_launch_parm, float *,
float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:179:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(triad_kernel), dim3(array_size/TBSIZE)...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::triad' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float *, float )] not viable: no
overload of 'triad_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
float *, float *, float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:213:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(dot_kernel), dim3(DOT_NUM_BLOCKS), dim3...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::dot' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float , unsigned int>,
F = void ()(hip_impl::Empty_launch_parm, float *, float *, float ,
unsigned int)] not viable: no overload of 'dot_kernel' matching 'void
()(hip_impl::Empty_launch_parm, float *, float *, float *, unsigned int)'
for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:129:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(copy_kernel), dim3(array_size/TBSIZE), ...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::copy' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double )] not viable: no
overload of 'copy_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:162:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(add_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::add' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double *, double )] not viable:
no overload of 'add_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:146:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(mul_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::mul' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double )] not viable: no
overload of 'mul_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:179:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(triad_kernel), dim3(array_size/TBSIZE)...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::triad' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double *, double )] not viable:
no overload of 'triad_kernel' matching 'void
()(hip_impl::Empty_launch_parm, double *, double *, double *)' for 1st
argument
void hipLaunchKernel(
^
HIPStream.cpp:213:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(dot_kernel), dim3(DOT_NUM_BLOCKS), dim3...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::dot' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double , unsigned
int>, F = void ()(hip_impl::Empty_launch_parm, double *, double *, double
, unsigned int)] not viable: no overload of 'dot_kernel' matching 'void
()(hip_impl::Empty_launch_parm, double *, double *, double *, unsigned
int)' for 1st argument
void hipLaunchKernel(
^
10 errors generated.
HIPStream.cpp:129:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(copy_kernel), dim3(array_size/TBSIZE), ...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::copy' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float )] not viable: no overload
of 'copy_kernel' matching 'void ()(hip_impl::Empty_launch_parm, float *,
float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:162:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(add_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::add' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float *, float )] not viable: no
overload of 'add_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
float *, float *, float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:146:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(mul_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::mul' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float )] not viable: no overload
of 'mul_kernel' matching 'void ()(hip_impl::Empty_launch_parm, float *,
float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:179:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(triad_kernel), dim3(array_size/TBSIZE)...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::triad' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float >, F = void
()(hip_impl::Empty_launch_parm, float *, float *, float )] not viable: no
overload of 'triad_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
float *, float *, float *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:213:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(dot_kernel), dim3(DOT_NUM_BLOCKS), dim3...
^~~~~~~~~~~~~~~
HIPStream.cpp:270:16: note: in instantiation of member function
'HIPStream::dot' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <float *, float *, float , unsigned int>,
F = void ()(hip_impl::Empty_launch_parm, float *, float *, float ,
unsigned int)] not viable: no overload of 'dot_kernel' matching 'void
()(hip_impl::Empty_launch_parm, float *, float *, float *, unsigned int)'
for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:129:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(copy_kernel), dim3(array_size/TBSIZE), ...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::copy' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double )] not viable: no
overload of 'copy_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:162:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(add_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::add' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double *, double )] not viable:
no overload of 'add_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:146:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(mul_kernel), dim3(array_size/TBSIZE), d...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::mul' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double )] not viable: no
overload of 'mul_kernel' matching 'void ()(hip_impl::Empty_launch_parm,
double *, double *)' for 1st argument
void hipLaunchKernel(
^
HIPStream.cpp:179:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(triad_kernel), dim3(array_size/TBSIZE)...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::triad' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double >, F = void
()(hip_impl::Empty_launch_parm, double *, double *, double )] not viable:
no overload of 'triad_kernel' matching 'void
()(hip_impl::Empty_launch_parm, double *, double *, double *)' for 1st
argument
void hipLaunchKernel(
^
HIPStream.cpp:213:3: error: no matching function for call to 'hipLaunchKernel'
hipLaunchKernel(HIP_KERNEL_NAME(dot_kernel), dim3(DOT_NUM_BLOCKS), dim3...
^~~~~~~~~~~~~~~
HIPStream.cpp:271:16: note: in instantiation of member function
'HIPStream::dot' requested here
template class HIPStream;
^
/opt/rocm/hip/include/hip/hcc_detail/functional_grid_launch.hpp:142:6: note:
candidate function [with Args = <double *, double *, double , unsigned
int>, F = void ()(hip_impl::Empty_launch_parm, double *, double *, double
, unsigned int)] not viable: no overload of 'dot_kernel' matching 'void
()(hip_impl::Empty_launch_parm, double *, double *, double *, unsigned
int)' for 1st argument
void hipLaunchKernel(
^
10 errors generated.
Died at /opt/rocm/hip/bin/hipcc line 498.
HIP.make:7: recipe for target 'hip-stream' failed
make: *** [hip-stream] Error 1
Cray C/C++ 9.x now needs Clang-style flags for the C++ standard and -fopenmp
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.