rocm / rocthrust Goto Github PK
View Code? Open in Web Editor NEWROCm Thrust - run Thrust dependent software on AMD GPUs
Home Page: https://rocm.docs.amd.com/projects/rocThrust/en/latest/
License: Apache License 2.0
ROCm Thrust - run Thrust dependent software on AMD GPUs
Home Page: https://rocm.docs.amd.com/projects/rocThrust/en/latest/
License: Apache License 2.0
https://github.com/ROCmSoftwarePlatform/rocThrust/tree/master/testing/unittest
Please add the missing copyright headers and modifications by AMD to files in this directory.
Please check the rest of the files in this directory for missing copyright notices.
https://github.com/ROCmSoftwarePlatform/rocThrust/tree/master/testing/
I compiled rocThrust-rocm-5.0.2 for gfx1031, and find 1 failed (async_copy) test among 113. Other tests passed.
Hardware | description |
---|---|
GPU | Navy_flounder [Radeon RX 6700XT] |
CPU | AMD Ryzen 9 5950X |
Software | version |
---|---|
Linux | 5.17.3 |
Distribution | Gentoo |
ROCK | Upstream Kernel |
ROCR | v5.0.2 |
Host Compiler | gcc-11.2 |
Device Compiler | hipcc-5.0.2 |
Command: "/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-5.0.2_build/test/async_copy.hip"
Directory: /ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-5.0.2_build/test
"async_copy.hip" start time: May 02 15:50 CST
Output:
----------------------------------------------------------
Running main() from /opt/build/portage/dev-cpp/gtest-1.11.0/work/googletest-aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e/googletest/src/gtest_main.cc
[==========] Running 32 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 4 tests from AsyncCopyTests/0, where TypeParam = Params<short>
[ RUN ] AsyncCopyTests/0.TestAsyncTriviallyRelocatableElementsHostToDevice
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:78: Failure
Expected equality of these values:
h0
Which is: { -32768 }
d0
Which is: { 0 }
Google Test trace:
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:66: with seed= 1
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:63: with size = 1
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:85: with device_id= 0
[ FAILED ] AsyncCopyTests/0.TestAsyncTriviallyRelocatableElementsHostToDevice, where TypeParam = Params<short> (17005 ms)
When doing add_subdirectory(ThirdParty/rocThrust)
from a project, rocThrust fails to install rocprim
with the following error message:
CMake Error at cmake/VerifyCompiler.cmake:29 (message):
On ROCm platform 'hipcc' or HIP-aware Clang must be used as C++ compiler.
Call Stack (most recent call first):
CMakeLists.txt:82 (include)
Please correct me, if I'm wrong: this message requests that the CMAKE_CXX_COMPILER
must be hipcc
. I believe this requirement contradicts with the way how modern CMake is designed. Starting from CMake 3.18/3.23, CUDA and HIP are CMake "languages" that could be enabled with e.g. enable_language(HIP)
. The hipcc compiler is then provided by CMAKE_HIP_COMPILER
, independently of the main CMAKE_CXX_COMPILER
. Furthermore, CMake chooses the C++ compiler variant, depending on the file extension (.cpp, .cu, .hip, etc.). In order to comply with this flexibility of CMake, rocThrust should not post this error, and instead check for CMAKE_HIP_COMPILER
, and use it internally if it wishes so.
I installed architecture specific versions of rocPRIM and rocThrust in the subdirectory:
/opt/rocm-5.4.0/myspecialrocm
When I try to compile an application for this architecture, the compiler picks the wrong version from the Linux distribution specific installation. When compiling rocThust itself, setting -DCMAKE_NO_SYSTEM_FROM_IMPORTED=TRUE and -DCMAKE_INSTALL_PREFIX=/opt/rocm-5.4.0/myspecialrocm did the trick.
How can I set the path etc. to this alternative rocThrust installation to be used by the compiler?
I'm using reduce_by_key to implement a segmented argmin. The following piece of code fails with a access violationt with ROCm 5.6.0 (since little changed since then, I assume it's still failing), the equivalent code works fine with NVIDIA thrust. I haven't been able to debug the corresponding issues, since I don't have access to a system with sufficient driver support for rocgdb
. My educated guess is that the reduction is being evaluated for invalid/out-of-bounds argument values, which leads to out-of-bounds accesses to degree
#include <thrust/device_vector.h>
#include <thrust/reduce.h>
template <typename IndexType>
struct node_min_degree_reduction {
__device__ __forceinline__ IndexType operator()(IndexType u, IndexType v)
{
return thrust::make_pair(degree[u], u) < thrust::make_pair(degree[v], v)
? u
: v;
}
const IndexType* degree;
};
int main() {
using IndexType = int;
thrust::device_vector<int> k(1000, 0);
thrust::device_vector<int> v{1000, 1};
thrust::device_vector<int> res1(1);
thrust::device_vector<int> res2(1);
thrust::device_vector<int> degrees(1000, 1);
thrust::reduce_by_key(k.begin(), k.end(), v.begin(), res1.begin(), res2.begin(), thrust::equal_to<IndexType>{},
node_min_degree_reduction<IndexType>{degrees.data().get()});
}
Log: rocthrust_build.log
Build settings: rocthrust.nix
These generally seem to be in the flavor of no matching X
.
I'm not sure what's really going on here, but any help would be appreciated!
https://github.com/ROCmSoftwarePlatform/rocThrust/blob/b3410b4a87798818046213afd4764c997ba17b8b/thrust/system/hip/detail/util.h#L106
Inline asm("trap") is invalid for ROCm and does not compile.
Instead, we can apparently use __builtin_trap() for both the CPU and GPU path in that function and avoid yucky ASM and ifdefing altogether.
@sbalint98;@Maetveis;@MathiasMagnus;@nolmoonen
We are going to enforce two factor authentication in (https://github.com/ROCmSoftwarePlatform/) organization on 29th April, 2022 .
Since we identified you as outside collaborator for ROCmSoftwarePlatform organization, you need to enable two factor authentication in your github account else you shall be removed from the organization after the enforcement.
Please skip if already done.
To set up two factor authentication, please go through the steps in below link:
Please email "[email protected]" for queries
Hi,
I was looking that the performance of the thrust::partition_copy and found it quite slow compared to what I believe is a similar operation, that is, thrust::copy_if and thrust::remove_copy_if. In fact, partition_copy seems x2 slower.
Hardware | description |
---|---|
GPU | MI100:amdgcn-amd-amdhsa--gfx908:sramecc+:xnack- |
CPU | AMD EPYC 7542 |
Software | version |
---|---|
Distribution | Redhat |
ROCm | 4.5.0 to 5.0.2 |
The reproducer is a bit verbose but the code is straight forward:
#include <assert.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>
#include <thrust/functional.h>
#include <thrust/host_vector.h>
#include <thrust/random.h>
#include <thrust/sort.h>
#include <ctime>
#include <iterator>
template <typename Duration = std::chrono::nanoseconds>
class StopWatch {
public:
/// Nanoseconds by default
///
using duration = Duration;
using time_point = std::chrono::steady_clock::time_point;
static_assert(std::chrono::steady_clock::is_steady, "Only steady clocks (the ones that only go forward) !");
public:
/// Start the StopWatch
///
StopWatch();
/// Return the Return time difference (duration) between the last Lap() or Reset() or object construction.
/// Do not call Elapsed() and then Reset(), just call Lap() instead.
///
duration Elapsed() const;
/// Same as lap but do not return the time difference (duration).
/// Do not call Elapsed() and then Reset(), just call Lap() instead.
///
void Reset();
protected:
time_point the_last_point_in_time_;
};
////////////////////////////////////////////////////////////////////////
// StopWatch methods definition
////////////////////////////////////////////////////////////////////////
template <typename Duration>
StopWatch<Duration>::StopWatch()
: the_last_point_in_time_{std::chrono::steady_clock::now()} {
// EMPTY
}
template <typename Duration>
typename StopWatch<Duration>::duration
StopWatch<Duration>::Elapsed() const {
// The cast is a noop if std::chrono::steady_clock::duration "is same" duration. For other duration, I dunno.
return std::chrono::duration_cast<duration>(std::chrono::steady_clock::now() - the_last_point_in_time_);
}
template <typename Duration>
void StopWatch<Duration>::Reset() {
the_last_point_in_time_ = std::chrono::steady_clock::now();
}
struct KeyInfZero {
template <typename Tuple>
__host__ __device__ constexpr bool
operator()(const Tuple& a_tuple) const {
static_cast<void>(a_tuple);
return thrust::get<0>(a_tuple) < 0;
}
};
template <typename T>
void initialize_keys(thrust::device_vector<T>& keys) {
thrust::default_random_engine rng;
thrust::uniform_int_distribution<int> dist(0, keys.size());
thrust::host_vector<T> h_keys(keys.size());
for(size_t i = 0; i < h_keys.size(); i++) {
h_keys[i] = dist(rng);
}
keys = h_keys;
}
int main(void) {
static constexpr size_t N = 10'000'000; // / 10;
{
thrust::device_vector<short> values0(N);
thrust::device_vector<double> x(N);
thrust::device_vector<double> y(N);
thrust::device_vector<double> z(N);
thrust::device_vector<double> mx(N);
thrust::device_vector<double> my(N);
thrust::device_vector<double> mz(N);
thrust::device_vector<double> c(N);
thrust::device_vector<double> w(N);
thrust::device_vector<short> _values0(N);
thrust::device_vector<double> _x(N);
thrust::device_vector<double> _y(N);
thrust::device_vector<double> _z(N);
thrust::device_vector<double> _mx(N);
thrust::device_vector<double> _my(N);
thrust::device_vector<double> _mz(N);
thrust::device_vector<double> _c(N);
thrust::device_vector<double> _w(N);
initialize_keys(values0);
// thrust::sort(std::begin(values0), std::end(values0)); // Sorting the keys does not improve the partitioning speed
auto input_iterator = thrust::make_zip_iterator(thrust::make_tuple(std::begin(values0),
std::begin(x),
std::begin(y),
std::begin(z),
std::begin(mx),
std::begin(my),
std::begin(mz),
std::begin(c),
std::begin(w)));
auto output_iterator_true = thrust::make_zip_iterator(thrust::make_tuple(std::begin(_values0),
std::begin(_x),
std::begin(_y),
std::begin(_z),
std::begin(_mx),
std::begin(_my),
std::begin(_mz),
std::begin(_c),
std::begin(_w)));
// reverse iterators have no overhead in this case
auto output_iterator_false = thrust::make_zip_iterator(thrust::make_tuple(thrust::make_reverse_iterator(std::end(_values0)),
thrust::make_reverse_iterator(std::end(_x)),
thrust::make_reverse_iterator(std::end(_y)),
thrust::make_reverse_iterator(std::end(_z)),
thrust::make_reverse_iterator(std::end(_mx)),
thrust::make_reverse_iterator(std::end(_my)),
thrust::make_reverse_iterator(std::end(_mz)),
thrust::make_reverse_iterator(std::end(_c)),
thrust::make_reverse_iterator(std::end(_w))));
StopWatch<> t;
thrust::partition_copy(input_iterator, input_iterator + N, output_iterator_true, output_iterator_false, KeyInfZero{});
auto duration = 1e-6 * t.Elapsed().count();
std::cout << "partition_copy: " << duration << " ms" << std::endl;
assert(thrust::is_partitioned(output_iterator_true, output_iterator_true + N, KeyInfZero{}));
}
{
thrust::device_vector<short> values0(N);
thrust::device_vector<double> x(N);
thrust::device_vector<double> y(N);
thrust::device_vector<double> z(N);
thrust::device_vector<double> mx(N);
thrust::device_vector<double> my(N);
thrust::device_vector<double> mz(N);
thrust::device_vector<double> c(N);
thrust::device_vector<double> w(N);
thrust::device_vector<short> _values0(N);
thrust::device_vector<double> _x(N);
thrust::device_vector<double> _y(N);
thrust::device_vector<double> _z(N);
thrust::device_vector<double> _mx(N);
thrust::device_vector<double> _my(N);
thrust::device_vector<double> _mz(N);
thrust::device_vector<double> _c(N);
thrust::device_vector<double> _w(N);
initialize_keys(values0);
// thrust::sort(std::begin(values0), std::end(values0)); // Sorting the keys does not improve the partitioning speed
auto input_iterator = thrust::make_zip_iterator(thrust::make_tuple(std::begin(values0),
std::begin(x),
std::begin(y),
std::begin(z),
std::begin(mx),
std::begin(my),
std::begin(mz),
std::begin(c),
std::begin(w)));
auto output_iterator_true = thrust::make_zip_iterator(thrust::make_tuple(std::begin(_values0),
std::begin(_x),
std::begin(_y),
std::begin(_z),
std::begin(_mx),
std::begin(_my),
std::begin(_mz),
std::begin(_c),
std::begin(_w)));
// reverse iterators have no overhead in this case
auto output_iterator_false = thrust::make_zip_iterator(thrust::make_tuple(thrust::make_reverse_iterator(std::end(_values0)),
thrust::make_reverse_iterator(std::end(_x)),
thrust::make_reverse_iterator(std::end(_y)),
thrust::make_reverse_iterator(std::end(_z)),
thrust::make_reverse_iterator(std::end(_mx)),
thrust::make_reverse_iterator(std::end(_my)),
thrust::make_reverse_iterator(std::end(_mz)),
thrust::make_reverse_iterator(std::end(_c)),
thrust::make_reverse_iterator(std::end(_w))));
StopWatch<> t;
thrust::copy_if(input_iterator, input_iterator + N, output_iterator_true, KeyInfZero{});
thrust::remove_copy_if(input_iterator, input_iterator + N, output_iterator_false, KeyInfZero{});
auto duration = 1e-6 * t.Elapsed().count();
std::cout << "partition_copy with copy_if: " << duration << " ms" << std::endl;
assert(thrust::is_partitioned(output_iterator_true, output_iterator_true + N, KeyInfZero{}));
}
return 0;
}
I cannot find any description of the thrust::hip::par_nosync
policy.
No response
Here is a reproducer:
test_header.h
#include<thrust/complex.h>
test.cc
#include"test_header.h"
void main()
{}
If I compile test.cc with hipcc, then everything works, but if with g++, then I am getting the following errors:
In file included from /opt/rocm-4.3.1/include/thrust/system/cuda/detail/execution_policy.h:33,
from /opt/rocm-4.3.1/include/thrust/iterator/detail/device_system_tag.h:23,
from /opt/rocm-4.3.1/include/thrust/iterator/iterator_traits.h:111,
from /opt/rocm-4.3.1/include/thrust/detail/type_traits/pointer_traits.h:23,
from /opt/rocm-4.3.1/include/thrust/type_traits/is_contiguous_iterator.h:27,
from /opt/rocm-4.3.1/include/thrust/type_traits/is_trivially_relocatable.h:19,
from /opt/rocm-4.3.1/include/thrust/detail/complex/complex.inl:20,
from /opt/rocm-4.3.1/include/thrust/complex.h:1032,
from complex_test.cc:2:
/opt/rocm-4.3.1/include/thrust/system/cuda/config.h:76:10: fatal error: cub/util_namespace.cuh: No such file or directory
#include <cub/util_namespace.cuh> // This includes <cub/version.cuh> in newer releases.
Compile time error:
In file included from /pytorch/aten/src/ATen/native/hip/Unique.hip:13:
In file included from /opt/rocm/include/thrust/adjacent_difference.h:245:
In file included from /opt/rocm/include/thrust/detail/adjacent_difference.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/adl/adjacent_difference.h:44:
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:284:44: error: no type named 'result_type' in '(lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:
5)'
using result_type = typename BinaryOp::result_type;
~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:374:9: note: in instantiation of function template specialization 'thrust::hip_rocprim::__adjacent_differen
ce::doit_step<long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
doit_step(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:426:32: note: in instantiation of function template specialization 'thrust::hip_rocprim::__adjacent_differe
nce::adjacent_difference<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Un
ique.hip:183:5)>' requested here
__adjacent_difference::adjacent_difference<Derived, InputIt, OutputIt, BinaryOp>
^
/opt/rocm/include/thrust/detail/adjacent_difference.inl:54:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::adjacent_difference<thrust::
detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
return adjacent_difference(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, result, binary_op);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:49:13: note: in instantiation of function template specialization 'thrust::adjacent_difference<thrust::detail::execute_with_all
ocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
thrust::adjacent_difference(policy, data, data + num_inp, inv_loc_ptr, not_equal);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:170:48: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::compute_unique<thrust::d
etail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:173:5), (lambda at /pytorch/aten/sr
c/ATen/native/hip/Unique.hip:183:5)>' requested here
std::tie(inverse_indices, counts, num_out) = compute_unique(
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:225:12: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::unique_dim_cuda_template
' requested here
return unique_dim_cuda_template<scalar_t>(self, dim, false, return_inverse, return_counts);
^
PyTorch requires the thrust::constant_iterator and thrust::thrust::make_constant_iterator API which we seem to be missing from rocThrust currently.
I have a program that builds v.s. rocthrust via CMake.
My build fails, because rocthrust links v.s. roc::rocprim_hip. (in rocthrust-targets.cmake)
So apparently, rocthrust depends on rocprim, but it doesn't require the rocprim package. I can fix this, by adding find_package(rocprim) to rocthrust-config.cmake.
And as a comment, I was a bit confused that the target is rocthrust and not roc::rocthrust, which is not really consistent e.g. with roc::rocprim.
How can I specify the location of the rocPRIM installation or adapted sources to be used by rocThrust?
The CMakeLists.txt option is to download and build rocPRIM but I need to use the version I built that supports my platform or the sources I modified to support my platform.
hipcc --offload-arch=gfx1031 -I/opt/rocm-5.4.0/myspecialrocm/include/thrust -I/opt/rocm-5.4.0/myspecialrocm/include/rocprim discrete_voronoi.cu -o testvoronoi
I tried to redirect hipcc to my custom version of rocThrust but it mixes (see >> comments) the custom and the default installations when I try to compile a program.
hipcc --offload-arch=gfx1031 -I/opt/rocm-5.4.0/myspecialrocm/include/thrust -I/opt/rocm-5.4.0/myspecialrocm/include/rocprim discrete_voronoi.cu -o testvoronoi
In file included from :1:
In file included from /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/include/__clang_hip_runtime_wrapper.h:115:
In file included from /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/include/__clang_hip_math.h:22:
In file included from /opt/rocm-5.4.0/myspecialrocm/include/thrust/limits.h:10: >> using custom version, ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config.h:23: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config/config.h:26: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config/compiler.h:29: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/hip/hip_runtime.h:62:
In file included from /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:105:
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:5243:25: error: use of undeclared identifier 'UINT_MAX'
size_t size __dparm(UINT_MAX));
^
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:6790:75: error: use of undeclared identifier 'UINT_MAX'
const void* devPtr, size_t size = UINT_MAX) {
^
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:6797:75: error: use of undeclared identifier 'UINT_MAX'
const struct hipChannelFormatDesc& desc, size_t size = UINT_MAX) {
^
The example is an example included in rocThrust and was compiled correctly when I installed rocThrust and opted to compile the examples, now I want to make use of the custom version to port software to work on my notebook but am not able to compile a program. I assume the resulting errors are caused by the parts pulled from the default rocThrust rocm stack that doesn't support my GPU.
What's the correct way to enforce the use of the custom version?
Seems need to add "#include <thrust/system/cuda/config.h>" header file into "thrust/system/hip/detail/error.inl" or it will throw error as following.
In file included from /opt/rocm/include/thrust/system/hip/error.h:183:0,
/opt/rocm/include/thrust/system/hip/detail/error.inl:23:1: error: โBEGIN_NS_THRUSTโ does not name a type
BEGIN_NS_THRUST
^~~~~~~~~~~~~~~
/opt/rocm/include/thrust/system/hip/detail/error.inl:81:1: error: โEND_NS_THRUSTโ does not name a type
END_NS_THRUST
^~~~~~~~~~~~~
Please let me know if any of my configuration is not correct, thanks :)
The following algorithms appear to be missing, even though they are all part of the so called ParallelSTL set of standard algorithms:
adjacent_find
find_end
find_first_of
inplace_merge
is_heap
is_heap_until
lexicographical_compare
nth_element
partial_sort
partial_sort_copy
rotate
rotate_copy
search
search_n
shift_left
shift_right
uninitialized_default_construct
uninitialized_default_construct_n
uninitialized_move
uninitialized_move_n
uninitialized_value_construct
uninitialized_value_construct_n
Whilst these are not currently exposed in thrust
itself, it might be opportune to seize the initiative, as many of the above are both useful and used. Thank you.
I'm trying to build a library using roc::rocthrust. I have the following lines in my CMakeLists.txt
...
find_package(rocprim REQUIRED)
find_package(rocthrust REQUIRED)
...
target_link_libraries(strumpack PUBLIC roc::hipblas roc::rocblas roc::rocsolver roc::rocthrust)
The problem is that the roc::rocthrust target also adds flags to the fortran compilation commands:
gfortran: error: unrecognized command line option โ--hip-device-lib-path=/opt/rocm-3.8.0/libโ
gfortran: error: unrecognized command line option โ--cuda-gpu-arch=gfx900โ
gfortran: error: unrecognized command line option โ--cuda-gpu-arch=gfx906โ
I want to build the examples and I'm following the instructions in the README and my error message is below.
Seems closest to issue #116 where use of "-x hip" resolved it, but I need help on where to make the change.
root@b5f40797d3af:~/rocThrust/build# make
--
-- ******** Summary ********
-- General:
-- System : Linux
-- HIP ROOT :
-- C++ compiler : /opt/rocm/hip/bin/hipcc
-- C++ compiler version : 11.0.0
-- CXX flags : -Wno-unused-command-line-argument
-- Build type : Release
-- Install prefix : /opt/rocm
-- Device targets : gfx900:xnack-;gfx906:xnack-;gfx908:xnack-
--
-- DISABLE_WERROR : ON
-- DOWNLOAD_ROCPRIM : OFF
-- BUILD_TEST : OFF
-- BUILD_EXAMPLES : ON
-- BUILD_BENCHMARKS : OFF
-- Building examples
-- Configuring done
-- Generating done
-- Build files have been written to: /root/rocThrust/build
[ 1%] Building CXX object examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o
clang-11: error: Unsupported CUDA gpu architecture: gfx900:xnack-
clang-11: error: Unsupported CUDA gpu architecture: gfx906:xnack-
clang-11: error: Unsupported CUDA gpu architecture: gfx908:xnack-
examples/CMakeFiles/example_thrust_monte_carlo.dir/build.make:62: recipe for target 'examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o' failed
make[2]: *** [examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o] Error 1
CMakeFiles/Makefile2:197: recipe for target 'examples/CMakeFiles/example_thrust_monte_carlo.dir/all' failed
make[1]: *** [examples/CMakeFiles/example_thrust_monte_carlo.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2
From https://rocm.docs.amd.com/projects/rocThrust/en/latest/setup/requirements.html:
please see the [CHANGELOG.md].
Not sure if Changelog is published to the docs site or the link should point to the GitHub repo version of the file.
Keyvaluepair is used in min functions and with the less than (<) operator. This currently does not work:
/pytorch/caffe2/operators/hip/rmac_regions_op.hip:44:12: error: invalid operands to binary expression ('KeyValuePair' (aka 'key_value_pair<int, float>') and 'KeyValuePair')
if (kv < min_kv) {
~~ ^ ~~~~~~
/pytorch/c10/util/typeid.h:90:13: note: candidate function not viable: no known conversion from 'KeyValuePair' (aka 'key_value_pair<int, float>') to 'caffe2::TypeIdentifier
' for 1st argument
inline bool operator<(TypeIdentifier lhs, TypeIdentifier rhs) {
^
/opt/rocm/rocprim/include/rocprim/device/../types/tuple.hpp:894:6: note: candidate template ignored: could not match 'tuple' against 'key_value_pair'
bool operator<(const tuple<TTypes...>& lhs, const tuple<UTypes...>& rhs)
^
In file included from /pytorch/caffe2/operators/hip/rmac_regions_op.hip:2:
In file included from /opt/rocm/hipcub/include/hipcub/hipcub.hpp:37:
In file included from /opt/rocm/hipcub/include/hipcub/rocprim/hipcub.hpp:38:
/opt/rocm/hipcub/include/hipcub/rocprim/device/../thread/thread_operators.hpp:101:18: error: invalid operands to binary expression ('const rocprim::key_value_pair<int, floa
t>' and 'const rocprim::key_value_pair<int, float>')
return a < b ? a : b;
~ ^ ~
/opt/rocm/rocprim/include/rocprim/warp/detail/warp_reduce_shuffle.hpp:61:22: note: in instantiation of function template specialization 'hipcub::Min::operator()<rocprim::ke
y_value_pair<int, float> >' requested here
output = reduce_op(output, value);
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:218:22: note: in instantiation of function template specialization 'rocprim::detail::warp_reduce
_shuffle<rocprim::key_value_pair<int, float>, 64, false>::reducehipcub::Min' requested here
WarpReduce().reduce(
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:174:9: note: in instantiation of function template specialization 'rocprim::detail::block_reduce
_warp_reduce<rocprim::key_value_pair<int, float>, 128>::warp_reduce<false, rocprim::detail::warp_reduce_shuffle<rocprim::key_value_pair<int, float>, 64, false>, hipcub::Min
' requested here
warp_reduce<!block_size_is_warp_multiple_, warp_reduce_input_type>(
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:81:15: note: in instantiation of function template specialization 'rocprim::detail::block_reduce
_warp_reduce<rocprim::key_value_pair<int, float>, 128>::reduce_implhipcub::Min' requested here
this->reduce_impl(
Hello,
It looks like the block size for a parallel_for using the HIP execution policy is hardcoded to 256:
Is that correct? If so, is there a way to change it without creating a custom execution policy?
Thanks in advance.
Some of the const modifiers for Thrust seem inconsistent with rocPRIM/rocThrust
Following compile-time error:
In file included from /pytorch/aten/src/THH/THHTensorMode.hip:7:
In file included from /opt/rocm/include/thrust/device_ptr.h:26:
In file included from /opt/rocm/include/thrust/memory.h:25:
In file included from /opt/rocm/include/thrust/detail/reference.h:173:
In file included from /opt/rocm/include/thrust/detail/reference.inl:23:
In file included from /opt/rocm/include/thrust/system/detail/adl/get_value.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/get_value.h:23:
In file included from /opt/rocm/include/thrust/system/detail/adl/assign_value.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/assign_value.h:25:
In file included from /opt/rocm/include/thrust/system/hip/detail/copy.h:99:
In file included from /opt/rocm/include/thrust/system/hip/detail/internal/copy_device_to_device.h:35:
In file included from /opt/rocm/include/thrust/system/hip/detail/transform.h:44:
In file included from /opt/rocm/rocprim/include/rocprim/rocprim.hpp:55:
In file included from /opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:30:
/opt/rocm/rocprim/include/rocprim/device/detail/device_merge_sort.hpp:365:25: error: no matching function for call to object of type 'const ThrustHalfLess'
const bool ab = compare_function(rocprim::get<0>(a), rocprim::get<0>(b));
^~~~~~~~~~~~~~~~
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_sort_kernel_impl<256, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
block_sort_kernel_impl(
^
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:179:25: note: in instantiation of function template specialization 'rocprim::detail::block_sort_kernel<256, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
HIP_KERNEL_NAME(block_sort_kernel<block_size>),
^
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:429:20: note: in instantiation of function template specialization 'rocprim::detail::merge_sort_impl<rocprim::default_config, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return detail::merge_sort_impl(
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:89:23: note: in instantiation of function template specialization 'rocprim::merge_sort<rocprim::default_config, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return rocprim::merge_sort(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:123:36: note: in instantiation of function template specialization 'thrust::hip_rocprim::__merge_sort::dispatch<thrust::detail::integral_constant<bool, true> >::doit<thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, long, ThrustHalfLess>' requested here
status = dispatch<SORT_ITEMS>::doit(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:351:19: note: in instantiation of function template specialization 'thrust::hip_rocprim::__merge_sort::merge_sort<thrust::detail::integral_constant<bool, true>, thrust::hip_rocprim::execution_policy<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base> >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
__merge_sort::merge_sort<SORT_ITEMS>(policy,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:428:19: note: in instantiation of function template specialization 'thrust::hip_rocprim::__smart_sort::smart_sort<thrust::detail::integral_constant<bool, true>, thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
__smart_sort::smart_sort<detail::true_type, Derived, KeysIt, ValuesIt, CompareOp>
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:448:16: note: in instantiation of function template specialization 'thrust::hip_rocprim::stable_sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
hip_rocprim::stable_sort_by_key(policy, keys_first, keys_last, values, compare_op);
^
/opt/rocm/include/thrust/detail/sort.inl:115:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return sort_by_key(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), keys_first, keys_last, values_first, comp);
^
/pytorch/aten/src/THH/generic/THHTensorMode.hip:43:11: note: in instantiation of function template specialization 'thrust::sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
thrust::sort_by_key(
^
/pytorch/aten/src/THH/THHTensorMode.cuh:11:35: note: candidate function not viable: 'this' argument has type 'const ThrustHalfLess', but method is not marked const
host device inline bool operator()(const at::Half& lhs, const at::Half& rhs) {
^
const modifiers for thrust::tabulate are inconsistent w/ the API expectation from Thrust:
Compile time error for PyTorch:
In file included from /pytorch/aten/src/ATen/native/hip/RangeFactories.hip:9:
In file included from /opt/rocm/include/thrust/sequence.h:295:
In file included from /opt/rocm/include/thrust/detail/sequence.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/generic/sequence.h:63:
In file included from /opt/rocm/include/thrust/system/detail/generic/sequence.inl:20:
In file included from /opt/rocm/include/thrust/tabulate.h:128:
In file included from /opt/rocm/include/thrust/detail/tabulate.inl:22:
In file included from /opt/rocm/include/thrust/system/detail/adl/tabulate.h:44:
/opt/rocm/include/thrust/system/hip/detail/tabulate.h:55:62: error: no matching function for call to object of type 'const at::native::LinspaceOp<double, double>'
[tabulate_op](size_type i) { return tabulate_op(i); });
^~~~~~~~~~~
/opt/rocm/include/thrust/detail/tabulate.inl:37:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::tabulate<thrust::hip_rocprim::tag, thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
return tabulate(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, unary_op);
^
/opt/rocm/include/thrust/detail/tabulate.inl:52:18: note: in instantiation of function template specialization 'thrust::tabulate<thrust::hip_rocprim::tag, thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
return thrust::tabulate(select_system(system), first, last, unary_op);
^
/pytorch/aten/src/ATen/native/hip/RangeFactories.hip:59:15: note: in instantiation of function template specialization 'thrust::tabulate<thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
thrust::tabulate(data_, data_ + steps, linspace_method);
^
/pytorch/aten/src/ATen/native/hip/RangeFactories.hip:18:32: note: candidate function not viable: 'this' argument has type 'const at::native::LinspaceOp<double, double>', but method is not marked const
device forceinline T operator()(ptrdiff_t index) {
^
I wrote a benchmark which random data generation partly base on rocThrust, the code can be accessed through
https://github.com/LeiWang1999/rocblas-benchmark
From benchmarking, I found that the data generation took too much time on thrust::copy if the data size is arised at
std::make_tuple(2048, 32768, 8192, false, false),
std::make_tuple(16384, 16384, 16384, false, false),
do you have any comments?
When I try to compile rocThrust, I get an error: no member named 'init_offset_scan_state_kernel' ...
...
In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/detail/scan.inl:29: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/detail/adl/scan_by_key.h:44: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/detail/scan_by_key.h:36: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/execution_policy.h:81: /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/detail/set_operations.h:956:61: error: no member named 'init_offset_scan_state_kernel' in namespace 'rocprim::detail'; did you mean 'init_lookback_scan_state_kernel'? hipLaunchKernelGGL(HIP_KERNEL_NAME(rocprim::detail::init_offset_scan_state_kernel), ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ init_lookback_scan_state_kernel /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:199:30: note: expanded from macro 'HIP_KERNEL_NAME' #define HIP_KERNEL_NAME(...) __VA_ARGS__ ^~~~~~~~~~~ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:251:74: note: expanded from macro 'hipLaunchKernelGGL' #define hipLaunchKernelGGL(kernelName, ...) hipLaunchKernelGGLInternal((kernelName), __VA_ARGS__) ^~~~~~~~~~ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:248:9: note: expanded from macro 'hipLaunchKernelGGLInternal' kernelName<<<(numBlocks), (numThreads), (memPerBlock), (streamId)>>>(__VA_ARGS__); \ ^~~~~~~~~~ /opt/rocm-5.4.0/include/rocprim/device/detail/device_scan_common.hpp:76:60: note: 'init_lookback_scan_state_kernel' declared here __launch_bounds__(ROCPRIM_DEFAULT_MAX_BLOCK_SIZE) void init_lookback_scan_state_kernel( ^ In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/test/test_zip_iterator.cpp:18: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/copy.h:512:
...
This error comes up several times before the compile process is stopped.
I tested rocPRIM after the installation and all tests were successful.
How to approach this error? Is it rocPRIM related or rocThrust related?
Hi all,
I am looking at rocThrust library. Are there any open-source workloads/benchmarks/applications or sample codes that directly using rocThrust?
Thanks
I am currently at commit 10c7281. Just like in a previously reported issue #163, I am encountering the following warnings
/rocThrust/thrust/system/hip/detail/util.h:116:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
hipStreamSynchronize(stream(policy));
/rocThrust/thrust/system/hip/memory_resource.h:59:17: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
hipGetLastError(); // Clear the HIP global error state.
If HCC is deprecated is this not also?
Compile time error from PyTorch:
In file included from /pytorch/aten/src/ATen/native/hip/Unique.hip:5:
In file included from /opt/rocm/include/thrust/execution_policy.h:31:
In file included from /opt/rocm/include/thrust/system/cpp/execution_policy.h:64:
In file included from /opt/rocm/include/thrust/system/cpp/detail/sort.h:22:
In file included from /opt/rocm/include/thrust/system/detail/sequential/sort.h:63:
In file included from /opt/rocm/include/thrust/system/detail/sequential/sort.inl:23:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_primitive_sort.h:55:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_primitive_sort.inl:21:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_radix_sort.h:55:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_radix_sort.inl:20:
In file included from /opt/rocm/include/thrust/copy.h:513:
In file included from /opt/rocm/include/thrust/detail/copy_if.h:74:
In file included from /opt/rocm/include/thrust/detail/copy_if.inl:20:
In file included from /opt/rocm/include/thrust/system/detail/generic/copy_if.h:63:
In file included from /opt/rocm/include/thrust/system/detail/generic/copy_if.inl:31:
In file included from /opt/rocm/include/thrust/scan.h:1563:
In file included from /opt/rocm/include/thrust/detail/scan.inl:29:
In file included from /opt/rocm/include/thrust/system/detail/adl/scan_by_key.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/scan_by_key.h:33:
In file included from /opt/rocm/include/thrust/system/hip/execution_policy.h:41:
In file included from /opt/rocm/include/thrust/system/hip/detail/count.h:35:
In file included from /opt/rocm/include/thrust/system/hip/detail/reduce.h:42:
In file included from /opt/rocm/include/thrust/device_vector.h:26:
In file included from /opt/rocm/include/thrust/detail/vector_base.h:547:
In file included from /opt/rocm/include/thrust/detail/vector_base.inl:25:
In file included from /opt/rocm/include/thrust/equal.h:237:
In file included from /opt/rocm/include/thrust/detail/equal.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/generic/equal.h:47:
In file included from /opt/rocm/include/thrust/system/detail/generic/equal.inl:21:
In file included from /opt/rocm/include/thrust/mismatch.h:259:
In file included from /opt/rocm/include/thrust/detail/mismatch.inl:27:
In file included from /opt/rocm/include/thrust/system/detail/generic/mismatch.h:57:
In file included from /opt/rocm/include/thrust/system/detail/generic/mismatch.inl:21:
In file included from /opt/rocm/include/thrust/find.h:384:
In file included from /opt/rocm/include/thrust/detail/find.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/generic/find.h:62:
In file included from /opt/rocm/include/thrust/system/detail/generic/find.inl:19:
In file included from /opt/rocm/include/thrust/reduce.h:784:
In file included from /opt/rocm/include/thrust/detail/reduce.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/generic/reduce_by_key.h:88:
In file included from /opt/rocm/include/thrust/system/detail/generic/reduce_by_key.inl:30:
In file included from /opt/rocm/include/thrust/scatter.h:422:
In file included from /opt/rocm/include/thrust/detail/scatter.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/adl/scatter.h:44:
/opt/rocm/include/thrust/system/hip/detail/scatter.h:50:20: error: type 'long *' cannot be used prior to '::' because it has no members
typedef typename ItemsIt::value_type ItemsTy;
^
/opt/rocm/include/thrust/detail/scatter.inl:45:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::scatter<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, const long *, long *>' requested here
return scatter(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, map, output);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:51:13: note: in instantiation of function template specialization 'thrust::scatter<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, const long *, long *>' requested here
thrust::scatter(policy, inv_loc_ptr, inv_loc_ptr + num_inp, sorted_indices_ptr, inverse_indices_ptr);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:105:48: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::compute_unique<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, unsigned char, thrust::equal_to, thrust::not_equal_to >' requested here
std::tie(inverse_indices, counts, num_out) = compute_unique(
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:207:46: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::unique_cuda_template' requested here
std::tie(output, inverse, std::ignore) = unique_cuda_template<scalar_t>(self, false, return_inverse, false);
^
Hi all, I am starting to compile rocThrust.
I tried to cmake with:
cmake ../. -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc
-- The CXX compiler identification is Clang 12.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
----------------------------------- ERROR -----------------------------------
Seems that cmake needs to set up CMAKE_CXX_FLAGS to something to successfully generate the correct make file, currently "make -j" does not compile anything.
But I did not see any instruction to set up CMAKE_CXX_FLAGS? Is there any other details/documents I missed to compile rocThrust?
Thanks!
The issue is similar to ROCm/rocPRIM#570.
With hardened libstdc++ (compiled with -D_GLIBCXX_ASSERTIONS), few tests rocThrust 6.1.1 fail with:
[ RUN ] AsyncReduceTests/3.TestAsyncReduce
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<unsigned short>::param_type::param_type(_IntType, _IntType) [_IntType = unsigned short]: Assertion '_M_a <= _M_b' failed.
[ RUN ] AsyncTransformTests/3.TestAsyncTransformUnary
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<unsigned short>::param_type::param_type(_IntType, _IntType) [_IntType = unsigned short]: Assertion '_M_a <= _M_b' failed.
[ RUN ] PrimitiveGatherTests/0.Gather
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<unsigned int>::param_type::param_type(_IntType, _IntType) [_IntType = unsigned int]: Assertion '_M_a <= _M_b' failed.
[ RUN ] PrimitiveInnerProductTests/0.InnerProductWithRandomData
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<short>::param_type::param_type(_IntType, _IntType) [_IntType = short]: Assertion '_M_a <= _M_b' failed.
[ RUN ] ScanByKeyTests.TestScanByKeyMixedTypes
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<unsigned int>::param_type::param_type(_IntType, _IntType) [_IntType = unsigned int]: Assertion '_M_a <= _M_b' failed.
In lines like get_random_data<T>(2, -100, 100, seed)
tests attempt to generate T=unsigned int
in range, where min (4294967196) > max (100), which is undefined behavior. Can you fix it (and maybe in rocPRIM too)? Thanks!
Gentoo
AMD Instinct MI300X
ROCm 6.1.0
No response
No response
No response
No response
Is there a rocThrust function for thrust::cuda::par() ? Thanks.
thrust::transform(
thrust::rocm::par.on(stream),
Hi all,
I have successfully build and tested my application with HCC(ROCm-3.3). Now, I am trying to build my app with hip-clang(ROCm-3.5) as HCC is deprecated but, I am getting compilation issues because of the rocThrust library. The thrust device system is getting selected as CUDA but it should be HIP.
I am attaching two files for further details.
I would like to know, how can i resolve this issue as i am unable to do so. If some more details are required please let me know.
Best regards,
Saurabh.
On Ubuntu 20.04 system with rocm 5.2 compiling code using rocThrust fails (Score.cpp has "#include <thrust/host_vector.h>" which causes compile to fail):
clang++ -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.2.0/include -I/opt/rocm-5.2.0/llvm/bin/../lib/clang/14.0.0 -I/opt/rocm-5.2.0/hsa/include -c -std=c++20 -g -O0 -pg Score.cpp
In file included from Score.cpp:11:
In file included from /opt/rocm-5.2.0/include/thrust/host_vector.h:27:
In file included from /opt/rocm-5.2.0/include/thrust/detail/vector_base.h:25:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/normal_iterator.h:27:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/iterator_adaptor.h:36:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/iterator_facade.h:37:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/iterator_facade_category.h:22:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/device_system_tag.h:23:
In file included from /opt/rocm-5.2.0/include/thrust/system/cuda/detail/execution_policy.h:35:
/opt/rocm-5.2.0/include/thrust/system/cuda/config.h:33:10: fatal error: 'cub/util_namespace.cuh' file not found
#include <cub/util_namespace.cuh>
^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
I followed the example https://github.com/ROCm-Developer-Tools/HIP-Examples/tree/master/openmp-helloworld and built it successfully with hipcc. But when I tried to add rocthrust containers in the source file, errors occurred.
I have also tried to build this example with only rocthrust and it works fine.
My environment:
Error messages:
In file included from openmp_helloworld.cpp:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.h:29:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.h:240:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:23:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.h:35:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:22:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/uninitialized_fill.h:274:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:26:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/adl/uninitialized_fill.h:44:
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:62:13: error: no matching function for call to 'operator new'
::new(static_cast<void*>(&out)) value_type(value);
^~
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:66:17: note: in instantiation of function template specialization 'thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>::operator()<unsigned long>' requested here
f(tile_base + idx);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:95:44: note: in instantiation of function template specialization 'thrust::hip_rocprim::__parallel_for::kernel<256, 1, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<block_size, items_per_thread, F, Size>),
^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:131:50: note: in instantiation of function template specialization 'thrust::hip_rocprim::__parallel_for::parallel_for<thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here hipError_t status = __parallel_for::parallel_for(count, f, stream);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:119:12: note: in instantiation of member function 'thrust::hip_rocprim::parallel_for(execution_policy<thrust::hip_rocprim::tag> &, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long)::workaround::par' requested here
struct workaround
^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:76:18: note: in instantiation of function template specialization 'thrust::hip_rocprim::parallel_for<thrust::hip_rocprim::tag, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here
hip_rocprim::parallel_for(policy, functor_t(first, x), count);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:54:10: note: (skipping 3 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
return uninitialized_fill_n(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, n, x);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:251:3: note: in instantiation of function template specialization 'thrust::detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
default_construct_range(m_allocator, first.base(), n);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:220:15: note: in instantiation of member function 'thrust::detail::contiguous_storage<int, thrust::device_allocator<int>>::default_construct_n' requested here
m_storage.default_construct_n(begin(), size());
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:65:3: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::default_init' requested here
default_init(n);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:95:8: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::vector_base' requested here
:Parent(n) {}
^
openmp_helloworld.cpp:41:32: note: in instantiation of member function 'thrust::device_vector<int, thrust::device_allocator<int>>::device_vector' requested here
thrust::device_vector<int> v(10);
^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:134:7: note: candidate function not viable: call to __host__ function from __device__ function
void* operator new(std::size_t, const std::nothrow_t&) _GLIBCXX_USE_NOEXCEPT
^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:168:14: note: candidate function not viable: call to __host__ function from __device__ function
inline void* operator new(std::size_t, void* __p) _GLIBCXX_USE_NOEXCEPT
^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:120:7: note: candidate function not viable: requires 1 argument, but 2 were provided
void* operator new(std::size_t) _GLIBCXX_THROW (std::bad_alloc)
^
note: candidate function not viable: requires 1 argument, but 2 were provided
In file included from openmp_helloworld.cpp:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.h:29:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.h:240:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:22:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/copy_construct_range.h:46:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/copy_construct_range.inl:21:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/copy.h:90:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/copy.inl:21:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/copy.h:58:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/copy.inl:23:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/transform.h:724:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/transform.inl:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/transform.h:105:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/transform.inl:19:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/for_each.h:279:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/for_each.inl:27:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/adl/for_each.h:44:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/for_each.h:35:
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:141:17: error: no matching function for call to object of type 'thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>'
f(idx);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:119:12: note: in instantiation of member function 'thrust::hip_rocprim::parallel_for(execution_policy<thrust::hip_rocprim::tag> &, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long)::workaround::seq' requested here
struct workaround
^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:76:18: note: in instantiation of function template specialization 'thrust::hip_rocprim::parallel_for<thrust::hip_rocprim::tag, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here
hip_rocprim::parallel_for(policy, functor_t(first, x), count);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:54:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::uninitialized_fill_n<thrust::hip_rocprim::tag, thrust::device_ptr<int>, unsigned long, int>' requested here
return uninitialized_fill_n(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, n, x);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:94:11: note: in instantiation of function template specialization 'thrust::uninitialized_fill_n<thrust::hip_rocprim::tag, thrust::device_ptr<int>, unsigned long, int>' requested here
thrust::uninitialized_fill_n(allocator_system<Allocator>::get(a), p, n, typename pointer_element<Pointer>::type());
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:105:35: note: in instantiation of function template specialization 'thrust::detail::allocator_traits_detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
return allocator_traits_detail::default_construct_range(a,p,n);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:251:3: note: in instantiation of function template specialization 'thrust::detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
default_construct_range(m_allocator, first.base(), n);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:220:15: note: in instantiation of member function 'thrust::detail::contiguous_storage<int, thrust::device_allocator<int>>::default_construct_n' requested here
m_storage.default_construct_n(begin(), size());
^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:65:3: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::default_init' requested here
default_init(n);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:95:8: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::vector_base' requested here
:Parent(n) {}
^
openmp_helloworld.cpp:41:32: note: in instantiation of member function 'thrust::device_vector<int, thrust::device_allocator<int>>::device_vector' requested here
thrust::device_vector<int> v(10);
^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:58:41: note: candidate template ignored: substitution failure [with Size = unsigned long]
void THRUST_HIP_DEVICE_FUNCTION operator()(Size idx)
^
2 errors generated when compiling for host.
openmp_helloworld.cpp:
// HIP header
#include <hip/hip_runtime.h>
#include <thrust/device_vector.h>
//OpenMP header
#include <omp.h>
#include <stdio.h>
#include <stdlib.h>
__global__
void hip_helloworld(unsigned omp_id)
{
printf("Hello World... from HIP thread = %u\n", omp_id);
}
int main(int argc, char* argv[])
{
thrust::device_vector<int> v(10);
// No errors
//thrust::device_vector<int> v;
// Beginning of parallel region
#pragma omp parallel
{
auto tid = omp_get_thread_num();
printf("Hello World... from OMP thread = %d\n", tid);
hipLaunchKernelGGL(hip_helloworld, dim3(1), dim3(1), 0, 0, tid);
}
// Ending of parallel region
hipLaunchKernelGGL(hip_helloworld, dim3(1), dim3(1), 0, 0, /*id=*/ 0);
hipStreamSynchronize(0);
printf("PASSED!\n");
return 0;
}
Commands:
hipcc -xhip -fopenmp=libomp openmp_helloworld.cpp
CMake gives the same error. This is my cmake file:
project(openmp_helloworld)
cmake_minimum_required(VERSION 3.16)
# Search for rocm in common locations
if(NOT DEFINED ROCM_PATH)
if(DEFINED ENV{ROCM_PATH})
set(ROCM_PATH $ENV{ROCM_PATH} CACHE PATH "ROCm path")
elseif(DEFINED ENV{HIP_PATH})
set(ROCM_PATH "$ENV{HIP_PATH}/.." CACHE PATH "ROCm path")
else()
set(ROCM_PATH "/opt/rocm" CACHE PATH "ROCm path")
endif()
endif()
set(HIP_PATH "${ROCM_PATH}/hip" CACHE PATH "HIP path")
list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH} ${HIP_PATH})
find_package(hip REQUIRED)
find_package(rocprim REQUIRED CONFIG)
find_package(rocthrust REQUIRED CONFIG)
# Find OpenMP
find_package(OpenMP REQUIRED)
# Source files
set(CPP_SOURCES ${CMAKE_SOURCE_DIR}/openmp_helloworld.cpp)
# Preparing the executable
add_executable(test_openmp_helloworld ${CPP_SOURCES})
target_link_libraries(test_openmp_helloworld
PRIVATE hip::device
roc::rocthrust
OpenMP::OpenMP_CXX
)
I am attempting to run some benchmarks that use thrust in them through the gem5 simulator, which currently only supports ROCm 1.6.x. While hipify'ing the applications, I need to port the thrust calls to use rocThrust (which I did). However, after looking through the release tags and the commits for rocThrust, it appears that rocThrust only supports ROCm 2.5+. Is this correct? Or is there an (older) commit that has support for older versions of ROCm?
Thanks,
Matt
I have an AMD Ryzen 9 7900X 12-core / 24-threads processor and I'm trying to run our code on both the CPU and iGP 0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c2)
for comparison.
Some of our code depends on Thrust, for which we are using rocThrust for HIP (1.17.2p0 as reported by the one shipping with ROCm 5.6.0, official .deb from AMD).
We have noticed that a sort_by_key with custom operator is extremely slow on GPU, averaging 30ms for 1.4M data points compared to ~20ms on CPU with the OpenMP backend.
Sample code for testing can be found in my titanxstall repo, rocthrust-perf
branch. The results can be tested as:
git clone [email protected]:Oblomov/titanxstall -b rocthrust-perf thrust-perf
cd thrust-perf
make clean ; make test
make clean ; make cpu=1 openmp=1 test
and comparing the results between the HIP and OpenMP runs. The sort is run 5 times, and on GPU the first run should be discarded as it includes host-to-device data transfers due to the use of managed allocations. (The original code does not use managed allocations, but similar performance differences have been observed, so this doesn't seem to be related.)
Please add support for NAVI22 and NAVI23 i.e. gfx1031 and gfx1032 to enable HIP development leveraging rocThrust on Notebooks as ALL AMD mobile GPUs are NAVI22 or NAVI23 i.e. gfx1031 or gfx1032.
Relevant engineering and scientific software uses Thrust. Porting this software to AMD hardware requires rocThrust support on development hardware. Please align with the rocPRIM developers as it`s a dependency.
My target operating system is RHEL9 = AlmaLinux 9, Rocky Linux ...
From Requirements:
Device Architectures
Thrust and CUB support all NVIDIA device architectures since SM 35.
From User Guide:
The HIP ported library works on HIP/ROCm platforms. Currently there is no CUDA backend in place.
There is no CUDA back-end but the NVIDIA devices are supported?
Also, the ROCm link is broken - ROCm.
Hi everyone,
I am new to Rocm and just installed rocThrust. When I tried to compiled my hip cpp code using hipcc (rocm 5.4.0). I got
rocThrust/include/thrust/type_traits/integer_sequence.h:61:31: error: no template named 'integer_sequence' in namespace 'std'.
This 'integer_sequence' should come from include utility.h. I used -std=c++14 tag but it persisted. When I turned to include utility.h in a test code and compiled it using g++. Things went well. But hipcc still cannot compile the test code with same errors. Is there anyone can tell me how fix this issue?
Thanks a lot!
Hi all,
I tried to compile the rocThrust test.
I tried to cmake with:
CXX=hipcc cmake -DBUILD_TEST=ON ../.
and then:
make
It shows:
/home/echo/workspace/rocThrust/thrust/../thrust/system/hip/detail/guarded_driver_types.h:50:10: fatal error: 'hip/amd_detail/host_defines.h' file not found
#include <hip/amd_detail/host_defines.h>
However, under ROCm directory, I do not find hip/amd_detail/host_defines.h, instead I found /opt/rocm/hip/include/hip/hcc_detail/host_defines.h.
Do I miss any configuration, installation, or redirection steps?
Other than that, I also have a build problem in #179.
Could that error introduce this problem?
Thanks!
A recent clang change (llvm/llvm-project#90152) revealed a bug in rocThrust develop branch
Line 2756 in f3a28e4
rocThrust/thrust/../thrust/optional.h:2756:11: error: no member named 'construct' in 'optional<type-parameter-0-0 &>'
In this->construct
, the arrow operator cannot be overloaded, therefore it can only access its own member function. However, construct is not its member function, since optional<T &>
does not define construct
. Although optional<T>
inherits construct
from its base classes, its member construct
is not inherited by optional<T &>
since they do not have inherit relation.
Since optional<T &>
behaves like a delegate of its member m_value
, my guess of intention of this->construct
is to
perform m_value->construct
, i.e., using the overloaded arrow operator. To make the arrow operator taking effect, (*this)->
can be used. Therefor, the fix is
(*this)->construct(std::forward<Args>(args)...);
Ubuntu 22.04
any
AMD Radeon Pro W7900
ROCm 6.1.0
No response
No response
No response
No response
Hello,
this is illustrated by trying to compile this 4-liner
#define HIP_TEMPLATE_KERNEL_LAUNCH
#include <thrust/device_vector.h>
device_ptr<float4> a;
device_ptr<uint> b : c < inclusive_scan_by_key(b, b, a, a);
obtained by applying Creduce to a larger piece of code.
With HIP_TEMPLATE_KERNEL_LAUNCH
, this fails with:
error: no matching function for call to 'hipLaunchKernelGGL
Without the define, inclusive_scan_by_key
compiles correctly. This is a regression since in the past (at least in 4.2.0) this used to work.
We've recently tried to build our library (ArborX) vs the newly release rocm 6.0 and rocthrust 3. We have
find_package(rocthrust 2.10.5 REQUIRED CONFIG)
in our CMake config file. This fails for the new 3.0 rocthrust because it has SameMajorVersion
compatibility mode.
I just wanted to check if that is intentional, or whether it could be switched to AnyNewerVersion
so that downstream codes don't have to work around it.
No response
No response
No response
Compiling a HIP program shows the following errors. Thanks for your suggestion.
In file included from /opt/rocm-6.0.0/include/thrust/system/detail/adl/malloc_and_free.h:26:
/opt/rocm-6.0.0/include/thrust/system/detail/sequential/malloc_and_free.h:45:8: error: reference to __host__ function 'free' in __host__ __device__ function
std::free(thrust::raw_pointer_cast(ptr));
In file included from /opt/rocm-6.0.0/include/thrust/system/detail/adl/malloc_and_free.h:26:
/opt/rocm-6.0.0/include/thrust/system/detail/sequential/malloc_and_free.h:37:15: error: reference to __host__ function 'malloc' in __host__ __device__ function
return std::malloc(n);
rocThrust is triggering a warning in a call to hipGetLastError
/rocthrust/thrust/system/hip/detail/malloc_and_free.h:71:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
hipGetLastError(); // Clear global hip error state.
^~~~~~~~~~~~~~~
This is with:
% clang --version
clang version 12.0.0 (/llvm/llvm-project/clang 009181e63cd6a46681319335b18180d91d05f241)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.