Giter Site home page Giter Site logo

rocm / rocthrust Goto Github PK

View Code? Open in Web Editor NEW
97.0 23.0 44.0 6.49 MB

ROCm Thrust - run Thrust dependent software on AMD GPUs

Home Page: https://rocm.docs.amd.com/projects/rocThrust/en/latest/

License: Apache License 2.0

CMake 1.37% C++ 76.38% Cuda 17.79% C 3.03% Shell 0.17% Groovy 0.11% Python 1.14% Makefile 0.02%

rocthrust's Issues

rocm-5.0.2 async_copy test failed on 6700xt

Summary

I compiled rocThrust-rocm-5.0.2 for gfx1031, and find 1 failed (async_copy) test among 113. Other tests passed.

Environment

Hardware description
GPU Navy_flounder [Radeon RX 6700XT]
CPU AMD Ryzen 9 5950X
Software version
Linux 5.17.3
Distribution Gentoo
ROCK Upstream Kernel
ROCR v5.0.2
Host Compiler gcc-11.2
Device Compiler hipcc-5.0.2

Log

Command: "/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-5.0.2_build/test/async_copy.hip"
Directory: /ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-5.0.2_build/test
"async_copy.hip" start time: May 02 15:50 CST
Output:
----------------------------------------------------------
Running main() from /opt/build/portage/dev-cpp/gtest-1.11.0/work/googletest-aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e/googletest/src/gtest_main.cc
[==========] Running 32 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 4 tests from AsyncCopyTests/0, where TypeParam = Params<short>
[ RUN      ] AsyncCopyTests/0.TestAsyncTriviallyRelocatableElementsHostToDevice
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:78: Failure
Expected equality of these values:
  h0
    Which is: { -32768 }
  d0
    Which is: { 0 }
Google Test trace:
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:66: with seed= 1
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:63: with size = 1
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:85: with device_id= 0
[  FAILED  ] AsyncCopyTests/0.TestAsyncTriviallyRelocatableElementsHostToDevice, where TypeParam = Params<short> (17005 ms)

C++ compiler should not be enforced globally

When doing add_subdirectory(ThirdParty/rocThrust) from a project, rocThrust fails to install rocprim with the following error message:

CMake Error at cmake/VerifyCompiler.cmake:29 (message):
  On ROCm platform 'hipcc' or HIP-aware Clang must be used as C++ compiler.
Call Stack (most recent call first):
  CMakeLists.txt:82 (include)

Please correct me, if I'm wrong: this message requests that the CMAKE_CXX_COMPILER must be hipcc. I believe this requirement contradicts with the way how modern CMake is designed. Starting from CMake 3.18/3.23, CUDA and HIP are CMake "languages" that could be enabled with e.g. enable_language(HIP). The hipcc compiler is then provided by CMAKE_HIP_COMPILER, independently of the main CMAKE_CXX_COMPILER. Furthermore, CMake chooses the C++ compiler variant, depending on the file extension (.cpp, .cu, .hip, etc.). In order to comply with this flexibility of CMake, rocThrust should not post this error, and instead check for CMAKE_HIP_COMPILER, and use it internally if it wishes so.

How can I set the path etc. to an alternative rocThrust installation to be used by the compiler?

I installed architecture specific versions of rocPRIM and rocThrust in the subdirectory:

/opt/rocm-5.4.0/myspecialrocm

When I try to compile an application for this architecture, the compiler picks the wrong version from the Linux distribution specific installation. When compiling rocThust itself, setting -DCMAKE_NO_SYSTEM_FROM_IMPORTED=TRUE and -DCMAKE_INSTALL_PREFIX=/opt/rocm-5.4.0/myspecialrocm did the trick.

How can I set the path etc. to this alternative rocThrust installation to be used by the compiler?

`reduce_by_key` fails with custom reduction

I'm using reduce_by_key to implement a segmented argmin. The following piece of code fails with a access violationt with ROCm 5.6.0 (since little changed since then, I assume it's still failing), the equivalent code works fine with NVIDIA thrust. I haven't been able to debug the corresponding issues, since I don't have access to a system with sufficient driver support for rocgdb. My educated guess is that the reduction is being evaluated for invalid/out-of-bounds argument values, which leads to out-of-bounds accesses to degree

#include <thrust/device_vector.h>
#include <thrust/reduce.h>

template <typename IndexType>
struct node_min_degree_reduction {
    __device__ __forceinline__ IndexType operator()(IndexType u, IndexType v)
    {
        return thrust::make_pair(degree[u], u) < thrust::make_pair(degree[v], v)
                   ? u
                   : v;
    }

    const IndexType* degree;
};

int main() {
        using IndexType = int;
        thrust::device_vector<int> k(1000, 0);
        thrust::device_vector<int> v{1000, 1};
        thrust::device_vector<int> res1(1);
        thrust::device_vector<int> res2(1);
        thrust::device_vector<int> degrees(1000, 1);
        thrust::reduce_by_key(k.begin(), k.end(), v.begin(), res1.begin(), res2.begin(), thrust::equal_to<IndexType>{},
                          node_min_degree_reduction<IndexType>{degrees.data().get()});

}

Please enable two factor authentication in your github account

@sbalint98;@Maetveis;@MathiasMagnus;@nolmoonen

We are going to enforce two factor authentication in (https://github.com/ROCmSoftwarePlatform/) organization on 29th April, 2022 .
Since we identified you as outside collaborator for ROCmSoftwarePlatform organization, you need to enable two factor authentication in your github account else you shall be removed from the organization after the enforcement.
Please skip if already done.

To set up two factor authentication, please go through the steps in below link:

https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/configuring-two-factor-authentication

Please email "[email protected]" for queries

partition_copy vs copy_if/remove_copy_if

Hi,

I was looking that the performance of the thrust::partition_copy and found it quite slow compared to what I believe is a similar operation, that is, thrust::copy_if and thrust::remove_copy_if. In fact, partition_copy seems x2 slower.

Hardware description
GPU MI100:amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
CPU AMD EPYC 7542
Software version
Distribution Redhat
ROCm 4.5.0 to 5.0.2

The reproducer is a bit verbose but the code is straight forward:

#include <assert.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>
#include <thrust/functional.h>
#include <thrust/host_vector.h>
#include <thrust/random.h>
#include <thrust/sort.h>

#include <ctime>
#include <iterator>


template <typename Duration = std::chrono::nanoseconds>
class StopWatch {
public:
    /// Nanoseconds by default
    ///
    using duration   = Duration;
    using time_point = std::chrono::steady_clock::time_point;

    static_assert(std::chrono::steady_clock::is_steady, "Only steady clocks (the ones that only go forward) !");

public:
    /// Start the StopWatch
    ///
    StopWatch();

    /// Return the Return time difference (duration) between the last Lap() or Reset() or object construction.
    /// Do not call Elapsed() and then Reset(), just call Lap() instead.
    ///
    duration Elapsed() const;

    /// Same as lap but do not return the time difference (duration).
    /// Do not call Elapsed() and then Reset(), just call Lap() instead.
    ///
    void Reset();

protected:
    time_point the_last_point_in_time_;
};

////////////////////////////////////////////////////////////////////////
// StopWatch methods definition
////////////////////////////////////////////////////////////////////////

template <typename Duration>
StopWatch<Duration>::StopWatch()
    : the_last_point_in_time_{std::chrono::steady_clock::now()} {
    // EMPTY
}

template <typename Duration>
typename StopWatch<Duration>::duration
StopWatch<Duration>::Elapsed() const {
    // The cast is a noop if std::chrono::steady_clock::duration "is same" duration. For other duration, I dunno.
    return std::chrono::duration_cast<duration>(std::chrono::steady_clock::now() - the_last_point_in_time_);
}

template <typename Duration>
void StopWatch<Duration>::Reset() {
    the_last_point_in_time_ = std::chrono::steady_clock::now();
}

struct KeyInfZero {
    template <typename Tuple>
    __host__ __device__ constexpr bool
    operator()(const Tuple& a_tuple) const {
        static_cast<void>(a_tuple);
        return thrust::get<0>(a_tuple) < 0;
    }
};

template <typename T>
void initialize_keys(thrust::device_vector<T>& keys) {
    thrust::default_random_engine         rng;
    thrust::uniform_int_distribution<int> dist(0, keys.size());

    thrust::host_vector<T> h_keys(keys.size());

    for(size_t i = 0; i < h_keys.size(); i++) {
        h_keys[i] = dist(rng);
    }

    keys = h_keys;
}

int main(void) {
    static constexpr size_t N = 10'000'000; // / 10;

    {
        thrust::device_vector<short>  values0(N);
        thrust::device_vector<double> x(N);
        thrust::device_vector<double> y(N);
        thrust::device_vector<double> z(N);
        thrust::device_vector<double> mx(N);
        thrust::device_vector<double> my(N);
        thrust::device_vector<double> mz(N);
        thrust::device_vector<double> c(N);
        thrust::device_vector<double> w(N);

        thrust::device_vector<short>  _values0(N);
        thrust::device_vector<double> _x(N);
        thrust::device_vector<double> _y(N);
        thrust::device_vector<double> _z(N);
        thrust::device_vector<double> _mx(N);
        thrust::device_vector<double> _my(N);
        thrust::device_vector<double> _mz(N);
        thrust::device_vector<double> _c(N);
        thrust::device_vector<double> _w(N);

        initialize_keys(values0);

        // thrust::sort(std::begin(values0), std::end(values0)); // Sorting the keys does not improve the partitioning speed

        auto input_iterator = thrust::make_zip_iterator(thrust::make_tuple(std::begin(values0),
                                                                           std::begin(x),
                                                                           std::begin(y),
                                                                           std::begin(z),
                                                                           std::begin(mx),
                                                                           std::begin(my),
                                                                           std::begin(mz),
                                                                           std::begin(c),
                                                                           std::begin(w)));

        auto output_iterator_true = thrust::make_zip_iterator(thrust::make_tuple(std::begin(_values0),
                                                                                 std::begin(_x),
                                                                                 std::begin(_y),
                                                                                 std::begin(_z),
                                                                                 std::begin(_mx),
                                                                                 std::begin(_my),
                                                                                 std::begin(_mz),
                                                                                 std::begin(_c),
                                                                                 std::begin(_w)));

        // reverse iterators have no overhead in this case
        auto output_iterator_false = thrust::make_zip_iterator(thrust::make_tuple(thrust::make_reverse_iterator(std::end(_values0)),
                                                                                  thrust::make_reverse_iterator(std::end(_x)),
                                                                                  thrust::make_reverse_iterator(std::end(_y)),
                                                                                  thrust::make_reverse_iterator(std::end(_z)),
                                                                                  thrust::make_reverse_iterator(std::end(_mx)),
                                                                                  thrust::make_reverse_iterator(std::end(_my)),
                                                                                  thrust::make_reverse_iterator(std::end(_mz)),
                                                                                  thrust::make_reverse_iterator(std::end(_c)),
                                                                                  thrust::make_reverse_iterator(std::end(_w))));

        StopWatch<> t;

        thrust::partition_copy(input_iterator, input_iterator + N, output_iterator_true, output_iterator_false, KeyInfZero{});

        auto duration = 1e-6 * t.Elapsed().count();
        std::cout << "partition_copy: " << duration << " ms" << std::endl;
        assert(thrust::is_partitioned(output_iterator_true, output_iterator_true + N, KeyInfZero{}));
    }


    {
        thrust::device_vector<short>  values0(N);
        thrust::device_vector<double> x(N);
        thrust::device_vector<double> y(N);
        thrust::device_vector<double> z(N);
        thrust::device_vector<double> mx(N);
        thrust::device_vector<double> my(N);
        thrust::device_vector<double> mz(N);
        thrust::device_vector<double> c(N);
        thrust::device_vector<double> w(N);

        thrust::device_vector<short>  _values0(N);
        thrust::device_vector<double> _x(N);
        thrust::device_vector<double> _y(N);
        thrust::device_vector<double> _z(N);
        thrust::device_vector<double> _mx(N);
        thrust::device_vector<double> _my(N);
        thrust::device_vector<double> _mz(N);
        thrust::device_vector<double> _c(N);
        thrust::device_vector<double> _w(N);

        initialize_keys(values0);

        // thrust::sort(std::begin(values0), std::end(values0)); // Sorting the keys does not improve the partitioning speed

        auto input_iterator = thrust::make_zip_iterator(thrust::make_tuple(std::begin(values0),
                                                                           std::begin(x),
                                                                           std::begin(y),
                                                                           std::begin(z),
                                                                           std::begin(mx),
                                                                           std::begin(my),
                                                                           std::begin(mz),
                                                                           std::begin(c),
                                                                           std::begin(w)));

        auto output_iterator_true = thrust::make_zip_iterator(thrust::make_tuple(std::begin(_values0),
                                                                                 std::begin(_x),
                                                                                 std::begin(_y),
                                                                                 std::begin(_z),
                                                                                 std::begin(_mx),
                                                                                 std::begin(_my),
                                                                                 std::begin(_mz),
                                                                                 std::begin(_c),
                                                                                 std::begin(_w)));

        // reverse iterators have no overhead in this case
        auto output_iterator_false = thrust::make_zip_iterator(thrust::make_tuple(thrust::make_reverse_iterator(std::end(_values0)),
                                                                                  thrust::make_reverse_iterator(std::end(_x)),
                                                                                  thrust::make_reverse_iterator(std::end(_y)),
                                                                                  thrust::make_reverse_iterator(std::end(_z)),
                                                                                  thrust::make_reverse_iterator(std::end(_mx)),
                                                                                  thrust::make_reverse_iterator(std::end(_my)),
                                                                                  thrust::make_reverse_iterator(std::end(_mz)),
                                                                                  thrust::make_reverse_iterator(std::end(_c)),
                                                                                  thrust::make_reverse_iterator(std::end(_w))));

        StopWatch<> t;

        thrust::copy_if(input_iterator, input_iterator + N, output_iterator_true, KeyInfZero{});
        thrust::remove_copy_if(input_iterator, input_iterator + N, output_iterator_false, KeyInfZero{});

        auto duration = 1e-6 * t.Elapsed().count();
        std::cout << "partition_copy with copy_if: " << duration << " ms" << std::endl;
        assert(thrust::is_partitioned(output_iterator_true, output_iterator_true + N, KeyInfZero{}));
    }
    return 0;
}

[Documentation]: `thrust::hip::par_nosync`

Description of errors

I cannot find any description of the thrust::hip::par_nosync policy.

Attach any links, screenshots, or additional evidence you think will be helpful.

No response

Compilation error when using non-hipcc compiler

Here is a reproducer:

test_header.h
#include<thrust/complex.h>

test.cc
#include"test_header.h"

void main()
{}

If I compile test.cc with hipcc, then everything works, but if with g++, then I am getting the following errors:

In file included from /opt/rocm-4.3.1/include/thrust/system/cuda/detail/execution_policy.h:33,
                 from /opt/rocm-4.3.1/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /opt/rocm-4.3.1/include/thrust/iterator/iterator_traits.h:111,
                 from /opt/rocm-4.3.1/include/thrust/detail/type_traits/pointer_traits.h:23,
                 from /opt/rocm-4.3.1/include/thrust/type_traits/is_contiguous_iterator.h:27,
                 from /opt/rocm-4.3.1/include/thrust/type_traits/is_trivially_relocatable.h:19,
                 from /opt/rocm-4.3.1/include/thrust/detail/complex/complex.inl:20,
                 from /opt/rocm-4.3.1/include/thrust/complex.h:1032,
                 from complex_test.cc:2:
/opt/rocm-4.3.1/include/thrust/system/cuda/config.h:76:10: fatal error: cub/util_namespace.cuh: No such file or directory
 #include <cub/util_namespace.cuh> // This includes <cub/version.cuh> in newer releases.

adjacent_difference requires result_type in BinaryOp not available

Compile time error:

In file included from /pytorch/aten/src/ATen/native/hip/Unique.hip:13:
In file included from /opt/rocm/include/thrust/adjacent_difference.h:245:
In file included from /opt/rocm/include/thrust/detail/adjacent_difference.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/adl/adjacent_difference.h:44:
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:284:44: error: no type named 'result_type' in '(lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:
5)'
using result_type = typename BinaryOp::result_type;
~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:374:9: note: in instantiation of function template specialization 'thrust::hip_rocprim::__adjacent_differen
ce::doit_step<long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
doit_step(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:426:32: note: in instantiation of function template specialization 'thrust::hip_rocprim::__adjacent_differe
nce::adjacent_difference<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Un
ique.hip:183:5)>' requested here
__adjacent_difference::adjacent_difference<Derived, InputIt, OutputIt, BinaryOp>
^
/opt/rocm/include/thrust/detail/adjacent_difference.inl:54:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::adjacent_difference<thrust::
detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
return adjacent_difference(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, result, binary_op);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:49:13: note: in instantiation of function template specialization 'thrust::adjacent_difference<thrust::detail::execute_with_all
ocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
thrust::adjacent_difference(policy, data, data + num_inp, inv_loc_ptr, not_equal);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:170:48: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::compute_unique<thrust::d
etail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:173:5), (lambda at /pytorch/aten/sr
c/ATen/native/hip/Unique.hip:183:5)>' requested here
std::tie(inverse_indices, counts, num_out) = compute_unique(
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:225:12: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::unique_dim_cuda_template
' requested here
return unique_dim_cuda_template<scalar_t>(self, dim, false, return_inverse, return_counts);
^

missing constant iterator

PyTorch requires the thrust::constant_iterator and thrust::thrust::make_constant_iterator API which we seem to be missing from rocThrust currently.

CMake dependency on rocprim missing

I have a program that builds v.s. rocthrust via CMake.

My build fails, because rocthrust links v.s. roc::rocprim_hip. (in rocthrust-targets.cmake)
So apparently, rocthrust depends on rocprim, but it doesn't require the rocprim package. I can fix this, by adding find_package(rocprim) to rocthrust-config.cmake.

And as a comment, I was a bit confused that the target is rocthrust and not roc::rocthrust, which is not really consistent e.g. with roc::rocprim.

How to include a custom header library version?

hipcc --offload-arch=gfx1031 -I/opt/rocm-5.4.0/myspecialrocm/include/thrust -I/opt/rocm-5.4.0/myspecialrocm/include/rocprim discrete_voronoi.cu -o testvoronoi

I tried to redirect hipcc to my custom version of rocThrust but it mixes (see >> comments) the custom and the default installations when I try to compile a program.

hipcc --offload-arch=gfx1031 -I/opt/rocm-5.4.0/myspecialrocm/include/thrust -I/opt/rocm-5.4.0/myspecialrocm/include/rocprim discrete_voronoi.cu -o testvoronoi

In file included from :1:
In file included from /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/include/__clang_hip_runtime_wrapper.h:115:
In file included from /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/include/__clang_hip_math.h:22:
In file included from /opt/rocm-5.4.0/myspecialrocm/include/thrust/limits.h:10: >> using custom version, ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config.h:23: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config/config.h:26: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config/compiler.h:29: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/hip/hip_runtime.h:62:
In file included from /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:105:
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:5243:25: error: use of undeclared identifier 'UINT_MAX'
size_t size __dparm(UINT_MAX));
^
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:6790:75: error: use of undeclared identifier 'UINT_MAX'
const void* devPtr, size_t size = UINT_MAX) {
^
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:6797:75: error: use of undeclared identifier 'UINT_MAX'
const struct hipChannelFormatDesc& desc, size_t size = UINT_MAX) {
^

The example is an example included in rocThrust and was compiled correctly when I installed rocThrust and opted to compile the examples, now I want to make use of the custom version to port software to work on my notebook but am not able to compile a program. I assume the resulting errors are caused by the parts pulled from the default rocThrust rocm stack that doesn't support my GPU.

What's the correct way to enforce the use of the custom version?

BEGIN_NS_THRUST does not name a type

Seems need to add "#include <thrust/system/cuda/config.h>" header file into "thrust/system/hip/detail/error.inl" or it will throw error as following.

In file included from /opt/rocm/include/thrust/system/hip/error.h:183:0,
/opt/rocm/include/thrust/system/hip/detail/error.inl:23:1: error: โ€˜BEGIN_NS_THRUSTโ€™ does not name a type
 BEGIN_NS_THRUST
 ^~~~~~~~~~~~~~~
/opt/rocm/include/thrust/system/hip/detail/error.inl:81:1: error: โ€˜END_NS_THRUSTโ€™ does not name a type
 END_NS_THRUST
 ^~~~~~~~~~~~~

Please let me know if any of my configuration is not correct, thanks :)

Missing algorithms

The following algorithms appear to be missing, even though they are all part of the so called ParallelSTL set of standard algorithms:

  1. adjacent_find
  2. find_end
  3. find_first_of
  4. inplace_merge
  5. is_heap
  6. is_heap_until
  7. lexicographical_compare
  8. nth_element
  9. partial_sort
  10. partial_sort_copy
  11. rotate
  12. rotate_copy
  13. search
  14. search_n
  15. shift_left
  16. shift_right
  17. uninitialized_default_construct
  18. uninitialized_default_construct_n
  19. uninitialized_move
  20. uninitialized_move_n
  21. uninitialized_value_construct
  22. uninitialized_value_construct_n

Whilst these are not currently exposed in thrust itself, it might be opportune to seize the initiative, as many of the above are both useful and used. Thank you.

CMake issue when mixing Fortran/c++

I'm trying to build a library using roc::rocthrust. I have the following lines in my CMakeLists.txt

...
find_package(rocprim REQUIRED)
find_package(rocthrust REQUIRED)
...
target_link_libraries(strumpack PUBLIC roc::hipblas roc::rocblas roc::rocsolver roc::rocthrust)

The problem is that the roc::rocthrust target also adds flags to the fortran compilation commands:

gfortran: error: unrecognized command line option โ€˜--hip-device-lib-path=/opt/rocm-3.8.0/libโ€™
gfortran: error: unrecognized command line option โ€˜--cuda-gpu-arch=gfx900โ€™
gfortran: error: unrecognized command line option โ€˜--cuda-gpu-arch=gfx906โ€™

Hit with "clang-11: error: Unsupported CUDA gpu architecture: gfx900:xnack-" while making examples

I want to build the examples and I'm following the instructions in the README and my error message is below.
Seems closest to issue #116 where use of "-x hip" resolved it, but I need help on where to make the change.

root@b5f40797d3af:~/rocThrust/build# make
--
-- ******** Summary ********
-- General:
--   System                : Linux
--   HIP ROOT              :
--   C++ compiler          : /opt/rocm/hip/bin/hipcc
--   C++ compiler version  : 11.0.0
--   CXX flags             : -Wno-unused-command-line-argument
--   Build type            : Release
--   Install prefix        : /opt/rocm
--   Device targets        : gfx900:xnack-;gfx906:xnack-;gfx908:xnack-
--
--   DISABLE_WERROR        : ON
--   DOWNLOAD_ROCPRIM      : OFF
--   BUILD_TEST            : OFF
--   BUILD_EXAMPLES        : ON
--   BUILD_BENCHMARKS      : OFF
-- Building examples
-- Configuring done
-- Generating done
-- Build files have been written to: /root/rocThrust/build
[  1%] Building CXX object examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o
clang-11: error: Unsupported CUDA gpu architecture: gfx900:xnack-
clang-11: error: Unsupported CUDA gpu architecture: gfx906:xnack-
clang-11: error: Unsupported CUDA gpu architecture: gfx908:xnack-
examples/CMakeFiles/example_thrust_monte_carlo.dir/build.make:62: recipe for target 'examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o' failed
make[2]: *** [examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o] Error 1
CMakeFiles/Makefile2:197: recipe for target 'examples/CMakeFiles/example_thrust_monte_carlo.dir/all' failed
make[1]: *** [examples/CMakeFiles/example_thrust_monte_carlo.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

keyvaluepair does not provide access to < operator

Keyvaluepair is used in min functions and with the less than (<) operator. This currently does not work:

/pytorch/caffe2/operators/hip/rmac_regions_op.hip:44:12: error: invalid operands to binary expression ('KeyValuePair' (aka 'key_value_pair<int, float>') and 'KeyValuePair')
if (kv < min_kv) {
~~ ^ ~~~~~~
/pytorch/c10/util/typeid.h:90:13: note: candidate function not viable: no known conversion from 'KeyValuePair' (aka 'key_value_pair<int, float>') to 'caffe2::TypeIdentifier
' for 1st argument
inline bool operator<(TypeIdentifier lhs, TypeIdentifier rhs) {
^
/opt/rocm/rocprim/include/rocprim/device/../types/tuple.hpp:894:6: note: candidate template ignored: could not match 'tuple' against 'key_value_pair'
bool operator<(const tuple<TTypes...>& lhs, const tuple<UTypes...>& rhs)
^
In file included from /pytorch/caffe2/operators/hip/rmac_regions_op.hip:2:
In file included from /opt/rocm/hipcub/include/hipcub/hipcub.hpp:37:
In file included from /opt/rocm/hipcub/include/hipcub/rocprim/hipcub.hpp:38:
/opt/rocm/hipcub/include/hipcub/rocprim/device/../thread/thread_operators.hpp:101:18: error: invalid operands to binary expression ('const rocprim::key_value_pair<int, floa
t>' and 'const rocprim::key_value_pair<int, float>')
return a < b ? a : b;
~ ^ ~
/opt/rocm/rocprim/include/rocprim/warp/detail/warp_reduce_shuffle.hpp:61:22: note: in instantiation of function template specialization 'hipcub::Min::operator()<rocprim::ke
y_value_pair<int, float> >' requested here
output = reduce_op(output, value);
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:218:22: note: in instantiation of function template specialization 'rocprim::detail::warp_reduce
_shuffle<rocprim::key_value_pair<int, float>, 64, false>::reducehipcub::Min' requested here
WarpReduce().reduce(
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:174:9: note: in instantiation of function template specialization 'rocprim::detail::block_reduce
_warp_reduce<rocprim::key_value_pair<int, float>, 128>::warp_reduce<false, rocprim::detail::warp_reduce_shuffle<rocprim::key_value_pair<int, float>, 64, false>, hipcub::Min

' requested here
warp_reduce<!block_size_is_warp_multiple_, warp_reduce_input_type>(
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:81:15: note: in instantiation of function template specialization 'rocprim::detail::block_reduce
_warp_reduce<rocprim::key_value_pair<int, float>, 128>::reduce_implhipcub::Min' requested here
this->reduce_impl(

correct const modifiers for merge sort

Some of the const modifiers for Thrust seem inconsistent with rocPRIM/rocThrust
Following compile-time error:
In file included from /pytorch/aten/src/THH/THHTensorMode.hip:7:
In file included from /opt/rocm/include/thrust/device_ptr.h:26:
In file included from /opt/rocm/include/thrust/memory.h:25:
In file included from /opt/rocm/include/thrust/detail/reference.h:173:
In file included from /opt/rocm/include/thrust/detail/reference.inl:23:
In file included from /opt/rocm/include/thrust/system/detail/adl/get_value.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/get_value.h:23:
In file included from /opt/rocm/include/thrust/system/detail/adl/assign_value.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/assign_value.h:25:
In file included from /opt/rocm/include/thrust/system/hip/detail/copy.h:99:
In file included from /opt/rocm/include/thrust/system/hip/detail/internal/copy_device_to_device.h:35:
In file included from /opt/rocm/include/thrust/system/hip/detail/transform.h:44:
In file included from /opt/rocm/rocprim/include/rocprim/rocprim.hpp:55:
In file included from /opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:30:
/opt/rocm/rocprim/include/rocprim/device/detail/device_merge_sort.hpp:365:25: error: no matching function for call to object of type 'const ThrustHalfLess'
const bool ab = compare_function(rocprim::get<0>(a), rocprim::get<0>(b));
^~~~~~~~~~~~~~~~
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_sort_kernel_impl<256, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
block_sort_kernel_impl(
^
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:179:25: note: in instantiation of function template specialization 'rocprim::detail::block_sort_kernel<256, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
HIP_KERNEL_NAME(block_sort_kernel<block_size>),
^
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:429:20: note: in instantiation of function template specialization 'rocprim::detail::merge_sort_impl<rocprim::default_config, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return detail::merge_sort_impl(
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:89:23: note: in instantiation of function template specialization 'rocprim::merge_sort<rocprim::default_config, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return rocprim::merge_sort(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:123:36: note: in instantiation of function template specialization 'thrust::hip_rocprim::__merge_sort::dispatch<thrust::detail::integral_constant<bool, true> >::doit<thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, long, ThrustHalfLess>' requested here
status = dispatch<SORT_ITEMS>::doit(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:351:19: note: in instantiation of function template specialization 'thrust::hip_rocprim::__merge_sort::merge_sort<thrust::detail::integral_constant<bool, true>, thrust::hip_rocprim::execution_policy<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base> >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
__merge_sort::merge_sort<SORT_ITEMS>(policy,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:428:19: note: in instantiation of function template specialization 'thrust::hip_rocprim::__smart_sort::smart_sort<thrust::detail::integral_constant<bool, true>, thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
__smart_sort::smart_sort<detail::true_type, Derived, KeysIt, ValuesIt, CompareOp>
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:448:16: note: in instantiation of function template specialization 'thrust::hip_rocprim::stable_sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
hip_rocprim::stable_sort_by_key(policy, keys_first, keys_last, values, compare_op);
^
/opt/rocm/include/thrust/detail/sort.inl:115:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return sort_by_key(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), keys_first, keys_last, values_first, comp);
^
/pytorch/aten/src/THH/generic/THHTensorMode.hip:43:11: note: in instantiation of function template specialization 'thrust::sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
thrust::sort_by_key(
^
/pytorch/aten/src/THH/THHTensorMode.cuh:11:35: note: candidate function not viable: 'this' argument has type 'const ThrustHalfLess', but method is not marked const
host device inline bool operator()(const at::Half& lhs, const at::Half& rhs) {
^

correct const modifiers for thrust::tabulate

const modifiers for thrust::tabulate are inconsistent w/ the API expectation from Thrust:

Compile time error for PyTorch:
In file included from /pytorch/aten/src/ATen/native/hip/RangeFactories.hip:9:
In file included from /opt/rocm/include/thrust/sequence.h:295:
In file included from /opt/rocm/include/thrust/detail/sequence.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/generic/sequence.h:63:
In file included from /opt/rocm/include/thrust/system/detail/generic/sequence.inl:20:
In file included from /opt/rocm/include/thrust/tabulate.h:128:
In file included from /opt/rocm/include/thrust/detail/tabulate.inl:22:
In file included from /opt/rocm/include/thrust/system/detail/adl/tabulate.h:44:
/opt/rocm/include/thrust/system/hip/detail/tabulate.h:55:62: error: no matching function for call to object of type 'const at::native::LinspaceOp<double, double>'
[tabulate_op](size_type i) { return tabulate_op(i); });
^~~~~~~~~~~
/opt/rocm/include/thrust/detail/tabulate.inl:37:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::tabulate<thrust::hip_rocprim::tag, thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
return tabulate(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, unary_op);
^
/opt/rocm/include/thrust/detail/tabulate.inl:52:18: note: in instantiation of function template specialization 'thrust::tabulate<thrust::hip_rocprim::tag, thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
return thrust::tabulate(select_system(system), first, last, unary_op);
^
/pytorch/aten/src/ATen/native/hip/RangeFactories.hip:59:15: note: in instantiation of function template specialization 'thrust::tabulate<thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
thrust::tabulate(data_, data_ + steps, linspace_method);
^
/pytorch/aten/src/ATen/native/hip/RangeFactories.hip:18:32: note: candidate function not viable: 'this' argument has type 'const at::native::LinspaceOp<double, double>', but method is not marked const
device forceinline T operator()(ptrdiff_t index) {
^

Fixing compile error: no member named 'init_offset_scan_state_kernel' ... ?

When I try to compile rocThrust, I get an error: no member named 'init_offset_scan_state_kernel' ...

...
In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/detail/scan.inl:29: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/detail/adl/scan_by_key.h:44: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/detail/scan_by_key.h:36: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/execution_policy.h:81: /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/detail/set_operations.h:956:61: error: no member named 'init_offset_scan_state_kernel' in namespace 'rocprim::detail'; did you mean 'init_lookback_scan_state_kernel'? hipLaunchKernelGGL(HIP_KERNEL_NAME(rocprim::detail::init_offset_scan_state_kernel), ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ init_lookback_scan_state_kernel /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:199:30: note: expanded from macro 'HIP_KERNEL_NAME' #define HIP_KERNEL_NAME(...) __VA_ARGS__ ^~~~~~~~~~~ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:251:74: note: expanded from macro 'hipLaunchKernelGGL' #define hipLaunchKernelGGL(kernelName, ...) hipLaunchKernelGGLInternal((kernelName), __VA_ARGS__) ^~~~~~~~~~ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:248:9: note: expanded from macro 'hipLaunchKernelGGLInternal' kernelName<<<(numBlocks), (numThreads), (memPerBlock), (streamId)>>>(__VA_ARGS__); \ ^~~~~~~~~~ /opt/rocm-5.4.0/include/rocprim/device/detail/device_scan_common.hpp:76:60: note: 'init_lookback_scan_state_kernel' declared here __launch_bounds__(ROCPRIM_DEFAULT_MAX_BLOCK_SIZE) void init_lookback_scan_state_kernel( ^ In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/test/test_zip_iterator.cpp:18: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/copy.h:512:
...

This error comes up several times before the compile process is stopped.

I tested rocPRIM after the installation and all tests were successful.

How to approach this error? Is it rocPRIM related or rocThrust related?

Apps/sample code using rocThrust

Hi all,

I am looking at rocThrust library. Are there any open-source workloads/benchmarks/applications or sample codes that directly using rocThrust?

Thanks

More -Wunused-result warnings

I am currently at commit 10c7281. Just like in a previously reported issue #163, I am encountering the following warnings

/rocThrust/thrust/system/hip/detail/util.h:116:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
        hipStreamSynchronize(stream(policy));
/rocThrust/thrust/system/hip/memory_resource.h:59:17: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
                hipGetLastError(); // Clear the HIP global error state.

scatter cannot be used w/ long*

Compile time error from PyTorch:

In file included from /pytorch/aten/src/ATen/native/hip/Unique.hip:5:
In file included from /opt/rocm/include/thrust/execution_policy.h:31:
In file included from /opt/rocm/include/thrust/system/cpp/execution_policy.h:64:
In file included from /opt/rocm/include/thrust/system/cpp/detail/sort.h:22:
In file included from /opt/rocm/include/thrust/system/detail/sequential/sort.h:63:
In file included from /opt/rocm/include/thrust/system/detail/sequential/sort.inl:23:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_primitive_sort.h:55:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_primitive_sort.inl:21:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_radix_sort.h:55:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_radix_sort.inl:20:
In file included from /opt/rocm/include/thrust/copy.h:513:
In file included from /opt/rocm/include/thrust/detail/copy_if.h:74:
In file included from /opt/rocm/include/thrust/detail/copy_if.inl:20:
In file included from /opt/rocm/include/thrust/system/detail/generic/copy_if.h:63:
In file included from /opt/rocm/include/thrust/system/detail/generic/copy_if.inl:31:
In file included from /opt/rocm/include/thrust/scan.h:1563:
In file included from /opt/rocm/include/thrust/detail/scan.inl:29:
In file included from /opt/rocm/include/thrust/system/detail/adl/scan_by_key.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/scan_by_key.h:33:
In file included from /opt/rocm/include/thrust/system/hip/execution_policy.h:41:
In file included from /opt/rocm/include/thrust/system/hip/detail/count.h:35:
In file included from /opt/rocm/include/thrust/system/hip/detail/reduce.h:42:
In file included from /opt/rocm/include/thrust/device_vector.h:26:
In file included from /opt/rocm/include/thrust/detail/vector_base.h:547:
In file included from /opt/rocm/include/thrust/detail/vector_base.inl:25:
In file included from /opt/rocm/include/thrust/equal.h:237:
In file included from /opt/rocm/include/thrust/detail/equal.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/generic/equal.h:47:
In file included from /opt/rocm/include/thrust/system/detail/generic/equal.inl:21:
In file included from /opt/rocm/include/thrust/mismatch.h:259:
In file included from /opt/rocm/include/thrust/detail/mismatch.inl:27:
In file included from /opt/rocm/include/thrust/system/detail/generic/mismatch.h:57:
In file included from /opt/rocm/include/thrust/system/detail/generic/mismatch.inl:21:
In file included from /opt/rocm/include/thrust/find.h:384:
In file included from /opt/rocm/include/thrust/detail/find.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/generic/find.h:62:
In file included from /opt/rocm/include/thrust/system/detail/generic/find.inl:19:
In file included from /opt/rocm/include/thrust/reduce.h:784:
In file included from /opt/rocm/include/thrust/detail/reduce.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/generic/reduce_by_key.h:88:
In file included from /opt/rocm/include/thrust/system/detail/generic/reduce_by_key.inl:30:
In file included from /opt/rocm/include/thrust/scatter.h:422:
In file included from /opt/rocm/include/thrust/detail/scatter.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/adl/scatter.h:44:
/opt/rocm/include/thrust/system/hip/detail/scatter.h:50:20: error: type 'long *' cannot be used prior to '::' because it has no members
typedef typename ItemsIt::value_type ItemsTy;
^
/opt/rocm/include/thrust/detail/scatter.inl:45:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::scatter<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, const long *, long *>' requested here
return scatter(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, map, output);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:51:13: note: in instantiation of function template specialization 'thrust::scatter<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, const long *, long *>' requested here
thrust::scatter(policy, inv_loc_ptr, inv_loc_ptr + num_inp, sorted_indices_ptr, inverse_indices_ptr);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:105:48: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::compute_unique<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, unsigned char, thrust::equal_to, thrust::not_equal_to >' requested here
std::tie(inverse_indices, counts, num_out) = compute_unique(
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:207:46: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::unique_cuda_template' requested here
std::tie(output, inverse, std::ignore) = unique_cuda_template<scalar_t>(self, false, return_inverse, false);
^

Need to set up CMAKE_CXX_FLAGS to cmake rocThrust?

Hi all, I am starting to compile rocThrust.

I tried to cmake with:

cmake ../. -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc

Then the cmake gives error like:

-- The CXX compiler identification is Clang 12.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success


----------------------------------- ERROR -----------------------------------

  • The variable 'CMAKE_CXX_FLAGS' should only be set by the cmake toolchain,
  • either by calling 'cmake -DCMAKE_CXX_FLAGS=" -Wno-unused-command-line-argument"' or
  • set in a toolchain file and added with
  • 'cmake -DCMAKE_TOOLCHAIN_FILE='.
    -----------------------------------------------------------------------------

...

Seems that cmake needs to set up CMAKE_CXX_FLAGS to something to successfully generate the correct make file, currently "make -j" does not compile anything.

But I did not see any instruction to set up CMAKE_CXX_FLAGS? Is there any other details/documents I missed to compile rocThrust?

Thanks!

[Issue]: Failing tests with libstdc++ assertions due to unsigned overflow

Problem Description

The issue is similar to ROCm/rocPRIM#570.

With hardened libstdc++ (compiled with -D_GLIBCXX_ASSERTIONS), few tests rocThrust 6.1.1 fail with:

[ RUN      ] AsyncReduceTests/3.TestAsyncReduce
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<unsigned short>::param_type::param_type(_IntType, _IntType) [_IntType = unsigned short]: Assertion '_M_a <= _M_b' failed.

[ RUN      ] AsyncTransformTests/3.TestAsyncTransformUnary
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<unsigned short>::param_type::param_type(_IntType, _IntType) [_IntType = unsigned short]: Assertion '_M_a <= _M_b' failed.

[ RUN      ] PrimitiveGatherTests/0.Gather
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<unsigned int>::param_type::param_type(_IntType, _IntType) [_IntType = unsigned int]: Assertion '_M_a <= _M_b' failed.

[ RUN      ] PrimitiveInnerProductTests/0.InnerProductWithRandomData
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<short>::param_type::param_type(_IntType, _IntType) [_IntType = short]: Assertion '_M_a <= _M_b' failed.

[ RUN      ] ScanByKeyTests.TestScanByKeyMixedTypes
/usr/lib/gcc/x86_64-pc-linux-gnu/13/include/g++-v13/bits/uniform_int_dist.h:108: std::uniform_int_distribution<unsigned int>::param_type::param_type(_IntType, _IntType) [_IntType = unsigned int]: Assertion '_M_a <= _M_b' failed.

In lines like get_random_data<T>(2, -100, 100, seed) tests attempt to generate T=unsigned int in range, where min (4294967196) > max (100), which is undefined behavior. Can you fix it (and maybe in rocPRIM too)? Thanks!

Operating System

Gentoo

CPU

GPU

AMD Instinct MI300X

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

Compilation issue from rocthrust library on ROCm-3.5.

Hi all,
I have successfully build and tested my application with HCC(ROCm-3.3). Now, I am trying to build my app with hip-clang(ROCm-3.5) as HCC is deprecated but, I am getting compilation issues because of the rocThrust library. The thrust device system is getting selected as CUDA but it should be HIP.
I am attaching two files for further details.

I would like to know, how can i resolve this issue as i am unable to do so. If some more details are required please let me know.

Best regards,
Saurabh.

How to compile a program using rocThrust on Ubuntu 20.04?

On Ubuntu 20.04 system with rocm 5.2 compiling code using rocThrust fails (Score.cpp has "#include <thrust/host_vector.h>" which causes compile to fail):

clang++ -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.2.0/include -I/opt/rocm-5.2.0/llvm/bin/../lib/clang/14.0.0 -I/opt/rocm-5.2.0/hsa/include -c -std=c++20 -g -O0 -pg Score.cpp
In file included from Score.cpp:11:
In file included from /opt/rocm-5.2.0/include/thrust/host_vector.h:27:
In file included from /opt/rocm-5.2.0/include/thrust/detail/vector_base.h:25:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/normal_iterator.h:27:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/iterator_adaptor.h:36:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/iterator_facade.h:37:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/iterator_facade_category.h:22:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/device_system_tag.h:23:
In file included from /opt/rocm-5.2.0/include/thrust/system/cuda/detail/execution_policy.h:35:
/opt/rocm-5.2.0/include/thrust/system/cuda/config.h:33:10: fatal error: 'cub/util_namespace.cuh' file not found
#include <cub/util_namespace.cuh>
^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

How to use rocThrust and OpenMP in the same time

I followed the example https://github.com/ROCm-Developer-Tools/HIP-Examples/tree/master/openmp-helloworld and built it successfully with hipcc. But when I tried to add rocthrust containers in the source file, errors occurred.

I have also tried to build this example with only rocthrust and it works fine.

My environment:

  • ROCm 3.9
  • CMake 3.16.5

Error messages:

In file included from openmp_helloworld.cpp:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.h:29:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.h:240:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:23:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.h:35:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:22:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/uninitialized_fill.h:274:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:26:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/adl/uninitialized_fill.h:44:
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:62:13: error: no matching function for call to 'operator new'
            ::new(static_cast<void*>(&out)) value_type(value);
            ^~
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:66:17: note: in instantiation of function template specialization 'thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>::operator()<unsigned long>' requested here
                f(tile_base + idx);
                ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:95:44: note: in instantiation of function template specialization 'thrust::hip_rocprim::__parallel_for::kernel<256, 1, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here        hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<block_size, items_per_thread, F, Size>),
                                           ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:131:50: note: in instantiation of function template specialization 'thrust::hip_rocprim::__parallel_for::parallel_for<thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here            hipError_t  status = __parallel_for::parallel_for(count, f, stream);
                                                 ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:119:12: note: in instantiation of member function 'thrust::hip_rocprim::parallel_for(execution_policy<thrust::hip_rocprim::tag> &, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long)::workaround::par' requested here
    struct workaround
           ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:76:18: note: in instantiation of function template specialization 'thrust::hip_rocprim::parallel_for<thrust::hip_rocprim::tag, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here
    hip_rocprim::parallel_for(policy, functor_t(first, x), count);
                 ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:54:10: note: (skipping 3 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
  return uninitialized_fill_n(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, n, x);
         ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:251:3: note: in instantiation of function template specialization 'thrust::detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
  default_construct_range(m_allocator, first.base(), n);
  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:220:15: note: in instantiation of member function 'thrust::detail::contiguous_storage<int, thrust::device_allocator<int>>::default_construct_n' requested here
    m_storage.default_construct_n(begin(), size());
              ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:65:3: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::default_init' requested here
  default_init(n);
  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:95:8: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::vector_base' requested here
      :Parent(n) {}
       ^
openmp_helloworld.cpp:41:32: note: in instantiation of member function 'thrust::device_vector<int, thrust::device_allocator<int>>::device_vector' requested here
    thrust::device_vector<int> v(10);
                               ^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:134:7: note: candidate function not viable: call to __host__ function from __device__ function
void* operator new(std::size_t, const std::nothrow_t&) _GLIBCXX_USE_NOEXCEPT
      ^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:168:14: note: candidate function not viable: call to __host__ function from __device__ function
inline void* operator new(std::size_t, void* __p) _GLIBCXX_USE_NOEXCEPT
             ^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:120:7: note: candidate function not viable: requires 1 argument, but 2 were provided
void* operator new(std::size_t) _GLIBCXX_THROW (std::bad_alloc)
      ^
note: candidate function not viable: requires 1 argument, but 2 were provided
In file included from openmp_helloworld.cpp:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.h:29:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.h:240:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:22:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/copy_construct_range.h:46:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/copy_construct_range.inl:21:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/copy.h:90:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/copy.inl:21:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/copy.h:58:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/copy.inl:23:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/transform.h:724:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/transform.inl:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/transform.h:105:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/transform.inl:19:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/for_each.h:279:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/for_each.inl:27:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/adl/for_each.h:44:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/for_each.h:35:
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:141:17: error: no matching function for call to object of type 'thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>'
                f(idx);
                ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:119:12: note: in instantiation of member function 'thrust::hip_rocprim::parallel_for(execution_policy<thrust::hip_rocprim::tag> &, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long)::workaround::seq' requested here
    struct workaround
           ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:76:18: note: in instantiation of function template specialization 'thrust::hip_rocprim::parallel_for<thrust::hip_rocprim::tag, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here
    hip_rocprim::parallel_for(policy, functor_t(first, x), count);
                 ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:54:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::uninitialized_fill_n<thrust::hip_rocprim::tag, thrust::device_ptr<int>, unsigned long, int>' requested here
  return uninitialized_fill_n(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, n, x);
         ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:94:11: note: in instantiation of function template specialization 'thrust::uninitialized_fill_n<thrust::hip_rocprim::tag, thrust::device_ptr<int>, unsigned long, int>' requested here
  thrust::uninitialized_fill_n(allocator_system<Allocator>::get(a), p, n, typename pointer_element<Pointer>::type());
          ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:105:35: note: in instantiation of function template specialization 'thrust::detail::allocator_traits_detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
  return allocator_traits_detail::default_construct_range(a,p,n);
                                  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:251:3: note: in instantiation of function template specialization 'thrust::detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
  default_construct_range(m_allocator, first.base(), n);
  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:220:15: note: in instantiation of member function 'thrust::detail::contiguous_storage<int, thrust::device_allocator<int>>::default_construct_n' requested here
    m_storage.default_construct_n(begin(), size());
              ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:65:3: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::default_init' requested here
  default_init(n);
  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:95:8: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::vector_base' requested here
      :Parent(n) {}
       ^
openmp_helloworld.cpp:41:32: note: in instantiation of member function 'thrust::device_vector<int, thrust::device_allocator<int>>::device_vector' requested here
    thrust::device_vector<int> v(10);
                               ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:58:41: note: candidate template ignored: substitution failure [with Size = unsigned long]
        void THRUST_HIP_DEVICE_FUNCTION operator()(Size idx)
                                        ^
2 errors generated when compiling for host.

openmp_helloworld.cpp:

// HIP header
#include <hip/hip_runtime.h>
#include <thrust/device_vector.h>

//OpenMP header
#include <omp.h>

#include <stdio.h>
#include <stdlib.h>

__global__
void hip_helloworld(unsigned omp_id)
{
    printf("Hello World... from HIP thread = %u\n", omp_id);
}

int main(int argc, char* argv[])
{
    thrust::device_vector<int> v(10);

    // No errors
    //thrust::device_vector<int> v;

    // Beginning of parallel region
    #pragma omp parallel
    {   
        auto tid = omp_get_thread_num();
        printf("Hello World... from OMP thread = %d\n", tid);

        hipLaunchKernelGGL(hip_helloworld, dim3(1), dim3(1), 0, 0, tid);
    }
    // Ending of parallel region

    hipLaunchKernelGGL(hip_helloworld, dim3(1), dim3(1), 0, 0, /*id=*/ 0);
    hipStreamSynchronize(0);

    printf("PASSED!\n");
    return 0;
}

Commands:

hipcc -xhip -fopenmp=libomp openmp_helloworld.cpp

CMake gives the same error. This is my cmake file:

project(openmp_helloworld)

cmake_minimum_required(VERSION 3.16)

# Search for rocm in common locations
if(NOT DEFINED ROCM_PATH)
    if(DEFINED ENV{ROCM_PATH})
        set(ROCM_PATH $ENV{ROCM_PATH} CACHE PATH "ROCm path")
    elseif(DEFINED ENV{HIP_PATH})
        set(ROCM_PATH "$ENV{HIP_PATH}/.." CACHE PATH "ROCm path")
    else()
        set(ROCM_PATH "/opt/rocm" CACHE PATH "ROCm path")
    endif()
endif()

set(HIP_PATH "${ROCM_PATH}/hip" CACHE PATH "HIP path")

list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH} ${HIP_PATH})
find_package(hip REQUIRED)
find_package(rocprim REQUIRED CONFIG)
find_package(rocthrust REQUIRED CONFIG)

# Find OpenMP
find_package(OpenMP REQUIRED)

# Source files
set(CPP_SOURCES ${CMAKE_SOURCE_DIR}/openmp_helloworld.cpp)

# Preparing the executable
add_executable(test_openmp_helloworld ${CPP_SOURCES})
target_link_libraries(test_openmp_helloworld
    PRIVATE hip::device
            roc::rocthrust
            OpenMP::OpenMP_CXX
)

Version of rocThrust compatible with older versions of ROCm?

I am attempting to run some benchmarks that use thrust in them through the gem5 simulator, which currently only supports ROCm 1.6.x. While hipify'ing the applications, I need to port the thrust calls to use rocThrust (which I did). However, after looking through the release tags and the commits for rocThrust, it appears that rocThrust only supports ROCm 2.5+. Is this correct? Or is there an (older) commit that has support for older versions of ROCm?

Thanks,
Matt

Slow sort on gfx1036 with custom operator on ROCm 5.6.0

I have an AMD Ryzen 9 7900X 12-core / 24-threads processor and I'm trying to run our code on both the CPU and iGP 0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c2) for comparison.

Some of our code depends on Thrust, for which we are using rocThrust for HIP (1.17.2p0 as reported by the one shipping with ROCm 5.6.0, official .deb from AMD).

We have noticed that a sort_by_key with custom operator is extremely slow on GPU, averaging 30ms for 1.4M data points compared to ~20ms on CPU with the OpenMP backend.

Sample code for testing can be found in my titanxstall repo, rocthrust-perf branch. The results can be tested as:

git clone [email protected]:Oblomov/titanxstall -b rocthrust-perf thrust-perf
cd thrust-perf
make clean ; make test
make clean ; make cpu=1 openmp=1 test

and comparing the results between the HIP and OpenMP runs. The sort is run 5 times, and on GPU the first run should be discarded as it includes host-to-device data transfers due to the use of managed allocations. (The original code does not use managed allocations, but similar performance differences have been observed, so this doesn't seem to be related.)

Feature request: Add support for NAVI22 and NAVI23 i.e. gfx1031 and gfx1032

Please add support for NAVI22 and NAVI23 i.e. gfx1031 and gfx1032 to enable HIP development leveraging rocThrust on Notebooks as ALL AMD mobile GPUs are NAVI22 or NAVI23 i.e. gfx1031 or gfx1032.

Relevant engineering and scientific software uses Thrust. Porting this software to AMD hardware requires rocThrust support on development hardware. Please align with the rocPRIM developers as it`s a dependency.

My target operating system is RHEL9 = AlmaLinux 9, Rocky Linux ...

Can't use utility.h in thrust/type_traits/integer_sequence.h

Hi everyone,

I am new to Rocm and just installed rocThrust. When I tried to compiled my hip cpp code using hipcc (rocm 5.4.0). I got

rocThrust/include/thrust/type_traits/integer_sequence.h:61:31: error: no template named 'integer_sequence' in namespace 'std'.

This 'integer_sequence' should come from include utility.h. I used -std=c++14 tag but it persisted. When I turned to include utility.h in a test code and compiled it using g++. Things went well. But hipcc still cannot compile the test code with same errors. Is there anyone can tell me how fix this issue?

Thanks a lot!

rocThrust test needs hip/amd_detail/host_defines.h

Hi all,

I tried to compile the rocThrust test.

I tried to cmake with:

CXX=hipcc cmake -DBUILD_TEST=ON ../.

and then:

make

It shows:

/home/echo/workspace/rocThrust/thrust/../thrust/system/hip/detail/guarded_driver_types.h:50:10: fatal error: 'hip/amd_detail/host_defines.h' file not found
#include <hip/amd_detail/host_defines.h>

However, under ROCm directory, I do not find hip/amd_detail/host_defines.h, instead I found /opt/rocm/hip/include/hip/hcc_detail/host_defines.h.

Do I miss any configuration, installation, or redirection steps?

Other than that, I also have a build problem in #179.
Could that error introduce this problem?

Thanks!

[Issue]: optional.h uses non-member function with `->`

Problem Description

A recent clang change (llvm/llvm-project#90152) revealed a bug in rocThrust develop branch

this->construct(std::forward<Args>(args)...);

rocThrust/thrust/../thrust/optional.h:2756:11: error: no member named 'construct' in 'optional<type-parameter-0-0 &>'

In this->construct, the arrow operator cannot be overloaded, therefore it can only access its own member function. However, construct is not its member function, since optional<T &> does not define construct. Although optional<T> inherits construct from its base classes, its member construct is not inherited by optional<T &> since they do not have inherit relation.

Since optional<T &> behaves like a delegate of its member m_value, my guess of intention of this->construct is to
perform m_value->construct, i.e., using the overloaded arrow operator. To make the arrow operator taking effect, (*this)-> can be used. Therefor, the fix is

(*this)->construct(std::forward<Args>(args)...);

Operating System

Ubuntu 22.04

CPU

any

GPU

AMD Radeon Pro W7900

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

inclusive_scan_by_key does not compile if HIP_TEMPLATE_KERNEL_LAUNCH is defined

Hello,

this is illustrated by trying to compile this 4-liner

#define HIP_TEMPLATE_KERNEL_LAUNCH
#include <thrust/device_vector.h>
device_ptr<float4> a;
device_ptr<uint> b : c < inclusive_scan_by_key(b, b, a, a);

obtained by applying Creduce to a larger piece of code.

With HIP_TEMPLATE_KERNEL_LAUNCH, this fails with:

error: no matching function for call to 'hipLaunchKernelGGL

Without the define, inclusive_scan_by_key compiles correctly. This is a regression since in the past (at least in 4.2.0) this used to work.

[Feature]: Use `AnyNewerVersion` instead of `SameMajorVersion` for CMake compatibility

Suggestion Description

We've recently tried to build our library (ArborX) vs the newly release rocm 6.0 and rocthrust 3. We have

  find_package(rocthrust 2.10.5 REQUIRED CONFIG)

in our CMake config file. This fails for the new 3.0 rocthrust because it has SameMajorVersion compatibility mode.

I just wanted to check if that is intentional, or whether it could be switched to AnyNewerVersion so that downstream codes don't have to work around it.

Operating System

No response

GPU

No response

ROCm Component

No response

reference to __host__ function 'free'/'malloc' in __host__ __device__ function

Compiling a HIP program shows the following errors. Thanks for your suggestion.

In file included from /opt/rocm-6.0.0/include/thrust/system/detail/adl/malloc_and_free.h:26:
/opt/rocm-6.0.0/include/thrust/system/detail/sequential/malloc_and_free.h:45:8: error: reference to __host__ function 'free' in __host__ __device__ function
  std::free(thrust::raw_pointer_cast(ptr));


In file included from /opt/rocm-6.0.0/include/thrust/system/detail/adl/malloc_and_free.h:26:
/opt/rocm-6.0.0/include/thrust/system/detail/sequential/malloc_and_free.h:37:15: error: reference to __host__ function 'malloc' in __host__ __device__ function
  return std::malloc(n);

hipGetLastError(); -Wunused-result

rocThrust is triggering a warning in a call to hipGetLastError

/rocthrust/thrust/system/hip/detail/malloc_and_free.h:71:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
        hipGetLastError(); // Clear global hip error state.
        ^~~~~~~~~~~~~~~

This is with:

% clang --version
clang version 12.0.0 (/llvm/llvm-project/clang 009181e63cd6a46681319335b18180d91d05f241)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.