Giter Site home page Giter Site logo

rocm / rocthrust Goto Github PK

View Code? Open in Web Editor NEW
97.0 97.0 44.0 6.49 MB

ROCm Thrust - run Thrust dependent software on AMD GPUs

Home Page: https://rocm.docs.amd.com/projects/rocThrust/en/latest/

License: Apache License 2.0

CMake 1.38% C++ 76.38% Cuda 17.78% C 3.03% Shell 0.17% Groovy 0.11% Python 1.14% Makefile 0.02%

rocthrust's Introduction

AMD ROCm Software

ROCm is an open-source stack, composed primarily of open-source software, designed for graphics processing unit (GPU) computation. ROCm consists of a collection of drivers, development tools, and APIs that enable GPU programming from low-level kernel to end-user applications.

With ROCm, you can customize your GPU software to meet your specific needs. You can develop, collaborate, test, and deploy your applications in a free, open source, integrated, and secure software ecosystem. ROCm is particularly well-suited to GPU-accelerated high-performance computing (HPC), artificial intelligence (AI), scientific computing, and computer aided design (CAD).

ROCm is powered by AMD’s Heterogeneous-computing Interface for Portability (HIP), an open-source software C++ GPU programming environment and its corresponding runtime. HIP allows ROCm developers to create portable applications on different platforms by deploying code on a range of platforms, from dedicated gaming GPUs to exascale HPC clusters.

ROCm supports programming models, such as OpenMP and OpenCL, and includes all necessary open source software compilers, debuggers, and libraries. ROCm is fully integrated into machine learning (ML) frameworks, such as PyTorch and TensorFlow.

Getting the ROCm Source Code

AMD ROCm is built from open source software. It is, therefore, possible to modify the various components of ROCm by downloading the source code and rebuilding the components. The source code for ROCm components can be cloned from each of the GitHub repositories using git. For easy access to download the correct versions of each of these tools, the ROCm repository contains a repo manifest file called default.xml. You can use this manifest file to download the source code for ROCm software.

Installing the repo tool

The repo tool from Google allows you to manage multiple git repositories simultaneously. Run the following commands to install the repo tool:

mkdir -p ~/bin/
curl https://storage.googleapis.com/git-repo-downloads/repo > ~/bin/repo
chmod a+x ~/bin/repo

Note: The ~/bin/ folder is used as an example. You can specify a different folder to install the repo tool into if you desire.

Installing git-lfs

Some ROCm projects use the Git Large File Storage (LFS) format that may require you to install git-lfs. Refer to Git Large File Storage for more information. For example, to install git-lfs for Ubuntu, use the following command:

sudo apt-get install git-lfs

Downloading the ROCm source code

The following example shows how to use the repo tool to download the ROCm source code. If you choose a directory other than ~/bin/ to install the repo tool, you must use that chosen directory in the code as shown below:

mkdir -p ~/ROCm/
cd ~/ROCm/
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.0.x
~/bin/repo sync

Note: Using this sample code will cause the repo tool to download the open source code associated with the specified ROCm release. Ensure that you have ssh-keys configured on your machine for your GitHub ID prior to the download as explained at Connecting to GitHub with SSH.

Building the ROCm source code

Each ROCm component repository contains directions for building that component, such as the rocSPARSE documentation Installation and Building for Linux. Refer to the specific component documentation for instructions on building the repository.

Each release of the ROCm software supports specific hardware and software configurations. Refer to System requirements (Linux) for the current supported hardware and OS.

Build ROCm from source

The Build will use as many processors as it can find to build in parallel. Some of the compiles can consume as much as 10GB of RAM, so make sure you have plenty of Swap Space !

By default the ROCm build will compile for all supported GPU architectures and will take approximately 500 CPU hours. The Build time will reduce significantly if we limit the GPU Architecture/s against which we need to build by using the environment variable GPU_ARCHS as mentioned below.

# --------------------------------------
# Step1: clone source code
# --------------------------------------

mkdir -p ~/WORKSPACE/      # Or any folder name other than WORKSPACE
cd ~/WORKSPACE/
export ROCM_VERSION=6.1.0   # or 6.1.1 6.1.2
~/bin/repo init -u http://github.com/ROCm/ROCm.git -b roc-6.1.x -m tools/rocm-build/rocm-${ROCM_VERSION}.xml
~/bin/repo sync

# --------------------------------------
# Step 2: Prepare build environment
# --------------------------------------

# Option 1: Start a docker container
# Pulling required base docker images:
# Ubuntu20.04 built from ROCm/tools/rocm-build/docker/ubuntu20/Dockerfile
docker pull rocm/rocm-build-ubuntu-20.04:6.1
# Ubuntu22.04 built from ROCm/tools/rocm-build/docker/ubuntu22/Dockerfile
docker pull rocm/rocm-build-ubuntu-22.04:6.1

# Start docker container and mount the source code folder:
docker run -ti \
    -e ROCM_VERSION=${ROCM_VERSION} \
    -e CCACHE_DIR=$HOME/.ccache \
    -e CCACHE_ENABLED=true \
    -e DOCK_WORK_FOLD=/src \
    -w /src \
    -v $PWD:/src \
    -v /etc/passwd:/etc/passwd \
    -v /etc/shadow:/etc/shadow \
    -v ${HOME}/.ccache:${HOME}/.ccache \
    -u $(id -u):$(id -g) \
    <replace_with_required_ubuntu_base_docker_image> bash

# Option 2: Install required packages into the host machine
# For ubuntu20.04 system
cd ROCm/tools/rocm-build/docker/ubuntu20
bash install-prerequisites.sh
# For ubuntu22.04 system
cd ROCm/tools/rocm-build/docker/ubuntu22
bash install-prerequisities.sh

# --------------------------------------
# Step 3: Run build command line
# --------------------------------------

# Select GPU targets before building:
# When GPU_ARCHS is not set, default GPU targets supported by ROCm6.1 will be used.
# To build against a subset of GFX architectures you can use the below env variable.
# Support MI300 (gfx940, gfx941, gfx942).
export GPU_ARCHS="gfx942"               # Example
export GPU_ARCHS="gfx940;gfx941;gfx942" # Example

# Pick and run build commands in the docker container:
# Build rocm-dev packages
make -f ROCm/tools/rocm-build/ROCm.mk -j ${NPROC:-$(nproc)} rocm-dev
# Build all ROCm packages
make -f ROCm/tools/rocm-build/ROCm.mk -j ${NPROC:-$(nproc)} all
# list all ROCm components to find required components
make -f ROCm/tools/rocm-build/ROCm.mk list_components
# Build a single ROCm packages
make -f ROCm/tools/rocm-build/ROCm.mk T_rocblas

# Find built packages in ubuntu20.04:
out/ubuntu-20.04/20.04/deb/
# Find built packages in ubuntu22.04:
out/ubuntu-22.04/22.04/deb/

# Find built logs in ubuntu20.04:
out/ubuntu-20.04/20.04/logs/
# Find built logs in ubuntu22.04:
out/ubuntu-22.04/22.04/logs/
# All logs pertaining to failed components, end with .errrors extension.
out/ubuntu-22.04/22.04/logs/rocblas.errors          # Example
# All logs pertaining to building components, end with .inprogress extension.
out/ubuntu-22.04/22.04/logs/rocblas.inprogress  # Example
# All logs pertaining to passed components, use the component names.
out/ubuntu-22.04/22.04/logs/rocblas             # Example

Note: Overview for ROCm.mk

ROCm documentation

This repository contains the manifest file for ROCm releases, changelogs, and release information.

The default.xml file contains information for all repositories and the associated commit used to build the current ROCm release; default.xml uses the Manifest Format repository.

Source code for our documentation is located in the /docs folder of most ROCm repositories. The develop branch of our repositories contains content for the next ROCm release.

The ROCm documentation homepage is rocm.docs.amd.com.

Building the documentation

For a quick-start build, use the following code. For more options and detail, refer to Building documentation.

cd docs
pip3 install -r sphinx/requirements.txt
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html

Alternatively, CMake build is supported.

cmake -B build
cmake --build build --target=doc

Older ROCm releases

For release information for older ROCm releases, refer to the CHANGELOG.

rocthrust's People

Contributors

ajcodes avatar alliepiper avatar amdkila avatar andrewcorrigan avatar arvindcheru avatar bjude avatar brycelelbach avatar cgmb avatar dependabot[bot] avatar dkolsen-pgi avatar doctorcolinsmith avatar eidenyoshida avatar ex-rzr avatar germasch avatar griwes avatar lawruble13 avatar mfep avatar mfrancis95 avatar neon60 avatar nolmoonen avatar raydouglass avatar rmalavally avatar robsonrlemos avatar rocmmathlibrariesbot avatar rongou avatar saadrahim avatar samjwu avatar snektron avatar stanleytsang-amd avatar vincentsc avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rocthrust's Issues

Apps/sample code using rocThrust

Hi all,

I am looking at rocThrust library. Are there any open-source workloads/benchmarks/applications or sample codes that directly using rocThrust?

Thanks

[Feature]: Use `AnyNewerVersion` instead of `SameMajorVersion` for CMake compatibility

Suggestion Description

We've recently tried to build our library (ArborX) vs the newly release rocm 6.0 and rocthrust 3. We have

  find_package(rocthrust 2.10.5 REQUIRED CONFIG)

in our CMake config file. This fails for the new 3.0 rocthrust because it has SameMajorVersion compatibility mode.

I just wanted to check if that is intentional, or whether it could be switched to AnyNewerVersion so that downstream codes don't have to work around it.

Operating System

No response

GPU

No response

ROCm Component

No response

Feature request: Add support for NAVI22 and NAVI23 i.e. gfx1031 and gfx1032

Please add support for NAVI22 and NAVI23 i.e. gfx1031 and gfx1032 to enable HIP development leveraging rocThrust on Notebooks as ALL AMD mobile GPUs are NAVI22 or NAVI23 i.e. gfx1031 or gfx1032.

Relevant engineering and scientific software uses Thrust. Porting this software to AMD hardware requires rocThrust support on development hardware. Please align with the rocPRIM developers as it`s a dependency.

My target operating system is RHEL9 = AlmaLinux 9, Rocky Linux ...

Need to set up CMAKE_CXX_FLAGS to cmake rocThrust?

Hi all, I am starting to compile rocThrust.

I tried to cmake with:

cmake ../. -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc

Then the cmake gives error like:

-- The CXX compiler identification is Clang 12.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/bin/hipcc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Setting build type to 'Release' as none was specified.
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success


----------------------------------- ERROR -----------------------------------

  • The variable 'CMAKE_CXX_FLAGS' should only be set by the cmake toolchain,
  • either by calling 'cmake -DCMAKE_CXX_FLAGS=" -Wno-unused-command-line-argument"' or
  • set in a toolchain file and added with
  • 'cmake -DCMAKE_TOOLCHAIN_FILE='.
    -----------------------------------------------------------------------------

...

Seems that cmake needs to set up CMAKE_CXX_FLAGS to something to successfully generate the correct make file, currently "make -j" does not compile anything.

But I did not see any instruction to set up CMAKE_CXX_FLAGS? Is there any other details/documents I missed to compile rocThrust?

Thanks!

How to use rocThrust and OpenMP in the same time

I followed the example https://github.com/ROCm-Developer-Tools/HIP-Examples/tree/master/openmp-helloworld and built it successfully with hipcc. But when I tried to add rocthrust containers in the source file, errors occurred.

I have also tried to build this example with only rocthrust and it works fine.

My environment:

  • ROCm 3.9
  • CMake 3.16.5

Error messages:

In file included from openmp_helloworld.cpp:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.h:29:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.h:240:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:23:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.h:35:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:22:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/uninitialized_fill.h:274:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:26:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/adl/uninitialized_fill.h:44:
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:62:13: error: no matching function for call to 'operator new'
            ::new(static_cast<void*>(&out)) value_type(value);
            ^~
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:66:17: note: in instantiation of function template specialization 'thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>::operator()<unsigned long>' requested here
                f(tile_base + idx);
                ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:95:44: note: in instantiation of function template specialization 'thrust::hip_rocprim::__parallel_for::kernel<256, 1, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here        hipLaunchKernelGGL(HIP_KERNEL_NAME(kernel<block_size, items_per_thread, F, Size>),
                                           ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:131:50: note: in instantiation of function template specialization 'thrust::hip_rocprim::__parallel_for::parallel_for<thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here            hipError_t  status = __parallel_for::parallel_for(count, f, stream);
                                                 ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:119:12: note: in instantiation of member function 'thrust::hip_rocprim::parallel_for(execution_policy<thrust::hip_rocprim::tag> &, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long)::workaround::par' requested here
    struct workaround
           ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:76:18: note: in instantiation of function template specialization 'thrust::hip_rocprim::parallel_for<thrust::hip_rocprim::tag, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here
    hip_rocprim::parallel_for(policy, functor_t(first, x), count);
                 ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:54:10: note: (skipping 3 contexts in backtrace; use -ftemplate-backtrace-limit=0 to see all)
  return uninitialized_fill_n(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, n, x);
         ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:251:3: note: in instantiation of function template specialization 'thrust::detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
  default_construct_range(m_allocator, first.base(), n);
  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:220:15: note: in instantiation of member function 'thrust::detail::contiguous_storage<int, thrust::device_allocator<int>>::default_construct_n' requested here
    m_storage.default_construct_n(begin(), size());
              ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:65:3: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::default_init' requested here
  default_init(n);
  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:95:8: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::vector_base' requested here
      :Parent(n) {}
       ^
openmp_helloworld.cpp:41:32: note: in instantiation of member function 'thrust::device_vector<int, thrust::device_allocator<int>>::device_vector' requested here
    thrust::device_vector<int> v(10);
                               ^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:134:7: note: candidate function not viable: call to __host__ function from __device__ function
void* operator new(std::size_t, const std::nothrow_t&) _GLIBCXX_USE_NOEXCEPT
      ^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:168:14: note: candidate function not viable: call to __host__ function from __device__ function
inline void* operator new(std::size_t, void* __p) _GLIBCXX_USE_NOEXCEPT
             ^
/public/opt/devtoolset-7/root/usr/include/c++/7/new:120:7: note: candidate function not viable: requires 1 argument, but 2 were provided
void* operator new(std::size_t) _GLIBCXX_THROW (std::bad_alloc)
      ^
note: candidate function not viable: requires 1 argument, but 2 were provided
In file included from openmp_helloworld.cpp:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.h:29:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.h:240:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:22:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/copy_construct_range.h:46:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/copy_construct_range.inl:21:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/copy.h:90:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/copy.inl:21:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/copy.h:58:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/copy.inl:23:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/transform.h:724:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/transform.inl:25:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/transform.h:105:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/generic/transform.inl:19:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/for_each.h:279:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/detail/for_each.inl:27:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/detail/adl/for_each.h:44:
In file included from /public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/for_each.h:35:
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:141:17: error: no matching function for call to object of type 'thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>'
                f(idx);
                ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/parallel_for.h:119:12: note: in instantiation of member function 'thrust::hip_rocprim::parallel_for(execution_policy<thrust::hip_rocprim::tag> &, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long)::workaround::seq' requested here
    struct workaround
           ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:76:18: note: in instantiation of function template specialization 'thrust::hip_rocprim::parallel_for<thrust::hip_rocprim::tag, thrust::hip_rocprim::__uninitialized_fill::functor<thrust::device_ptr<int>, int>, unsigned long>' requested here
    hip_rocprim::parallel_for(policy, functor_t(first, x), count);
                 ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/uninitialized_fill.inl:54:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::uninitialized_fill_n<thrust::hip_rocprim::tag, thrust::device_ptr<int>, unsigned long, int>' requested here
  return uninitialized_fill_n(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, n, x);
         ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:94:11: note: in instantiation of function template specialization 'thrust::uninitialized_fill_n<thrust::hip_rocprim::tag, thrust::device_ptr<int>, unsigned long, int>' requested here
  thrust::uninitialized_fill_n(allocator_system<Allocator>::get(a), p, n, typename pointer_element<Pointer>::type());
          ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/allocator/default_construct_range.inl:105:35: note: in instantiation of function template specialization 'thrust::detail::allocator_traits_detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
  return allocator_traits_detail::default_construct_range(a,p,n);
                                  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/contiguous_storage.inl:251:3: note: in instantiation of function template specialization 'thrust::detail::default_construct_range<thrust::device_allocator<int>, thrust::device_ptr<int>, unsigned long>' requested here
  default_construct_range(m_allocator, first.base(), n);
  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:220:15: note: in instantiation of member function 'thrust::detail::contiguous_storage<int, thrust::device_allocator<int>>::default_construct_n' requested here
    m_storage.default_construct_n(begin(), size());
              ^
/public/opt/rocm/rocm-3.9.1/include/thrust/detail/vector_base.inl:65:3: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::default_init' requested here
  default_init(n);
  ^
/public/opt/rocm/rocm-3.9.1/include/thrust/device_vector.h:95:8: note: in instantiation of member function 'thrust::detail::vector_base<int, thrust::device_allocator<int>>::vector_base' requested here
      :Parent(n) {}
       ^
openmp_helloworld.cpp:41:32: note: in instantiation of member function 'thrust::device_vector<int, thrust::device_allocator<int>>::device_vector' requested here
    thrust::device_vector<int> v(10);
                               ^
/public/opt/rocm/rocm-3.9.1/include/thrust/system/hip/detail/uninitialized_fill.h:58:41: note: candidate template ignored: substitution failure [with Size = unsigned long]
        void THRUST_HIP_DEVICE_FUNCTION operator()(Size idx)
                                        ^
2 errors generated when compiling for host.

openmp_helloworld.cpp:

// HIP header
#include <hip/hip_runtime.h>
#include <thrust/device_vector.h>

//OpenMP header
#include <omp.h>

#include <stdio.h>
#include <stdlib.h>

__global__
void hip_helloworld(unsigned omp_id)
{
    printf("Hello World... from HIP thread = %u\n", omp_id);
}

int main(int argc, char* argv[])
{
    thrust::device_vector<int> v(10);

    // No errors
    //thrust::device_vector<int> v;

    // Beginning of parallel region
    #pragma omp parallel
    {   
        auto tid = omp_get_thread_num();
        printf("Hello World... from OMP thread = %d\n", tid);

        hipLaunchKernelGGL(hip_helloworld, dim3(1), dim3(1), 0, 0, tid);
    }
    // Ending of parallel region

    hipLaunchKernelGGL(hip_helloworld, dim3(1), dim3(1), 0, 0, /*id=*/ 0);
    hipStreamSynchronize(0);

    printf("PASSED!\n");
    return 0;
}

Commands:

hipcc -xhip -fopenmp=libomp openmp_helloworld.cpp

CMake gives the same error. This is my cmake file:

project(openmp_helloworld)

cmake_minimum_required(VERSION 3.16)

# Search for rocm in common locations
if(NOT DEFINED ROCM_PATH)
    if(DEFINED ENV{ROCM_PATH})
        set(ROCM_PATH $ENV{ROCM_PATH} CACHE PATH "ROCm path")
    elseif(DEFINED ENV{HIP_PATH})
        set(ROCM_PATH "$ENV{HIP_PATH}/.." CACHE PATH "ROCm path")
    else()
        set(ROCM_PATH "/opt/rocm" CACHE PATH "ROCm path")
    endif()
endif()

set(HIP_PATH "${ROCM_PATH}/hip" CACHE PATH "HIP path")

list(APPEND CMAKE_PREFIX_PATH ${ROCM_PATH} ${HIP_PATH})
find_package(hip REQUIRED)
find_package(rocprim REQUIRED CONFIG)
find_package(rocthrust REQUIRED CONFIG)

# Find OpenMP
find_package(OpenMP REQUIRED)

# Source files
set(CPP_SOURCES ${CMAKE_SOURCE_DIR}/openmp_helloworld.cpp)

# Preparing the executable
add_executable(test_openmp_helloworld ${CPP_SOURCES})
target_link_libraries(test_openmp_helloworld
    PRIVATE hip::device
            roc::rocthrust
            OpenMP::OpenMP_CXX
)

reference to __host__ function 'free'/'malloc' in __host__ __device__ function

Compiling a HIP program shows the following errors. Thanks for your suggestion.

In file included from /opt/rocm-6.0.0/include/thrust/system/detail/adl/malloc_and_free.h:26:
/opt/rocm-6.0.0/include/thrust/system/detail/sequential/malloc_and_free.h:45:8: error: reference to __host__ function 'free' in __host__ __device__ function
  std::free(thrust::raw_pointer_cast(ptr));


In file included from /opt/rocm-6.0.0/include/thrust/system/detail/adl/malloc_and_free.h:26:
/opt/rocm-6.0.0/include/thrust/system/detail/sequential/malloc_and_free.h:37:15: error: reference to __host__ function 'malloc' in __host__ __device__ function
  return std::malloc(n);

correct const modifiers for merge sort

Some of the const modifiers for Thrust seem inconsistent with rocPRIM/rocThrust
Following compile-time error:
In file included from /pytorch/aten/src/THH/THHTensorMode.hip:7:
In file included from /opt/rocm/include/thrust/device_ptr.h:26:
In file included from /opt/rocm/include/thrust/memory.h:25:
In file included from /opt/rocm/include/thrust/detail/reference.h:173:
In file included from /opt/rocm/include/thrust/detail/reference.inl:23:
In file included from /opt/rocm/include/thrust/system/detail/adl/get_value.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/get_value.h:23:
In file included from /opt/rocm/include/thrust/system/detail/adl/assign_value.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/assign_value.h:25:
In file included from /opt/rocm/include/thrust/system/hip/detail/copy.h:99:
In file included from /opt/rocm/include/thrust/system/hip/detail/internal/copy_device_to_device.h:35:
In file included from /opt/rocm/include/thrust/system/hip/detail/transform.h:44:
In file included from /opt/rocm/rocprim/include/rocprim/rocprim.hpp:55:
In file included from /opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:30:
/opt/rocm/rocprim/include/rocprim/device/detail/device_merge_sort.hpp:365:25: error: no matching function for call to object of type 'const ThrustHalfLess'
const bool ab = compare_function(rocprim::get<0>(a), rocprim::get<0>(b));
^~~~~~~~~~~~~~~~
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:58:5: note: in instantiation of function template specialization 'rocprim::detail::block_sort_kernel_impl<256, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
block_sort_kernel_impl(
^
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:179:25: note: in instantiation of function template specialization 'rocprim::detail::block_sort_kernel<256, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
HIP_KERNEL_NAME(block_sort_kernel<block_size>),
^
/opt/rocm/rocprim/include/rocprim/device/device_merge_sort.hpp:429:20: note: in instantiation of function template specialization 'rocprim::detail::merge_sort_impl<rocprim::default_config, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return detail::merge_sort_impl(
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:89:23: note: in instantiation of function template specialization 'rocprim::merge_sort<rocprim::default_config, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return rocprim::merge_sort(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:123:36: note: in instantiation of function template specialization 'thrust::hip_rocprim::__merge_sort::dispatch<thrust::detail::integral_constant<bool, true> >::doit<thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, long, ThrustHalfLess>' requested here
status = dispatch<SORT_ITEMS>::doit(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:351:19: note: in instantiation of function template specialization 'thrust::hip_rocprim::__merge_sort::merge_sort<thrust::detail::integral_constant<bool, true>, thrust::hip_rocprim::execution_policy<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base> >, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
__merge_sort::merge_sort<SORT_ITEMS>(policy,
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:428:19: note: in instantiation of function template specialization 'thrust::hip_rocprim::__smart_sort::smart_sort<thrust::detail::integral_constant<bool, true>, thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
__smart_sort::smart_sort<detail::true_type, Derived, KeysIt, ValuesIt, CompareOp>
^
/opt/rocm/include/thrust/system/hip/detail/sort.h:448:16: note: in instantiation of function template specialization 'thrust::hip_rocprim::stable_sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
hip_rocprim::stable_sort_by_key(policy, keys_first, keys_last, values, compare_op);
^
/opt/rocm/include/thrust/detail/sort.inl:115:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
return sort_by_key(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), keys_first, keys_last, values_first, comp);
^
/pytorch/aten/src/THH/generic/THHTensorMode.hip:43:11: note: in instantiation of function template specialization 'thrust::sort_by_key<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, thrust::detail::normal_iterator<thrust::device_ptrc10::Half >, thrust::detail::normal_iterator<thrust::device_ptr >, ThrustHalfLess>' requested here
thrust::sort_by_key(
^
/pytorch/aten/src/THH/THHTensorMode.cuh:11:35: note: candidate function not viable: 'this' argument has type 'const ThrustHalfLess', but method is not marked const
host device inline bool operator()(const at::Half& lhs, const at::Half& rhs) {
^

rocm-5.0.2 async_copy test failed on 6700xt

Summary

I compiled rocThrust-rocm-5.0.2 for gfx1031, and find 1 failed (async_copy) test among 113. Other tests passed.

Environment

Hardware description
GPU Navy_flounder [Radeon RX 6700XT]
CPU AMD Ryzen 9 5950X
Software version
Linux 5.17.3
Distribution Gentoo
ROCK Upstream Kernel
ROCR v5.0.2
Host Compiler gcc-11.2
Device Compiler hipcc-5.0.2

Log

Command: "/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-5.0.2_build/test/async_copy.hip"
Directory: /ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-5.0.2_build/test
"async_copy.hip" start time: May 02 15:50 CST
Output:
----------------------------------------------------------
Running main() from /opt/build/portage/dev-cpp/gtest-1.11.0/work/googletest-aee0f9d9b5b87796ee8a0ab26b7587ec30e8858e/googletest/src/gtest_main.cc
[==========] Running 32 tests from 8 test suites.
[----------] Global test environment set-up.
[----------] 4 tests from AsyncCopyTests/0, where TypeParam = Params<short>
[ RUN      ] AsyncCopyTests/0.TestAsyncTriviallyRelocatableElementsHostToDevice
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:78: Failure
Expected equality of these values:
  h0
    Which is: { -32768 }
  d0
    Which is: { 0 }
Google Test trace:
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:66: with seed= 1
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:63: with size = 1
/ext4-disk/build/portage/sci-libs/rocThrust-5.0.2/work/rocThrust-rocm-5.0.2/test/test_async_copy.cpp:85: with device_id= 0
[  FAILED  ] AsyncCopyTests/0.TestAsyncTriviallyRelocatableElementsHostToDevice, where TypeParam = Params<short> (17005 ms)

Please enable two factor authentication in your github account

@sbalint98;@Maetveis;@MathiasMagnus;@nolmoonen

We are going to enforce two factor authentication in (https://github.com/ROCmSoftwarePlatform/) organization on 29th April, 2022 .
Since we identified you as outside collaborator for ROCmSoftwarePlatform organization, you need to enable two factor authentication in your github account else you shall be removed from the organization after the enforcement.
Please skip if already done.

To set up two factor authentication, please go through the steps in below link:

https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/configuring-two-factor-authentication

Please email "[email protected]" for queries

Fixing compile error: no member named 'init_offset_scan_state_kernel' ... ?

When I try to compile rocThrust, I get an error: no member named 'init_offset_scan_state_kernel' ...

...
In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/detail/scan.inl:29: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/detail/adl/scan_by_key.h:44: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/detail/scan_by_key.h:36: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/execution_policy.h:81: /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/system/hip/detail/set_operations.h:956:61: error: no member named 'init_offset_scan_state_kernel' in namespace 'rocprim::detail'; did you mean 'init_lookback_scan_state_kernel'? hipLaunchKernelGGL(HIP_KERNEL_NAME(rocprim::detail::init_offset_scan_state_kernel), ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ init_lookback_scan_state_kernel /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:199:30: note: expanded from macro 'HIP_KERNEL_NAME' #define HIP_KERNEL_NAME(...) __VA_ARGS__ ^~~~~~~~~~~ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:251:74: note: expanded from macro 'hipLaunchKernelGGL' #define hipLaunchKernelGGL(kernelName, ...) hipLaunchKernelGGLInternal((kernelName), __VA_ARGS__) ^~~~~~~~~~ /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:248:9: note: expanded from macro 'hipLaunchKernelGGLInternal' kernelName<<<(numBlocks), (numThreads), (memPerBlock), (streamId)>>>(__VA_ARGS__); \ ^~~~~~~~~~ /opt/rocm-5.4.0/include/rocprim/device/detail/device_scan_common.hpp:76:60: note: 'init_lookback_scan_state_kernel' declared here __launch_bounds__(ROCPRIM_DEFAULT_MAX_BLOCK_SIZE) void init_lookback_scan_state_kernel( ^ In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/test/test_zip_iterator.cpp:18: In file included from /home/klaus/Programme/rocalution_install/rocThrust-rocm-5.1.3/thrust/../thrust/copy.h:512:
...

This error comes up several times before the compile process is stopped.

I tested rocPRIM after the installation and all tests were successful.

How to approach this error? Is it rocPRIM related or rocThrust related?

hipGetLastError(); -Wunused-result

rocThrust is triggering a warning in a call to hipGetLastError

/rocthrust/thrust/system/hip/detail/malloc_and_free.h:71:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
        hipGetLastError(); // Clear global hip error state.
        ^~~~~~~~~~~~~~~

This is with:

% clang --version
clang version 12.0.0 (/llvm/llvm-project/clang 009181e63cd6a46681319335b18180d91d05f241)

C++ compiler should not be enforced globally

When doing add_subdirectory(ThirdParty/rocThrust) from a project, rocThrust fails to install rocprim with the following error message:

CMake Error at cmake/VerifyCompiler.cmake:29 (message):
  On ROCm platform 'hipcc' or HIP-aware Clang must be used as C++ compiler.
Call Stack (most recent call first):
  CMakeLists.txt:82 (include)

Please correct me, if I'm wrong: this message requests that the CMAKE_CXX_COMPILER must be hipcc. I believe this requirement contradicts with the way how modern CMake is designed. Starting from CMake 3.18/3.23, CUDA and HIP are CMake "languages" that could be enabled with e.g. enable_language(HIP). The hipcc compiler is then provided by CMAKE_HIP_COMPILER, independently of the main CMAKE_CXX_COMPILER. Furthermore, CMake chooses the C++ compiler variant, depending on the file extension (.cpp, .cu, .hip, etc.). In order to comply with this flexibility of CMake, rocThrust should not post this error, and instead check for CMAKE_HIP_COMPILER, and use it internally if it wishes so.

CMake issue when mixing Fortran/c++

I'm trying to build a library using roc::rocthrust. I have the following lines in my CMakeLists.txt

...
find_package(rocprim REQUIRED)
find_package(rocthrust REQUIRED)
...
target_link_libraries(strumpack PUBLIC roc::hipblas roc::rocblas roc::rocsolver roc::rocthrust)

The problem is that the roc::rocthrust target also adds flags to the fortran compilation commands:

gfortran: error: unrecognized command line option ‘--hip-device-lib-path=/opt/rocm-3.8.0/lib’
gfortran: error: unrecognized command line option ‘--cuda-gpu-arch=gfx900’
gfortran: error: unrecognized command line option ‘--cuda-gpu-arch=gfx906’

Version of rocThrust compatible with older versions of ROCm?

I am attempting to run some benchmarks that use thrust in them through the gem5 simulator, which currently only supports ROCm 1.6.x. While hipify'ing the applications, I need to port the thrust calls to use rocThrust (which I did). However, after looking through the release tags and the commits for rocThrust, it appears that rocThrust only supports ROCm 2.5+. Is this correct? Or is there an (older) commit that has support for older versions of ROCm?

Thanks,
Matt

How can I set the path etc. to an alternative rocThrust installation to be used by the compiler?

I installed architecture specific versions of rocPRIM and rocThrust in the subdirectory:

/opt/rocm-5.4.0/myspecialrocm

When I try to compile an application for this architecture, the compiler picks the wrong version from the Linux distribution specific installation. When compiling rocThust itself, setting -DCMAKE_NO_SYSTEM_FROM_IMPORTED=TRUE and -DCMAKE_INSTALL_PREFIX=/opt/rocm-5.4.0/myspecialrocm did the trick.

How can I set the path etc. to this alternative rocThrust installation to be used by the compiler?

CMake dependency on rocprim missing

I have a program that builds v.s. rocthrust via CMake.

My build fails, because rocthrust links v.s. roc::rocprim_hip. (in rocthrust-targets.cmake)
So apparently, rocthrust depends on rocprim, but it doesn't require the rocprim package. I can fix this, by adding find_package(rocprim) to rocthrust-config.cmake.

And as a comment, I was a bit confused that the target is rocthrust and not roc::rocthrust, which is not really consistent e.g. with roc::rocprim.

How to compile a program using rocThrust on Ubuntu 20.04?

On Ubuntu 20.04 system with rocm 5.2 compiling code using rocThrust fails (Score.cpp has "#include <thrust/host_vector.h>" which causes compile to fail):

clang++ -D__HIP_PLATFORM_HCC__= -D__HIP_PLATFORM_AMD__= -I/opt/rocm-5.2.0/include -I/opt/rocm-5.2.0/llvm/bin/../lib/clang/14.0.0 -I/opt/rocm-5.2.0/hsa/include -c -std=c++20 -g -O0 -pg Score.cpp
In file included from Score.cpp:11:
In file included from /opt/rocm-5.2.0/include/thrust/host_vector.h:27:
In file included from /opt/rocm-5.2.0/include/thrust/detail/vector_base.h:25:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/normal_iterator.h:27:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/iterator_adaptor.h:36:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/iterator_facade.h:37:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/iterator_facade_category.h:22:
In file included from /opt/rocm-5.2.0/include/thrust/iterator/detail/device_system_tag.h:23:
In file included from /opt/rocm-5.2.0/include/thrust/system/cuda/detail/execution_policy.h:35:
/opt/rocm-5.2.0/include/thrust/system/cuda/config.h:33:10: fatal error: 'cub/util_namespace.cuh' file not found
#include <cub/util_namespace.cuh>
^~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.

inclusive_scan_by_key does not compile if HIP_TEMPLATE_KERNEL_LAUNCH is defined

Hello,

this is illustrated by trying to compile this 4-liner

#define HIP_TEMPLATE_KERNEL_LAUNCH
#include <thrust/device_vector.h>
device_ptr<float4> a;
device_ptr<uint> b : c < inclusive_scan_by_key(b, b, a, a);

obtained by applying Creduce to a larger piece of code.

With HIP_TEMPLATE_KERNEL_LAUNCH, this fails with:

error: no matching function for call to 'hipLaunchKernelGGL

Without the define, inclusive_scan_by_key compiles correctly. This is a regression since in the past (at least in 4.2.0) this used to work.

More -Wunused-result warnings

I am currently at commit 10c7281. Just like in a previously reported issue #163, I am encountering the following warnings

/rocThrust/thrust/system/hip/detail/util.h:116:9: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
        hipStreamSynchronize(stream(policy));
/rocThrust/thrust/system/hip/memory_resource.h:59:17: warning: ignoring return value of function declared with 'nodiscard' attribute [-Wunused-result]
                hipGetLastError(); // Clear the HIP global error state.

Hit with "clang-11: error: Unsupported CUDA gpu architecture: gfx900:xnack-" while making examples

I want to build the examples and I'm following the instructions in the README and my error message is below.
Seems closest to issue #116 where use of "-x hip" resolved it, but I need help on where to make the change.

root@b5f40797d3af:~/rocThrust/build# make
--
-- ******** Summary ********
-- General:
--   System                : Linux
--   HIP ROOT              :
--   C++ compiler          : /opt/rocm/hip/bin/hipcc
--   C++ compiler version  : 11.0.0
--   CXX flags             : -Wno-unused-command-line-argument
--   Build type            : Release
--   Install prefix        : /opt/rocm
--   Device targets        : gfx900:xnack-;gfx906:xnack-;gfx908:xnack-
--
--   DISABLE_WERROR        : ON
--   DOWNLOAD_ROCPRIM      : OFF
--   BUILD_TEST            : OFF
--   BUILD_EXAMPLES        : ON
--   BUILD_BENCHMARKS      : OFF
-- Building examples
-- Configuring done
-- Generating done
-- Build files have been written to: /root/rocThrust/build
[  1%] Building CXX object examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o
clang-11: error: Unsupported CUDA gpu architecture: gfx900:xnack-
clang-11: error: Unsupported CUDA gpu architecture: gfx906:xnack-
clang-11: error: Unsupported CUDA gpu architecture: gfx908:xnack-
examples/CMakeFiles/example_thrust_monte_carlo.dir/build.make:62: recipe for target 'examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o' failed
make[2]: *** [examples/CMakeFiles/example_thrust_monte_carlo.dir/monte_carlo.cu.o] Error 1
CMakeFiles/Makefile2:197: recipe for target 'examples/CMakeFiles/example_thrust_monte_carlo.dir/all' failed
make[1]: *** [examples/CMakeFiles/example_thrust_monte_carlo.dir/all] Error 2
Makefile:149: recipe for target 'all' failed
make: *** [all] Error 2

adjacent_difference requires result_type in BinaryOp not available

Compile time error:

In file included from /pytorch/aten/src/ATen/native/hip/Unique.hip:13:
In file included from /opt/rocm/include/thrust/adjacent_difference.h:245:
In file included from /opt/rocm/include/thrust/detail/adjacent_difference.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/adl/adjacent_difference.h:44:
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:284:44: error: no type named 'result_type' in '(lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:
5)'
using result_type = typename BinaryOp::result_type;
~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:374:9: note: in instantiation of function template specialization 'thrust::hip_rocprim::__adjacent_differen
ce::doit_step<long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
doit_step(d_temp_storage,
^
/opt/rocm/include/thrust/system/hip/detail/adjacent_difference.h:426:32: note: in instantiation of function template specialization 'thrust::hip_rocprim::__adjacent_differe
nce::adjacent_difference<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Un
ique.hip:183:5)>' requested here
__adjacent_difference::adjacent_difference<Derived, InputIt, OutputIt, BinaryOp>
^
/opt/rocm/include/thrust/detail/adjacent_difference.inl:54:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::adjacent_difference<thrust::
detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
return adjacent_difference(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, result, binary_op);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:49:13: note: in instantiation of function template specialization 'thrust::adjacent_difference<thrust::detail::execute_with_all
ocator<THCThrustAllocator, execute_on_stream_base>, long *, long *, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:183:5)>' requested here
thrust::adjacent_difference(policy, data, data + num_inp, inv_loc_ptr, not_equal);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:170:48: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::compute_unique<thrust::d
etail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long, (lambda at /pytorch/aten/src/ATen/native/hip/Unique.hip:173:5), (lambda at /pytorch/aten/sr
c/ATen/native/hip/Unique.hip:183:5)>' requested here
std::tie(inverse_indices, counts, num_out) = compute_unique(
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:225:12: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::unique_dim_cuda_template
' requested here
return unique_dim_cuda_template<scalar_t>(self, dim, false, return_inverse, return_counts);
^

How to include a custom header library version?

hipcc --offload-arch=gfx1031 -I/opt/rocm-5.4.0/myspecialrocm/include/thrust -I/opt/rocm-5.4.0/myspecialrocm/include/rocprim discrete_voronoi.cu -o testvoronoi

I tried to redirect hipcc to my custom version of rocThrust but it mixes (see >> comments) the custom and the default installations when I try to compile a program.

hipcc --offload-arch=gfx1031 -I/opt/rocm-5.4.0/myspecialrocm/include/thrust -I/opt/rocm-5.4.0/myspecialrocm/include/rocprim discrete_voronoi.cu -o testvoronoi

In file included from :1:
In file included from /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/include/__clang_hip_runtime_wrapper.h:115:
In file included from /opt/rocm-5.4.0/llvm/lib/clang/15.0.0/include/__clang_hip_math.h:22:
In file included from /opt/rocm-5.4.0/myspecialrocm/include/thrust/limits.h:10: >> using custom version, ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config.h:23: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config/config.h:26: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/thrust/detail/config/compiler.h:29: >> using default version, not ok
In file included from /opt/rocm-5.4.0/include/hip/hip_runtime.h:62:
In file included from /opt/rocm-5.4.0/include/hip/amd_detail/amd_hip_runtime.h:105:
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:5243:25: error: use of undeclared identifier 'UINT_MAX'
size_t size __dparm(UINT_MAX));
^
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:6790:75: error: use of undeclared identifier 'UINT_MAX'
const void* devPtr, size_t size = UINT_MAX) {
^
/opt/rocm-5.4.0/include/hip/hip_runtime_api.h:6797:75: error: use of undeclared identifier 'UINT_MAX'
const struct hipChannelFormatDesc& desc, size_t size = UINT_MAX) {
^

The example is an example included in rocThrust and was compiled correctly when I installed rocThrust and opted to compile the examples, now I want to make use of the custom version to port software to work on my notebook but am not able to compile a program. I assume the resulting errors are caused by the parts pulled from the default rocThrust rocm stack that doesn't support my GPU.

What's the correct way to enforce the use of the custom version?

Slow sort on gfx1036 with custom operator on ROCm 5.6.0

I have an AMD Ryzen 9 7900X 12-core / 24-threads processor and I'm trying to run our code on both the CPU and iGP 0d:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Raphael [1002:164e] (rev c2) for comparison.

Some of our code depends on Thrust, for which we are using rocThrust for HIP (1.17.2p0 as reported by the one shipping with ROCm 5.6.0, official .deb from AMD).

We have noticed that a sort_by_key with custom operator is extremely slow on GPU, averaging 30ms for 1.4M data points compared to ~20ms on CPU with the OpenMP backend.

Sample code for testing can be found in my titanxstall repo, rocthrust-perf branch. The results can be tested as:

git clone [email protected]:Oblomov/titanxstall -b rocthrust-perf thrust-perf
cd thrust-perf
make clean ; make test
make clean ; make cpu=1 openmp=1 test

and comparing the results between the HIP and OpenMP runs. The sort is run 5 times, and on GPU the first run should be discarded as it includes host-to-device data transfers due to the use of managed allocations. (The original code does not use managed allocations, but similar performance differences have been observed, so this doesn't seem to be related.)

scatter cannot be used w/ long*

Compile time error from PyTorch:

In file included from /pytorch/aten/src/ATen/native/hip/Unique.hip:5:
In file included from /opt/rocm/include/thrust/execution_policy.h:31:
In file included from /opt/rocm/include/thrust/system/cpp/execution_policy.h:64:
In file included from /opt/rocm/include/thrust/system/cpp/detail/sort.h:22:
In file included from /opt/rocm/include/thrust/system/detail/sequential/sort.h:63:
In file included from /opt/rocm/include/thrust/system/detail/sequential/sort.inl:23:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_primitive_sort.h:55:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_primitive_sort.inl:21:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_radix_sort.h:55:
In file included from /opt/rocm/include/thrust/system/detail/sequential/stable_radix_sort.inl:20:
In file included from /opt/rocm/include/thrust/copy.h:513:
In file included from /opt/rocm/include/thrust/detail/copy_if.h:74:
In file included from /opt/rocm/include/thrust/detail/copy_if.inl:20:
In file included from /opt/rocm/include/thrust/system/detail/generic/copy_if.h:63:
In file included from /opt/rocm/include/thrust/system/detail/generic/copy_if.inl:31:
In file included from /opt/rocm/include/thrust/scan.h:1563:
In file included from /opt/rocm/include/thrust/detail/scan.inl:29:
In file included from /opt/rocm/include/thrust/system/detail/adl/scan_by_key.h:44:
In file included from /opt/rocm/include/thrust/system/hip/detail/scan_by_key.h:33:
In file included from /opt/rocm/include/thrust/system/hip/execution_policy.h:41:
In file included from /opt/rocm/include/thrust/system/hip/detail/count.h:35:
In file included from /opt/rocm/include/thrust/system/hip/detail/reduce.h:42:
In file included from /opt/rocm/include/thrust/device_vector.h:26:
In file included from /opt/rocm/include/thrust/detail/vector_base.h:547:
In file included from /opt/rocm/include/thrust/detail/vector_base.inl:25:
In file included from /opt/rocm/include/thrust/equal.h:237:
In file included from /opt/rocm/include/thrust/detail/equal.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/generic/equal.h:47:
In file included from /opt/rocm/include/thrust/system/detail/generic/equal.inl:21:
In file included from /opt/rocm/include/thrust/mismatch.h:259:
In file included from /opt/rocm/include/thrust/detail/mismatch.inl:27:
In file included from /opt/rocm/include/thrust/system/detail/generic/mismatch.h:57:
In file included from /opt/rocm/include/thrust/system/detail/generic/mismatch.inl:21:
In file included from /opt/rocm/include/thrust/find.h:384:
In file included from /opt/rocm/include/thrust/detail/find.inl:25:
In file included from /opt/rocm/include/thrust/system/detail/generic/find.h:62:
In file included from /opt/rocm/include/thrust/system/detail/generic/find.inl:19:
In file included from /opt/rocm/include/thrust/reduce.h:784:
In file included from /opt/rocm/include/thrust/detail/reduce.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/generic/reduce_by_key.h:88:
In file included from /opt/rocm/include/thrust/system/detail/generic/reduce_by_key.inl:30:
In file included from /opt/rocm/include/thrust/scatter.h:422:
In file included from /opt/rocm/include/thrust/detail/scatter.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/adl/scatter.h:44:
/opt/rocm/include/thrust/system/hip/detail/scatter.h:50:20: error: type 'long *' cannot be used prior to '::' because it has no members
typedef typename ItemsIt::value_type ItemsTy;
^
/opt/rocm/include/thrust/detail/scatter.inl:45:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::scatter<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, const long *, long *>' requested here
return scatter(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, map, output);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:51:13: note: in instantiation of function template specialization 'thrust::scatter<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, long *, const long *, long *>' requested here
thrust::scatter(policy, inv_loc_ptr, inv_loc_ptr + num_inp, sorted_indices_ptr, inverse_indices_ptr);
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:105:48: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::compute_unique<thrust::detail::execute_with_allocator<THCThrustAllocator, execute_on_stream_base>, unsigned char, thrust::equal_to, thrust::not_equal_to >' requested here
std::tie(inverse_indices, counts, num_out) = compute_unique(
^
/pytorch/aten/src/ATen/native/hip/Unique.hip:207:46: note: in instantiation of function template specialization 'at::native::(anonymous namespace)::unique_cuda_template' requested here
std::tie(output, inverse, std::ignore) = unique_cuda_template<scalar_t>(self, false, return_inverse, false);
^

[Issue]: optional.h uses non-member function with `->`

Problem Description

A recent clang change (llvm/llvm-project#90152) revealed a bug in rocThrust develop branch

this->construct(std::forward<Args>(args)...);

rocThrust/thrust/../thrust/optional.h:2756:11: error: no member named 'construct' in 'optional<type-parameter-0-0 &>'

In this->construct, the arrow operator cannot be overloaded, therefore it can only access its own member function. However, construct is not its member function, since optional<T &> does not define construct. Although optional<T> inherits construct from its base classes, its member construct is not inherited by optional<T &> since they do not have inherit relation.

Since optional<T &> behaves like a delegate of its member m_value, my guess of intention of this->construct is to
perform m_value->construct, i.e., using the overloaded arrow operator. To make the arrow operator taking effect, (*this)-> can be used. Therefor, the fix is

(*this)->construct(std::forward<Args>(args)...);

Operating System

Ubuntu 22.04

CPU

any

GPU

AMD Radeon Pro W7900

ROCm Version

ROCm 6.1.0

ROCm Component

No response

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

keyvaluepair does not provide access to < operator

Keyvaluepair is used in min functions and with the less than (<) operator. This currently does not work:

/pytorch/caffe2/operators/hip/rmac_regions_op.hip:44:12: error: invalid operands to binary expression ('KeyValuePair' (aka 'key_value_pair<int, float>') and 'KeyValuePair')
if (kv < min_kv) {
~~ ^ ~~~~~~
/pytorch/c10/util/typeid.h:90:13: note: candidate function not viable: no known conversion from 'KeyValuePair' (aka 'key_value_pair<int, float>') to 'caffe2::TypeIdentifier
' for 1st argument
inline bool operator<(TypeIdentifier lhs, TypeIdentifier rhs) {
^
/opt/rocm/rocprim/include/rocprim/device/../types/tuple.hpp:894:6: note: candidate template ignored: could not match 'tuple' against 'key_value_pair'
bool operator<(const tuple<TTypes...>& lhs, const tuple<UTypes...>& rhs)
^
In file included from /pytorch/caffe2/operators/hip/rmac_regions_op.hip:2:
In file included from /opt/rocm/hipcub/include/hipcub/hipcub.hpp:37:
In file included from /opt/rocm/hipcub/include/hipcub/rocprim/hipcub.hpp:38:
/opt/rocm/hipcub/include/hipcub/rocprim/device/../thread/thread_operators.hpp:101:18: error: invalid operands to binary expression ('const rocprim::key_value_pair<int, floa
t>' and 'const rocprim::key_value_pair<int, float>')
return a < b ? a : b;
~ ^ ~
/opt/rocm/rocprim/include/rocprim/warp/detail/warp_reduce_shuffle.hpp:61:22: note: in instantiation of function template specialization 'hipcub::Min::operator()<rocprim::ke
y_value_pair<int, float> >' requested here
output = reduce_op(output, value);
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:218:22: note: in instantiation of function template specialization 'rocprim::detail::warp_reduce
_shuffle<rocprim::key_value_pair<int, float>, 64, false>::reducehipcub::Min' requested here
WarpReduce().reduce(
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:174:9: note: in instantiation of function template specialization 'rocprim::detail::block_reduce
_warp_reduce<rocprim::key_value_pair<int, float>, 128>::warp_reduce<false, rocprim::detail::warp_reduce_shuffle<rocprim::key_value_pair<int, float>, 64, false>, hipcub::Min

' requested here
warp_reduce<!block_size_is_warp_multiple_, warp_reduce_input_type>(
^
/opt/rocm/rocprim/include/rocprim/block/detail/block_reduce_warp_reduce.hpp:81:15: note: in instantiation of function template specialization 'rocprim::detail::block_reduce
_warp_reduce<rocprim::key_value_pair<int, float>, 128>::reduce_implhipcub::Min' requested here
this->reduce_impl(

missing constant iterator

PyTorch requires the thrust::constant_iterator and thrust::thrust::make_constant_iterator API which we seem to be missing from rocThrust currently.

BEGIN_NS_THRUST does not name a type

Seems need to add "#include <thrust/system/cuda/config.h>" header file into "thrust/system/hip/detail/error.inl" or it will throw error as following.

In file included from /opt/rocm/include/thrust/system/hip/error.h:183:0,
/opt/rocm/include/thrust/system/hip/detail/error.inl:23:1: error: ‘BEGIN_NS_THRUST’ does not name a type
 BEGIN_NS_THRUST
 ^~~~~~~~~~~~~~~
/opt/rocm/include/thrust/system/hip/detail/error.inl:81:1: error: ‘END_NS_THRUST’ does not name a type
 END_NS_THRUST
 ^~~~~~~~~~~~~

Please let me know if any of my configuration is not correct, thanks :)

Compilation issue from rocthrust library on ROCm-3.5.

Hi all,
I have successfully build and tested my application with HCC(ROCm-3.3). Now, I am trying to build my app with hip-clang(ROCm-3.5) as HCC is deprecated but, I am getting compilation issues because of the rocThrust library. The thrust device system is getting selected as CUDA but it should be HIP.
I am attaching two files for further details.

I would like to know, how can i resolve this issue as i am unable to do so. If some more details are required please let me know.

Best regards,
Saurabh.

Can't use utility.h in thrust/type_traits/integer_sequence.h

Hi everyone,

I am new to Rocm and just installed rocThrust. When I tried to compiled my hip cpp code using hipcc (rocm 5.4.0). I got

rocThrust/include/thrust/type_traits/integer_sequence.h:61:31: error: no template named 'integer_sequence' in namespace 'std'.

This 'integer_sequence' should come from include utility.h. I used -std=c++14 tag but it persisted. When I turned to include utility.h in a test code and compiled it using g++. Things went well. But hipcc still cannot compile the test code with same errors. Is there anyone can tell me how fix this issue?

Thanks a lot!

rocThrust test needs hip/amd_detail/host_defines.h

Hi all,

I tried to compile the rocThrust test.

I tried to cmake with:

CXX=hipcc cmake -DBUILD_TEST=ON ../.

and then:

make

It shows:

/home/echo/workspace/rocThrust/thrust/../thrust/system/hip/detail/guarded_driver_types.h:50:10: fatal error: 'hip/amd_detail/host_defines.h' file not found
#include <hip/amd_detail/host_defines.h>

However, under ROCm directory, I do not find hip/amd_detail/host_defines.h, instead I found /opt/rocm/hip/include/hip/hcc_detail/host_defines.h.

Do I miss any configuration, installation, or redirection steps?

Other than that, I also have a build problem in #179.
Could that error introduce this problem?

Thanks!

`reduce_by_key` fails with custom reduction

I'm using reduce_by_key to implement a segmented argmin. The following piece of code fails with a access violationt with ROCm 5.6.0 (since little changed since then, I assume it's still failing), the equivalent code works fine with NVIDIA thrust. I haven't been able to debug the corresponding issues, since I don't have access to a system with sufficient driver support for rocgdb. My educated guess is that the reduction is being evaluated for invalid/out-of-bounds argument values, which leads to out-of-bounds accesses to degree

#include <thrust/device_vector.h>
#include <thrust/reduce.h>

template <typename IndexType>
struct node_min_degree_reduction {
    __device__ __forceinline__ IndexType operator()(IndexType u, IndexType v)
    {
        return thrust::make_pair(degree[u], u) < thrust::make_pair(degree[v], v)
                   ? u
                   : v;
    }

    const IndexType* degree;
};

int main() {
        using IndexType = int;
        thrust::device_vector<int> k(1000, 0);
        thrust::device_vector<int> v{1000, 1};
        thrust::device_vector<int> res1(1);
        thrust::device_vector<int> res2(1);
        thrust::device_vector<int> degrees(1000, 1);
        thrust::reduce_by_key(k.begin(), k.end(), v.begin(), res1.begin(), res2.begin(), thrust::equal_to<IndexType>{},
                          node_min_degree_reduction<IndexType>{degrees.data().get()});

}

correct const modifiers for thrust::tabulate

const modifiers for thrust::tabulate are inconsistent w/ the API expectation from Thrust:

Compile time error for PyTorch:
In file included from /pytorch/aten/src/ATen/native/hip/RangeFactories.hip:9:
In file included from /opt/rocm/include/thrust/sequence.h:295:
In file included from /opt/rocm/include/thrust/detail/sequence.inl:26:
In file included from /opt/rocm/include/thrust/system/detail/generic/sequence.h:63:
In file included from /opt/rocm/include/thrust/system/detail/generic/sequence.inl:20:
In file included from /opt/rocm/include/thrust/tabulate.h:128:
In file included from /opt/rocm/include/thrust/detail/tabulate.inl:22:
In file included from /opt/rocm/include/thrust/system/detail/adl/tabulate.h:44:
/opt/rocm/include/thrust/system/hip/detail/tabulate.h:55:62: error: no matching function for call to object of type 'const at::native::LinspaceOp<double, double>'
[tabulate_op](size_type i) { return tabulate_op(i); });
^~~~~~~~~~~
/opt/rocm/include/thrust/detail/tabulate.inl:37:10: note: in instantiation of function template specialization 'thrust::hip_rocprim::tabulate<thrust::hip_rocprim::tag, thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
return tabulate(thrust::detail::derived_cast(thrust::detail::strip_const(exec)), first, last, unary_op);
^
/opt/rocm/include/thrust/detail/tabulate.inl:52:18: note: in instantiation of function template specialization 'thrust::tabulate<thrust::hip_rocprim::tag, thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
return thrust::tabulate(select_system(system), first, last, unary_op);
^
/pytorch/aten/src/ATen/native/hip/RangeFactories.hip:59:15: note: in instantiation of function template specialization 'thrust::tabulate<thrust::device_ptr, at::native::LinspaceOp<double, double> >' requested here
thrust::tabulate(data_, data_ + steps, linspace_method);
^
/pytorch/aten/src/ATen/native/hip/RangeFactories.hip:18:32: note: candidate function not viable: 'this' argument has type 'const at::native::LinspaceOp<double, double>', but method is not marked const
device forceinline T operator()(ptrdiff_t index) {
^

Missing algorithms

The following algorithms appear to be missing, even though they are all part of the so called ParallelSTL set of standard algorithms:

  1. adjacent_find
  2. find_end
  3. find_first_of
  4. inplace_merge
  5. is_heap
  6. is_heap_until
  7. lexicographical_compare
  8. nth_element
  9. partial_sort
  10. partial_sort_copy
  11. rotate
  12. rotate_copy
  13. search
  14. search_n
  15. shift_left
  16. shift_right
  17. uninitialized_default_construct
  18. uninitialized_default_construct_n
  19. uninitialized_move
  20. uninitialized_move_n
  21. uninitialized_value_construct
  22. uninitialized_value_construct_n

Whilst these are not currently exposed in thrust itself, it might be opportune to seize the initiative, as many of the above are both useful and used. Thank you.

partition_copy vs copy_if/remove_copy_if

Hi,

I was looking that the performance of the thrust::partition_copy and found it quite slow compared to what I believe is a similar operation, that is, thrust::copy_if and thrust::remove_copy_if. In fact, partition_copy seems x2 slower.

Hardware description
GPU MI100:amdgcn-amd-amdhsa--gfx908:sramecc+:xnack-
CPU AMD EPYC 7542
Software version
Distribution Redhat
ROCm 4.5.0 to 5.0.2

The reproducer is a bit verbose but the code is straight forward:

#include <assert.h>
#include <thrust/device_vector.h>
#include <thrust/execution_policy.h>
#include <thrust/functional.h>
#include <thrust/host_vector.h>
#include <thrust/random.h>
#include <thrust/sort.h>

#include <ctime>
#include <iterator>


template <typename Duration = std::chrono::nanoseconds>
class StopWatch {
public:
    /// Nanoseconds by default
    ///
    using duration   = Duration;
    using time_point = std::chrono::steady_clock::time_point;

    static_assert(std::chrono::steady_clock::is_steady, "Only steady clocks (the ones that only go forward) !");

public:
    /// Start the StopWatch
    ///
    StopWatch();

    /// Return the Return time difference (duration) between the last Lap() or Reset() or object construction.
    /// Do not call Elapsed() and then Reset(), just call Lap() instead.
    ///
    duration Elapsed() const;

    /// Same as lap but do not return the time difference (duration).
    /// Do not call Elapsed() and then Reset(), just call Lap() instead.
    ///
    void Reset();

protected:
    time_point the_last_point_in_time_;
};

////////////////////////////////////////////////////////////////////////
// StopWatch methods definition
////////////////////////////////////////////////////////////////////////

template <typename Duration>
StopWatch<Duration>::StopWatch()
    : the_last_point_in_time_{std::chrono::steady_clock::now()} {
    // EMPTY
}

template <typename Duration>
typename StopWatch<Duration>::duration
StopWatch<Duration>::Elapsed() const {
    // The cast is a noop if std::chrono::steady_clock::duration "is same" duration. For other duration, I dunno.
    return std::chrono::duration_cast<duration>(std::chrono::steady_clock::now() - the_last_point_in_time_);
}

template <typename Duration>
void StopWatch<Duration>::Reset() {
    the_last_point_in_time_ = std::chrono::steady_clock::now();
}

struct KeyInfZero {
    template <typename Tuple>
    __host__ __device__ constexpr bool
    operator()(const Tuple& a_tuple) const {
        static_cast<void>(a_tuple);
        return thrust::get<0>(a_tuple) < 0;
    }
};

template <typename T>
void initialize_keys(thrust::device_vector<T>& keys) {
    thrust::default_random_engine         rng;
    thrust::uniform_int_distribution<int> dist(0, keys.size());

    thrust::host_vector<T> h_keys(keys.size());

    for(size_t i = 0; i < h_keys.size(); i++) {
        h_keys[i] = dist(rng);
    }

    keys = h_keys;
}

int main(void) {
    static constexpr size_t N = 10'000'000; // / 10;

    {
        thrust::device_vector<short>  values0(N);
        thrust::device_vector<double> x(N);
        thrust::device_vector<double> y(N);
        thrust::device_vector<double> z(N);
        thrust::device_vector<double> mx(N);
        thrust::device_vector<double> my(N);
        thrust::device_vector<double> mz(N);
        thrust::device_vector<double> c(N);
        thrust::device_vector<double> w(N);

        thrust::device_vector<short>  _values0(N);
        thrust::device_vector<double> _x(N);
        thrust::device_vector<double> _y(N);
        thrust::device_vector<double> _z(N);
        thrust::device_vector<double> _mx(N);
        thrust::device_vector<double> _my(N);
        thrust::device_vector<double> _mz(N);
        thrust::device_vector<double> _c(N);
        thrust::device_vector<double> _w(N);

        initialize_keys(values0);

        // thrust::sort(std::begin(values0), std::end(values0)); // Sorting the keys does not improve the partitioning speed

        auto input_iterator = thrust::make_zip_iterator(thrust::make_tuple(std::begin(values0),
                                                                           std::begin(x),
                                                                           std::begin(y),
                                                                           std::begin(z),
                                                                           std::begin(mx),
                                                                           std::begin(my),
                                                                           std::begin(mz),
                                                                           std::begin(c),
                                                                           std::begin(w)));

        auto output_iterator_true = thrust::make_zip_iterator(thrust::make_tuple(std::begin(_values0),
                                                                                 std::begin(_x),
                                                                                 std::begin(_y),
                                                                                 std::begin(_z),
                                                                                 std::begin(_mx),
                                                                                 std::begin(_my),
                                                                                 std::begin(_mz),
                                                                                 std::begin(_c),
                                                                                 std::begin(_w)));

        // reverse iterators have no overhead in this case
        auto output_iterator_false = thrust::make_zip_iterator(thrust::make_tuple(thrust::make_reverse_iterator(std::end(_values0)),
                                                                                  thrust::make_reverse_iterator(std::end(_x)),
                                                                                  thrust::make_reverse_iterator(std::end(_y)),
                                                                                  thrust::make_reverse_iterator(std::end(_z)),
                                                                                  thrust::make_reverse_iterator(std::end(_mx)),
                                                                                  thrust::make_reverse_iterator(std::end(_my)),
                                                                                  thrust::make_reverse_iterator(std::end(_mz)),
                                                                                  thrust::make_reverse_iterator(std::end(_c)),
                                                                                  thrust::make_reverse_iterator(std::end(_w))));

        StopWatch<> t;

        thrust::partition_copy(input_iterator, input_iterator + N, output_iterator_true, output_iterator_false, KeyInfZero{});

        auto duration = 1e-6 * t.Elapsed().count();
        std::cout << "partition_copy: " << duration << " ms" << std::endl;
        assert(thrust::is_partitioned(output_iterator_true, output_iterator_true + N, KeyInfZero{}));
    }


    {
        thrust::device_vector<short>  values0(N);
        thrust::device_vector<double> x(N);
        thrust::device_vector<double> y(N);
        thrust::device_vector<double> z(N);
        thrust::device_vector<double> mx(N);
        thrust::device_vector<double> my(N);
        thrust::device_vector<double> mz(N);
        thrust::device_vector<double> c(N);
        thrust::device_vector<double> w(N);

        thrust::device_vector<short>  _values0(N);
        thrust::device_vector<double> _x(N);
        thrust::device_vector<double> _y(N);
        thrust::device_vector<double> _z(N);
        thrust::device_vector<double> _mx(N);
        thrust::device_vector<double> _my(N);
        thrust::device_vector<double> _mz(N);
        thrust::device_vector<double> _c(N);
        thrust::device_vector<double> _w(N);

        initialize_keys(values0);

        // thrust::sort(std::begin(values0), std::end(values0)); // Sorting the keys does not improve the partitioning speed

        auto input_iterator = thrust::make_zip_iterator(thrust::make_tuple(std::begin(values0),
                                                                           std::begin(x),
                                                                           std::begin(y),
                                                                           std::begin(z),
                                                                           std::begin(mx),
                                                                           std::begin(my),
                                                                           std::begin(mz),
                                                                           std::begin(c),
                                                                           std::begin(w)));

        auto output_iterator_true = thrust::make_zip_iterator(thrust::make_tuple(std::begin(_values0),
                                                                                 std::begin(_x),
                                                                                 std::begin(_y),
                                                                                 std::begin(_z),
                                                                                 std::begin(_mx),
                                                                                 std::begin(_my),
                                                                                 std::begin(_mz),
                                                                                 std::begin(_c),
                                                                                 std::begin(_w)));

        // reverse iterators have no overhead in this case
        auto output_iterator_false = thrust::make_zip_iterator(thrust::make_tuple(thrust::make_reverse_iterator(std::end(_values0)),
                                                                                  thrust::make_reverse_iterator(std::end(_x)),
                                                                                  thrust::make_reverse_iterator(std::end(_y)),
                                                                                  thrust::make_reverse_iterator(std::end(_z)),
                                                                                  thrust::make_reverse_iterator(std::end(_mx)),
                                                                                  thrust::make_reverse_iterator(std::end(_my)),
                                                                                  thrust::make_reverse_iterator(std::end(_mz)),
                                                                                  thrust::make_reverse_iterator(std::end(_c)),
                                                                                  thrust::make_reverse_iterator(std::end(_w))));

        StopWatch<> t;

        thrust::copy_if(input_iterator, input_iterator + N, output_iterator_true, KeyInfZero{});
        thrust::remove_copy_if(input_iterator, input_iterator + N, output_iterator_false, KeyInfZero{});

        auto duration = 1e-6 * t.Elapsed().count();
        std::cout << "partition_copy with copy_if: " << duration << " ms" << std::endl;
        assert(thrust::is_partitioned(output_iterator_true, output_iterator_true + N, KeyInfZero{}));
    }
    return 0;
}

Compilation error when using non-hipcc compiler

Here is a reproducer:

test_header.h
#include<thrust/complex.h>

test.cc
#include"test_header.h"

void main()
{}

If I compile test.cc with hipcc, then everything works, but if with g++, then I am getting the following errors:

In file included from /opt/rocm-4.3.1/include/thrust/system/cuda/detail/execution_policy.h:33,
                 from /opt/rocm-4.3.1/include/thrust/iterator/detail/device_system_tag.h:23,
                 from /opt/rocm-4.3.1/include/thrust/iterator/iterator_traits.h:111,
                 from /opt/rocm-4.3.1/include/thrust/detail/type_traits/pointer_traits.h:23,
                 from /opt/rocm-4.3.1/include/thrust/type_traits/is_contiguous_iterator.h:27,
                 from /opt/rocm-4.3.1/include/thrust/type_traits/is_trivially_relocatable.h:19,
                 from /opt/rocm-4.3.1/include/thrust/detail/complex/complex.inl:20,
                 from /opt/rocm-4.3.1/include/thrust/complex.h:1032,
                 from complex_test.cc:2:
/opt/rocm-4.3.1/include/thrust/system/cuda/config.h:76:10: fatal error: cub/util_namespace.cuh: No such file or directory
 #include <cub/util_namespace.cuh> // This includes <cub/version.cuh> in newer releases.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.