rocm / hipcub Goto Github PK

Reusable software components for ROCm developers

Home Page: https://rocm.docs.amd.com/projects/hipCUB/en/latest/

License: Other

CMake 2.27% C++ 95.89% Shell 0.55% Groovy 0.26% Python 1.02%

hipcub's Introduction

hipCUB

hipCUB is a thin wrapper library on top of rocPRIM or CUB. You can use it to port a CUB project into HIP so you can use AMD hardware (and ROCm software).

In the ROCm environment, hipCUB uses the rocPRIM library as the backend. On CUDA platforms, it uses CUB as the backend.

Documentation

Documentation for hipCUB is available at https://rocm.docs.amd.com/projects/hipCUB/en/latest/.

To build our documentation locally, run the following code:

# Go to the hipCUB docs directory
cd hipCUB; cd docs

# Install required pip packages
python3 -m pip install -r .sphinx/requirements.txt

# Build the documentation
python3 -m sphinx -T -E -b html -d _build/doctrees -D language=en . _build/html

# For e.g. serve the HTML docs locally
cd _build/html
python3 -m http.server

Requirements

Git
CMake (3.16 or later)
For AMD GPUs:
- AMD ROCm software (1.8.0 or later)
  - The HIP-clang compiler (you must, set this as the C++ compiler for ROCm)
- The rocPRIM library
  - Automatically downloaded and built by the CMake script
  - Requires CMake 3.16.9 or later
For NVIDIA GPUs:
- CUDA Toolkit
- CUB library
  - Automatically downloaded and built by the CMake script
  - Requires CMake 3.15.0 or later
Python 3.6 or higher (for HIP on Windows only; this is only required for install scripts)
Visual Studio 2019 with Clang support (HIP on Windows only)
Strawberry Perl (HIP on Windows only)

Optional:

GoogleTest and Google Benchmark are automatically downloaded and built by the CMake script.

Build and install

To build and install hipCub, run the following code:

git clone https://github.com/ROCm/hipCUB.git

# Go to hipCUB directory, create and go to the build directory.
cd hipCUB; mkdir build; cd build

# Configure hipCUB, setup options for your system.
# Build options:
#   BUILD_TEST - OFF by default,
#   BUILD_BENCHMARK - OFF by default.
#   DEPENDENCIES_FORCE_DOWNLOAD - OFF by default and at ON the dependencies will be downloaded to build folder,
#
# ! IMPORTANT !
# Set C++ compiler to HIP-aware clang. You can do it by adding 'CXX=<path-to-compiler>'
# before 'cmake' or setting cmake option 'CMAKE_CXX_COMPILER' to path to the compiler.
#
[CXX=hipcc] cmake ../. # or cmake-gui ../.

# To configure hipCUB for Nvidia platforms, 'CXX=<path-to-nvcc>', `CXX=nvcc` or omitting the flag
# entirely before 'cmake' is sufficient
[CXX=nvcc] cmake -DBUILD_TEST=ON ../. # or cmake-gui ../.
# or
cmake -DBUILD_TEST=ON ../. # or cmake-gui ../.
# or to build benchmarks
cmake -DBUILD_BENCHMARK=ON ../.

# Build
make -j4

# Optionally, run tests if they're enabled.
ctest --output-on-failure

# Package
make package

# Install
[sudo] make install

HIP on Windows

Initial support for HIP on Windows is available. You can install it using the provided rmake.py Python script:

git clone https://github.com/ROCm/hipCUB.git
cd hipCUB

# the -i option will install rocPRIM to C:\hipSDK by default
python rmake.py -i

# the -c option will build all clients including unit tests
python rmake.py -c

Using hipCUB

To use hipCUB in a CMake project, we recommended using the package configuration files.

# On ROCm hipCUB requires rocPRIM
find_package(rocprim REQUIRED CONFIG PATHS "/opt/rocm/rocprim")

# "/opt/rocm" - default install prefix
find_package(hipcub REQUIRED CONFIG PATHS "/opt/rocm/hipcub")

...
# On ROCm: includes hipCUB headers and roc::rocprim_hip target
# On CUDA: includes only hipCUB headers, user has to include CUB directory
target_link_libraries(<your_target> hip::hipcub)

Include only the main header file:

#include <hipcub/hipcub.hpp>

Depending on your current HIP platform, hipCUB includes CUB or rocPRIM headers.

Running unit tests

# Go to hipCUB build directory
cd hipCUB; cd build

# To run all tests
ctest

# To run unit tests for hipCUB
./test/hipcub/<unit-test-name>

Using custom seeds for the tests

Go to the hipCUB/test/hipcub/test_seed.hpp file.

//(1)
static constexpr int random_seeds_count = 10;

//(2)
static constexpr unsigned int seeds [] = {0, 2, 10, 1000};

//(3)
static constexpr size_t seed_size = sizeof(seeds) / sizeof(seeds[0]);

(1) Defines a constant that sets how many passes are performed over the tests with runtime-generated seeds. Modify at will.

(2) Defines the user-generated seeds. Each of the elements of the array are used as seeds for all tests. Modify at will. If no static seeds are desired, leave the array empty.

static constexpr unsigned int seeds [] = {};

(3) Never modified this line.

Running benchmarks

# Go to hipCUB build directory
cd hipCUB; cd build

# To run benchmark for warp functions:
# Further option can be found using --help
# [] Fields are optional
./benchmark/benchmark_warp_<function_name> [--size <size>] [--trials <trials>]

# To run benchmark for block functions:
# Further option can be found using --help
# [] Fields are optional
./benchmark/benchmark_block_<function_name> [--size <size>] [--trials <trials>]

# To run benchmark for device functions:
# Further option can be found using --help
# [] Fields are optional
./benchmark/benchmark_device_<function_name> [--size <size>] [--trials <trials>]

Support

Bugs and feature requests can be reported through the GitHub issue tracker.

Contributing

Contributions are most welcome! Learn more at CONTRIBUTING.

hipcub's People

Contributors

Stargazers

Watchers

Forkers

amdkila saadrahim skyreflectedinmirrors aaronenyeshi yoichiyoshida streamhsa eidenyoshida pruthvistony jglaser glotzerlab ansurya nelsonc-amd stanleytsang-amd cjatin cgmb lawruble13 raramakr rocmmathlibrariesbot pavahora voldmare nunnikri arvindcheru sergiykostrov mfep samjwu chip-spv streamhpc qelk123 milthorpe lajagapp trixirt umfranzw benson31 swraw abhimeda amd-jmacaran nguyennhudi peterjunpark amd-garydeng snektron

hipcub's Issues

allow including headers only with NVCC in cmake projects

How should hipCUB be used as a subproject inside a cmake project? We would like to include it in HOOMD for the case where a system wide install is not available and compilation is done through nvcc directly (no hipcc). The only thing that's holding us back is that hipCUB/hipcub/include/hipcub/hipcub_version.hpp does not exist and needs to be generated by the CMake system. Since for nvcc, hipcub does not depend on rocPRIM and is therefore header-only, I think the easiest solution would be to not include that file on when __HIP_PLATFORM_NVCC__ is defined. Would that be acceptable to the developers?

Otherwise, we would have include hipCUB's cmake logic into ours via add_subdirectory. However, currently it looks like hipCUB's cmake does it's own testing of the CUDA compiler and also pulls in additional dependencies (google test), which seems like an undesirable complication.

#include <cfloat> for compatability w/ CUB?

It seems certain programs (PyTorch...) rely on including cub/cub.cuh to provide access to, among others, FLT_MIN and FLT_MAX. Currently this does not work with hipCUB.

Compatibility code for hip-clang and C++14

hipCUB test build is having the same C++14 issue as rocPRIM: ROCm/rocPRIM#100

To add compatibility code for hip-clang and C++14, this code should be removed when the hip-clang project adds functionality.

Some tests fail after build

Some tests fail after building, on AMD machines:

$ CXX=hcc cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_MODULE_PATH=/home/kaveh/software/hip/cmake/  -DCMAKE_INSTALL_PREFIX=/home/kaveh/software/lib/hipCUB ..
...
  Environment variable ROCM_ROOT is set to:

    /opt/rocm/

  For compatibility, CMake is ignoring the variable.
Call Stack (most recent call first):
  CMakeLists.txt:59 (include)
This warning is for project developers.  Use -Wno-dev to suppress it.

-- ******** Summary ********
-- General:
--   System                : Linux
--   HIP ROOT              : /home/kaveh/software/hip
--   C++ compiler          : /opt/rocm/bin/hcc
--   C++ compiler version  : 9.0.0
--   CXX flags             : -Wno-unused-command-line-argument -Wall -Wextra
--   Build type            : Release
--   Install prefix        : /home/kaveh/software/lib/hipCUB
--   Device targets        : gfx803;gfx900;gfx906;gfx908
-- 
--   BUILD_TEST            : ON
-- Configuring done
-- Generating done
-- Build files have been written to: /home/kaveh/source/hipCUB/CLANG

build:

make -j8
Scanning dependencies of target test_hipcub_block_reduce
Scanning dependencies of target test_hipcub_basic
Scanning dependencies of target test_hipcub_device_radix_sort
Scanning dependencies of target test_hipcub_block_load_store
Scanning dependencies of target test_hipcub_device_histogram
Scanning dependencies of target test_hipcub_block_scan
Scanning dependencies of target test_hipcub_block_exchange
Scanning dependencies of target test_hipcub_caching_device_allocator
...
[ 95%] Built target test_hipcub_warp_scan
[ 97%] Linking CXX executable test_hipcub_device_radix_sort
clang-9: warning: -amdgpu-target argument 'gfx908' is not recognized; using gfx803 instead [-Winvalid-command-line-argument]
[ 97%] Built target test_hipcub_device_radix_sort
[100%] Linking CXX executable test_hipcub_device_segmented_radix_sort
clang-9: warning: -amdgpu-target argument 'gfx908' is not recognized; using gfx803 instead [-Winvalid-command-line-argument]
[100%] Built target test_hipcub_device_segmented_radix_sort

tests:

67% tests passed, 7 tests failed out of 21

Total Test time (real) = 225.76 sec

The following tests FAILED:
	  3 - hipcub.BlockDiscontinuity (SEGFAULT)
	  4 - hipcub.BlockExchange (SEGFAULT)
	  5 - hipcub.BlockHistogram (SEGFAULT)
	  6 - hipcub.BlockLoadStore (SEGFAULT)
	  7 - hipcub.BlockRadixSort (SEGFAULT)
	  8 - hipcub.BlockReduce (SEGFAULT)
	  9 - hipcub.BlockScan (SEGFAULT)
Errors while running CTest
make: *** [Makefile:152: test] Error 8

more info:

$ hcc --version
HCC clang version 9.0.0 (/data/jenkins_workspace/compute-rocm-rel-2.7/external/hcc-tot/clang 5c2570257a8bbd74eff632fbc60692ef61ef8ecb) (/data/jenkins_workspace/compute-rocm-rel-2.7/external/hcc-tot/compiler 9a9477021e6998100ff64d1360dcfe64f65cebe5) (based on HCC 2.7.19315-346267d-5c25702-9a94770 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm//bin

$ hipconfig 
HIP version  : 2.8.19382-e93e22fb

== hipconfig
HIP_PATH     : /home/kaveh/software/hip
HIP_PLATFORM : hcc
CPP_CONFIG   :  -D__HIP_PLATFORM_HCC__=   -I/home/kaveh/software/hip/include -I/opt/rocm/hcc/include -I/opt/rocm/hsa/include

== hcc
HSA_PATH     : /opt/rocm/hsa
HCC_HOME     : /opt/rocm/hcc
HCC clang version 9.0.0 (/data/jenkins_workspace/compute-rocm-rel-2.7/external/hcc-tot/clang 5c2570257a8bbd74eff632fbc60692ef61ef8ecb) (/data/jenkins_workspace/compute-rocm-rel-2.7/external/hcc-tot/compiler 9a9477021e6998100ff64d1360dcfe64f65cebe5) (based on HCC 2.7.19315-346267d-5c25702-9a94770 )
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /opt/rocm/hcc/bin
LLVM (http://llvm.org/):
  LLVM version 9.0.0svn
  Optimized build.
  Default target: x86_64-unknown-linux-gnu
  Host CPU: znver1

  Registered Targets:
    amdgcn - AMD GCN GPUs
    r600   - AMD GPUs HD2XXX-HD6XXX
    x86    - 32-bit X86: Pentium-Pro and above
    x86-64 - 64-bit X86: EM64T and AMD64
HCC-cxxflags :  -hc -std=c++amp -I/opt/rocm/hcc/include -I/opt/rocm/includeHCC-ldflags  :  -hc -std=c++amp -L/opt/rocm/hcc/lib -Wl,--rpath=/opt/rocm/hcc/lib -ldl -lm -lpthread -lhc_am -Wl,--whole-archive -lmcwamp -Wl,--no-whole-archive

=== Environment Variables
PATH=/home/kaveh/software/hip/bin:/usr/lib/llvm-9//bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/games:/usr/games:/opt/rocm//bin
LD_LIBRARY_PATH=/home/kaveh/software/hip/lib:/usr/lib/llvm-9//lib::/opt/rocm//lib:/home/kaveh/software/lib/rocRAND/hiprand/lib:/home/kaveh/software/lib/rocRAND/rocrand/lib

== Linux Kernel
Hostname     : hosname
Linux zam101 4.19.0-5-amd64 #1 SMP Debian 4.19.37-5+deb10u2 (2019-08-08) x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Debian
Description:	Debian GNU/Linux 10 (buster)
Release:	10
Codename:	buster

No CMAKE_CUDA_COMPILER could be found

I am on ubuntu 20.04, and I have install ROCm and HIP already.

When I tried to follow the instructions, and when I hit the line
[CXX=hipcc] cmake ../. # or cmake-gui ../.

root@python-polaris:/home/qipe/hipCUB/build# CXX=/opt/rocm/bin/hipcc cmake ../.
-- The CUDA compiler identification is unknown
CMake Error at cmake/SetupNVCC.cmake:81 (enable_language):
  No CMAKE_CUDA_COMPILER could be found.

  Tell CMake where to find the compiler by setting either the environment
  variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
  path to the compiler, or to the compiler name if it is in the PATH.
Call Stack (most recent call first):
  cmake/VerifyCompiler.cmake:39 (include)
  CMakeLists.txt:66 (include)


-- Configuring incomplete, errors occurred!
See also "/home/qipe/hipCUB/build/CMakeFiles/CMakeOutput.log".
See also "/home/qipe/hipCUB/build/CMakeFiles/CMakeError.log".

What may I be missing?

missing cub/util_allocator.cuh header

It seems hipCUB is missing the cub/util_allocator.cuh header which is used in PyTorch's Caffe2 backend.

Arguments out of order in built-in rocmPRIM DeviceScan::ExclusiveScan

edit: removed my previous description, as I've figure out it was a simple issue of the arguments being out of order:

The current call of:

        return ExclusiveScan(
            d_temp_storage, temp_storage_bytes,
            d_in, d_out, T(0), num_items, ::hipcub::Sum(),
            stream, debug_synchronous
        );

in hipcub/rocprim/device/device_scan.hpp is incorrect, as template method with the ScanOpT expects the scan operation immediately after d_out.

This results in the rather confusing error:

/opt/rocm/bin/hipcc -std=c++11 -O2 -march=native -ffast-math -DNAMD_CUDA  -D__HIP_PLATFORM_HCC__=   -I/opt/rocm/include -I/opt/rocm/hcc/include -I/opt/rocm/include/ -o obj/CudaTileListKernel.o -c `echo src/`CudaTileListKernel.cu
In file included from src/CudaTileListKernel.cu:1:
In file included from src/HIPWrapperCUB.h:42:
In file included from /opt/rocm/include/hipcub/hipcub.hpp:34:
In file included from /opt/rocm/include/hipcub/rocprim/hipcub.hpp:65:
/opt/rocm/include/hipcub/rocprim/device/device_scan.hpp:99:16: error: use of undeclared identifier 'ExclusiveScan'
        return ExclusiveScan(
               ^
src/CudaTileListKernel.cu:1246:38: note: in instantiation of function template specialization 'hipcub::DeviceScan::ExclusiveSum<int *, int *>' requested here
          cudaCheck(cub::DeviceScan::ExclusiveSum(NULL, size,
                                     ^
/opt/rocm/include/hipcub/rocprim/device/device_scan.hpp:113:16: note: must qualify identifier to find this declaration in dependent base class
    hipError_t ExclusiveScan(void *d_temp_storage,
               ^
/opt/rocm/include/hipcub/rocprim/device/device_scan.hpp:99:16: error: no matching function for call to 'ExclusiveScan'
        return ExclusiveScan(
               ^~~~~~~~~~~~~
/opt/rocm/include/hipcub/rocprim/device/device_scan.hpp:113:16: note: candidate function template not viable: no known conversion from '::hipcub::Sum' to 'int' for 7th argument
    hipError_t ExclusiveScan(void *d_temp_storage,
               ^
/opt/rocm/include/hipcub/rocprim/device/device_scan.hpp:99:16: error: no matching function for call to 'ExclusiveScan'
        return ExclusiveScan(
               ^~~~~~~~~~~~~
/opt/rocm/include/hipcub/rocprim/device/device_scan.hpp:113:16: note: candidate function template not viable: no known conversion from '::hipcub::Sum' to 'int' for 7th argument
    hipError_t ExclusiveScan(void *d_temp_storage,
               ^
3 errors generated.

Submitting a PR shortly

make fails on Nvidia machines

$ cmake -DCMAKE_BUILD_TYPE=Release -DCMAKE_MODULE_PATH=$HIP_PATH/cmake 
-DCMAKE_CXX_COMPILER=/home/k/software/cuda/9.0//bin/g++  
 -DCMAKE_INSTALL_PREFIX=/home/k/software/hipCUB ../
...
-- Found GTest: /home/k/source/hipCUB/build/gtest/lib/libgtest.so  
-- ******** Summary ********
-- General:
--   System                : Linux
--   HIP ROOT              : /home/k/software/hip
--   C++ compiler          : /home/k/software/cuda/9.0//bin/g++
--   C++ compiler version  : 6.5.0
--   CXX flags             : -Wno-unknown-pragmas -Wno-deprecated-declarations -Wall -Wextra
--   Build type            : Release
--   Install prefix        : /home/k/software/hipCUB
-- 
--   BUILD_TEST            : ON
-- Configuring done
-- Generating done
-- Build files have been written to: /home/k/source/hipCUB/build

$ make
...
[ 74%] Building NVCC (Device) object test/hipcub/CMakeFiles/test_hipcub_device_histogram.dir/test_hipcub_device_histogram_generated_test_hipcub_device_histogram.cpp.o
/usr/include/c++/6/tuple: In instantiation of ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_MoveConstructibleTuple() [with _UElements = {std::tuple<int, int, int>}; bool <anonymous> = true; _Elements = {int, int, int}]’:
/usr/include/c++/6/tuple:626:248:   required by substitution of ‘template<class ... _UElements, typename std::enable_if<(((std::_TC<(sizeof... (_UElements) == 1), int, int, int>::_NotSameTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), int, int, int>::_MoveConstructibleTuple<_UElements ...>()) && std::_TC<(1ul == sizeof... (_UElements)), int, int, int>::_ImplicitlyMoveConvertibleTuple<_UElements ...>()) && (3ul >= 1)), bool>::type <anonymous> > constexpr std::tuple< <template-parameter-1-1> >::tuple(_UElements&& ...) [with _UElements = {std::tuple<int, int, int>}; typename std::enable_if<(((std::_TC<(sizeof... (_UElements) == 1), int, int, int>::_NotSameTuple<_UElements ...>() && std::_TC<(1ul == sizeof... (_UElements)), int, int, int>::_MoveConstructibleTuple<_UElements ...>()) && std::_TC<(1ul == sizeof... (_UElements)), int, int, int>::_ImplicitlyMoveConvertibleTuple<_UElements ...>()) && (3ul >= 1)), bool>::type <anonymous> = <missing>]’
/usr/include/c++/6/tuple:1396:58:   required from ‘constexpr std::tuple<typename std::__decay_and_strip<_Elements>::__type ...> std::make_tuple(_Elements&& ...) [with _Elements = {int, int, int}]’
/home/k/source/hipCUB/test/hipcub/test_hipcub_device_histogram.cpp:49:106:   required from here
/usr/include/c++/6/tuple:483:67: error: mismatched argument pack lengths while expanding ‘std::is_constructible<_Elements, _UElements&&>’
       return __and_<is_constructible<_Elements, _UElements&&>...>::value;
                                                                   ^~~~~
/usr/include/c++/6/tuple:484:1: error: body of constexpr function ‘static constexpr bool std::_TC<<anonymous>, _Elements>::_MoveConstructibleTuple() [with _UElements = {std::tuple<int, int, int>}; bool <anonymous> = true; _Elements = {int, int, int}]’ not a return-statement
     }
 ^
 ...

Additional info:

$ git log -1
commit 325a9e61701d9b729a4585cce9c5d804f1fb368f (HEAD -> master, origin/master)
Merge: 5f73bda 218af7e
Author: Yoichi Yoshida <[email protected]>
Date:   Tue Jul 30 13:38:02 2019 -0600

    Merge pull request #37 from yoichiyoshida/gfx908
    
    adding gfx908 build target

$ hipconfig 
HIP version  : 1.5.19356-29d09eb2

== hipconfig
HIP_PATH     : /home/k/software/hip
HIP_PLATFORM : nvcc
CPP_CONFIG   :  -D__HIP_PLATFORM_NVCC__=  -I/home/k/software/hip/include -I/home/k/software/cuda/9.0//include

== nvcc
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

=== Environment Variables
PATH=/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/bin/intel64:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/bin:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin:/home/k/software/intel/debugger_2019/gdb/intel64/bin:/home/k/software/pgi/18.10/linux86-64/18.10/bin:/home/k/software/cuda/9.0//bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin:/home/k/software/llvm/6.0.1//bin:/home/k/software/hip/bin
CUDAROOT=/home/k/software/cuda/9.0/
LD_LIBRARY_PATH=/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/libfabric/lib:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib/release:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/lib:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/ipp/lib/intel64:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/compiler/lib/intel64_lin:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/mkl/lib/intel64_lin:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/tbb/lib/intel64/gcc4.7:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/tbb/lib/intel64/gcc4.7:/home/k/software/intel/debugger_2019/libipt/intel64/lib:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/daal/lib/intel64_lin:/home/k/software/intel/compilers_and_libraries_2019.4.243/linux/daal/../tbb/lib/intel64_lin/gcc4.4:/home/k/software/cuda/9.0/lib64::/home/k/software/pgi/18.10/linux86-64/18.10/lib:/home/k/software/llvm/6.0.1//lib
HIP_PATH=/home/k/software/hip
CUDA_ROOT=/home/k/software/cuda/9.0/
CUDA_HOME=/home/k/software/cuda/9.0/
CUDA_PATH=/home/k/software/cuda/9.0/

== Linux Kernel
Hostname     : zam1283
Linux zam1283 4.18.0-25-generic #26~18.04.1-Ubuntu SMP Thu Jun 27 07:28:31 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.3 LTS
Release:	18.04
Codename:	bionic

exclusive scan not callable from device/device_scan.hpp

Error during PyTorch compilation:
/opt/rocm/hipcub/include/hipcub/rocprim/device/device_scan.hpp:113:16: note: must qualify identifier to find this declaration in dependent base class
hipError_t ExclusiveScan(void *d_temp_storage,
^
/opt/rocm/hipcub/include/hipcub/rocprim/device/device_scan.hpp:99:16: error: no matching function for call to 'ExclusiveScan'
return ExclusiveScan(
^~~~~~~~~~~~~
/opt/rocm/hipcub/include/hipcub/rocprim/device/device_scan.hpp:113:16: note: candidate function template not viable: no known conversion from '::hipcub::Sum' to 'int' for 7
th argument
hipError_t ExclusiveScan(void *d_temp_storage,
^
/opt/rocm/hipcub/include/hipcub/rocprim/device/device_scan.hpp:99:16: error: no matching function for call to 'ExclusiveScan'
return ExclusiveScan(
^~~~~~~~~~~~~
/opt/rocm/hipcub/include/hipcub/rocprim/device/device_scan.hpp:113:16: note: candidate function template not viable: no known conversion from '::hipcub::Sum' to 'int' for 7
th argument
hipError_t ExclusiveScan(void *d_temp_storage,

Type error

[Issue]: calling function min with (int, unsigned int) causing ambiguity

Problem Description

when we try to fix a compiler bug about function min by llvm/llvm-project#82956 we encountered build failure in hipCUB. Then we found the line that causes the compilation failure is

https://github.com/ROCm/hipCUB/blob/develop/hipcub/include/hipcub/backend/rocprim/device/device_spmv.hpp#L142

Basically this line is calling function min defined in clang header with (int, unsigned int). Previously the compiler only defined min(int, int). As we introduce min(unsigned, unsigned), min(long, long), min(unsigned long, unsigned long). The compiler could not choose among these candidates since there is no exact match.

We do not want to define min(int, unsigned int) because that would do implicit conversion from int to unsigned int then compare. This may cause unexpected results, e.g. min(-1, 1U) will return 1U. We want users to do explicit cast to indicate their intention. We think it is better than do silent implicit conversion.

Therefore we would like hipCUB to the line https://github.com/ROCm/hipCUB/blob/develop/hipcub/include/hipcub/backend/rocprim/device/device_spmv.hpp#L142

by modifying it to be

size_t block_size = min(num_cols, static_cast<int>(DeviceSpmv::CsrMVKernel_MaxThreads));

which will keep its original behavior, or otherwise as it suits.

Thanks.

Operating System

Ubuntu 22.04

CPU

any

GPU

AMD Radeon RX 7900 XTX

ROCm Version

ROCm 6.0.0

ROCm Component

hipCUB

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

set HIP_PLATFORM=nvcc for nvidai GPU, cmake error

[alice@prj47-rack-96 build]$ CXX=hipcc cmake -DBUILD_TEST ..
Parse error in command line argument: -DBUILD_TEST
Should be: VAR:type=value
CMake Error: No cmake script provided.
CMake Error: Problem processing arguments. Aborting.

[alice@prj47-rack-96 build]$ CXX=hipcc cmake -DBUILD_TEST=ON ..
CMake Warning (dev) at CMakeLists.txt:44 (set):
implicitly converting 'BOOLEAN' to 'STRING' type.
This warning is for project developers. Use -Wno-dev to suppress it.

-- CUB will be used as hipCUB's backend.
-- Downloading and building GTest.
-- Downloading/updating googletest
-- Configuring done
-- Generating done
-- Build files have been written to: /home/alice/test/hipCUB/build/googletest-download
Scanning dependencies of target googletest-download
[ 11%] Skipping update step for 'googletest-download'
[ 22%] Performing configure step for 'googletest-download'
CMake Error at /home/alice/test/hipCUB/build/googletest-download/googletest-download-prefix/src/googletest-download-stamp/googletest-download-configure-.cmake:49 (message):
Command failed: 1

'/home/alice/cern/nvidia-test/alice/sw/slc7_x86-64/CMake/v3.15.3-1/bin/cmake' '-DBUILD_GTEST=ON' '-DINSTALL_GTEST=ON' '-Dgtest_force_shared_crt=ON' '-DBUILD_SHARED_LIBS=ON' '-DCMAKE_INSTALL_PREFIX=/home/alice/test/hipCUB/build/gtest' '-GUnix Makefiles' '/home/alice/test/hipCUB/build/googletest-src'

Cmake issues for hipCUB

/opt/rocm/hipcub/lib/cmake/hipcub/hipcub-config.cmake contains:
set(hipcub_LIBRARIES hipcub-targets)
set(hipcub_LIBRARY hipcub-targets)
set(HIPCUB_LIBRARIES hipcub-targets)
set(HIPCUB_LIBRARY hipcub-targets)
set(hipcub_LIBRARIES hipcub-targets)
set(hipcub_LIBRARY hipcub-targets)

There is no hipcub-targets library and using the cmake config causes PyTorch compilation to fail. These sets need to be removed.

Saad: I think these issues are related to rocm-cmake. This might need a patch for that project.

macros and debug support

CubDebugExit(), CUB_MAX() and CUB_MIN() may be commonly used in CUB programs.

thread_operators.hpp misses include of "../util_type.hpp"

https://github.com/ROCmSoftwarePlatform/hipCUB/blob/master/hipcub/include/hipcub/rocprim/thread/thread_operators.hpp

WIthout said include, KeyValuePair is not defined and PyTorch compilation crashes.

HIP with hipCub fails build c++17

The following simple test case, just including the hipcub header, fails to compile with c++ std std to c++17:

#include <hipcub/hipcub.hpp>
int main(int argc, char** argv) {
    return 0;
}

compiled with

hipcc --amdgpu-target=gfx906 -o test test.hip.cpp -std=c++17

yields:

In file included from test.hip.cpp:1:
In file included from /opt/rocm/include/hipcub/hipcub.hpp:31:
In file included from /opt/rocm/include/hipcub/config.hpp:50:
In file included from /opt/rocm/include/rocprim/rocprim.hpp:33:
In file included from /opt/rocm/include/rocprim/intrinsics.hpp:29:
In file included from /opt/rocm/include/rocprim/intrinsics/thread.hpp:27:
In file included from /opt/rocm/include/rocprim/intrinsics/../detail/various.hpp:27:
In file included from /opt/rocm/include/rocprim/intrinsics/../detail/../types.hpp:32:
/opt/rocm/include/rocprim/types/tuple.hpp:386:19: error: no template named 'index_sequence' in namespace 'rocprim'; did you mean 'std::index_sequence'?
struct tuple_impl<::rocprim::index_sequence<Indices...>, Types...>
                  ^~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/8.3.0/include/g++-v8/utility:336:5: note: 'std::index_sequence' declared here
    using index_sequence = integer_sequence<size_t, _Idx...>;
    ^

and several similar errors.

WarpReduce::TempStorage always got ZERO after temp storage reduce

I have some simple codes attached here, which only need to run "bash builld.sh" in ROCM environment to reproduce this issue.

depthconv.tar.gz

The kernel code of WarpReduce::TempStorage is as following:

template <typename T>
__device__ __inline__ void CudaAtomicAddWithWarp(T* sum, T value) {
  int gbid = ((blockIdx.z * gridDim.y) + blockIdx.y) * gridDim.x + blockIdx.x;
  if(gbid == 4) CUDA_PRINT("==Before WrapReuce== gbid=%d, value=%2.1f", gbid, value);
#ifdef __HIPCC__
  typedef hipcub::WarpReduce<T> WarpReduce;
#else
  typedef cub::WarpReduce<T> WarpReduce;
#endif
  __shared__ typename WarpReduce::TempStorage temp_storage;
  value = WarpReduce(temp_storage).Sum(value);
  if(gbid == 4) CUDA_PRINT("==After WrapReuce== gbid=%d, value=%2.1f", gbid, value);
#ifdef __HIPCC__
  if (hipcub::LaneId() == 0) atomicAdd(sum, value);
#else
  if (cub::LaneId() == 0) atomicAdd(sum, value);
#endif
}

In CUDA and CUB environment I can get correct output as following:

[tid.x=<0> tid.y=<0> bid.x=<1> bid.y=<1>]: ==Before WrapReuce== gbid=4, value=1.0
[tid.x=<1> tid.y=<0> bid.x=<1> bid.y=<1>]: ==Before WrapReuce== gbid=4, value=1.0
[tid.x=<0> tid.y=<1> bid.x=<1> bid.y=<1>]: ==Before WrapReuce== gbid=4, value=1.0
[tid.x=<1> tid.y=<1> bid.x=<1> bid.y=<1>]: ==Before WrapReuce== gbid=4, value=1.0
[tid.x=<0> tid.y=<0> bid.x=<1> bid.y=<1>]: ==After WrapReuce== gbid=4, value=4.0
[tid.x=<1> tid.y=<0> bid.x=<1> bid.y=<1>]: ==After WrapReuce== gbid=4, value=3.0
[tid.x=<0> tid.y=<1> bid.x=<1> bid.y=<1>]: ==After WrapReuce== gbid=4, value=2.0
[tid.x=<1> tid.y=<1> bid.x=<1> bid.y=<1>]: ==After WrapReuce== gbid=4, value=1.0

But in ROCM and HIPCUB environment (verified on 4.0.1, 3.10, and 3.9.1), the output of value is always ZERO after wrapduce.

[tid.x=<0> tid.y=<0> bid.x=<1> bid.y=<1>]: ==Before WrapReuce== gbid=4, value=1.0
[tid.x=<1> tid.y=<0> bid.x=<1> bid.y=<1>]: ==Before WrapReuce== gbid=4, value=1.0
[tid.x=<0> tid.y=<1> bid.x=<1> bid.y=<1>]: ==Before WrapReuce== gbid=4, value=1.0
[tid.x=<1> tid.y=<1> bid.x=<1> bid.y=<1>]: ==Before WrapReuce== gbid=4, value=1.0
[tid.x=<0> tid.y=<0> bid.x=<1> bid.y=<1>]: ==After WrapReuce== gbid=4, value=0.0
[tid.x=<1> tid.y=<0> bid.x=<1> bid.y=<1>]: ==After WrapReuce== gbid=4, value=0.0
[tid.x=<0> tid.y=<1> bid.x=<1> bid.y=<1>]: ==After WrapReuce== gbid=4, value=0.0
[tid.x=<1> tid.y=<1> bid.x=<1> bid.y=<1>]: ==After WrapReuce== gbid=4, value=0.0

I also tried to change the code as following, but the result is also ZERO in HIPCUB

template <typename T>
__device__ __inline__ void CudaAtomicAddWithWarp(T* sum, T value) {
  int gbid = ((blockIdx.z * gridDim.y) + blockIdx.y) * gridDim.x + blockIdx.x;
  if(gbid == 4) CUDA_PRINT("==Before WrapReuce== gbid=%d, value=%2.1f", gbid, value);
#ifdef __HIPCC__
  typedef hipcub::WarpReduce<T, 64> WarpReduce;
#else
  typedef cub::WarpReduce<T> WarpReduce;
#endif
  __shared__ typename WarpReduce::TempStorage temp_storage;
  // value = WarpReduce(temp_storage).Sum(value);
  value = WarpReduce(temp_storage).Reduce(value, hipcub::Sum());
  if(gbid == 4) CUDA_PRINT("==After WrapReuce== gbid=%d, value=%2.1f", gbid, value);
#ifdef __HIPCC__
  if (hipcub::LaneId() == 0) atomicAdd(sum, value);
#else
  if (cub::LaneId() == 0) atomicAdd(sum, value);
#endif
}

Missing cub::DeviceMemcpy::Batched

cub::DeviceMemcpy::Batched was introduce in https://togithub.com/NVIDIA/cub/pull/359 and is missing from hipcub and rocprim

Segfault / "invalid device function" in `DeviceReduce::Sum`

I am trying to use DeviceReduce::Sum but:

with ROCm 5.4.3, I get a segfault,
with ROCm 5.6.1 and 5.7.0, I get HIP error: invalid device function on the line where actual reduction takes place.

The context here is to add an initial reduction support for AMD GPUs in Chapel (chapel-lang.org), so my actual use case is a bit complicated. However, in my system I was able to reproduce the behavior with a standalone test based on this unit test. Full code is at the end of my post.

I have:

> hipcc --version
HIP version: 5.6.31062-73ed8adfd

and compile the code with

> hipcc --std=c++14 reduceTest.cpp

I am trying to run this on an MI250x.

I can provide more details.

Reproducer

#include <iostream>
#include <vector>
#include <hip/hip_runtime.h>
#include <hip/hip_runtime_api.h>
#include <hip/hip_common.h>
#include <hipcub/hipcub.hpp>

#define HIP_CHECK(condition)         \
{                                    \
  hipError_t error = condition;    \
  if(error != hipSuccess){         \
    std::cout << "HIP error: " << hipGetErrorString(error) << " line: " << __LINE__ << std::endl; \
    exit(error); \
  } \
}

int main(int argc, char* argv[]) {
  using T = int;
  using U = int;
  int device_id = 0;
  HIP_CHECK(hipSetDevice(device_id));

  size_t size = 1000;

  bool debug_synchronous = true;

  hipStream_t stream = 0; // default

  // Generate data
  std::vector<T> input(1, (U) 0.0f);
  std::vector<U> output(1, (U) 0.0f);

  T * d_input;
  U * d_output;
  HIP_CHECK(hipMalloc(&d_input, input.size() * sizeof(T)));
  HIP_CHECK(hipMalloc(&d_output, output.size() * sizeof(U)));
  HIP_CHECK(
      hipMemcpy(
        d_input, input.data(),
        input.size() * sizeof(T),
        hipMemcpyHostToDevice
        )
      );
  HIP_CHECK(hipDeviceSynchronize());

  // Calculate expected results on host
  U expected = U(0.0f);
  for(unsigned int i = 0; i < input.size(); i++)
  {
    expected = expected + (U) input[i];
  }

  // temp storage
  size_t temp_storage_size_bytes;
  void * d_temp_storage = nullptr;
  // Get size of d_temp_storage
  HIP_CHECK(
      hipcub::DeviceReduce::Sum(
        d_temp_storage, temp_storage_size_bytes,
        d_input, d_output, input.size(),
        stream, debug_synchronous
        )
      );

  // temp_storage_size_bytes must be >0
  assert(temp_storage_size_bytes > 0U);

  // allocate temporary storage
  HIP_CHECK(hipMalloc(&d_temp_storage, temp_storage_size_bytes));
  HIP_CHECK(hipDeviceSynchronize());

  // Run
  HIP_CHECK(
      hipcub::DeviceReduce::Sum(
        d_temp_storage, temp_storage_size_bytes,
        d_input, d_output, input.size(),
        stream, debug_synchronous
        )
      );
  HIP_CHECK(hipPeekAtLastError());
  HIP_CHECK(hipDeviceSynchronize());

  // Copy output to host
  HIP_CHECK(
      hipMemcpy(
        output.data(), d_output,
        output.size() * sizeof(U),
        hipMemcpyDeviceToHost
        )
      );
  HIP_CHECK(hipDeviceSynchronize());

  hipFree(d_input);
  hipFree(d_output);
  hipFree(d_temp_storage);

}

backend/rocprim/hipcub.hpp does not include device/device_partition.hpp

This creates incompatibility with CUB.
When the code references cub::DevicePartition and only includes cub.cuh, it compiles with CUB but not hipCUB.

In this case, the code also cannot be hipified and human intervention is needed.

hipCUB build fails with 4.3

According to instruction, it was building on 4.1 but fails on 4.3:

4.3:


[root@localhost build]# cmake ..
-- The CXX compiler identification is unknown
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - failed
-- Check for working CXX compiler: /opt/rocm/hip/bin/hipcc
-- Check for working CXX compiler: /opt/rocm/hip/bin/hipcc - broken
CMake Error at /usr/share/cmake/Modules/CMakeTestCXXCompiler.cmake:59 (message):
  The C++ compiler

    "/opt/rocm/hip/bin/hipcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /root/ROCm-4.3/hipCUB/build/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/gmake -f Makefile cmTC_60632/fast && /usr/bin/gmake  -f CMakeFiles/cmTC_60632.dir/build.make CMakeFiles/cmTC_60632.dir/build
    gmake[1]: Entering directory '/root/ROCm-4.3/hipCUB/build/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_60632.dir/testCXXCompiler.cxx.o
    /opt/rocm/hip/bin/hipcc    -o CMakeFiles/cmTC_60632.dir/testCXXCompiler.cxx.o -c /root/ROCm-4.3/hipCUB/build/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    clang-13: error: cannot find ROCm device library. Provide its path via --rocm-path or --rocm-device-lib-path, or pass -nogpulib to build without ROCm device library.
    gmake[1]: *** [CMakeFiles/cmTC_60632.dir/build.make:78: CMakeFiles/cmTC_60632.dir/testCXXCompiler.cxx.o] Error 1
    gmake[1]: Leaving directory '/root/ROCm-4.3/hipCUB/build/CMakeFiles/CMakeTmp'
    gmake: *** [Makefile:127: cmTC_60632/fast] Error 2





  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:32 (project)


-- Configuring incomplete, errors occurred!
See also "/root/ROCm-4.3/hipCUB/build/CMakeFiles/CMakeOutput.log".
See also "/root/ROCm-4.3/hipCUB/build/CMakeFiles/CMakeError.log".
[root@localhost build]#

How to porting cub::BaseTraits in ROCM platform

Hi There,

I'm working on poring a deep learning framework from CUDA to HIP, and facing some codes with no idea how to change it from CUDA to HIP? Would you please kindly help to support? Thanks in advance :)

// set cub base traits in order to handle float16
namespace cub {
template <>
struct NumericTraits<paddle::platform::float16>
    : BaseTraits<FLOATING_POINT, true, false, uint16_t,
                 paddle::platform::float16> {};
}  // namespace cub

I changed the above lines of code to the following lines, while compile errors thrown as hipCUB is actually a wrapper of rocPRIM on HIP platform.

// set cub base traits in order to handle float16
namespace hipcub {
template <>
struct NumericTraits<paddle::platform::float16>
    : BaseTraits<FLOATING_POINT, true, false, uint16_t,
                 paddle::platform::float16> {};
}  // namespace cub

CuPy Issue reported to hip

ROCm/HIP#2247

rocm / hipcub Goto Github PK

hipcub's Introduction

hipCUB

Documentation

Requirements

Build and install

HIP on Windows

Using hipCUB

Running unit tests

Using custom seeds for the tests

Running benchmarks

Support

Contributing

hipcub's People

Contributors

Stargazers

Watchers

Forkers

hipcub's Issues

Problem Description

Operating System

CPU

GPU

ROCm Version

ROCm Component

Steps to Reproduce

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

Additional Information

Recommend Projects

Recommend Topics

Recommend Org