Comments (3)
256 should be optimal for the default case, but @nolmoonen will review.
from rocthrust.
Hi @cwsmith, I agree with Colin here: a block size of 256 is generally a good choice for AMD cards. It would be helpful to know why you would want to change it.
As it stands, it is not possible to change the kernel configuration for parallel_for
with a custom policy (or in any other way). rocThrust must maintain compatiblity with Thrust, which also does not offer this functionality.
Perhaps you would want to have a look at rocPRIM's device_transform
which does the same thing as parallel_for
, but does take in a kernel configuration as parameter.
from rocthrust.
Hello @doctorcolinsmith and @nolmoonen.
I was looking at register usage for some of our kernels and wanted to experiment with different block sizes.
Thank you for the info on the default [roc]Trust size and the link to the rocPRIM API.
from rocthrust.
Related Issues (20)
- Feature request: Add support for NAVI22 and NAVI23 i.e. gfx1031 and gfx1032 HOT 1
- Multiple errors building tests HOT 1
- More -Wunused-result warnings HOT 2
- Compiling rocThrust with a local test rocPRIM installation for target -1031? HOT 1
- Fixing compile error: no member named 'init_offset_scan_state_kernel' ... ? HOT 3
- How can I set the path etc. to an alternative rocThrust installation to be used by the compiler? HOT 3
- looks like thrust::copy takes too much time. HOT 2
- question about thrust::cuda::par HOT 2
- Slow sort on gfx1036 with custom operator on ROCm 5.6.0 HOT 1
- Missing algorithms
- [Changelog] link not set on Requirements page HOT 2
- rocThrust requirements seem to contradict user guide
- Can't use utility.h in thrust/type_traits/integer_sequence.h HOT 2
- `reduce_by_key` fails with custom reduction HOT 2
- reference to __host__ function 'free'/'malloc' in __host__ __device__ function HOT 4
- [Feature]: Use `AnyNewerVersion` instead of `SameMajorVersion` for CMake compatibility HOT 2
- [Issue]: optional.h uses non-member function with `->` HOT 4
- How to include a custom header library version? HOT 9
- [Documentation]: `thrust::hip::par_nosync` HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rocthrust.