Giter Site home page Giter Site logo

Comments (2)

illuhad avatar illuhad commented on June 12, 2024

Hi!

A couple of pointers:

  • The SYCL specification does not support non-scalar reductions. E.g. You cannot reduce a 3D array along one axis to a 2D array. This is independent of span<> support. For such a numpy-like operation, you would need some higher-level library (I don't know whether this exists), as this is beyond the scope of the language, which, after all, focuses more on providing lower-level primitives to aid in the construction of such high-level libraries.
  • What you can do is have multiple reduction objects in one kernel. E.g. q.parallel_for(range, sycl::reduction(/* specifiy 1st reduction*/), sycl::reduction(/*specifiy 2nd reduction*/), [...specify more reductions if desired...], kernel);. This is an appropriate solution if you know the number of reductions at compile time.
  • That the reduction result is at the first element in the buffer independently of the offset might be a bug; however this API is non-standard anyway as the SYCL 2020 final specification has changed the API to work in terms of buffer directly, not accessor, and we have not yet updated the API. Of course, when working with buffer you won't be able to specify this anyway.
  • To solve this, I recommend using USM pointers instead of the buffer-accessor API, as this will give you much more control over where reduction results end up (you can just pass in an arbitrary pointer with arbitrary offset). Also, the USM memory model is generally more efficient compared to buffer-accessor as overheads are lower.
  • In general, be aware that the sycl::reduction support built into the language is intended as a building block, not necessarily a performance-portable algorithm solution. For example, sycl::reduction does not handle how many data elements from the input array should be reduced by a single work item (i.e. how many calls to combine() you have per work item). This can however be an extremely important tuning parameter, depending on the hardware!
  • Also, be aware that our sycl::reduction support is incomplete. As I have mentioned, it does not yet implement the SYCL 2020 final API, and it is also not supported on all compilation flows.
  • Our more high-level algorithms hipsycl::algorithms::transform_reduce() and its cousin from our parallel STL offloading support std::transform_reduce(std::execution::par_unseq) are universally supported across all compilation flows, and already handle the most important tuning parameters for you, and thus generally perform better - at least out of the box. However, the transform_reduce() API only supports a single reduction.

from adaptivecpp.

eigenraum avatar eigenraum commented on June 12, 2024

Hi illuhad,

thanks for your reply, that is very helpful to me! I will investigate the use of USM instead of buffers / accessors and also have a look at the transform_reduce high-level algorithm that you mentioned.

Concerning the algorithmic problem, the issue is that I do not know the number of reductions at compile time, but that these depend on intermediate results. Probably that means that I am best off by enqueueing each reduction separately, right?

Best regards!

from adaptivecpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.