Giter Site home page Giter Site logo

Comments (4)

danhoeflinger avatar danhoeflinger commented on July 23, 2024

@al42and
Thank you very much for the feedback. This is something we are aware of as a potentially beneficial feature for users who are interested in improved performance at the cost of some convenience. Your post very clearly lays out this demand.

My expectation is that if oneDPL were to support something like this, it would be in the context of kernel template APIs. It matches with the mindset behind kernel templates, which is to give the user more control in the pursuit of better performance and at the cost of some generality. We are considering this feature as well as others to prioritize performance within that effort.

from onedpl.

al42and avatar al42and commented on July 23, 2024

Hi @danhoeflinger,

My expectation is that if oneDPL were to support something like this, it would be in the context of kernel template APIs.

If you have a clear idea of how the API will look like, could you elaborate further, please? I can't understand how a runtime pointer can be passed this way. Or are you suggesting a flag "persist the working buffer past the launch and reuse the old one if it exists"?

We are considering this feature as well as others to prioritize performance within that effort.

To be clear: this is not a priority for us (GROMACS). So far, we are operating on an array of a fixed size of 8k elements, so it's a single kernel launch without any working buffers. But that size was set arbitrary, so this problem can become pressing eventually if we go past 16k, and I decide to raise the issue proactively.

from onedpl.

danhoeflinger avatar danhoeflinger commented on July 23, 2024

@al42and
I don't have specific information at this time about what the API would look like, but this issue of temporary memory allocation reuse something we are considering. The "kernel template" APIs are not fixed to the C++ standard libraries parallel algorithms specification and we envision API adjustments to support functionality like this. There are multiple possible approaches to support this feature, one of these options is the addition of an extra API to query temporary space required and extra runtime parameter(s) to accept externally allocated memory.

Separate from a potential oneDPL feature, there are ways to mitigate the performance penalty from these repeated temporary allocations currently available in the oneAPI DPC++ compiler. I suggest taking a look at the SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR environment variable:
https://intel.github.io/llvm-docs/EnvironmentVariables.html#debugging-variables-for-level-zero-plugin
This environment variable allows the configuring of memory pool sizes used by the level zero USM allocator. For repeated oneDPL calls of the same size, this can help reduce the performance impact of this temporary allocation by reusing allocations from a memory pool. If you need help selecting values for this environment variable we can work with you on your specific use case, but some experimentation may be necessary.

When we have more information, we will update here. Thanks again for the feedback.

from onedpl.

al42and avatar al42and commented on July 23, 2024

I suggest taking a look at the SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR environment variable:
intel.github.io/llvm-docs/EnvironmentVariables.html#debugging-variables-for-level-zero-plugin

That will not work for other backends, unfortunately; and even for L0 is not a user-friendly solution.

But thanks for suggesting it as a workaround, it could definitely be helpful during the development.

from onedpl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.