Currently, <a href="https://github.com/oneapi-src/oneDPL/blob/d442b291c86b1d44539f2e47

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Persistent working buffer for scans about onedpl HOT 4 OPEN

al42and commented on July 23, 2024

Persistent working buffer for scans

from onedpl.

Comments (4)

danhoeflinger commented on July 23, 2024

@al42and
Thank you very much for the feedback. This is something we are aware of as a potentially beneficial feature for users who are interested in improved performance at the cost of some convenience. Your post very clearly lays out this demand.

My expectation is that if oneDPL were to support something like this, it would be in the context of kernel template APIs. It matches with the mindset behind kernel templates, which is to give the user more control in the pursuit of better performance and at the cost of some generality. We are considering this feature as well as others to prioritize performance within that effort.

from onedpl.

al42and commented on July 23, 2024

Hi @danhoeflinger,

My expectation is that if oneDPL were to support something like this, it would be in the context of kernel template APIs.

If you have a clear idea of how the API will look like, could you elaborate further, please? I can't understand how a runtime pointer can be passed this way. Or are you suggesting a flag "persist the working buffer past the launch and reuse the old one if it exists"?

We are considering this feature as well as others to prioritize performance within that effort.

To be clear: this is not a priority for us (GROMACS). So far, we are operating on an array of a fixed size of 8k elements, so it's a single kernel launch without any working buffers. But that size was set arbitrary, so this problem can become pressing eventually if we go past 16k, and I decide to raise the issue proactively.

from onedpl.

danhoeflinger commented on July 23, 2024

@al42and
I don't have specific information at this time about what the API would look like, but this issue of temporary memory allocation reuse something we are considering. The "kernel template" APIs are not fixed to the C++ standard libraries parallel algorithms specification and we envision API adjustments to support functionality like this. There are multiple possible approaches to support this feature, one of these options is the addition of an extra API to query temporary space required and extra runtime parameter(s) to accept externally allocated memory.

Separate from a potential oneDPL feature, there are ways to mitigate the performance penalty from these repeated temporary allocations currently available in the oneAPI DPC++ compiler. I suggest taking a look at the SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR environment variable:
https://intel.github.io/llvm-docs/EnvironmentVariables.html#debugging-variables-for-level-zero-plugin
This environment variable allows the configuring of memory pool sizes used by the level zero USM allocator. For repeated oneDPL calls of the same size, this can help reduce the performance impact of this temporary allocation by reusing allocations from a memory pool. If you need help selecting values for this environment variable we can work with you on your specific use case, but some experimentation may be necessary.

When we have more information, we will update here. Thanks again for the feedback.

from onedpl.

al42and commented on July 23, 2024

I suggest taking a look at the SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR environment variable:
intel.github.io/llvm-docs/EnvironmentVariables.html#debugging-variables-for-level-zero-plugin

That will not work for other backends, unfortunately; and even for L0 is not a user-friendly solution.

But thanks for suggesting it as a workaround, it could definitely be helpful during the development.

from onedpl.

Persistent working buffer for scans about onedpl HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent