Comments (2)
Yep, this is a trade-off that I made at the outset, for two reasons:
- First, figuring out an appropriate launch config for strided accesses would add considerable overhead to an already too-hot code path leading up to arithmetic. (cf. numarray)
- Second, the existing code generation infrastructure won't easily stretch to true n-dimensional arrays. (https://github.com/inducer/loopy was intended, among other things, to address that use case)
It's obviously embarrassing that adding different-stride arrays silently gives wrong results, and I've been meaning to at least add a (C-based, to avoid a big impact on said hot code path) check to raise an error. PRs are definitely welcome along those lines.
That said, even if this issue were fixed, eagerly evaluating array calculations is still not a winning move, so I'm not super likely to devote much more attention than adding that check, cf. https://github.com/inducer/pytato/ for a possible remedy.
from pyopencl.
Thanks for the clarification. It is clear to me that the array arithmetic is not optimal in regard with performance - but convenient to use. I consider moving my code to pytorch or JAX for this purpose.
To add a check that throws an error if memory layout is different, I ended up patching PyOpenCL
_pyopencl_array_get_broadcasted_binary_op_result_SAVED = None
def patch_pyopencl_array():
import pyopencl.array
global _pyopencl_array_get_broadcasted_binary_op_result_SAVED
if _pyopencl_array_get_broadcasted_binary_op_result_SAVED is None:
_pyopencl_array_get_broadcasted_binary_op_result_SAVED = pyopencl.array._get_broadcasted_binary_op_result
def _get_broadcasted_binary_op_result_PATCHED(obj1, obj2, cq, dtype_getter=pyopencl.array._get_common_dtype):
if obj1.strides != obj2.strides:
raise NotImplementedError('Memory layouts (strides) of arrays are not the same')
else:
return _pyopencl_array_get_broadcasted_binary_op_result_SAVED(
obj1, obj2, cq, dtype_getter
)
pyopencl.array._get_broadcasted_binary_op_result = _get_broadcasted_binary_op_result_PATCHED
Essentially, it adds a check that the strides are equal. This check is performed for add, mul and the other binary arithmetic operators, but not for the inplace operators. The inplace operators have all a explicit tests for matching shape, a little bit of refactoring would be required to move this into a common method that incorporates the additional check for same strides. Will such a PR be acceptable? Compared to the code already in place I thinks the overhead is acceptable to avoid silent failures. I am wondering if it sensible to make the check more general, similar to strides_equal
Gregor
from pyopencl.
Related Issues (20)
- Warn when using SVM whose deallocation synchronizes on different queue
- enqueue_copy does not support IMAGE2D_ARRAY HOT 4
- Possible memory leak when collecting events for profiling HOT 5
- Allow None in wait_for lists HOT 2
- Memory issue on cauculations of pyopencl.array.Array HOT 4
- Crash in test_slice on Intel CL HOT 7
- abrupt in code HOT 3
- SVM `bind_to_queue` and `unbind_queue` cannot be safely used
- Array allocation failure HOT 2
- TypeError: __class__ assignment: 'KernelWithCustomEnqueue' object layout differs from 'pyopencl._cl.Kernel' HOT 2
- New np.isscalar checks in array arithmetic break operations with unit-length arrays HOT 1
- Array: check that queues match
- source tarball contains binary artefact _skbuild/linux-x86_64-3.11/cmake-install/pyopencl/_cl.cpython-311-x86_64-linux-gnu.so HOT 1
- test_clmath.py::test_fmod fails on i386 using pocl 3.1 built with llvm 15 HOT 2
- Disable kernel caching from within pyopencl code HOT 1
- Expose create_buffer_gc in a header HOT 4
- Windows Intel CL Github CI fails HOT 2
- Fails to build with OpenCL 3.0 headers HOT 3
- UHD Graphics 600 | Calling kernel + enqueue_copy more than once, results in OUT_OF_RESOURCES error or freeze HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyopencl.