Comments (7)
Good catch. Can you provide a patch?
from pyopencl.
Works for me on a one-dimensional arrays, but I can't run all the tests provided, because on my laptop I have only two buggy opencl implementations - pocl and beignet. It anyway crashes with as without my changes.
diff --git a/pyopencl/array.py b/pyopencl/array.py
index 71ce3d8..5f4cddd 100644
--- a/pyopencl/array.py
+++ b/pyopencl/array.py
@@ -1998,19 +1998,22 @@ def multi_put(arrays, dest_indices, dest_shape=None, out=None, queue=None,
chunk_size = _builtin_min(vec_count, 10)
- def make_func_for_chunk_size(chunk_size):
+ vals_count = arrays[0].size
+
+ def make_func_for_chunk_size(chunk_size, vc=0):
knl = elementwise.get_put_kernel(
context,
- a_dtype, dest_indices.dtype, vec_count=chunk_size)
+ a_dtype, dest_indices.dtype, vec_count=chunk_size, vals_count=vc)
return knl
- knl = make_func_for_chunk_size(chunk_size)
+ knl = make_func_for_chunk_size(chunk_size, vals_count)
for start_i in range(0, len(arrays), chunk_size):
chunk_slice = slice(start_i, start_i+chunk_size)
+ vals_count = arrays[start_i].size
if start_i + chunk_size > vec_count:
- knl = make_func_for_chunk_size(vec_count-start_i)
+ knl = make_func_for_chunk_size(vec_count-start_i, vals_count)
gs, ls = dest_indices.get_sizes(queue,
knl.get_work_group_info(
diff --git a/pyopencl/elementwise.py b/pyopencl/elementwise.py
index 398936f..a4973fd 100644
--- a/pyopencl/elementwise.py
+++ b/pyopencl/elementwise.py
@@ -413,7 +413,7 @@ def get_take_put_kernel(context, dtype, idx_dtype, with_offsets, vec_count=1):
@context_dependent_memoize
-def get_put_kernel(context, dtype, idx_dtype, vec_count=1):
+def get_put_kernel(context, dtype, idx_dtype, vec_count=1, vals_count=0):
ctx = {
"idx_tp": dtype_to_ctype(idx_dtype),
"tp": dtype_to_ctype(dtype),
@@ -429,9 +429,12 @@ def get_put_kernel(context, dtype, idx_dtype, vec_count=1):
for i in range(vec_count)
]
+ idxs = ('i','0','i%'+str(vals_count))
+ idx = idxs[min(vals_count, 2)]
+
body = (
"%(idx_tp)s dest_idx = gmem_dest_idx[i];\n" % ctx
- + "\n".join("dest%d[dest_idx] = src%d[i];" % (i, i)
+ + "\n".join("dest%d[dest_idx] = src%d[%s];" % (i, i, idx)
for i in range(vec_count)))
return get_elwise_kernel(context, args, body, name="put")
from pyopencl.
I'm not quite happy with this, for two reasons:
- The code is a little wasteful in terms of read bandwidth. Broadcasting the scalar value to all threads through a kernel argument would likely cut bandwidth usage.
- The code is a little hard to decipher, with the computed index into a pre-constructed tuple. I'd prefer somewhat plainer (if marginally slower) if-then code. I'm also not convinced that "modulo mode" is sufficiently general to merit being included in a general-purpose array toolkit.
from pyopencl.
Broadcasting the scalar value to all threads through a kernel argument would likely cut bandwidth usage.
Agree with that. But hardcoding value into kernel needs recompilation each time. 2-step broadcasting over local data may be an solution. Is there for now any way to do this with pyopencl? If isn't, any extension for ElementwiseKernel to work with array and single value would be also helpful.
from pyopencl.
Passing a kernel argument seems like a good solution--you do have to bake the type into the kernel (but you have to do that anyway), but the can be provided at runtime.
from pyopencl.
@inducer Currently working on a solution to this. If we don't want to use "modulo mode" in the general case, does it make sense to throw an Exception if the indices array and the source array have different lengths (of course excluding the case where the source array has length == 1
, in which case we fill all indices with the value)?. Behavior for a case with say, 2 values and 3 indices, would be undefined without "modulo mode" and could potentially seg fault or write garbage memory to the destination array (as it does in my build). However, I believe adding an Exception would not be backwards compatible.
from pyopencl.
Upon reconsidering this, I'm OK with (and will take patches for) either (a) matching numpy
behavior exactly (but doing the broadcast via kernel arguments) or (b) only taking care of the same-size case and throwing an error in all other cases.
from pyopencl.
Related Issues (20)
- Crash in test_slice on Intel CL HOT 7
- abrupt in code HOT 3
- SVM `bind_to_queue` and `unbind_queue` cannot be safely used
- Array allocation failure HOT 2
- TypeError: __class__ assignment: 'KernelWithCustomEnqueue' object layout differs from 'pyopencl._cl.Kernel' HOT 2
- New np.isscalar checks in array arithmetic break operations with unit-length arrays HOT 1
- Array: check that queues match
- source tarball contains binary artefact _skbuild/linux-x86_64-3.11/cmake-install/pyopencl/_cl.cpython-311-x86_64-linux-gnu.so HOT 1
- test_clmath.py::test_fmod fails on i386 using pocl 3.1 built with llvm 15 HOT 2
- Disable kernel caching from within pyopencl code HOT 1
- Expose create_buffer_gc in a header HOT 4
- Binary operations with Arrays of different memory layout HOT 2
- Windows Intel CL Github CI fails HOT 2
- Fails to build with OpenCL 3.0 headers HOT 3
- UHD Graphics 600 | Calling kernel + enqueue_copy more than once, results in OUT_OF_RESOURCES error or freeze HOT 4
- Compatibility with `numpy2` HOT 3
- move away from deprecated appdirs
- `build program` times increasing with rank count on Mac when caching is enabled HOT 7
- Pocl Mac crashing again HOT 4
- `CMakeLists.txt` file missing in PyPI source dist HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyopencl.