Giter Site home page Giter Site logo

Comments (7)

inducer avatar inducer commented on May 18, 2024

Good catch. Can you provide a patch?

from pyopencl.

inferrna avatar inferrna commented on May 18, 2024

Works for me on a one-dimensional arrays, but I can't run all the tests provided, because on my laptop I have only two buggy opencl implementations - pocl and beignet. It anyway crashes with as without my changes.

diff --git a/pyopencl/array.py b/pyopencl/array.py
index 71ce3d8..5f4cddd 100644
--- a/pyopencl/array.py
+++ b/pyopencl/array.py
@@ -1998,19 +1998,22 @@ def multi_put(arrays, dest_indices, dest_shape=None, out=None, queue=None,

     chunk_size = _builtin_min(vec_count, 10)

-    def make_func_for_chunk_size(chunk_size):
+    vals_count = arrays[0].size
+
+    def make_func_for_chunk_size(chunk_size, vc=0):
         knl = elementwise.get_put_kernel(
                 context,
-                a_dtype, dest_indices.dtype, vec_count=chunk_size)
+                a_dtype, dest_indices.dtype, vec_count=chunk_size, vals_count=vc)
         return knl

-    knl = make_func_for_chunk_size(chunk_size)
+    knl = make_func_for_chunk_size(chunk_size, vals_count)

     for start_i in range(0, len(arrays), chunk_size):
         chunk_slice = slice(start_i, start_i+chunk_size)

+        vals_count = arrays[start_i].size
         if start_i + chunk_size > vec_count:
-            knl = make_func_for_chunk_size(vec_count-start_i)
+            knl = make_func_for_chunk_size(vec_count-start_i, vals_count)

         gs, ls = dest_indices.get_sizes(queue,
                 knl.get_work_group_info(
diff --git a/pyopencl/elementwise.py b/pyopencl/elementwise.py
index 398936f..a4973fd 100644
--- a/pyopencl/elementwise.py
+++ b/pyopencl/elementwise.py
@@ -413,7 +413,7 @@ def get_take_put_kernel(context, dtype, idx_dtype, with_offsets, vec_count=1):


 @context_dependent_memoize
-def get_put_kernel(context, dtype, idx_dtype, vec_count=1):
+def get_put_kernel(context, dtype, idx_dtype, vec_count=1, vals_count=0):
     ctx = {
             "idx_tp": dtype_to_ctype(idx_dtype),
             "tp": dtype_to_ctype(dtype),
@@ -429,9 +429,12 @@ def get_put_kernel(context, dtype, idx_dtype, vec_count=1):
                 for i in range(vec_count)
             ]

+    idxs = ('i','0','i%'+str(vals_count))
+    idx = idxs[min(vals_count, 2)]
+
     body = (
             "%(idx_tp)s dest_idx = gmem_dest_idx[i];\n" % ctx
-            + "\n".join("dest%d[dest_idx] = src%d[i];" % (i, i)
+            + "\n".join("dest%d[dest_idx] = src%d[%s];" % (i, i, idx)
                 for i in range(vec_count)))

     return get_elwise_kernel(context, args, body, name="put")

from pyopencl.

inducer avatar inducer commented on May 18, 2024

I'm not quite happy with this, for two reasons:

  • The code is a little wasteful in terms of read bandwidth. Broadcasting the scalar value to all threads through a kernel argument would likely cut bandwidth usage.
  • The code is a little hard to decipher, with the computed index into a pre-constructed tuple. I'd prefer somewhat plainer (if marginally slower) if-then code. I'm also not convinced that "modulo mode" is sufficiently general to merit being included in a general-purpose array toolkit.

from pyopencl.

inferrna avatar inferrna commented on May 18, 2024

Broadcasting the scalar value to all threads through a kernel argument would likely cut bandwidth usage.

Agree with that. But hardcoding value into kernel needs recompilation each time. 2-step broadcasting over local data may be an solution. Is there for now any way to do this with pyopencl? If isn't, any extension for ElementwiseKernel to work with array and single value would be also helpful.

from pyopencl.

inducer avatar inducer commented on May 18, 2024

Passing a kernel argument seems like a good solution--you do have to bake the type into the kernel (but you have to do that anyway), but the can be provided at runtime.

from pyopencl.

hrfuller avatar hrfuller commented on May 18, 2024

@inducer Currently working on a solution to this. If we don't want to use "modulo mode" in the general case, does it make sense to throw an Exception if the indices array and the source array have different lengths (of course excluding the case where the source array has length == 1, in which case we fill all indices with the value)?. Behavior for a case with say, 2 values and 3 indices, would be undefined without "modulo mode" and could potentially seg fault or write garbage memory to the destination array (as it does in my build). However, I believe adding an Exception would not be backwards compatible.

from pyopencl.

inducer avatar inducer commented on May 18, 2024

Upon reconsidering this, I'm OK with (and will take patches for) either (a) matching numpy behavior exactly (but doing the broadcast via kernel arguments) or (b) only taking care of the same-size case and throwing an error in all other cases.

from pyopencl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.