inducer / pyopencl Goto Github PK

View Code? Open in Web Editor NEW

1.0K 48.0 235.0 5.85 MB

OpenCL integration for Python, plus shiny features

Home Page: http://mathema.tician.de/software/pyopencl

License: Other

Python 60.88% C 16.33% C++ 22.30% Makefile 0.02% Shell 0.30% Vim Script 0.18%

python gpu opencl heterogeneous-parallel-programming nvidia cuda amd performance array multidimensional-arrays

pyopencl's Introduction

PyOpenCL: Pythonic Access to OpenCL, with Arrays and Algorithms

PyOpenCL lets you access GPUs and other massively parallel compute devices from Python. It tries to offer computing goodness in the spirit of its sister project PyCUDA:

Object cleanup tied to lifetime of objects. This idiom, often called RAII in C++, makes it much easier to write correct, leak- and crash-free code.
Completeness. PyOpenCL puts the full power of OpenCL's API at your disposal, if you wish. Every obscure get_info() query and all CL calls are accessible.
Automatic Error Checking. All CL errors are automatically translated into Python exceptions.
Speed. PyOpenCL's base layer is written in C++, so all the niceties above are virtually free.
Helpful and complete Documentation as well as a Wiki.
Liberal license. PyOpenCL is open-source under the MIT license and free for commercial, academic, and private use.
Broad support. PyOpenCL was tested and works with Apple's, AMD's, and Nvidia's CL implementations.

Simple 4-step install instructions using Conda on Linux and macOS (that also install a working OpenCL implementation!) can be found in the documentation.

What you'll need if you do not want to use the convenient instructions above and instead build from source:

gcc/g++ new enough to be compatible with pybind11 (see their FAQ)
numpy, and
an OpenCL implementation. (See this howto for how to get one.)

pyopencl's People

Contributors

Stargazers

Watchers

Forkers

almclean npinto pieper eynuel abergeron stephenbalaban braincorp albeirobu sys-git kif tempbottle shinsec sjoshi60 fjarri dirkhaehnel benma blep romanstingler pohlt yuyichao kaimi2007 cancan101 gw0 tsmithe jjlaisnoopy shigh loudcube quantummechanist bmerry mitchellallison alex-linhares ssouyris timrudge mukolx evizbiz imrehg rckirby wme7 ielillo larsmans mrgloom linan7788626 leopardob shyamalschandra polarnick239 rhythm92 ozialien simudream syamgadde ewaybotyan jameslinus kennethlj jmc734 astro44 alexrothwell evitself ai42 jmmauricio ernwa kayarre mrakgr god1991 python-jheengut nachiket jcarrano hunse arashthk shahzaibgill-zz oldas1 erzb5e6f65 ok32 hrfuller maedoc hughperkins railyliu janbraiins jonnoftw skyreflectedinmirrors adityapb stratakis hightower8083 cyloss akkaze mattwala taihulight gaz3ll3 erkhemee patrickmmartin tboser wzheng21 aiengine jazzman37 iiseymour umitsamima solertis andrelrt keckj egpbos brutia blueringlab

pyopencl's Issues

ImportError: No module named 'pyopencl.IPython'

On Python 3.x for Windows %load_ext pyopencl.ipython fails with ImportError: No module named 'pyopencl.IPython'. Apparently the 2to3 tool converts from IPython.core.magic import ... to from .IPython.core.magic import...:

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-2-83a3f95db82d> in <module>()
----> 1 get_ipython().magic('load_ext pyopencl.ipython')

X:\Python33\lib\site-packages\IPython\core\interactiveshell.py in magic(self, arg_s)
   2203         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2204         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2205         return self.run_line_magic(magic_name, magic_arg_s)
   2206 
   2207     #-------------------------------------------------------------------------

X:\Python33\lib\site-packages\IPython\core\interactiveshell.py in run_line_magic(self, magic_name, line)
   2124                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2125             with self.builtin_trap:
-> 2126                 result = fn(*args,**kwargs)
   2127             return result
   2128 

X:\Python33\lib\site-packages\IPython\core\magics\extension.py in load_ext(self, module_str)

X:\Python33\lib\site-packages\IPython\core\magic.py in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

X:\Python33\lib\site-packages\IPython\core\magics\extension.py in load_ext(self, module_str)
     61         if not module_str:
     62             raise UsageError('Missing module name.')
---> 63         res = self.shell.extension_manager.load_extension(module_str)
     64 
     65         if res == 'already loaded':

X:\Python33\lib\site-packages\IPython\core\extensions.py in load_extension(self, module_str)
     96             if module_str not in sys.modules:
     97                 with prepended_to_syspath(self.ipython_extension_dir):
---> 98                     __import__(module_str)
     99             mod = sys.modules[module_str]
    100             if self._call_load_ipython_extension(mod):

X:\Python33\lib\site-packages\pyopencl\ipython.py in <module>()
      1 
      2 
----> 3 from .IPython.core.magic import (magics_class, Magics, cell_magic)
      4 
      5 import pyopencl as cl

ImportError: No module named 'pyopencl.IPython'

Import crashes due to a regex_error

This is on the branch submitted as #83, but I didn't touch regexps at all so I assume it will hold for a different fix of the build problem as well:

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyopencl as cl
terminate called after throwing an instance of 'std::regex_error'
  what():  regex_error
Aborted

Read Image back with enqueue_read_image: TypeError: expected a writeable buffer object

I am trying to read the image back with pyopencl's enqueue_read_image:

a = np.random.random((2,8,4)).astype(np.complex64)

test = cl.Buffer(ctx, mf.WRITE_ONLY, a.nbytes)

fmt = cl.ImageFormat(cl.channel_order.RG, cl.channel_type.FLOAT)
b = cl.Image(ctx, mf.READ_WRITE, fmt, shape = (4,16) )

writeImage(queue, (4,16), None, b)

cl.enqueue_read_image(queue, b, (0,0,0), (4,16,1), test)

The simple kernel writeImage is

kernel void writeImage(__write_only image2d_t out)
{
 int x = get_global_id(0); // (x,y) = pixel to process
 int y = get_global_id(1); // in this work item
 write_imagef(out, (int2)(x,y), (float4)(1.0, 2.0, 0.0, 1.0)); // Load and Store 
}

The enqueue_read_image creates the following error (I tried enqueue_copy and ended up getting the same error).

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-68-21a39c99d185> in <module>()
      8 writeImage(queue, (4,16), None, b)
      9 
---> 10 cl.enqueue_read_image(queue, b, (0,0,0), (4,16,1), test)

/Users/recondev/.virtualenvs/ipythonenv/lib/python2.7/site-packages/pyopencl/__init__.pyc in new_func(*args, **kwargs)
    900                 "enqueue_copy() instead." % func.__name__[1:], DeprecationWarning,
    901                 stacklevel=2)
--> 902         return func(*args, **kwargs)
    903 
    904     try:

TypeError: expected a writeable buffer object

I can't figure out what I am doing wrong. Should I set the test buffer differently?

Import error

Hi, i got this error when i try to run the demo.py example.

python demo.py
Traceback (most recent call last):
File "demo.py", line 1, in
import pyopencl as cl
File "/usr/local/lib/python2.7/dist-packages/pyopencl-2014.1-py2.7-linux-x86_64.egg/pyopencl/init.py", line 28, in
import pyopencl._cl as _cl
ImportError: /usr/local/lib/python2.7/dist-packages/pyopencl-2014.1-py2.7-linux-x86_64.egg/pyopencl/_cl.so: undefined symbol: _ZTIN5boost6python15instance_holderE

Compilation terminated, cannon open source file

I have 2 kernels that both share a common helper function which I abstracted out into a helper file which then gets #included... This seems to work fine on my OSX system using the stock Apple OpenCL compiler/library but on a different Windows 10 both with the AMD OpenCL compiler/library I get a cannon open source file error. Basically, the compiler can't find the header. How do I tell pyopencl where to look for headers?

bitonic

Is there any way to sort signed float values with pyopencl? For implemented radix sort keys must be "integer-valued C expressions", but bitonic sort haven't this limitation.

Does not compile with python-3

Hi all.

I use Gentoo Linux with multiple python versions installed. (3.2.3-r2 and 2.7.3-r3). When I use ebuild from official portage tree dev-python/pyopencl (versions 2012.1 and 9999 - no matter) - it compiles without any error for python2. But it does not even try to compile for python-3 (I think it will be fixed in future, I can fix it by myself, anyway).

I tried to compile it by hands without portage (using ./configure.py && make) and I see the following error:

building '_pvt_struct' extension
x86_64-pc-linux-gnu-g++ -pthread -fPIC
-I/usr/lib64/python3.2/site-packages/numpy/core/include
-I/usr/lib64/python3.2/site-packages/numpy/core/include
-I/usr/include/python3.2 -c src/wrapper/_pvt_struct_v3.cpp -o
build/temp.linux-x86_64-3.2/src/wrapper/_pvt_struct_v3.o
src/wrapper/_pvt_struct_v3.cpp: In function 'int s_init(PyObject*, PyObject*,
PyObject*)':
src/wrapper/_pvt_struct_v3.cpp:1045:41: warning: deprecated conversion from
string constant to 'char*' [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1047:5: error: 'PyStructType' was not declared
in this scope
src/wrapper/_pvt_struct_v3.cpp: In function 'PyObject* s_unpack(PyObject*,
PyObject*)':
src/wrapper/_pvt_struct_v3.cpp:1138:5: error: 'PyStructType' was not declared
in this scope
src/wrapper/_pvt_struct_v3.cpp: In function 'PyObject*
s_unpack_from(PyObject*, PyObject*, PyObject*)':
src/wrapper/_pvt_struct_v3.cpp:1164:51: warning: deprecated conversion from
string constant to 'char*' [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1164:51: warning: deprecated conversion from
string constant to 'char*' [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1172:5: error: 'PyStructType' was not declared
in this scope
src/wrapper/_pvt_struct_v3.cpp: In function 'PyObject* s_pack(PyObject*,
PyObject*)':
src/wrapper/_pvt_struct_v3.cpp:1296:5: error: 'PyStructType' was not declared
in this scope
src/wrapper/_pvt_struct_v3.cpp: In function 'PyObject* s_pack_into(PyObject*,
PyObject*)':
src/wrapper/_pvt_struct_v3.cpp:1336:5: error: 'PyStructType' was not declared
in this scope
src/wrapper/_pvt_struct_v3.cpp: At global scope:
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion from
string constant to 'char*' [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion from
string constant to 'char*' [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion from
string constant to 'char*' [-Wwrite-strings]
src/wrapper/_pvt_struct_v3.cpp:1414:1: warning: deprecated conversion from
string constant to 'char*' [-Wwrite-strings]
error: command 'x86_64-pc-linux-gnu-g++' failed with exit status 1
make: *** [all] Error 1

I did a change in my fork with deleting lines which cause compile error and compilcation worked fine. What is the problem? Is it a bug or my local problem?

(the troubled line is "assert(PyStruct_Check(self));")

float3 data type requires explicit padding value

Hi,
I am using PyOpenCL 2013.2. When creating a float3 data type, an explicit padding value is required: np.array((x, y, z, PADDING_VALUE), array.vec.float3).

The array.vec.make_float3 flavor doesn't work at all.

This is not a huge problem, but a little hint in the documentation would help a lot (if there is one, I missed it).

Thanks a bunch for PyOpenCL. It's just great!

Cheers,
Tom

internal boolean type for array

For equations like array[array>3] results of numpy anp pyopencl completely differs. For myself I had rewritten some operators to support boolean indexing. Is it possible to add internal boolean type into Array class? My solution (dirty and non-optimal, of course) below for example.

class myclArray(clarray.Array):
    def __init__(self, *args, **kwargs):
        clarray.Array.__init__(self, *args, **kwargs)
        self.ndim = len(self.shape)
        self.is_boolean = False

    def __lt__(self, other):
        result = clarray.Array.__lt__(self, other)
        result.is_boolean = True
        return result

    def __getitem__(self, index):
        if isinstance(index, myclArray) and index.is_boolean == True:
            x, y, z = algorithm.copy_if(self.reshape((self.size,)), "index[i]!=0", [("index", index.reshape((index.size,)))])
            _res = x[:y.get()]
            res = myclArray(queue, _res.shape, _res.dtype, data=_res.data)
        else:
            res = clarray.Array.__getitem__(self, index)
        return res

    def __setitem__(self, subscript, value):
        if isinstance(subscript, myclArray) and subscript.is_boolean == True:
            idxcl = clarray.arange(queue, 0, self.size, 1, dtype=np.int32)
            x, y, z = algorithm.copy_if(idxcl, "index[i]!=0", [("index", subscript.reshape((subscript.size,)))])
            _res = x[:y.get()]
            clarray.Array.setitem(self.reshape((self.size,)), _res, value, queue=queue)
        else:
            self.setitem(subscript, value, queue=queue)

There is no compiler errors explanation if PYOPENCL_NO_CACHE is set

Steps to reproduce:

Write up some buggy kernel
build it using Program.build: http://documen.tician.de/pyopencl/runtime.html#pyopencl.Program.build
See compiler output as usual
run your app with Program.build inside with PYOPENCL_NO_CACHE turned on

You will not see any error except pyopencl.RuntimeError: clBuildProgram failed: build program failure after step 4.

pyopencl cache doesn't care about #incude

I share some code between opencl files. Because of OpenCL 1.1 format easiest way is inclusion common cl code into both of cl files like this:

common.cl

void test(void){
  return;
}

kernel.cl:

#include "common.cl"

__kernel void test_kernel(__global int *result)
{
  test()
}

If I modify test in common.cl pyopencl doesn't see it and use cached version of common.cl. Even if I delete test from common.cl pyopencl doesn't see it. It's quite dangerous because for example I spent a lot of time trying to find an error that was in this cache issue.

Restrict work group size for elementwise kernel

For the ElementwiseKernel, there is no mechanism to determine the work group size before invocation. Sometimes, this is a problem, e.g. if you want to allocate local memory proportional to the expected work group size.

What is the best pattern to overcome this limitation? Would it be a good idea to incorporate a mechanism for controlling the work group size before invocation into ElementwiseKernel?

Thanks in advance,
Johannes

Add some debug-oriented tools

As I can see It isn't easy to debug kernels in PyOpenCL project. For example if I catch Segmentation Fault I can't debug It with some powerful tool easily.

Some bibliography: http://stackoverflow.com/questions/14468147/pyopencl-how-to-debug-segmentation-fault

uchar3 interpreted as uchar4

Consider the following test case:

import numpy as np
import pyopencl as cl

vector = np.asarray([[1,2,3] for i in range(0, 4)], dtype='uint8').T

kernel = """
__kernel void run(__global const uchar3 *vals) {
    int index = get_global_id(0);
    printf("(%d) = (%d, %d, %d)\\n", index, vals[index].x, vals[index].y, vals[index].z);
}
"""

ctx = cl.Context(dev_type=cl.device_type.CPU)
queue = cl.CommandQueue(ctx)

program = cl.Program(ctx, kernel).build()

buf = cl.Buffer(ctx, cl.mem_flags.READ_WRITE, size=vector.nbytes)
cl.enqueue_copy(queue, buf, vector)

program.run(queue, (4, ), None, buf)

It should output:

(0) = (1, 2, 3)
(1) = (1, 2, 3)
(2) = (1, 2, 3)
(3) = (1, 2, 3)

But outputs:

(0) = (1, 2, 3)
(1) = (2, 3, 1)
(2) = (3, 1, 2)
(3) = (0, 0, 0)

But when changing the above test case to use:

vector = np.asarray([[1,2,3,4] for i in range(0, 4)], dtype='uint8').T

kernel = """
__kernel void run(__global const uchar4 *vals) {
    int index = get_global_id(0);
    printf("(%d) = (%d, %d, %d)\\n", index, vals[index].x, vals[index].y, vals[index].z);
}
"""

the output is as I would expect:

(0) = (1, 2, 3)
(1) = (1, 2, 3)
(2) = (1, 2, 3)
(3) = (1, 2, 3)

Insecure temporary file creation on UNIX

Same as this PyCUDA bug.

Equality of MemoryObjects

The docs for MemoryObject say "Instances of this class are hashable, and two instances of this class may be compared using “==” and ”!=”." But when are two instances equal?

TypeError: can't concat bytes to str

ipython-demo.ipynb fails on Python 3 for Windows:

%%cl_kernel

__kernel void sum_vector(__global const float *a,
__global const float *b, __global float *c)
{
  int gid = get_global_id(0);
  c[gid] = a[gid] + b[gid];
}

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-b6be44adac0d> in <module>()
----> 1 get_ipython().run_cell_magic('cl_kernel', '', '\n__kernel void sum_vector(__global const float *a,\n__global const float *b, __global float *c)\n{\n  int gid = get_global_id(0);\n  c[gid] = a[gid] + b[gid];\n}')

X:\Python33\lib\site-packages\IPython\core\interactiveshell.py in run_cell_magic(self, magic_name, line, cell)
   2160             magic_arg_s = self.var_expand(line, stack_depth)
   2161             with self.builtin_trap:
-> 2162                 result = fn(magic_arg_s, cell)
   2163             return result
   2164 

X:\Python33\lib\site-packages\pyopencl\ipython.py in cl_kernel(self, line, cell)

X:\Python33\lib\site-packages\IPython\core\magic.py in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

X:\Python33\lib\site-packages\pyopencl\ipython.py in cl_kernel(self, line, cell)
     28                     "present in namespace as 'cl_ctx' or 'ctx'")
     29 
---> 30         prg = cl.Program(ctx, cell.encode("utf8")).build()
     31 
     32         for knl in prg.all_kernels():

X:\Python33\lib\site-packages\pyopencl\__init__.py in build(self, options, devices, cache_dir)
    207                         self._context, self._source, options, devices,
    208                         cache_dir=cache_dir),
--> 209                     options=options, source=self._source)
    210 
    211             del self._context

X:\Python33\lib\site-packages\pyopencl\__init__.py in _build_and_catch_errors(self, build_func, options, source)
    215     def _build_and_catch_errors(self, build_func, options, source=None):
    216         try:
--> 217             return build_func()
    218         except _cl.RuntimeError as e:
    219             from pytools import Record

X:\Python33\lib\site-packages\pyopencl\__init__.py in <lambda>()
    206                     lambda: create_built_program_from_source_cached(
    207                         self._context, self._source, options, devices,
--> 208                         cache_dir=cache_dir),
    209                     options=options, source=self._source)
    210 

X:\Python33\lib\site-packages\pyopencl\cache.py in create_built_program_from_source_cached(ctx, src, options, devices, cache_dir)
    473         if cache_dir is not False:
    474             prg, already_built = _create_built_program_from_source_cached(
--> 475                     ctx, src, options, devices, cache_dir)
    476         else:
    477             prg = _cl._Program(ctx, src)

X:\Python33\lib\site-packages\pyopencl\cache.py in _create_built_program_from_source_cached(ctx, src, options, devices, cache_dir)
    395         from uuid import uuid4
    396         src = src + "\n\n__constant int pyopencl_defeat_cache_%s = 0;" % (
--> 397                 uuid4().hex)
    398 
    399         prg = _cl._Program(ctx, src)

TypeError: can't concat bytes to str

Looks like src is bytes while uuid4().hex is str.

2013.1: get easy access to resulted GenericScanKernel based kernels

From #9

If I understood better what you'd like to do with the kernels, perhaps I could serve you better. If you've got time to write what you'd like to do, please reopen this bug.

I have two reasons for it

I want to call GenericScanKernel based algorithm a lot of times during the program execution. I split input to a lot of sets less sized and apply GenericScanKernel based algorithm to each set. I want to get rid of kernel compiling and same overheads. Only way I see is to reuse generated kernels.
I want to get as much performance as I can. So I want to tweak such kernels. For example in some cases my input is boolean array of 0 and 1, so I don't need to get it. I want to turn such part off and I can't without access to kernels.

building on ubuntu python 3.4 makes the 2.7 version

Hi I am trying to build pyopencl on ubuntu with anaconda python 3.4 64bit. But I end up with warnings saying deprecated numpy api.

>>> import pyopencl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/immanuel/programs/pyopencl-2015.1/pyopencl/__init__.py", line 222
    except _cl.RuntimeError, e:
                           ^
SyntaxError: invalid syntax

Seems like I am building a 2.7 version

Installation procedure:

tar zxvf AMD-APP-SDK-v2.8.1.0-lnx64.tgz
chmod +x Install-AMD-APP.sh
./Install-AMD-APP.sh
tar zxvf icd-registration.tgz
sudo cp -r etc/OpenCL/* /etc/OpenCL/
rm -r etc

sudo apt-get install python-numpy libboost1.55-all-dev python-setuptools

wget http://pypi.python.org/packages/source/p/pyopencl/pyopencl-2011.2.tar.gz
tar xfz pyopencl-2015.1.tar.gz
cd pyopencl-2015.1/
sudo python configure.py                            \
   --boost-inc-dir=/usr/include/boost          \
   --boost-lib-dir=/usr/lib                    \
   --no-use-shipped-boost                      \
   --boost-python-libname=boost_python-py34 \
   --cl-inc-dir=/opt/AMDAPP/include/           \
   --cl-lib-dir=/opt/AMDAPP/lib/x86_64         \
   --cl-libname=OpenCL
make
sudo make install

optimization: support for 2-D/3-D arrays with strides >= for better memory bandwidth utilization

http://developer.amd.com/tools-and-sdks/opencl-zone/amd-accelerated-parallel-processing-app-sdk/opencl-optimization-guide/#50401334_pgfId-472173

You can see that, depending on graphics card model, there are stride values to avoid - funny enough sometimes with strides of powers of 2 - fairly common numbers. If you ignore this, you can get into trouble and under utilize the memory channels available in hardware in addition to increasing bank conflicts (so the hit is 2 fold). The fix is to simply allocate a slightly pad the strides to avoid these conflicts (found via vendor/model documentation). This should be of use to any GPU implementation and likely some accelerators. It should not be performed when the device type is CPU or code that is doing vector operations directly on memory where the hardware has alignment restrictions.

As such, I tried to use such an array with pyopencl but it doesn't work out very well in manipulation/printing/inspection:

    def make_cl_image_buffer(queue, img_dims, img_dtype, linestep = None):
        img_dtype = dtype_of(img_dtype)
        if linestep is None:
            linestep = img_dims[1] * img_dtype.itemsize
            if queue.device.type == cl.device_type.GPU:
                #make sure to have an uneven line width for better memory channel utilization, as per AMD recommendation
                if linestep % 2 == 0:
                    linestep += 1
        buf = cl.Buffer(queue.context, cl.mem_flags.READ_WRITE, img_dims[0]*linestep)
        return buf, linestep

    def make_cl_image(queue, img_dims, img_dtype, linestep = None):
        img_dtype = dtype_of(img_dtype)
        img_buf, linestep = make_cl_image_buffer(queue, img_dims, img_dtype, linestep)
        strides = (linestep, dtype_of(img_dtype).itemsize)
        img = clarray.Array(queue, img_dims, img_dtype, strides=strides, data=img_buf)
        return img


In [3]: img=com.make_cl_image(queue, (2044, 2044), np.uint32)

In [12]: img.shape
Out[12]: (2044, 2044)

In [13]: img.strides
Out[13]: (8177, 4)

    /home/jason/tmp/pyopencl.git/pyopencl/array.py in get(self, queue, ary, async)
        685                 raise TypeError("'ary' has non-matching type")
        686
    --> 687         assert self.flags.forc, "Array in get() must be contiguous"
        688
        689         if self.size:

    AssertionError: Array in get() must be contiguous

I searched around and found someone else curious about this last year - he also put some work into this:
#54

I wouldn't mind adding some lines to the library to accomplish good-no-surprises manipulation but I was unsure the best way to do it since it seems everything relies on the element wise kernels. I think the best/most compatible way of doing this is having the element wise kernels take in an additional stride parameter which they use to calculate the offset of the element they process.

What do you think? Is it possible for you to add this? If you don't have the time - can you outline what is the way to get it done?

ps. Tried bringing this up for discussion on the ML, got stuck in moderation.

copy_if: use uint64 instead of int64

I think it is better to use np.uint64 and np.uint32 instead of int-version in this code:

if len(ary) > np.iinfo(np.int32).max:
    scan_dtype = np.int64
else:
    scan_dtype = np.int32

https://github.com/inducer/pyopencl/blob/master/pyopencl/algorithm.py#L70

You can get twice more possible values for count variable.

problem with enqueue_fill_image

I'm pretty sure I'm using enqueue_fill_image according to the spec:

import pyopencl as cl

#...

output = cl.Image(mgr.context, cl.mem_flags.READ_WRITE, cl.ImageFormat(cl.channel_order.RGBA,cl.channel_type.UNSIGNED_INT32),
                          shape = (mgr.cell_width, mgr.cell_height*2))

fill_evt = cl.enqueue_fill_image(mgr.queue,output,np.zeros((4,),dtype=np.uint32),origin=(0,0),region=output.shape)

I'm getting an error that clearly indicates there is something different in the expected arguments list:

ArgumentError: Python argument types in
    pyopencl._cl.enqueue_fill_image(CommandQueue, Image, numpy.ndarray)
did not match C++ signature:
    enqueue_fill_image(pyopencl::command_queue {lvalue}, pyopencl::image {lvalue}, pyopenclboost::python::api::object, pyopenclboost::python::api::object queue, pyopenclboost::python::api::object mem, unsigned long color, unsigned long origin, pyopenclboost::python::api::object region, bool wait_for=None)

It seems like either the reference is outdated, in which case I want to know how to actually use this function, or something in the code is incorrect. Could you look into this?

-Thanks

Problems with installing the `cffi` branch.

On my ArchLinux machine some hacks are necessary for installing the cffi branch on pypy. (Some of the problem might not be pyopencl's fault but I don't really know where's the problem at this point.)

the _wrap.so library is compiled using cc and is therefore not linked with libstdc++.so causing problems while loading. Using a c++ compiler (by hacking the PATH) solves the problem.
the _wrapcl.so is actually named _wrapcl.py-22.so here and therefore cannot be found at run time. creating a symlink solves the problem.
pyopencl/c_wrapper/wrap_cl_gl_core.h is not installed and needs to be installed by hand.

pypy version info:

Python 2.7.3 (87aa9de10f9ca71da9ab4a3d53e0ba176b67d086, Dec 04 2013, 12:50:47)
[PyPy 2.2.1 with GCC 4.8.2]

Double precision complex numbers

Hello,

First thanks for that great piece of software.
I am trying to use double precision complex with PyOpenCL (complex128) on an ATI platform. I guess I am not doing it properly, because I get an error message (see below). Or is it an issue with that OpenCL extension cl_khr_fp64 ?
Adding a preamble like in http://documen.tician.de/pyopencl/array.html#complex-numbers leads to the same results.

On the CPU (intel i7), the following will work:

 norm2 = ElementwiseKernel(ctx,
            "cdouble_t *x, cdouble_t *z",
            "z[i] = cdouble_mul(x[i], cdouble_conj(x[i]))",
            "norm2")

Also, it seems that many test of pyopencl/test/ fail for the same issue.
I am using the last GIT of PyOpenCL on linux 3.2.7 x86_64, gcc version 4.6.2, ati catalyst 12.1, OpenCL SDK 2.6.
Any hint ?
Thanks,

norm2 = ElementwiseKernel(ctx,
    "cdouble_t *x, cdouble_t *z",
    "z[i] = cdouble_mul(x[i], cdouble_conj(x[i]))",
    "norm2")
## -- End pasted text --
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/home/tnorth/Documents/phd/conferences/UP2012/simus/ssfs_20120209_beta3/<ipython-input-93-22eceb07b4a1> in <module>()
      2     "cdouble_t *x, cdouble_t *z",
      3     "z[i] = cdouble_mul(x[i], cdouble_conj(x[i]))",
----> 4     "norm2")

/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/elementwise.pyc in __init__(self, context, arguments, operation, name, options, **kwargs)                                                              
    141             context, arguments, operation,
    142             name=name, options=options,
--> 143             **kwargs)
    144                                                                                                              
    145         if not [i for i, arg in enumerate(self.arguments)

/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/elementwise.pyc in get_elwise_kernel_and_types(context, arguments, operation, name, options, preamble, **kwargs)                                       
    102     prg = get_elwise_program(
    103         context, parsed_args, operation,
--> 104         name=name, options=options, preamble=preamble, **kwargs)
    105                                                                                                              
    106     scalar_arg_dtypes = []

/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/elementwise.pyc in get_elwise_program(context, arguments, operation, name, options, preamble, loop_prep, after_loop)                                   
     67             })
     68 
---> 69     return Program(context, source).build(options)
     70 
     71 

/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/__init__.pyc in build(self, options, devices, cache_dir)                                                                                               
    122             self._prg = create_built_program_from_source_cached(
    123                     self._context, self._source, options, devices,
--> 124                     cache_dir=cache_dir)
    125                                                                                                              
    126         return self

/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/cache.pyc in create_built_program_from_source_cached(ctx, src, options, devices, cache_dir)                                                            
    457         if cache_dir != False:
    458             prg, already_built = _create_built_program_from_source_cached(
--> 459                     ctx, src, options, devices, cache_dir)
    460         else:                                                                                                
    461             prg = _cl._Program(ctx, src)

/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/cache.pyc in _create_built_program_from_source_cached(ctx, src, options, devices, cache_dir)                                                           
    381 
    382         prg = _cl._Program(ctx, src)
--> 383         prg.build(options, [devices[i] for i in to_be_built_indices])
    384 
    385         prg_devs = prg.get_info(_cl.program_info.DEVICES)

/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/__init__.pyc in program_build(self, options, devices)                                                                                                  
    375             # Python 3.2 outputs the whole list of currently active exceptions

    376             # This serves to remove one (redundant) level from that nesting.

--> 377             raise err
    378 
    379         message = (75*"="+"\n").join(

RuntimeError: clBuildProgram failed: build program failure - 

Build on <pyopencl.Device 'Juniper' on 'AMD Accelerated Parallel Processing' at 0x1647f20>:

/tmp/OCLLZXKXc.cl(2): error: can't enable all OpenCL extensions or
          unrecognized OpenCL extension
          #pragma OPENCL EXTENSION cl_khr_fp64: enable
                                                ^

/usr/lib64/python2.7/site-packages/pyopencl-2011.2-py2.7-linux-x86_64.egg/pyopencl/../include/pyopencl/pyopencl-complex.h(223): error: 
          identifier "double2" is undefined
  PYOPENCL_DECLARE_COMPLEX_TYPE(double, DBL);
  ^

Cannot run same script twice.

After installing pyopencl and running some example I found this strange behaviour: running twice the same program produces a RuntimeError. For instance running the demo.py from examples folder:

$ python demo.py 
[ 0.  0.  0. ...,  0.  0.  0.]
0.0

$ python demo.py 
Traceback (most recent call last):
  File "demo.py", line 22, in <module>
    """).build()
  File "/usr/local/lib/python2.7/dist-packages/pyopencl-2014.1-py2.7-linux-x86_64.egg/pyopencl/__init__.py", line 213, in build
    options=options, source=self._source)
  File "/usr/local/lib/python2.7/dist-packages/pyopencl-2014.1-py2.7-linux-x86_64.egg/pyopencl/__init__.py", line 253, in _build_and_catch_errors
    raise err
pyopencl.RuntimeError: clBuildProgram failed: invalid program - 

Build on <pyopencl.Device 'Intel HD Graphics Family' on 'Experiment Intel Gen OCL Driver' at 0x7f273f4c2720>:

(options: -I /usr/local/lib/python2.7/dist-packages/pyopencl-2014.1-py2.7-linux-x86_64.egg/pyopencl/cl)
(source saved as /tmp/tmpT6Tv8O.cl)

I tried install the pip version of pyopencl and the git version. In both cases they produce the same errors. Compiling C/C++ code for OpenCL with the installed driver doesn't produce any error.

Here the generated .cl file:

$ cat /tmp/tmpw0tqln.cl

__kernel void sum(__global const float *a_g, __global const float *b_g, __global float *res_g) {
  int gid = get_global_id(0);
  res_g[gid] = a_g[gid] + b_g[gid];
}

However removing the cached output under /tmp the execution resumes without any problems:

$ rm -r /tmp/pyopencl-compiler-cache-v2-uidvalerio-py2.7.6.final.0
$ python demo.py 
[ 0.  0.  0. ...,  0.  0.  0.]
0.0

Any ideas on how to fix this problem? :)

Build error on dev version

I'm able to install the newest release (2015.1) using the instructions here. (By the way, these instructions need to be updated, since configure.py no longer seems to take boost arguments. [EDIT: Sorry, it's just in master that boost arguments no longer work. They're fine in v2015.1.] Also, it's a bit confusing having documentation both at that site and at http://documen.tician.de/pyopencl/.)

But when I try to build what's currently in master, I get a bunch of errors. I've posted the first one below.

x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -fwrapv -Wall -O3 -DNDEBUG -fPIC -DPYGPU_PACKAGE=pyopencl -DPYGPU_PYOPENCL=1 -DPYOPENCL_USE_DEVICE_FISSION=1 -I/opt/AMDAPP/include -Isrc/c_wrapper/ -I/usr/include/python2.7 -c src/c_wrapper/wrap_cl.cpp -o build/temp.linux-x86_64-2.7/src/c_wrapper/wrap_cl.o -std=c++0x
In file included from src/c_wrapper/clhelper.h:1:0,
                 from src/c_wrapper/wrap_cl.cpp:2:
src/c_wrapper/error.h: In instantiation of ‘void call_guarded(cl_int (*)(ArgTypes ...), const char*, ArgTypes2&& ...) [with ArgTypes2 = {device* const, int, ArgBuffer<_cl_platform_id*, (ArgType)1>, std::nullptr_t}; ArgTypes = {_cl_device_id*, unsigned int, long unsigned int, void*, long unsigned int*}; cl_int = int]’:
src/c_wrapper/device.h:39:17:   required from here
src/c_wrapper/error.h:218:51: error: no matching function for call to ‘CLArgPack<device* const, int, ArgBuffer<_cl_platform_id*, (ArgType)1>, std::nullptr_t>::clcall(cl_int (*&)(_cl_device_id*, unsigned int, long unsigned int, void*, long unsigned int*), const char*&)’
     cl_int status_code = argpack.clcall(func, name);
                                                   ^

Explicit value declaration for local buffer

Why now is need to explicit put a "none" for local buffer space? :/

Error building master on Mac OS

building 'pyopencl._cffi' extension
clang -fno-strict-aliasing -fno-common -dynamic -fwrapv -Wall -O3 -DNDEBUG -DPYGPU_PACKAGE=pyopencl -DPYGPU_PYOPENCL=1 -DPYOPENCL_USE_DEVICE_FISSION=1 -Isrc/c_wrapper/ -I/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c build/temp.macosx-10.8-x86_64-2.7/pyopencl._cffi.cpp -o build/temp.macosx-10.8-x86_64-2.7/build/temp.macosx-10.8-x86_64-2.7/pyopencl._cffi.o -std=c++0x -arch i386 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk
clang -fno-strict-aliasing -fno-common -dynamic -fwrapv -Wall -O3 -DNDEBUG -DPYGPU_PACKAGE=pyopencl -DPYGPU_PYOPENCL=1 -DPYOPENCL_USE_DEVICE_FISSION=1 -Isrc/c_wrapper/ -I/usr/local/Cellar/python/2.7.6/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c src/c_wrapper/wrap_cl.cpp -o build/temp.macosx-10.8-x86_64-2.7/src/c_wrapper/wrap_cl.o -std=c++0x -arch i386 -arch x86_64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk
In file included from src/c_wrapper/wrap_cl.cpp:1:
In file included from src/c_wrapper/pyhelper.h:5:
src/c_wrapper/function.h:14:32: error: no type named 'remove_reference' in namespace 'std'
using rm_ref_t = typename std::remove_reference<T>::type;
                 ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~

I'm guessing my clang version needs to be updated but didn't see a minimum version on the wiki.

Mako for Python 3 does not support disabling Unicode

Hi, I'm trying current git version (30a83a7) in Python3.2.
When executing following small program,

import numpy as np
import pyopencl as cl
from pyopencl.scan import InclusiveScanKernel

ctx = cl.create_some_context()
knl = InclusiveScanKernel(ctx, np.int32, 'a + b', neutral='0')

following error is occured.

Traceback (most recent call last):
  File "scan.py", line 6, in <module>
    knl = InclusiveScanKernel(ctx, np.int32, 'a + b', neutral='0')
  File "/Users/likr/local/python32/lib/python3.2/site-packages/pyopencl-2012.1-py3.2-macosx-10.7-x86_64.egg/pyopencl/scan.py", line 1170, in __init__
    options=options, preamble=preamble, devices=devices)
  File "/Users/likr/local/python32/lib/python3.2/site-packages/pyopencl-2012.1-py3.2-macosx-10.7-x86_64.egg/pyopencl/scan.py", line 894, in __init__
    store_segment_start_flags=self.store_segment_start_flags)
  File "/Users/likr/local/python32/lib/python3.2/site-packages/pyopencl-2012.1-py3.2-macosx-10.7-x86_64.egg/pyopencl/scan.py", line 1008, in build_scan_kernel
    scan_tpl = _make_template(SCAN_INTERVALS_SOURCE)
  File "/Users/likr/local/python32/lib/python3.2/site-packages/pyopencl-2012.1-py3.2-macosx-10.7-x86_64.egg/pyopencl/scan.py", line 760, in _make_template
    return mako.template.Template(s, strict_undefined=True, disable_unicode=True)
  File "/Users/likr/local/python32/lib/python3.2/site-packages/mako/template.py", line 253, in __init__
    "Mako for Python 3 does not "
mako.exceptions.UnsupportedError: Mako for Python 3 does not support disabling Unicode

array.py: allow 'u' additional to 'i' for subscript.dtype.kind and index.dtype.kind

In functions __getitem__ and __setitem__ signed integer types is only allowed for indexing. Is there any reason for disallowing unsigned indexes? Unfortunately my memory supports only positive addressing and I would like to be able to use unsigned indexing too.

Issue with NVIDIA Build Option in OpenCL

This should be a valid build option for intel compiler: http://www.khronos.org/registry/cl/extensions/nv/cl_nv_compiler_options.txt

prog = cl.Program(ctx, txt).build(options="-cl-nv-verbose")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-3-2e721669fcb8> in <module>()
     18 }
     19 """
---> 20 prog = cl.Program(ctx, txt).build(options="-cl-nv-verbose")

/usr/local/lib/python2.7/site-packages/pyopencl/__init__.pyc in build(self, options, devices, cache_dir)
    207                         self._context, self._source, options, devices,
    208                         cache_dir=cache_dir),
--> 209                     options=options, source=self._source)
    210 
    211             del self._context

/usr/local/lib/python2.7/site-packages/pyopencl/__init__.pyc in _build_and_catch_errors(self, build_func, options, source)
    247         # Python 3.2 outputs the whole list of currently active exceptions
    248         # This serves to remove one (redundant) level from that nesting.
--> 249         raise err
    250 
    251     # }}}

RuntimeError: clBuildProgram failed: invalid build options - 

Build on <pyopencl.Device 'GeForce GT 750M' on 'Apple' at 0x1022700>:


(options: -cl-nv-verbose -I /usr/local/lib/python2.7/site-packages/pyopencl/cl)
(source saved as /var/folders/nk/5v0p39pn4yg7c_3vtydljk000000gn/T/tmpRUnaG1.cl)

2013.1: Errors with enqueue_marker

I have such code:

import numpy
import pyopencl as cl
from pyopencl.scan import GenericScanKernel

context = cl.create_some_context()
queue = cl.CommandQueue(context, properties=cl.command_queue_properties.PROFILING_ENABLE)

array = numpy.zeros(1000000, dtype=numpy.int32)
gpu_array = cl.array.to_device(queue, array)

knl = GenericScanKernel(
        context, numpy.int32,
        arguments="__global int *ary",
        input_expr="ary[i]",
        scan_expr="a+b", neutral="0",
        output_statement="ary[i+1] = item;")

start_event = cl.enqueue_marker(queue)
knl(gpu_array, queue=queue)
stop_event = cl.enqueue_marker(queue)
stop_event.wait()

elapsed_seconds = (stop_event.profile.end - start_event.profile.start)
print(elapsed_seconds)

When I run it and choose device Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz on Intel(R) OpenCL it prints 0.
When I run it on Intel(R) Core(TM) i7-2600K CPU @ 3.40GHz on AMD Accelerated Parallel Processing it crashes with error

Traceback (most recent call last):
  File "test_copy_id_if.py", line 66, in <module>
    elapsed_seconds = (stop_event.profile.end - start_event.profile.start)
  File "/usr/lib/python3.3/site-packages/pyopencl/__init__.py", line 493, in __getattr__
    return self.event.get_profiling_info(inf_attr)
pyopencl.RuntimeError: clGetEventProfilingInfo failed: profiling info not available

Same behaviour on AMD Opteron(tm) Processor 6176 SE on AMD Accelerated Parallel Processing.

When I run it on Tahiti on AMD Accelerated Parallel Processing it works fine and get some non zero number.

Pyopencl 2013.1, git version, last commit is 14f6b74043ce7f5fec054107db9818f436ee1497.

Platform name: AMD Accelerated Parallel Processing
Platform profile: FULL_PROFILE
Platform vendor: Advanced Micro Devices, Inc.
Platform version: OpenCL 1.2 AMD-APP (1084.4)
===============================================================
Platform name: Intel(R) OpenCL
Platform profile: FULL_PROFILE
Platform vendor: Intel(R) Corporation
Platform version: OpenCL 1.1 LINUX

In case of Tahiti and AMD Opteron it is

Platform name:, AMD Accelerated Parallel Processing
Platform profile:, FULL_PROFILE
Platform vendor:, Advanced Micro Devices, Inc.
Platform version:, OpenCL 1.2 AMD-APP (938.1)

But following code works perfectly on GPU (Tahiti) and CPU (both Opteron and Core i7) on AMD SDK and still return zero on Intel SDK:

import numpy as np
import pyopencl as cl
from pyopencl.scan import GenericScanKernel

context = cl.create_some_context()
queue = cl.CommandQueue(context, properties=cl.command_queue_properties.PROFILING_ENABLE)
knl = GenericScanKernel(
        context, np.int32,
        arguments="__global int *ary",
        input_expr="ary[i]",
        scan_expr="a+b", neutral="0",
        output_statement="ary[i+1] = item;")

a = cl.array.arange(queue, 10000, dtype=np.int32)
start_event = cl.enqueue_marker(queue)
knl(a, queue=queue)
stop_event = cl.enqueue_marker(queue)
stop_event.wait()

elapsed_seconds = (stop_event.profile.end - start_event.profile.end)
print(elapsed_seconds)

Calling set_args followed by enqueue_nd_range_kernel raises INVALID_KERNEL_ARGS exception

One receives an INVALID_KERNEL_ARGS exception when trying to split a kernel call into separate set_arg(s) and enqueue_nd_range_kernel methods. This occurs on Windows 10 and Linux using OpenCL 1.1/1.2 Nvidia/Intel on PyOpenCL versions 2014.1, 2015.1 and current master.

Specifically, the following code (adapted from main example).

import numpy as np
import pyopencl as cl

a_np = np.random.rand(50000).astype(np.float32)
b_np = np.random.rand(50000).astype(np.float32)

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

mf = cl.mem_flags
a_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=a_np)
b_g = cl.Buffer(ctx, mf.READ_ONLY | mf.COPY_HOST_PTR, hostbuf=b_np)

prg = cl.Program(ctx, """
__kernel void sum(__global const float *a_g, __global const float *b_g, __global float *res_g) {
  int gid = get_global_id(0);
  res_g[gid] = a_g[gid] + b_g[gid];
}
""").build()

res_g = cl.Buffer(ctx, mf.WRITE_ONLY, a_np.nbytes)

#prg.sum(queue, a_np.shape, None, a_g, b_g, res_g)
prg.sum.set_arg(0, a_g)
prg.sum.set_arg(1, b_g)
prg.sum.set_arg(2, res_g)
#prg.sum.set_args(a_g, b_g, res_g)
ev = cl.enqueue_nd_range_kernel(queue, prg.sum, a_np.shape, None)
ev.wait()

res_np = np.empty_like(a_np)
cl.enqueue_copy(queue, res_np, res_g)

# Check on CPU with Numpy:
print(res_np - (a_np + b_np))
print(np.linalg.norm(res_np - (a_np + b_np)))

gives the following error

Traceback (most recent call last):
  File "C:\..\Scratch\opencl.py", line 28, in <module>
    ev = cl.enqueue_nd_range_kernel(queue, prg.sum, a_np.shape, None)
  File "D:\..l\pyo\lib\site-packages\pyopencl-2015.1-py3.4-win-amd64.egg\pyopencl\cffi_cl.py", line 1197, in enqueue_nd_range_kernel
    global_work_size, local_work_size, c_wait_for, num_wait_for))
  File "D:\..\pyo\lib\site-packages\pyopencl-2015.1-py3.4-win-amd64.egg\pyopencl\cffi_cl.py", line 549, in _handle_error
    raise e
pyopencl.cffi_cl.LogicError: clenqueuendrangekernel failed: INVALID_KERNEL_ARGS

Please see this thread for more information.

Avoid using double if result is float

Multiplying float32 Array to single value generates kernel like this:

//CL//

                        #if __OPENCL_C_VERSION__ < 120
                        #pragma OPENCL EXTENSION cl_khr_fp64: enable
                        #endif
                        #define PYOPENCL_DEFINE_CDOUBLE



        #define PYOPENCL_ELWISE_CONTINUE continue

        __kernel void axpb(__global float *z__base, long z__offset, double a, __global float *x__base, long x__offset, float b, long n)
        {
          int lid = get_local_id(0);
          int gsize = get_global_size(0);
          int work_group_start = get_local_size(0)*get_group_id(0);
          long i;

          __global float *z = (__global float *) ((__global char *) z__base + z__offset);
__global float *x = (__global float *) ((__global char *) x__base + x__offset);;
          //CL//
          for (i = work_group_start + lid; i < n; i += gsize)
          {
            z[i] = a*x[i] + b;
          }

          ;
        }

where "a" passed to kernel as double. It have at least two disadvantages:

makes GPU to use double registers (may depend on realization)
makes extra types conversion.
For example, the beignet (http://cgit.freedesktop.org/beignet/) even requires to enable cl_khr_fp64 for this kernel to compile, while it declares OpenCL 1.2 compatibility. I have also reported bug about it
https://bugs.freedesktop.org/show_bug.cgi?id=90308

MemoryError: clCreateImage failed: mem object allocation failure

I have a "complex-valued" array read from a Matlab data file .mat:

datas = scipy.io.loadmat('/Users/recondev/Projects/RMA/Data/RMA_to_iPython_interpolator_only.mat')
dataIn = datas['vec_in'].astype(np.complex64).view(cl.array.vec.float2).T

dataIn.dtype returns complex64

I am trying to create an 1d image object from the dataIn array using pyopencl.image_from_array, but I receive the following errors:

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
<ipython-input-69-07fda49d952e> in <module>()
----> 1 dataInImg = pyopencl.image_from_array(ctx, dataInCont)

/Users/recondev/.virtualenvs/ipythonenv/lib/python2.7/site-packages/pyopencl/__init__.pyc in image_from_array(ctx, ary, num_channels, mode, norm_int)
   1178             ImageFormat(img_format, channel_type),
   1179             shape=shape[::-1], pitches=strides[::-1][1:],
-> 1180             hostbuf=ary)
   1181 
   1182 # }}}

/Users/recondev/.virtualenvs/ipythonenv/lib/python2.7/site-packages/pyopencl/__init__.pyc in image_init(self, context, flags, format, shape, pitches, hostbuf, is_array, buffer)
    676             desc.buffer = buffer
    677 
--> 678             image_old_init(self, context, flags, format, desc, hostbuf)
    679         else:
    680             # legacy init for CL 1.1 and older

MemoryError: clCreateImage failed: mem object allocation failure

By reducing the array to a much smaller size (the original array contains 1728_256_13 complex64 values, and the reduced array contains only 1728 complex64 values), I was able to remove the errors.

What I am troubling getting my head around is that when I create an image from float32 (as compared to complex64) array, I am able to create an image containing much larger sized array. I am able to create an image containing 1728_256_13 float32 values (23003136 bytes). However, I keep failing to create an image containing 1728*256 complex64 values (2097152 bytes).

I wonder if this is an expected behavior and what is the reason behind it.

Issue with Using Image Interpolation

Something seems to be wrong with the interpolated values that I am getting here:

__kernel 
void interp1d(__read_only image1d_t testImg,
              __global float* out1,
              __global float* out2)
{
     const sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE  | CLK_FILTER_LINEAR;

    const size_t localId = get_local_id(0);

    out1[localId] = (float)1.5f+0.1f*localId;
    out2[localId] = read_imagef(testImg, sampler, 1.5f+0.1f*localId).x;  
}

test = np.arange(10).astype(np.float32)*10
testImg = cl.image_from_array(ctx, np.ascontiguousarray(test))

out1 = cl.array.empty(queue, (test.size,), dtype=np.float32)
out2 = cl.array.empty(queue, (test.size,), dtype=np.float32)

In [95]:

event = interp1d(queue, (out1.size,), None, testImg, out1.data, out2.data)

print out1.get()
print out2.get()
[ 1.5         1.60000002  1.70000005  1.79999995  1.89999998  2.          2.0999999
  2.20000005  2.29999995  2.4000001 ]
[ 10.         11.015625   11.9921875  13.0078125  13.984375   15.
  16.015625   16.9921875  18.0078125  18.984375 ]

I expect the values in out2 to be be 10,11,12... The discrepancy seems much larger than I would expect from using float32s

"TypeError: expected an object with a writable buffer interface" when "getting" f_contiguous array

Hi,
When you try to use the get method on an array created with order='F', it raises an "TypeError: expected an object with a writable buffer interface".

I've put a minimum working example on https://gist.github.com/dzamlo/9b38af519a17b6647483

After some search, it looks like the exception is raised in the "enqueue_read_buffer" method in wrap_cl.hpp on line 1809:

if (PyObject_AsWriteBuffer(buffer.ptr(), &buf, &len))
  throw py::error_already_set();

Unfortunately my skills stop me here.

Thanks for your work on PyOpenCL and other open source code.
Loïc

2013.1: Add possibility to profile GenericScanKernel based kernels

It doesn't return an Event even if I enable profiling in queue. It would be great if call method of GenericScanKernel based kernels return event if profiling is enabled in kernel.

Add support for clBLAS and others.

It would be great to add support for AMD's clBLAS and optionally clFFT libraries. There is almost nothing to do: functions requires same OpenCL arguments as clEnqueueKernel. Example:
https://github.com/clMathLibraries/clBLAS/blob/master/src/samples/example_sdot.c#LC117

But it needed some prepare calling. For clBLAS: https://github.com/clMathLibraries/clBLAS/blob/master/src/samples/example_sdot.c#LC97
For clFFT: https://github.com/clMathLibraries/clFFT/blob/develop/src/examples/fft3d.c#LC66
As of my opinion it would be better to keep its results in some kind of cache to avoid overhead. I have no any idea how to implement it - I'm noob in C/C++ tricks

_new_like_me() doesn't copy strides/order

In _new_like_me(), I'd recommend setting strides outside the 'else' clause (patch below), so that storage order is preserved even if you don't specify dtype. One side effect of this is that array.copy() does the right thing even when array is not C-contiguous.

--- pyopencl/array.py.orig 2014-07-08 15:13:03.341845254 -0400
+++ pyopencl/array.py 2014-07-08 15:13:26.659777866 -0400
@@ -798,9 +798,8 @@
strides = None
if dtype is None:
dtype = self.dtype

```
   else:
```
```
       if dtype == self.dtype:
```
```
           strides = self.strides
```
```
   if dtype == self.dtype:
```

       strides = self.strides

 queue = queue or self.queue
 if queue is not None:

Allow the IPython Magic to Accept Build Options

Currently there is no way to pass any build options to the builder when using the IPython magic:

prg = cl.Program(ctx, cell.encode("utf8")).build()

https://github.com/pyopencl/pyopencl/blob/master/pyopencl/ipython_ext.py#L30

It would be great if I could put options on the first line.

Cannot use pyopencl.array on pypy

I get the following error message when importing pyopencl.array on pypy.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/pypy/site-packages/pyopencl/array.py", line 32, in <module>
    import pyopencl.elementwise as elementwise
  File "/opt/pypy/site-packages/pyopencl/elementwise.py", line 31, in <module>
    from pyopencl.tools import context_dependent_memoize
  File "/opt/pypy/site-packages/pyopencl/tools.py", line 58, in <module>
    _register_types()
  File "/opt/pypy/site-packages/pyopencl/tools.py", line 46, in _register_types
    _fill_dtype_registry(respect_windows=False, include_bool=False)
  File "/opt/pypy/site-packages/pyopencl/compyte/dtypes.py", line 134, in _fill_dtype_registry
    is_64_bit = tuple.__itemsize__ * 8 == 64
AttributeError: type object 'tuple' has no attribute '__itemsize__'

It seems that pyopencl is using a cpython undocumented api and pypy has no plan/is impossible to support it.

image_from_array does not work with Complex Numbers

This works:

In [355]:
cl.image_from_array(ctx, np.ones((10,), dtype=cl.array.vec.float2))

Out[355]:
<pyopencl._cl.Image at 0x11e926ec0>

But this does not:

In [354]:
cl.image_from_array(ctx, np.ones((10,), dtype=np.complex64))

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-354-34759afd9caa> in <module>()
----> 1 cl.image_from_array(ctx, np.ones((10,), dtype=np.complex64))

/usr/local/lib/python2.7/site-packages/pyopencl/__init__.pyc in image_from_array(ctx, ary, num_channels, mode, norm_int)
   1173         channel_type = DTYPE_TO_CHANNEL_TYPE_NORM[dtype]
   1174     else:
-> 1175         channel_type = DTYPE_TO_CHANNEL_TYPE[dtype]
   1176 
   1177     return Image(ctx, mode_flags | mem_flags.COPY_HOST_PTR,

KeyError: dtype('complex64')

I would assume that complex64 should just be mapped to float2 behind the scenes (per: http://documen.tician.de/pyopencl/array.html#complex-numbers)

2013.1: get easy access to resulted GenericScanKernel based kernels

It is impossible to get resulted GenericScanKernel based kernels. If I use something like copy_if I can't read resulted kernel code. It would be great if we have something like GenericScanKernel.kernel_src.

Memory leak with beignet

I already filed an bug to beignet https://bugs.freedesktop.org/show_bug.cgi?id=91710
But it also possible to be a pyopencl bug.

PyPy DType hash workaround can be removed.

My pull request to fix the hash function for numpy.dtype under pypy have just been merged so the corresponding workaround can be removed now.

Should I submit a pull request now or should I wait until a new release with that commit included is made?

Memory leak: events list never cleared in Array

I have encountered a problem, which seems to be a bug in PyOpenCL: the following code consumes about 1.5 GB of host memory on my machine (on the CUDA platform; when selecting AMD OpenCL, about 500MB is leaked).

import pyopencl, numpy
import pyopencl.array

ctx = pyopencl.create_some_context()
queue = pyopencl.CommandQueue(ctx)

arr = pyopencl.array.zeros(queue, (1000,1000), dtype=numpy.float32)

for i in xrange(10000):
    arr.fill(0.0)
    print len(arr.events)
    queue.finish() # (has no effect here)
    #arr.finish() # uncommenting fixes the problem

The memory leak is caused by the 'events' array building up and can be fixed by calling the undocumented method 'finish'.

I am running PyOpenCL version 2013.2 on Debian with Nvidia driver version 337.19. This problem did not exist in v. 2012.1.

(P.S. I attempted to post this message in the PyOpenCL mailing list about a month ago but it apparently did not get through)

Make setitem to fill all subscript positions even if size of values array is less than subscript size.

It's needed to be more compatible with numpy. For example:

>>> sx = array.empty(queue, 4, np.int32)
>>> sx
array([0, 0, 0, 0], dtype=int32)
>>> idx = clrandom.rand(queue, 2, np.int32, luxury=None, a=0, b=4)
>>> idx
array([3, 1], dtype=int32)
>>> val = clrandom.rand(queue, 1, np.int32, luxury=None, a=0, b=99)
>>> val
array([48], dtype=int32)
>>> sx[idx] = val
>>> sx
array([ 0,  0,  0, 48], dtype=int32) #Only one position filled

>>> npsx = np.zeros(sx.shape, dtype=sx.dtype)
>>> npsx
array([0, 0, 0, 0], dtype=int32)
>>> npval = np.array([48], dtype=np.int32)
>>> npval
array([48], dtype=int32)
>>> npidx = np.array([3, 1], dtype=np.int32)
>>> npidx
array([3, 1], dtype=int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48,  0, 48], dtype=int32) #Both positions filled

Also, as I observed in other cases, numpy is using equation like values[n%values.size] (where n is current subscription position) to get value for current subscription index. For example:

>>> npval = np.array([48, 32], dtype=np.int32)
>>> npidx = np.array([1, 2, 3], dtype=np.int32)
>>> npsx = np.zeros((4,), dtype=np.int32)
>>> npsx
array([0, 0, 0, 0], dtype=int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48, 32, 48], dtype=int32)

>>> npidx = np.array([2, 3, 1], dtype=np.int32)
>>> npsx = np.zeros((4,), dtype=np.int32)
>>> npsx[npidx] = npval
>>> npsx
array([ 0, 48, 48, 32], dtype=int32)

I think, it is possible to make custom kernels for cases like:

values is array with size 1
values is array with size less than subscription size
values is array with size bigger or equal than subscription size

ImportError: cannot import name intern

I didn't know where to report this but here's the problem. I compiled pyopencl 2014.01 with an outdated version of six (1.5.3).
When I tried to run anything importing pyopencl, I had this error

Looked it up and found that I had to upgrade my six library.

I am now on six 1.9.0. I tried compiling and installing again pyopencl as I still had the same import error.

Here is the log of the second installation. At line 137 we can see that it actually detects my new six.

I don't have a clue where to go from there.

I'm currently runnning linux mint 17.1 with kernel 3.16.0-30.

[enqueue_copy] Add byte_count option to Buffer <-> Host transfers

Option byte_count is available only in Buffer <-> Buffer transfers using enqueue_copy:

byte_count – (optional) If not specified, defaults to the size of the source in versions 2012.x and earlier, and to the minimum of the size of the source and target from 2013.1 on.

http://documen.tician.de/pyopencl/runtime.html#pyopencl.enqueue_copy

It would be great if such option will be available in Buffer <-> Host transfers. It is very useful for example when you use atomic in your kernels (as me for example) and output buffer size is variational.

inducer / pyopencl Goto Github PK

pyopencl's Introduction

PyOpenCL: Pythonic Access to OpenCL, with Arrays and Algorithms

Links

pyopencl's People

Contributors

Stargazers

Watchers

Forkers

pyopencl's Issues

Recommend Projects

Recommend Topics

Recommend Org