Giter Site home page Giter Site logo

odlcuda's People

Contributors

adler-j avatar kohr-h avatar larswestergren avatar niinimaki avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

odlcuda's Issues

Remove pyinstall

We should change remove the custom pyinstall and instead use the built in install found in CMake. This is used in our STIR clone.

installation with local odl

I managed to install odlcuda with odl that comes from pip. However, that version is outdated so I would like to use odlcuda with the latest odl version. Following exactly the same installation steps as with the off-the-shelf odl version, my odl cannot find "cuda" as an implementation, e.g.

NotImplementedError: no corresponding data space available for space FunctionSpace(IntervalProd([-333.8016, -333.8016, 0. ], [ 333.8016 , 333.8016 , 257.96875]), out_dtype='float32') and implementation 'cuda'

This is all strange, because it does find my odl version within the installation and this odl version works perfectly fine without cuda. Also my system finds odlcuda as it shows up on auto-complete after import odl. Any ideas what might have gotten wrong here? What is the mechanism that tells odl that odlcuda is present?

Better error with wrong GCC

currently using unmatched GCC and CUDA (particularly GCC 5.x and CUDA<8) gives weird errors. Perhaps we should check for this to help users.

Errors when installing odlcuda with GCC.

We have installed odl and run pytest, no errors. But when we tried to install odlcuda by using this command “CUDA_ROOT=/usr/local/cuda-10.2 CUDA_COMPUTE=75 conda build ./conda”. The following errors occurs,
"conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform linux-64: {"gcc[version='<5']"}".
I wonder what the problem is and how to slove this. Thanks a lot.

using odlcuda

This is kind of a continuation of issue odlgroup/odl#1074.

A few observations:

  1. I can import odlcuda. The import also works without _install_location = __file__ but in the following I left it in.

  2. The order matters. I first tried

import odlcuda
import odl

which causes odl not to know 'cuda' but

import odl
import odlcuda

works!

  1. This is not really related to CUDA but still weird: I tried to test timings similar to my application.

domain_cpu = odl.uniform_discr([0], [1], [3e+8], impl='numpy')

and failed.

Traceback (most recent call last):

File "", line 4, in
domain_cpu = odl.uniform_discr([0], [1], [3e+8], impl='numpy')

File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/lp_discr.py", line 1311, in uniform_discr
**kwargs)

File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/lp_discr.py", line 1222, in uniform_discr_fromintv
**kwargs)

File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/lp_discr.py", line 1136, in uniform_discr_fromspace
nodes_on_bdry)

File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/partition.py", line 940, in uniform_partition_fromintv
grid = uniform_grid_fromintv(intv_prod, shape, nodes_on_bdry=nodes_on_bdry)

File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/grid.py", line 1092, in uniform_grid_fromintv
shape = normalized_scalar_param_list(shape, intv_prod.ndim, safe_int_conv)

File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/util/normalize.py", line 149, in normalized_scalar_param_list
out_list.append(param_conv(p))

File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/util/normalize.py", line 396, in safe_int_conv
raise ValueError('cannot safely convert {} to integer'.format(number))

ValueError: cannot safely convert 300000000.0 to integer

  1. Now the CUDA issue: Instead of 1D, I went 3D:

domain_gpu = odl.uniform_discr([0, 0, 0], [1, 1, 1], [4000, 300, 400], impl='cuda')
x_gpu = domain_gpu.one()

error:

Traceback (most recent call last):

File "", line 4, in
x_gpu = domain_gpu.one()

File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/discretization.py", line 473, in one
return self.element_type(self, self.dspace.one())

File "/home/me404/.local/lib/python2.7/site-packages/odlcuda-0.5.0-py2.7.egg/odlcuda/cu_ntuples.py", line 912, in one
return self.element_type(self, self._vector_impl(self.size, 1))

RuntimeError: function_attributes(): after cudaFuncGetAttributes: invalid device function

Any idea what is wrong here?

nd arrays

We currently only support 1d arrays. I'll look into how we could improve this to either true Nd array support, or to at least 2d, 3d support.

performance of odlcuda

I was running some experiments and there seems to be a performance issue with CUDA. However, I am not sure whether odlcuda is causing it or this is just a general CUDA phenomenon. Below is the code that I ran with its output. I don't think it is very important to know what this code is doing but if need be, am happy to post that.

There seems to be a phenomenon that with CUDA the code scales linearly with the number of subsets even though in each run of the code approximately the same number of flops are executed. The timings are pretty constant for numpy. The question is whether this is an issue with odlcuda or this is a general CUDA phenomenon. In all of these timings, there is no copying from or to the GPU. I am aware that executing kernels generates an overhead but I would not have thought this is so dramatic. In particular as one can see that the smallest subset still contains 262 x 65000 = 17 million elements.

Do you think that this is related to the way ODL is written or do you think this is a general CUDA thing?

for impl in ['cuda', 'numpy']:
    for nsubsets in [1, 4, 16]:
        shape = [4200 // nsubsets, 65000]
        print('impl:{}, shape:{}, nsubsets:{}'.format(impl, shape, nsubsets))
        Y = odl.ProductSpace(odl.uniform_discr([0, 0], shape, shape,
                                               impl=impl), nsubsets)
        data = Y.one()
        background = Y.one()
        f = src_odl.KullbackLeibler(Y, data, background)
        x = 2 * Y.one()
        %time fx = f(x)
    
    shape = [4200, 65000]
    print('impl:{}, shape:{}'.format(impl, shape))
    Y = odl.uniform_discr([0, 0], shape, shape, impl=impl)
    data = Y.one()
    background = Y.one()
    f = src_odl.KullbackLeibler(Y, data, background)
    x = 2 * Y.one()
    %time fx = f(x)

impl:cuda, shape:[4200, 65000], nsubsets:1
CPU times: user 88.2 ms, sys: 24 ms, total: 112 ms
Wall time: 114 ms

impl:cuda, shape:[1050, 65000], nsubsets:4
CPU times: user 325 ms, sys: 28.7 ms, total: 354 ms
Wall time: 361 ms

impl:cuda, shape:[262, 65000], nsubsets:16
CPU times: user 1.43 s, sys: 31.7 ms, total: 1.46 s
Wall time: 1.49 s

impl:cuda, shape:[4200, 65000]
CPU times: user 93.6 ms, sys: 20.2 ms, total: 114 ms
Wall time: 116 ms

impl:numpy, shape:[4200, 65000], nsubsets:1
CPU times: user 10.7 s, sys: 352 ms, total: 11.1 s
Wall time: 6.88 s

impl:numpy, shape:[1050, 65000], nsubsets:4
CPU times: user 13.4 s, sys: 562 ms, total: 14 s
Wall time: 6.9 s

impl:numpy, shape:[262, 65000], nsubsets:16
CPU times: user 24.7 s, sys: 1 s, total: 25.7 s
Wall time: 7.04 s

impl:numpy, shape:[4200, 65000]
CPU times: user 10.8 s, sys: 380 ms, total: 11.1 s
Wall time: 6.92 s

for impl in ['cuda', 'numpy']:
    for nsubsets in [1, 4, 16]:
        shape = [4200 // nsubsets, 65000]
        print('impl:{}, shape:{}, nsubsets:{}'.format(impl, shape, nsubsets))
        Y = odl.ProductSpace(odl.uniform_discr([0, 0], shape, shape, impl=impl), nsubsets)
        data = Y.one()
        background = Y.one()
        f = src_odl.KullbackLeibler(Y, data, background)
        x = 2 * Y.one()
        out = Y.element()
        
        t = 0        
        for i in range(len(Y)):
            f_prox = f[i].convex_conj.proximal(x[i])
            src.tic()
            f_prox(x[i], out=out[i])
            t += src.toc()
        print('time:{}, average:{}'.format(t, t / len(Y)))
    
    shape = [4200, 65000]
    print('impl:{}, shape:{}'.format(impl, shape))
    Y = odl.uniform_discr([0, 0], shape, shape, impl=impl)
    data = Y.one()
    background = Y.one()
    f = src_odl.KullbackLeibler(Y, data, background)
    x = 2 * Y.one()
    out = Y.element()
    f_prox = f.convex_conj.proximal(x)
    %time f_prox(x, out=out)

impl:cuda, shape:[4200, 65000], nsubsets:1
time:0.607445955276, average:0.607445955276

impl:cuda, shape:[1050, 65000], nsubsets:4
time:0.405075311661, average:0.101268827915

impl:cuda, shape:[262, 65000], nsubsets:16

time:2.36511826515, average:0.147819891572

impl:cuda, shape:[4200, 65000]
CPU times: user 146 ms, sys: 127 µs, total: 146 ms
Wall time: 150 ms

impl:numpy, shape:[4200, 65000], nsubsets:1
time:3.18624901772, average:3.18624901772

impl:numpy, shape:[1050, 65000], nsubsets:4
time:3.12681221962, average:0.781703054905

impl:numpy, shape:[262, 65000], nsubsets:16
time:3.25435972214, average:0.203397482634

impl:numpy, shape:[4200, 65000]
CPU times: user 13.8 s, sys: 1.47 s, total: 15.2 s
Wall time: 3.19 s

Installation issues

Hi when I try to build I get the following error;

CMake Error at CMakeLists.txt:28 (add_dependencies):
  add_dependencies called with incorrect number of arguments


CMake Warning (dev) in CMakeLists.txt:
  No cmake_minimum_required command is present.  A line of code such as

    cmake_minimum_required(VERSION 2.8)

  should be added at the top of the file.  The version specified may be lower
  if you wish to support older CMake versions for this project.  For more
  information run "cmake --help-policy CMP0000".
This warning is for project developers.  Use -Wno-dev to suppress it.

Circular dependency with CUDA

We have a circular dependency wherein odlcuda imports odl, and odl.space..entry_points imports odlcuda. We need to solve this somehow.

Add optimized versions of p-norm and p-dist for p != 2

The standard p-norm can be implemented with the help of the CUDA sum and abs ufuncs, but this involves a copy. We need a C++ implementation if we want efficiency.

For p=inf I currently don't see a way of implementing in Python. We can include that case in the same function in C++, too, but while we're at it, the max() and min() functions would be good to have.

Change from boost to pybind11

PyBind11 is an alternative to boost python that does not require a compiled part (and the rest of boost), this would simplify compilation and distribution by PyPI.

Error when trying to install odlcuda

We installed odl and run test, no issues. Then we tried to install odlcuda. But we had a problem when we ran this command "CUDA_ROOT=/usr/local/cuda-9.0 CUDA_COMPUTE=37 conda build ./conda". Error shows "conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform linux-64: {"odl[version='>=0.3.0']"}".
Do you what the problem is?

Build for multiple CUDA compute versions?

Currently AFAICS you can only specify a single CUDA_COMPUTE value. For conda packages it would be good to build for a number of values so the package works for different GPU architectures.
I don't know very much about this topic so I don't know if this is necessary at all or if it's fine to just set the minimum version required. The only thing I know is that the packages on the conda channel are built with 52 and fail for lower versions due to "invalid device function" errors.

slow maximum function

I noticed that the maximum function is very slow in odlcuda on the GPU. In fact, it is slower than computing the maximum on the CPU. Please see my example test case below. Any ideas why that is and how to fix it?

import odl

X = odl.rn(300 * 10**6, dtype='float32')
x = 0.5 * X.one()
y = X.one()
%time x.ufuncs.maximum(1, out=y)
%time x.ufuncs.log(out=y)

X = odl.rn(300 * 10**6, dtype='float32', impl='cuda')
x = 0.5 * X.one()
y = X.one()
%time x.ufuncs.maximum(1, out=y)
%time x.ufuncs.log(out=y)
CPU times: user 346 ms, sys: 200 µs, total: 346 ms
Wall time: 347 ms
CPU times: user 1.44 s, sys: 0 ns, total: 1.44 s
Wall time: 1.43 s
CPU times: user 838 ms, sys: 341 ms, total: 1.18 s
Wall time: 1.18 s
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 91.1 µs

Installation without admin rights

I have been trying to install odlcuda without admin rights. I have changed the "CMAKE_INSTALL_PREFIX" folder to a folder where I do have write access. This approach worked well to install other software packages. However, odlcuda seems to ignore this path and tries to install it into the default path

/usr/local/lib/python2.7/dist-packages/

instead.

I tried two options how to change the installation path, all of which should work according to some forums, but none does here.

  • cmake -DCMAKE_INSTALL_PREFIX=~/.local/lib/python2.7/site-packages ../
  • make DESTDIR=~/.local/lib/python2.7/site-packages pyinstall

Any ideas of what is going on here?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.