odlgroup / odlcuda Goto Github PK
View Code? Open in Web Editor NEWC++ backend for ODL
License: GNU General Public License v3.0
C++ backend for ODL
License: GNU General Public License v3.0
The current install script assumes the user wants to install for the python, and also happens to have a somewhat restrictive licence. When discussing this isuse in STIR with Kris he pointed out a better script, we should swap to that. Togeather with fixing issue #9, this would greatly improve the install process.
You removed Eigen in
9196001
But it is still referenced in
https://github.com/odlgroup/odl-cpp-utils/blob/master/utils/Ellipse.h
So Jenkins CI can't build the project from scratch, it gets Cannot open include file: 'Eigen/Core' No such file or directory.
Revert the Eigen deletion, or update odl-cpp-utils?
/Lars
Some users at Elekta "require" msvc 2012 support, which currently fails. We should try to get this working.
We should change remove the custom pyinstall
and instead use the built in install
found in CMake. This is used in our STIR clone.
I managed to install odlcuda with odl that comes from pip. However, that version is outdated so I would like to use odlcuda with the latest odl version. Following exactly the same installation steps as with the off-the-shelf odl version, my odl cannot find "cuda" as an implementation, e.g.
NotImplementedError: no corresponding data space available for space FunctionSpace(IntervalProd([-333.8016, -333.8016, 0. ], [ 333.8016 , 333.8016 , 257.96875]), out_dtype='float32') and implementation 'cuda'
This is all strange, because it does find my odl version within the installation and this odl version works perfectly fine without cuda. Also my system finds odlcuda as it shows up on auto-complete after import odl
. Any ideas what might have gotten wrong here? What is the mechanism that tells odl that odlcuda is present?
currently using unmatched GCC and CUDA (particularly GCC 5.x and CUDA<8) gives weird errors. Perhaps we should check for this to help users.
We have installed odl and run pytest, no errors. But when we tried to install odlcuda by using this command “CUDA_ROOT=/usr/local/cuda-10.2 CUDA_COMPUTE=75 conda build ./conda”. The following errors occurs,
"conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform linux-64: {"gcc[version='<5']"}".
I wonder what the problem is and how to slove this. Thanks a lot.
This is kind of a continuation of issue odlgroup/odl#1074.
A few observations:
I can import odlcuda. The import also works without _install_location = __file__
but in the following I left it in.
The order matters. I first tried
import odlcuda
import odl
which causes odl not to know 'cuda' but
import odl
import odlcuda
works!
domain_cpu = odl.uniform_discr([0], [1], [3e+8], impl='numpy')
and failed.
Traceback (most recent call last):
File "", line 4, in
domain_cpu = odl.uniform_discr([0], [1], [3e+8], impl='numpy')File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/lp_discr.py", line 1311, in uniform_discr
**kwargs)File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/lp_discr.py", line 1222, in uniform_discr_fromintv
**kwargs)File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/lp_discr.py", line 1136, in uniform_discr_fromspace
nodes_on_bdry)File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/partition.py", line 940, in uniform_partition_fromintv
grid = uniform_grid_fromintv(intv_prod, shape, nodes_on_bdry=nodes_on_bdry)File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/grid.py", line 1092, in uniform_grid_fromintv
shape = normalized_scalar_param_list(shape, intv_prod.ndim, safe_int_conv)File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/util/normalize.py", line 149, in normalized_scalar_param_list
out_list.append(param_conv(p))File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/util/normalize.py", line 396, in safe_int_conv
raise ValueError('cannot safely convert {} to integer'.format(number))ValueError: cannot safely convert 300000000.0 to integer
domain_gpu = odl.uniform_discr([0, 0, 0], [1, 1, 1], [4000, 300, 400], impl='cuda')
x_gpu = domain_gpu.one()
error:
Traceback (most recent call last):
File "", line 4, in
x_gpu = domain_gpu.one()File "/mhome/damtp/s/me404/store/repositories/git_ODL/odl/discr/discretization.py", line 473, in one
return self.element_type(self, self.dspace.one())File "/home/me404/.local/lib/python2.7/site-packages/odlcuda-0.5.0-py2.7.egg/odlcuda/cu_ntuples.py", line 912, in one
return self.element_type(self, self._vector_impl(self.size, 1))RuntimeError: function_attributes(): after cudaFuncGetAttributes: invalid device function
Any idea what is wrong here?
We currently only support 1d arrays. I'll look into how we could improve this to either true Nd array support, or to at least 2d, 3d support.
Numpy supports a long list of ufuncs
we should support all of these.
I was running some experiments and there seems to be a performance issue with CUDA. However, I am not sure whether odlcuda is causing it or this is just a general CUDA phenomenon. Below is the code that I ran with its output. I don't think it is very important to know what this code is doing but if need be, am happy to post that.
There seems to be a phenomenon that with CUDA the code scales linearly with the number of subsets even though in each run of the code approximately the same number of flops are executed. The timings are pretty constant for numpy. The question is whether this is an issue with odlcuda or this is a general CUDA phenomenon. In all of these timings, there is no copying from or to the GPU. I am aware that executing kernels generates an overhead but I would not have thought this is so dramatic. In particular as one can see that the smallest subset still contains 262 x 65000 = 17 million elements.
Do you think that this is related to the way ODL is written or do you think this is a general CUDA thing?
for impl in ['cuda', 'numpy']:
for nsubsets in [1, 4, 16]:
shape = [4200 // nsubsets, 65000]
print('impl:{}, shape:{}, nsubsets:{}'.format(impl, shape, nsubsets))
Y = odl.ProductSpace(odl.uniform_discr([0, 0], shape, shape,
impl=impl), nsubsets)
data = Y.one()
background = Y.one()
f = src_odl.KullbackLeibler(Y, data, background)
x = 2 * Y.one()
%time fx = f(x)
shape = [4200, 65000]
print('impl:{}, shape:{}'.format(impl, shape))
Y = odl.uniform_discr([0, 0], shape, shape, impl=impl)
data = Y.one()
background = Y.one()
f = src_odl.KullbackLeibler(Y, data, background)
x = 2 * Y.one()
%time fx = f(x)
impl:cuda, shape:[4200, 65000], nsubsets:1
CPU times: user 88.2 ms, sys: 24 ms, total: 112 ms
Wall time: 114 msimpl:cuda, shape:[1050, 65000], nsubsets:4
CPU times: user 325 ms, sys: 28.7 ms, total: 354 ms
Wall time: 361 msimpl:cuda, shape:[262, 65000], nsubsets:16
CPU times: user 1.43 s, sys: 31.7 ms, total: 1.46 s
Wall time: 1.49 simpl:cuda, shape:[4200, 65000]
CPU times: user 93.6 ms, sys: 20.2 ms, total: 114 ms
Wall time: 116 msimpl:numpy, shape:[4200, 65000], nsubsets:1
CPU times: user 10.7 s, sys: 352 ms, total: 11.1 s
Wall time: 6.88 simpl:numpy, shape:[1050, 65000], nsubsets:4
CPU times: user 13.4 s, sys: 562 ms, total: 14 s
Wall time: 6.9 simpl:numpy, shape:[262, 65000], nsubsets:16
CPU times: user 24.7 s, sys: 1 s, total: 25.7 s
Wall time: 7.04 simpl:numpy, shape:[4200, 65000]
CPU times: user 10.8 s, sys: 380 ms, total: 11.1 s
Wall time: 6.92 s
for impl in ['cuda', 'numpy']:
for nsubsets in [1, 4, 16]:
shape = [4200 // nsubsets, 65000]
print('impl:{}, shape:{}, nsubsets:{}'.format(impl, shape, nsubsets))
Y = odl.ProductSpace(odl.uniform_discr([0, 0], shape, shape, impl=impl), nsubsets)
data = Y.one()
background = Y.one()
f = src_odl.KullbackLeibler(Y, data, background)
x = 2 * Y.one()
out = Y.element()
t = 0
for i in range(len(Y)):
f_prox = f[i].convex_conj.proximal(x[i])
src.tic()
f_prox(x[i], out=out[i])
t += src.toc()
print('time:{}, average:{}'.format(t, t / len(Y)))
shape = [4200, 65000]
print('impl:{}, shape:{}'.format(impl, shape))
Y = odl.uniform_discr([0, 0], shape, shape, impl=impl)
data = Y.one()
background = Y.one()
f = src_odl.KullbackLeibler(Y, data, background)
x = 2 * Y.one()
out = Y.element()
f_prox = f.convex_conj.proximal(x)
%time f_prox(x, out=out)
impl:cuda, shape:[4200, 65000], nsubsets:1
time:0.607445955276, average:0.607445955276impl:cuda, shape:[1050, 65000], nsubsets:4
time:0.405075311661, average:0.101268827915impl:cuda, shape:[262, 65000], nsubsets:16
time:2.36511826515, average:0.147819891572
impl:cuda, shape:[4200, 65000]
CPU times: user 146 ms, sys: 127 µs, total: 146 ms
Wall time: 150 msimpl:numpy, shape:[4200, 65000], nsubsets:1
time:3.18624901772, average:3.18624901772impl:numpy, shape:[1050, 65000], nsubsets:4
time:3.12681221962, average:0.781703054905impl:numpy, shape:[262, 65000], nsubsets:16
time:3.25435972214, average:0.203397482634impl:numpy, shape:[4200, 65000]
CPU times: user 13.8 s, sys: 1.47 s, total: 15.2 s
Wall time: 3.19 s
Hi when I try to build I get the following error;
CMake Error at CMakeLists.txt:28 (add_dependencies):
add_dependencies called with incorrect number of arguments
CMake Warning (dev) in CMakeLists.txt:
No cmake_minimum_required command is present. A line of code such as
cmake_minimum_required(VERSION 2.8)
should be added at the top of the file. The version specified may be lower
if you wish to support older CMake versions for this project. For more
information run "cmake --help-policy CMP0000".
This warning is for project developers. Use -Wno-dev to suppress it.
We have a circular dependency wherein odlcuda imports odl, and odl.space..entry_points
imports odlcuda. We need to solve this somehow.
The standard p-norm can be implemented with the help of the CUDA sum
and abs
ufuncs, but this involves a copy. We need a C++ implementation if we want efficiency.
For p=inf
I currently don't see a way of implementing in Python. We can include that case in the same function in C++, too, but while we're at it, the max()
and min()
functions would be good to have.
PyBind11 is an alternative to boost python that does not require a compiled part (and the rest of boost), this would simplify compilation and distribution by PyPI.
Python is always needed so this flag should be removed.
Currently they only work with float32, our users would expect them to work with any dtype.
We installed odl and run test, no issues. Then we tried to install odlcuda. But we had a problem when we ran this command "CUDA_ROOT=/usr/local/cuda-9.0 CUDA_COMPUTE=37 conda build ./conda". Error shows "conda_build.exceptions.DependencyNeedsBuildingError: Unsatisfiable dependencies for platform linux-64: {"odl[version='>=0.3.0']"}".
Do you what the problem is?
The numeric
module has been removed in boost 1.65, thus odlcuda
currently doesn't support that version. For building we need version 1.64 or lower.
Currently AFAICS you can only specify a single CUDA_COMPUTE
value. For conda packages it would be good to build for a number of values so the package works for different GPU architectures.
I don't know very much about this topic so I don't know if this is necessary at all or if it's fine to just set the minimum version required. The only thing I know is that the packages on the conda channel are built with 52 and fail for lower versions due to "invalid device function" errors.
I noticed that the maximum function is very slow in odlcuda on the GPU. In fact, it is slower than computing the maximum on the CPU. Please see my example test case below. Any ideas why that is and how to fix it?
import odl
X = odl.rn(300 * 10**6, dtype='float32')
x = 0.5 * X.one()
y = X.one()
%time x.ufuncs.maximum(1, out=y)
%time x.ufuncs.log(out=y)
X = odl.rn(300 * 10**6, dtype='float32', impl='cuda')
x = 0.5 * X.one()
y = X.one()
%time x.ufuncs.maximum(1, out=y)
%time x.ufuncs.log(out=y)
CPU times: user 346 ms, sys: 200 µs, total: 346 ms
Wall time: 347 ms
CPU times: user 1.44 s, sys: 0 ns, total: 1.44 s
Wall time: 1.43 s
CPU times: user 838 ms, sys: 341 ms, total: 1.18 s
Wall time: 1.18 s
CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 91.1 µs
Eigen should not be needed as a dependency for the CUDA part.
I have been trying to install odlcuda without admin rights. I have changed the "CMAKE_INSTALL_PREFIX" folder to a folder where I do have write access. This approach worked well to install other software packages. However, odlcuda seems to ignore this path and tries to install it into the default path
/usr/local/lib/python2.7/dist-packages/
instead.
I tried two options how to change the installation path, all of which should work according to some forums, but none does here.
Any ideas of what is going on here?
I'll see if I can get it back alive, with that said the cupy approach seems far supperior.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.