Comments (9)
Hi Hicham,
Once again, thanks for the detailed report!
This time, it looks like a problem with CMake and your library path: the standard FindCUDA
script could not find the libcudart_static
library, as referenced e.g. by this thread on the NVidia devtalk forum. Could you please report the result of the
locate libcudart_static
command? If locate
is not installed on your machine,
sudo apt-get install mlocate
sudo updatedb
should do the trick. For reference, the output on my laptop:
jean@jean-XPS-15-9550:~$ locate libcudart_static
/usr/lib/x86_64-linux-gnu/libcudart_static.a
And on a new Google Colab session:
!sudo apt-get install mlocate
!sudo updatedb
!locate libcudart_static
/usr/local/cuda-10.0/targets/x86_64-linux/lib/libcudart_static.a
If locate
cannot find the libcudart_static
file in your CUDA install, updating the ldpath
with ldconfig
may allow you to fix this problem.
N.B. for @bcharlier : FindCUDA
has recently been deprecated by Kitware/CMake. Long-term, we may have to switch to their new "first-class" support for CUDA.
from keops.
Hi Hicham,
In your issue ##2 (comment) cmake seems to have found the nvidia stuff as the line tells us :
-- Autodetected CUDA architecture(s): 6.0 6.0 6.0 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.7
This should corresponds to the compute capability of the 13 GPUs (?!) available on your system. Is that correct ? if not please provide the output of nvidia-smi
.
In any cases, the FindCuda
module of cmake don't do much than looking in /usr/local/cuda
... So, the best practice is to contact the admin to create a symbolic /usr/local/cuda link to the last cuda lib installed on your system. As documented here, https://cmake.org/cmake/help/latest/module/FindCUDA.html :
To use a different installed version of the toolkit set the environment variable CUDA_BIN_PATH before running cmake (e.g. CUDA_BIN_PATH=/usr/local/cuda1.0 instead of the default /usr/local/cuda)
That is not very handy when using keops from python... even if it could be done in your conda env.
Hope this helps.
b.
PS : @jeanfeydy, in fact, we use a mixture of "first-class" support of cuda language and the old FindCuda module. As a matter of fact, the "first class" support of CUDA by cmake was still young as of 2018/19. The good old FindCuda module has some features (mainly for detection purposes like the internal cmake variables CUDA_NVCC_EXECUTABLE, CUDA_FOUND, etc...) that are not available when simply use a enable_language(CUDA)
. As cmake is moving quickliy this could change in a near future...
from keops.
Hi,
I am using the same server as Hicham hence encountering the same issues. Could you elaborate on how to set CUDA_BIN_PATH locally in a conda env?
Many thanks!
Gwendoline
from keops.
For conda, you should find an how-to in the doc here:
https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#macos-and-linux
b.
from keops.
Hi Benjamin,
Just a quick note on these two issues: according to a mail that Hicham sent me last week, #2 is on "Marco's machine" hosted by the MokaPlan Inria team at Gare de Lyon, while this issue #3 is on the machine of the Parietal Inria team at Neurospin.
As far as I can tell, the problems are not 100% the same for both configurations, so it may be best to keep both tracks separate, waiting for Hicham to come back to these issues ;-)
from keops.
This is most likely due to a system configuration problem. I close the issue for now. Do not hesitate to re-open it if needed.
from keops.
Hi @bcharlier ,
I'm having similar issues where cuda exists and torch finds it but pykeops doesn't.
I ran: import pykeops
pykeops.verbose = True
pykeops.build_type = 'Debug'
pykeops.clean_pykeops()
pykeops.test_torch_bindings()
And get:
Environment variable CUDA_ROOT is set to:
/usr/local/cuda
For compatibility, CMake is ignoring the variable.
Call Stack (most recent call first):
CMakeLists.txt:15 (include)
This warning is for project developers. Use -Wno-dev to suppress it.
-- The CXX compiler identification is GNU 9.2.1
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Looking for C++ include pthread.h
-- Looking for C++ include pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- No GPU detected. USE_CUDA set to FALSE.
.....
RuntimeError: [KeOps]ย This KeOps shared object has been compiled without cuda support:
- to perform computations on CPU, simply set tagHostDevice to 0
- to perform computations on GPU, please recompile the formula with a working version of cuda.
Cheers
from keops.
Hi,
I tried to dig on the problem on my own but I don't know anything about cmake.
When you say :
In any cases, the
FindCuda
module of cmake don't do much than looking in/usr/local/cuda
... So, the best practice is to contact the admin to create a symbolic /usr/local/cuda link to the last cuda lib installed on your system. As documented here, https://cmake.org/cmake/help/latest/module/FindCUDA.html :To use a different installed version of the toolkit set the environment variable CUDA_BIN_PATH before running cmake (e.g. CUDA_BIN_PATH=/usr/local/cuda1.0 instead of the default /usr/local/cuda)
You were thinking about this line:
Line 12 in ae0b921
Right?
But I tested an "empty" cmake that tries to find cuda:
https://gist.github.com/daidedou/dc0e43070195d3b4f8899eed3fc3062f.js
It appears that cmake says that cuda is found:
-- Found CUDA: /usr/local/cuda (found version "10.2")
-- Configuring done
-- Generating done
-- Build files have been written to: /home/machin/Documents/these/build
So the problem might as well come from your handmade script that counts the GPUs, but not from finding CUDA. If you want to share any information about what you wanted to do (it seems simple but meh), I could try continue digging on my own.
It might be also interesting to reopen the issue.
from keops.
Ok so you were right it was a configuration problem. I'm on Fedora and there is a lot of things to do in order to get CUDA working. There is a specific version of g++ to install and I need to pass -ccbin=cuda-g++ each time.
My problem was that this part :
Lines 18 to 45 in ae0b921
use in the end nvcc without the flag. I also had to set the value CMAKE_CUDA_HOST_COMPILER cuda-g++.
But still it is too much linked to my configuration.
I guess that if you want to check whether you are able to find cuda or not, you should do the following (describing a lot for the unfamiliar with cmake like me):
-
create an empty folder, and put a file named CMakeLists.txt inside it, filled with this code:
https://gist.github.com/daidedou/dc0e43070195d3b4f8899eed3fc3062f -
create a folder named "build", when inside it, launch 'cmake ..' in the command line
-
if it fails to find cuda, it means that findcuda is failing
-
if not, change CMakeLists.txt by this one : https://gist.github.com/daidedou/700cd87d6e24fbff1e2190fad1224c08
-
relaunch cmake .. inside build, it will fail and create a file named detect_cuda_props.cu
-
Then run nvcc --run detect_cuda_props.cu, you will have a proper error on which you can dig (if pytorch does in fact detect your GPUs!)
Hope it will help someone!
from keops.
Related Issues (20)
- Running pykeops on docker container without access to cuda folder HOT 3
- Slice LazyTensor of shape (M,N) but with ndim=1
- TypeError in backward pass HOT 3
- Error: incompatible dimensions in VectApply with chunking on Pm variable HOT 2
- cant import test_torch_bindings HOT 3
- Unspecified Warning upon first call to import pyKeops HOT 1
- ModuleNotFoundError: No module named 'pykeops' on CPU HOT 3
- Segfault in backward with some shapes. HOT 1
- An error occurred when I test my installation and about Memory Usage HOT 2
- Does keops backpropagation exploit the "diagonal" structure of reduction operations ?
- KeOps CPU backend 'g++ not found'
- Incompatible with vmap HOT 3
- `cuMemFree(buffer) failed with error CUDA_ERROR_ILLEGAL_ADDRESS` with KNN example HOT 2
- KeOps import broken on CPU config? HOT 4
- At least two input variables have different memory locations (Cpu/Gpu)
- Genred sparse reduction fails silently for int64 indices HOT 1
- Docs out of date HOT 1
- Question about KMin and ArgKMin reduction in pykeops HOT 4
- pykeops.test_torch_bindings() results in Fatal Python error in PyThreadState_Get HOT 1
- Issue when install pykeops HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from keops.