I tried to run your tutorial from bootstrap/mnist/train.ipynb, but it crashes because

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

I was facing a similar issue on Ubuntu 22.04, and as per <a class="user-mention notran

Runtime errors related to ninja_build about lava-dl HOT 7 OPEN

diana273 commented on September 26, 2024

Runtime errors related to ninja_build

from lava-dl.

Comments (7)

mgkwill commented on September 26, 2024

Hi @diana273 thanks for this bug issue!

It looks like @bamsumit is looking into it.

from lava-dl.

daevem commented on September 26, 2024

Hi, I was just wondering if this has been resolved.
I have been using lava-dl 0.2.0 and noticed that this issue especially raised when I was trying to use two different types of neuron models. I then found that all the models were being jit-compiled and loaded using the same name (dynamics), which caused conflicts. I changed the names (e.g., cuba_dynamics or alif_dynamics) so that they would differ for every neuron and have stopped receiving this kind of error, but I'm unsure if this is ok for other use cases.

EDIT: Manually setting the path to the correct version of Visual Studio was also part of the solution to the problem for me.

from lava-dl.

bamsumit commented on September 26, 2024

Hi @daevem, the issue for windows has not been resolved. I don't see the issue with same name dynamics in linux environment.

import torch
from lava.lib.dl import slayer
device = torch.device('cuda')
cuba = slayer.neuron.cuba.Neuron(threshold=10, current_decay=0.5, voltage_decay=0.5).to(device)
alif = slayer.neuron.alif.Neuron(threshold=10, current_decay=0.5, voltage_decay=0.5, threshold_step=1, threshold_decay=0, refractory_decay=0).to(device)
x = torch.rand([1, 10, 100]).to(torch.device('cuda'))
alif(cuba(x))

It is fine to change the name of dynamics to unique name if that work sin Windows. If you have made progress toward fixing the windows issue, the steps would be useful for other windows users.

from lava-dl.

ParsaOmidi commented on September 26, 2024

The same error keeps appearing for my Linux machines (I tested it on 2 different machines). Are there any solutions to this problem? The code runs on the CPU fine, but slow.
My environment is:

Ubuntu 20.04, python 3.9, PyTorch 2, cudnn 11.7

RuntimeError: Error building extension 'dynamics': [1/1] c++ leaky_integrator.cuda.o -shared -L/home/rescue/anaconda3/envs/lavaEnv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/rescue/anaconda3/lib64 -lcudart -o dynamics.so
FAILED: dynamics.so
c++ leaky_integrator.cuda.o -shared -L/home/rescue/anaconda3/envs/lavaEnv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/rescue/anaconda3/lib64 -lcudart -o dynamics.so
/usr/bin/ld: cannot find -lcudart
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.

from lava-dl.

ParsaOmidi commented on September 26, 2024

The same error keeps appearing for my Linux machines (I tested it on 2 different machines). Are there any solutions to this problem? The code runs on the CPU fine, but slow. My environment is:

Ubuntu 20.04, python 3.9, PyTorch 2, cudnn 11.7

RuntimeError: Error building extension 'dynamics': [1/1] c++ leaky_integrator.cuda.o -shared -L/home/rescue/anaconda3/envs/lavaEnv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/rescue/anaconda3/lib64 -lcudart -o dynamics.so FAILED: dynamics.so c++ leaky_integrator.cuda.o -shared -L/home/rescue/anaconda3/envs/lavaEnv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/rescue/anaconda3/lib64 -lcudart -o dynamics.so /usr/bin/ld: cannot find -lcudart collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.

On a Windows machine, I received a similar error. As I tracked down the error, I found this in the file cpp_extension.py:

cl_paths=subprocess.check_output(['where', 'cl']).decode(*SUBPROCESS_DECODE_ARGS).split('\r\n').

This indicates that cl.exe cannot be found on your computer. To resolve this issue, I installed a new version of MSVC C++ and then found the directory in MSVC that contained cl.exe. I then added the directory to my system PATH via "Environment Variables". After rebooting the computer, everything worked perfectly. Hope this helps someone.

from lava-dl.

bamsumit commented on September 26, 2024

Hi @ParsaOmidi, the issue has not been resolved for Windows. It's a pytorch issue with compiling extensions for windows. For your linux system, it looks like cuda compile library is missing. It is most likely because you do not have cuda compiler: nvcc installed or the path is not configured correctly.

Try:

nvcc --version

to check if it is installed. If it not installed please install it first. Make sure the cuda version for nvcc matches your nvidia-runtime cuda version and your pytorch cuda version.

Only if it does not work after nvcc installation: you might need to export some environment variables like this:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

from lava-dl.

R-Gaurav commented on September 26, 2024

I was facing a similar issue on Ubuntu 22.04, and as per @bamsumit's comment above, I installed the cuda toolkit to resolve my issue and it helped. However, there are some nuances to installing nvcc for different GPUs which I would like to share here; hopefully useful to someone else too!

TLDR:

Check the Compute Capability of your NVIDIA GPU and install the correct version of nvidia toolkit, i.e., nvcc supported for your GPU to get rid of the compilation issues. You can find the supported mappings here.

More details with my trials/failures/successes below.

I have got three machines, one with RTX 2080, another with RTX A2000, and last one with RTX 4060 Ti. All the three machine are running Ubuntu 22.04 and nvidia-smi on them reports the following ouput.

RTX 2080 and RTX 4060 Ti:

NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2

RTX A2000:

NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2

I am mentioning the installation case one by one below.

RTX A2000 -- Compute Capability: 8.6

The installation through the lava_dl-0.5.0 binary was all smooth until I faced a ninja build issue (similar to the one reported here) when I ran my lava network. Checked my nvcc --version and it produced:

Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

Upon installing nvidia-cuda-toolkit via the above command my ninja build issue was resolved; and nvcc --version produced:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

Note that cuda version of the nvcc above is 11.5, which is supported on 8.6. Although, I could not find the cuda symlink or directory in the path /usr/local/ and I guess I did not set the env vars either (unlike as suggested by @bamsumit above).

RTX 4060 Ti -- Compute Capability: 8.9

Installation through lava_dl-0.5.0 binary was a breeze until I faced a ninja build issue again and upon checking the nvcc --version it produced the same above error output to install nvidia-cuda-toolkit. I did the same but to my surprise it did not resolve the issue; though the nvcc --version worked successfully after the apt install, and produced the same above output of cuda version being 11.5. After some extensive search I got to understand that nvidia-cuda-toolkit's (or nvcc's) cuda version should be supported by the GPU -- as per the mapping linked above. Therefore I installed the latest 12.2 cuda toolkit from here supported for 8.9 Compute Capability. Note that I installed only the toolkit and not the driver -- you will get the command prompt options to do so while installing. This time I did find the cuda symlink pointing to cuda-12.2 in the path /usr/local/ as used above by @bamsumit. Upon setting the suggested env vars: PATH and LD_LIBRARY_PATH I was able to get past that build issue and my network compiled successfully. Now I am getting the following output of nvcc --version:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0

Note that cuda version above is 12.2, which is supported on 8.9.

RTX 2080 -- Compute Capability: 7.5

Here too sudo apt install nvidia-cuda-toolkit installed the 11.5 cuda version of nvcc and it didn't work. For 7.5 Compute Capability, the max supported cuda toolkit version is 10.2 and although I installed it following the same above process for RTX 4060 Ti, the compilation of my lava network still failed with following error:

.
.
.
nvcc fatal   : Value 'c++17' is not defined for option 'std'
ninja: build stopped: subcommand failed.

which I suppose has got something to do with setting up the CMake properly as described here, although I haven't investigated it further yet.

NOTE: If you have mistakenly installed nvidia-cuda-toolkit via apt install, and need to install the cuda toolkit directly from NVIDIA's website, do purge your current nvidia-cuda-toolkit install before proceeding ahead; I followed the steps here -- I executed only sudo apt-get purge nvidia-cuda-toolkit (I suppose --auto-remove purges the nvidia drivers too which I did not want).

Hope this detailed info helps you!

from lava-dl.

Runtime errors related to ninja_build about lava-dl HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent