Comments (7)
Hi @diana273 thanks for this bug issue!
It looks like @bamsumit is looking into it.
from lava-dl.
Hi, I was just wondering if this has been resolved.
I have been using lava-dl 0.2.0 and noticed that this issue especially raised when I was trying to use two different types of neuron models. I then found that all the models were being jit-compiled and loaded using the same name (dynamics
), which caused conflicts. I changed the names (e.g., cuba_dynamics
or alif_dynamics
) so that they would differ for every neuron and have stopped receiving this kind of error, but I'm unsure if this is ok for other use cases.
EDIT: Manually setting the path to the correct version of Visual Studio was also part of the solution to the problem for me.
from lava-dl.
Hi @daevem, the issue for windows has not been resolved. I don't see the issue with same name dynamics
in linux environment.
import torch
from lava.lib.dl import slayer
device = torch.device('cuda')
cuba = slayer.neuron.cuba.Neuron(threshold=10, current_decay=0.5, voltage_decay=0.5).to(device)
alif = slayer.neuron.alif.Neuron(threshold=10, current_decay=0.5, voltage_decay=0.5, threshold_step=1, threshold_decay=0, refractory_decay=0).to(device)
x = torch.rand([1, 10, 100]).to(torch.device('cuda'))
alif(cuba(x))
It is fine to change the name of dynamics to unique name if that work sin Windows. If you have made progress toward fixing the windows issue, the steps would be useful for other windows users.
from lava-dl.
The same error keeps appearing for my Linux machines (I tested it on 2 different machines). Are there any solutions to this problem? The code runs on the CPU fine, but slow.
My environment is:
Ubuntu 20.04, python 3.9, PyTorch 2, cudnn 11.7
RuntimeError: Error building extension 'dynamics': [1/1] c++ leaky_integrator.cuda.o -shared -L/home/rescue/anaconda3/envs/lavaEnv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/rescue/anaconda3/lib64 -lcudart -o dynamics.so
FAILED: dynamics.so
c++ leaky_integrator.cuda.o -shared -L/home/rescue/anaconda3/envs/lavaEnv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/rescue/anaconda3/lib64 -lcudart -o dynamics.so
/usr/bin/ld: cannot find -lcudart
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
from lava-dl.
The same error keeps appearing for my Linux machines (I tested it on 2 different machines). Are there any solutions to this problem? The code runs on the CPU fine, but slow. My environment is:
Ubuntu 20.04, python 3.9, PyTorch 2, cudnn 11.7
RuntimeError: Error building extension 'dynamics': [1/1] c++ leaky_integrator.cuda.o -shared -L/home/rescue/anaconda3/envs/lavaEnv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/rescue/anaconda3/lib64 -lcudart -o dynamics.so FAILED: dynamics.so c++ leaky_integrator.cuda.o -shared -L/home/rescue/anaconda3/envs/lavaEnv/lib/python3.9/site-packages/torch/lib -lc10 -lc10_cuda -ltorch_cpu -ltorch_cuda -ltorch -ltorch_python -L/home/rescue/anaconda3/lib64 -lcudart -o dynamics.so /usr/bin/ld: cannot find -lcudart collect2: error: ld returned 1 exit status ninja: build stopped: subcommand failed.
On a Windows machine, I received a similar error. As I tracked down the error, I found this in the file cpp_extension.py:
cl_paths=subprocess.check_output(['where', 'cl']).decode(*SUBPROCESS_DECODE_ARGS).split('\r\n').
This indicates that cl.exe cannot be found on your computer. To resolve this issue, I installed a new version of MSVC C++ and then found the directory in MSVC that contained cl.exe. I then added the directory to my system PATH via "Environment Variables". After rebooting the computer, everything worked perfectly. Hope this helps someone.
from lava-dl.
Hi @ParsaOmidi, the issue has not been resolved for Windows. It's a pytorch issue with compiling extensions for windows. For your linux system, it looks like cuda compile library is missing. It is most likely because you do not have cuda compiler: nvcc
installed or the path is not configured correctly.
Try:
nvcc --version
to check if it is installed. If it not installed please install it first. Make sure the cuda version for nvcc matches your nvidia-runtime cuda version and your pytorch cuda version.
Only if it does not work after nvcc installation: you might need to export some environment variables like this:
export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
from lava-dl.
I was facing a similar issue on Ubuntu 22.04, and as per @bamsumit's comment above, I installed the cuda toolkit to resolve my issue and it helped. However, there are some nuances to installing nvcc
for different GPUs which I would like to share here; hopefully useful to someone else too!
TLDR:
Check the Compute Capability of your NVIDIA GPU and install the correct version of nvidia toolkit, i.e., nvcc
supported for your GPU to get rid of the compilation issues. You can find the supported mappings here.
More details with my trials/failures/successes below.
I have got three machines, one with RTX 2080, another with RTX A2000, and last one with RTX 4060 Ti. All the three machine are running Ubuntu 22.04 and nvidia-smi
on them reports the following ouput.
- RTX 2080 and RTX 4060 Ti:
NVIDIA-SMI 535.161.07 Driver Version: 535.161.07 CUDA Version: 12.2
- RTX A2000:
NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2
I am mentioning the installation case one by one below.
- RTX A2000 -- Compute Capability: 8.6
The installation through the lava_dl-0.5.0
binary was all smooth until I faced a ninja build issue (similar to the one reported here) when I ran my lava network. Checked my nvcc --version
and it produced:
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit
Upon installing nvidia-cuda-toolkit
via the above command my ninja build issue was resolved; and nvcc --version
produced:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
Note that cuda version of the nvcc
above is 11.5, which is supported on 8.6. Although, I could not find the cuda
symlink or directory in the path /usr/local/
and I guess I did not set the env vars either (unlike as suggested by @bamsumit above).
- RTX 4060 Ti -- Compute Capability: 8.9
Installation through lava_dl-0.5.0
binary was a breeze until I faced a ninja build issue again and upon checking the nvcc --version
it produced the same above error output to install nvidia-cuda-toolkit
. I did the same but to my surprise it did not resolve the issue; though the nvcc --version
worked successfully after the apt install, and produced the same above output of cuda version being 11.5. After some extensive search I got to understand that nvidia-cuda-toolkit
's (or nvcc
's) cuda version should be supported by the GPU -- as per the mapping linked above. Therefore I installed the latest 12.2 cuda toolkit from here supported for 8.9 Compute Capability. Note that I installed only the toolkit and not the driver -- you will get the command prompt options to do so while installing. This time I did find the cuda
symlink pointing to cuda-12.2
in the path /usr/local/
as used above by @bamsumit. Upon setting the suggested env vars: PATH
and LD_LIBRARY_PATH
I was able to get past that build issue and my network compiled successfully. Now I am getting the following output of nvcc --version
:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Jun_13_19:16:58_PDT_2023
Cuda compilation tools, release 12.2, V12.2.91
Build cuda_12.2.r12.2/compiler.32965470_0
Note that cuda version above is 12.2, which is supported on 8.9.
- RTX 2080 -- Compute Capability: 7.5
Here too sudo apt install nvidia-cuda-toolkit
installed the 11.5 cuda version of nvcc
and it didn't work. For 7.5 Compute Capability, the max supported cuda toolkit version is 10.2 and although I installed it following the same above process for RTX 4060 Ti, the compilation of my lava network still failed with following error:
.
.
.
nvcc fatal : Value 'c++17' is not defined for option 'std'
ninja: build stopped: subcommand failed.
which I suppose has got something to do with setting up the CMake properly as described here, although I haven't investigated it further yet.
NOTE: If you have mistakenly installed nvidia-cuda-toolkit
via apt install, and need to install the cuda toolkit directly from NVIDIA's website, do purge your current nvidia-cuda-toolkit
install before proceeding ahead; I followed the steps here -- I executed only sudo apt-get purge nvidia-cuda-toolkit
(I suppose --auto-remove
purges the nvidia drivers too which I did not want).
Hope this detailed info helps you!
from lava-dl.
Related Issues (20)
- Support verification and optimization of YoloKP on Loihi2 HOT 2
- Compiled netx hdf5 models cannot be serialized. HOT 1
- YOLO SDNN GPU inference notebook is too big to render on github
- Unable to reproduce Slayer NMNIST Test Accuracy HOT 1
- lava.lib.dl.netx.hdf5 imports Convolutional Layers incorrectly HOT 3
- YOLO SDNN inference
- SDNNs and SNNs
- error while using Recurrent block in lava-dl
- TypeError when using adrf neurons HOT 1
- Regression Tutorial using slayer HOT 2
- RuntimeError when using Recurrent blocks HOT 2
- When using slayer.block.cuba.Pool, input-output dimensions are not as expected. HOT 1
- next input block does not connect input port to neuron input. HOT 2
- Allow slayer norms to use parameters HOT 2
- Making the decay parameters(dv,du) learnable and separate du, dv for different layers? HOT 2
- optimize_weight_bits is increasing the weight matrix scale? HOT 1
- Netx DelaySynapse Bug: Weight_exp is None
- Neuron Parameters remain unchanged after setting them and also after training them. HOT 1
- Save recurrent network in lava-dl to hdf5 file, and load hdf5 file into lava with NetX
- Accelerate BDD100K dataset
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lava-dl.