Comments (11)
Hi @VishalPathak-GTRI wanted to check if you are able to run all the unittests.
It looks like you don't have cuda enabled machne. If you don't have GPU capable machine, you would want to set the code to use CPU instead.
device = torch.device('cpu')
in block 4 of the notebook.
from lava-dl.
So I actually am on a SLURM based GPU manager using CentOS; I have confirmed torch installation for GPU usage with torch.cuda.is_available
, and with torch.cuda.get_device_name(0)
. However despite this I have issues with the notebook. There seems to be another post that I also replied on that is having similar issues.
from lava-dl.
Try this piece of code in python interpreter in your virtual environment and see if you get the error:
import torch
torch.round(torch.arange(10) / 2)
device = torch.device('cuda')
torch.round(torch.arange(10).to(device) / 2)
from lava-dl.
Alright, so I'm still having some issues but I think I can provide more information in order to get some advice:
That image is of the unittests for lava passing successfully.
That image is of the unittests for lava-dl failing..
This is the output of the commands that you had requested I perform.
What would you suggest?
from lava-dl.
Okay, you have a problem using GPU from torch. Looks like there is a cuda version mismatch between your torch installation and Nvidia drivers.
Let's see the output of the following commands
python -c "import torch; print(torch.__version__)"
nvidia-smi | grep CUDA
nvcc --version
from lava-dl.
Happy to oblige, but I did just figure something out first; I am accessing GPUs off a remote cluster. I have been consistently working on a TeslaK20Xm and I tried to grab a GeForceGTX1080Ti and the error that I linked above has disappeared.
The above image shows the accuracy of the unittesting. Strangely, this the errors all state issues with a module not being found? Additionally, the below image shows the error in jupyter itself:
This error occurs immediately upon executing the first cell under "Training Loop" in the NMNIST tutorial.
As far as the outputs of the commands you requested the below image should suffice:
from lava-dl.
Okay there is quite a bit of mismatch in your cuda versions
- Your torch installation is for cuda 10.2
- Your nvidia driver is for cuda 11.2
- This is the reason you are getting no kernel image available as there is no runtime for cuda 10.2 in torch
- Your nvcc compiler is for cuda 10
- You will encounter problems when you use lava-dl compiled codes
Bottom line: the cuda versions of all three must be the same
from lava-dl.
Hello, i managed to make cuda the same version as nvcc as i show in the screen.
But the test still give me the same error "CUDA error: no kernel image is available for execution on the device" and the same test coverage.
how can i resolve this problem? thanks for your help.
P.S. i tried also to update the torch version of the virtual env to the latest one but it still don't work.
from lava-dl.
You can look at the #55 for some similar issue and investigate my case.
There are two things I would like to mention. First I think there is no PyTorch version for CUDA 11.6 the latest one is for 11.3.
Secondly I strongly disagree with @bamsumit . Nvidia driver cuda runtime version is only about compatibility need not to be same with torch cuda version and nvcc compiler version. However installed pytorch cuda version and nvcc compiler should be same and nvidia driver runtime cuda version should be equal or higher than them.
I have struggled with a similar issue past weeks and my suggestion would be holding nvidia driver like that and installing cuda 11.3 and latest stable pytorch with 11.3 .
cheers,
from lava-dl.
@VishalPathak-GTRI as I said you want to have your cuda versions for torch, nvcc and nvidia drivers at the same version if you can. You can sometimes make things work with different versions, but that's not always guaranteed. You can dig into nvidia cuda compatibility documentation for details.
As @ahmetakman pointed out, there is no official torch build for cuda versions later than Cuda 11.3 so I am guessing your torch is 1.11.0+cu113
. So you at least
- change your nvcc to cuda 11.3
I have working configurations with nvidia driver at cuda 11.6. So you can try not to change update your driver and see if it works.
from lava-dl.
@VishalPathak-GTRI is this issue resolved?
from lava-dl.
Related Issues (20)
- NetX should allow user to utilize lava Sparse and DelaySparse synapses when converting SNNs from lava-dl to lava. HOT 2
- YOLO SDNN training
- Set user defined `spike_exp` level globally when creating netx network
- Support verification and optimization of YoloKP on Loihi2 HOT 2
- Compiled netx hdf5 models cannot be serialized. HOT 1
- YOLO SDNN GPU inference notebook is too big to render on github
- Unable to reproduce Slayer NMNIST Test Accuracy HOT 1
- lava.lib.dl.netx.hdf5 imports Convolutional Layers incorrectly HOT 3
- YOLO SDNN inference
- SDNNs and SNNs
- error while using Recurrent block in lava-dl
- TypeError when using adrf neurons HOT 1
- Regression Tutorial using slayer HOT 2
- RuntimeError when using Recurrent blocks HOT 1
- When using slayer.block.cuba.Pool, input-output dimensions are not as expected. HOT 1
- next input block does not connect input port to neuron input. HOT 2
- Allow slayer norms to use parameters HOT 2
- Making the decay parameters(dv,du) learnable and separate du, dv for different layers? HOT 2
- optimize_weight_bits is increasing the weight matrix scale? HOT 1
- Netx DelaySynapse Bug: Weight_exp is None
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from lava-dl.