Giter Site home page Giter Site logo

Comments (11)

bamsumit avatar bamsumit commented on June 17, 2024

Hi @VishalPathak-GTRI wanted to check if you are able to run all the unittests.

It looks like you don't have cuda enabled machne. If you don't have GPU capable machine, you would want to set the code to use CPU instead.

device = torch.device('cpu')

in block 4 of the notebook.

from lava-dl.

VishalPathak-GTRI avatar VishalPathak-GTRI commented on June 17, 2024

So I actually am on a SLURM based GPU manager using CentOS; I have confirmed torch installation for GPU usage with torch.cuda.is_available, and with torch.cuda.get_device_name(0). However despite this I have issues with the notebook. There seems to be another post that I also replied on that is having similar issues.

from lava-dl.

bamsumit avatar bamsumit commented on June 17, 2024

Try this piece of code in python interpreter in your virtual environment and see if you get the error:

import torch
torch.round(torch.arange(10) / 2)
device = torch.device('cuda')
torch.round(torch.arange(10).to(device) / 2)

from lava-dl.

VishalPathak-GTRI avatar VishalPathak-GTRI commented on June 17, 2024

Alright, so I'm still having some issues but I think I can provide more information in order to get some advice:

image

That image is of the unittests for lava passing successfully.

image

That image is of the unittests for lava-dl failing..

image

This is the output of the commands that you had requested I perform.

What would you suggest?

from lava-dl.

bamsumit avatar bamsumit commented on June 17, 2024

Okay, you have a problem using GPU from torch. Looks like there is a cuda version mismatch between your torch installation and Nvidia drivers.

Let's see the output of the following commands

python -c "import torch; print(torch.__version__)"
nvidia-smi | grep CUDA
nvcc --version

from lava-dl.

VishalPathak-GTRI avatar VishalPathak-GTRI commented on June 17, 2024

Happy to oblige, but I did just figure something out first; I am accessing GPUs off a remote cluster. I have been consistently working on a TeslaK20Xm and I tried to grab a GeForceGTX1080Ti and the error that I linked above has disappeared.

image

The above image shows the accuracy of the unittesting. Strangely, this the errors all state issues with a module not being found? Additionally, the below image shows the error in jupyter itself:

image

This error occurs immediately upon executing the first cell under "Training Loop" in the NMNIST tutorial.

As far as the outputs of the commands you requested the below image should suffice:

image

from lava-dl.

bamsumit avatar bamsumit commented on June 17, 2024

Okay there is quite a bit of mismatch in your cuda versions

  • Your torch installation is for cuda 10.2
  • Your nvidia driver is for cuda 11.2
    • This is the reason you are getting no kernel image available as there is no runtime for cuda 10.2 in torch
  • Your nvcc compiler is for cuda 10
    • You will encounter problems when you use lava-dl compiled codes

Bottom line: the cuda versions of all three must be the same

from lava-dl.

franzhd avatar franzhd commented on June 17, 2024

Hello, i managed to make cuda the same version as nvcc as i show in the screen.
image
But the test still give me the same error "CUDA error: no kernel image is available for execution on the device" and the same test coverage.
image
how can i resolve this problem? thanks for your help.
P.S. i tried also to update the torch version of the virtual env to the latest one but it still don't work.

from lava-dl.

ahmetakman avatar ahmetakman commented on June 17, 2024

You can look at the #55 for some similar issue and investigate my case.
There are two things I would like to mention. First I think there is no PyTorch version for CUDA 11.6 the latest one is for 11.3.
Secondly I strongly disagree with @bamsumit . Nvidia driver cuda runtime version is only about compatibility need not to be same with torch cuda version and nvcc compiler version. However installed pytorch cuda version and nvcc compiler should be same and nvidia driver runtime cuda version should be equal or higher than them.
I have struggled with a similar issue past weeks and my suggestion would be holding nvidia driver like that and installing cuda 11.3 and latest stable pytorch with 11.3 .
cheers,

from lava-dl.

bamsumit avatar bamsumit commented on June 17, 2024

@VishalPathak-GTRI as I said you want to have your cuda versions for torch, nvcc and nvidia drivers at the same version if you can. You can sometimes make things work with different versions, but that's not always guaranteed. You can dig into nvidia cuda compatibility documentation for details.

As @ahmetakman pointed out, there is no official torch build for cuda versions later than Cuda 11.3 so I am guessing your torch is 1.11.0+cu113. So you at least

  • change your nvcc to cuda 11.3

I have working configurations with nvidia driver at cuda 11.6. So you can try not to change update your driver and see if it works.

from lava-dl.

bamsumit avatar bamsumit commented on June 17, 2024

@VishalPathak-GTRI is this issue resolved?

from lava-dl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.