Giter Site home page Giter Site logo

Comments (5)

steffenWi avatar steffenWi commented on June 3, 2024 1

As for a list of supported GPUs, you can find that here and to figure out which GPU is meant by something like gfx900, you can see that here. You can also execute clinfo | grep gfx in a terminal and it'll output something like this Name: gfx1010:xnack-.

As you may notice, my GPU is not supported. More to the point no RDNA1 GPU is supported. To work around that you can use HSA_OVERRIDE_GFX_VERSION=10.3.0. That way pytorch/torchvision will think one is using a gfx1030 or Radeon RX 6800 based GPU. I've used that workaround on other projects before and it works just fine - usually.

from dashcamcleaner.

joshinils avatar joshinils commented on June 3, 2024

I have no clue about pytorch, maybe @tfaehse can help with that.
it may be a possible you need the rocm version of pytorch?
https://pytorch.org/blog/pytorch-for-amd-rocm-platform-now-available-as-python-package/
but I can not find the supported gpu list they linked there...

Ich würde das auch gern verwenden, wenn das geht, hab auch beides cpu und gpu von amd auf ubuntu.

from dashcamcleaner.

steffenWi avatar steffenWi commented on June 3, 2024

Small Update: this isn't an issue with DashcamCleaner. I ran some tests and get the same hang with relatively simple python scripts as well. Seems there is something wrong with my installation or the GPU is simply unable to do some things

from dashcamcleaner.

steffenWi avatar steffenWi commented on June 3, 2024

Just in case anyone else runs into this: Some of the things I tried running were the tests included in the 'roctracer' package. Some of them would run, others wouldn't. The ones that didn't run all crashed with segmentation faults.

export HSA_OVERRIDE_GFX_VERSION=10.3.0
./run.sh 0

would already crash for example. I got around to debugging this with gdb by running gdb ./test/MatrixTranspose and then enter run dry run at the gdb prompt. For all tests that failed the stacktrace would end up at this point:

hip::FatBinaryInfo::AddDevProgram (device_id=<optimized out>, this=<optimized out>) at /usr/src/debug/hip-runtime-amd/hipamd-rocm-5.4.3/src/hip_fatbin.cpp:122
122       if (fbd_info->add_dev_prog_ == false) { 

and in every case it turned out that fbd_info was unassigned/NULL. Looking at the source code of hip_fatbin.cpp I noticed that in the current version a change was made to fix the segmentation fault. This was added directly in front of the if statement:

if (fbd_info == nullptr) {
    return hipErrorInvalidKernelFile;
  }

I'm now going to wait until the 5.5 version hits the ArchLinux repository and see what happens then.

from dashcamcleaner.

steffenWi avatar steffenWi commented on June 3, 2024

Short update. I downloaded the docker pytorch/rocm container, updated it and compiled everything for rocm 5.5 as that version is still not available for Arch Linux. Got the same exception as before. Then swapped my RDNA1 GPU for a RDNA2 GPU of a friend. With the RDNA2 GPU it works fine. Swapping back to RDNA1 it fails.

from dashcamcleaner.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.