Giter Site home page Giter Site logo

Comments (5)

ssube avatar ssube commented on May 18, 2024

This will require a new container based on the ROCm base image and adding the ROCmExecutionProvider to the list of platforms. The DML execution provider is currently named AMD, so that will be confusing and may have to be changed.

The ROCm + pytorch image is only available with Python 3.7 and 3.8. The 3.8 base should work, but I've only tested 3.9 and 3.10 (3.11+ do not work).

from onnx-web.

ssube avatar ssube commented on May 18, 2024

I set up an Ubuntu drive for my desktop with the ROCm 5.4 drivers, and rocminfo shows my GPU from within the container, but the ROCm execution provider is not available. Next, I built an image with a custom ONNX runtime with ROCm support, but it complains that the Python version does not match despite the builder and runner using the same FROM image:

>>> import onnxruntime
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/onnxruntime/__init__.py", line 55, in <module>
    raise import_capi_exception
  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import (
  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
ImportError: Python version mismatch: module was compiled for Python 3.7, but the interpreter version is incompatible: 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0].

from onnx-web.

ssube avatar ssube commented on May 18, 2024

I set up and tried with both ROCm 5.2 and 5.4, but neither version works correctly. The container does start and rocminfo runs, and I got somewhat further:

(onnx_env) root@ssube-notwin:/home/ssube/onnx-web/api# python
Python 3.9.16 (main, Dec  7 2022, 01:12:08) 
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnxruntime
No ROCm runtime is found, using ROCM_HOME='/opt/rocm-5.4.2'
>>> onnxruntime.get_available_providers()
['ROCMExecutionProvider', 'CPUExecutionProvider']
>>> import torch
>>> sess = onnxruntime.InferenceSession('../models/upscaling-real-esrgan-x2-plus.onnx', providers=['ROCMExecutionProvider'])
Inconsistency detected by ld.so: dl-version.c: 204: _dl_check_map_versions: Assertion `needed != NULL' failed!

or

root@d5146e7bb5e0:/onnx-web/api# python
Python 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnxruntime
No ROCm runtime is found, using ROCM_HOME='/opt/rocm'
>>> import torch
>>> sess = onnxruntime.InferenceSession('/data/models/upscaling-real-esrgan-x2-plus.onnx', providers=['ROCMExecutionProvider'])
2023-01-23 23:12:18.977468070 [W:onnxruntime:, session_state.cc:1136 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-01-23 23:12:18.977482750 [W:onnxruntime:, session_state.cc:1138 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
>>> [i.name for i in sess.get_inputs()]
['data']
>>> [o.name for o in sess.get_outputs()]
['output']
>>> image = torch.zeros([1, 3, 512, 512])
>>> sess.run(['output'], {'data': image.cpu().numpy()})
Segmentation fault (core dumped)

I don't want to delay the v0.5.0 release because of this any more. I will leave the image in the CI pipeline, but can't guarantee it works.

from onnx-web.

ssube avatar ssube commented on May 18, 2024

ROCm is working outside of the container:

request from 10.2.2.16: 75 rounds of EulerAncestralDiscreteScheduler using ../models/stable-diffusion-onnx-v1-5 on ROCMExecutionProvider, 512x512, 6.0, 909959654 - an astronaut eating a hamburger
invalid selection: 
invalid selection: 
txt2img output: txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png
10.2.2.16 - - [24/Jan/2023 03:28:35] "POST /api/txt2img?cfg=6.00&steps=75&scheduler=euler-a&seed=-1&prompt=an+astronaut+eating+a+hamburger&negativePrompt=&model=&platform=rocm&upscaling=&correction=&width=512&height=512 HTTP/1.1" 200 -
reusing existing pipeline
running garbage collection during pipeline change
10.2.2.16 - - [24/Jan/2023 03:28:35] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -
 17%|██████████████████████████▎                                                                                                                             | 13/75 [00:04<00:15,  4.05it/s]10.2.2.16 - - [24/Jan/2023 03:28:40] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -
 45%|████████████████████████████████████████████████████████████████████▉                                                                                   | 34/75 [00:09<00:09,  4.18it/s]10.2.2.16 - - [24/Jan/2023 03:28:45] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -
 72%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                          | 54/75 [00:14<00:05,  4.18it/s]10.2.2.16 - - [24/Jan/2023 03:28:50] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:19<00:00,  3.79it/s]
10.2.2.16 - - [24/Jan/2023 03:28:55] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -

from onnx-web.

ssube avatar ssube commented on May 18, 2024

This container should be working, but my build runners keep running out of disk, and uploading it from a local build eventually complains about an incorrect digest.

You can run a ROCm container, if you build your own, for now.

from onnx-web.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.