The regular flask server in a <code class="notranslat

ROCm is working outside of the container: <div class="snippet-clipboard-content no

get container working with AMD acceleration about onnx-web HOT 5 OPEN

ssube commented on May 18, 2024

get container working with AMD acceleration

from onnx-web.

Comments (5)

ssube commented on May 18, 2024

This will require a new container based on the ROCm base image and adding the ROCmExecutionProvider to the list of platforms. The DML execution provider is currently named AMD, so that will be confusing and may have to be changed.

The ROCm + pytorch image is only available with Python 3.7 and 3.8. The 3.8 base should work, but I've only tested 3.9 and 3.10 (3.11+ do not work).

from onnx-web.

ssube commented on May 18, 2024

I set up an Ubuntu drive for my desktop with the ROCm 5.4 drivers, and rocminfo shows my GPU from within the container, but the ROCm execution provider is not available. Next, I built an image with a custom ONNX runtime with ROCm support, but it complains that the Python version does not match despite the builder and runner using the same FROM image:

>>> import onnxruntime
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/onnxruntime/__init__.py", line 55, in <module>
    raise import_capi_exception
  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import (
  File "/onnx-web/api/onnx_env/lib/python3.8/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
ImportError: Python version mismatch: module was compiled for Python 3.7, but the interpreter version is incompatible: 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0].

from onnx-web.

ssube commented on May 18, 2024

I set up and tried with both ROCm 5.2 and 5.4, but neither version works correctly. The container does start and rocminfo runs, and I got somewhat further:

(onnx_env) root@ssube-notwin:/home/ssube/onnx-web/api# python
Python 3.9.16 (main, Dec  7 2022, 01:12:08) 
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnxruntime
No ROCm runtime is found, using ROCM_HOME='/opt/rocm-5.4.2'
>>> onnxruntime.get_available_providers()
['ROCMExecutionProvider', 'CPUExecutionProvider']
>>> import torch
>>> sess = onnxruntime.InferenceSession('../models/upscaling-real-esrgan-x2-plus.onnx', providers=['ROCMExecutionProvider'])
Inconsistency detected by ld.so: dl-version.c: 204: _dl_check_map_versions: Assertion `needed != NULL' failed!

root@d5146e7bb5e0:/onnx-web/api# python
Python 3.8.13 (default, Mar 28 2022, 11:38:47) 
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import onnxruntime
No ROCm runtime is found, using ROCM_HOME='/opt/rocm'
>>> import torch
>>> sess = onnxruntime.InferenceSession('/data/models/upscaling-real-esrgan-x2-plus.onnx', providers=['ROCMExecutionProvider'])
2023-01-23 23:12:18.977468070 [W:onnxruntime:, session_state.cc:1136 VerifyEachNodeIsAssignedToAnEp] Some nodes were not assigned to the preferred execution providers which may or may not have an negative impact on performance. e.g. ORT explicitly assigns shape related ops to CPU to improve perf.
2023-01-23 23:12:18.977482750 [W:onnxruntime:, session_state.cc:1138 VerifyEachNodeIsAssignedToAnEp] Rerunning with verbose output on a non-minimal build will show node assignments.
>>> [i.name for i in sess.get_inputs()]
['data']
>>> [o.name for o in sess.get_outputs()]
['output']
>>> image = torch.zeros([1, 3, 512, 512])
>>> sess.run(['output'], {'data': image.cpu().numpy()})
Segmentation fault (core dumped)

I don't want to delay the v0.5.0 release because of this any more. I will leave the image in the CI pipeline, but can't guarantee it works.

from onnx-web.

ssube commented on May 18, 2024

ROCm is working outside of the container:

request from 10.2.2.16: 75 rounds of EulerAncestralDiscreteScheduler using ../models/stable-diffusion-onnx-v1-5 on ROCMExecutionProvider, 512x512, 6.0, 909959654 - an astronaut eating a hamburger
invalid selection: 
invalid selection: 
txt2img output: txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png
10.2.2.16 - - [24/Jan/2023 03:28:35] "POST /api/txt2img?cfg=6.00&steps=75&scheduler=euler-a&seed=-1&prompt=an+astronaut+eating+a+hamburger&negativePrompt=&model=&platform=rocm&upscaling=&correction=&width=512&height=512 HTTP/1.1" 200 -
reusing existing pipeline
running garbage collection during pipeline change
10.2.2.16 - - [24/Jan/2023 03:28:35] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -
 17%|██████████████████████████▎                                                                                                                             | 13/75 [00:04<00:15,  4.05it/s]10.2.2.16 - - [24/Jan/2023 03:28:40] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -
 45%|████████████████████████████████████████████████████████████████████▉                                                                                   | 34/75 [00:09<00:09,  4.18it/s]10.2.2.16 - - [24/Jan/2023 03:28:45] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -
 72%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▍                                          | 54/75 [00:14<00:05,  4.18it/s]10.2.2.16 - - [24/Jan/2023 03:28:50] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 75/75 [00:19<00:00,  3.79it/s]
10.2.2.16 - - [24/Jan/2023 03:28:55] "GET /api/ready?output=txt2img_909959654_73ccc88818151b74dab48b14fdc59a2c8761904d7c603a7e67e893962c1139e7_1674530915.png HTTP/1.1" 200 -

from onnx-web.

ssube commented on May 18, 2024

This container should be working, but my build runners keep running out of disk, and uploading it from a local build eventually complains about an incorrect digest.

You can run a ROCm container, if you build your own, for now.

from onnx-web.

get container working with AMD acceleration about onnx-web HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent