Giter Site home page Giter Site logo

nickfraser / optimum-amd Goto Github PK

View Code? Open in Web Editor NEW

This project forked from huggingface/optimum-amd

0.0 1.0 0.0 1.34 MB

AMD related optimizations for transformer models

Home Page: https://huggingface.co/docs/optimum/amd/index

License: MIT License

Python 24.65% Makefile 0.10% Jupyter Notebook 74.76% Dockerfile 0.49%

optimum-amd's Introduction

Optimum-AMD

🤗 Optimum-AMD is an extension to Hugging Face libraries enabling performance optimizations for ROCm for AMD GPUs and Ryzen AI for AMD NPU accelerator.

Install

Optimum-AMD library can be installed through pip:

pip install --upgrade-strategy eager optimum[amd]

Installation is possible from source as well:

git clone https://github.com/huggingface/optimum-amd.git
cd optimum-amd
pip install -e .

ROCm support for AMD GPUs

Hugging Face libraries natively support AMD GPUs through PyTorch for ROCm with zero code change.

🤗 Transformers natively supports Flash Attention 2, GPTQ quantization with ROCm. 🤗 Text Generation Inference library for LLM deployment has native ROCm support, with Flash Attention 2, Paged Attention, fused positional encoding & layer norm kernels support.

Find out more about these integrations in the documentation!

In the future, Optimum-AMD may host more ROCm-specific optimizations.

How to use it: Text Generation Inference

Text Generation Inference library for LLM deployment supports AMD Instinct MI210/MI250 GPUs. Deployment can be done as follow:

  1. Install ROCm5.7 to the host machine
  2. Example LLM server setup: launch a Falcon-7b model server on the ROCm-enabled docker.
model=tiiuae/falcon-7b-instruct
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/text-generation-inference:1.2-rocm --model-id $model
  1. Client setup: Open another shell and run:
curl 127.0.0.1:8080/generate \
    -X POST \
    -d '{"inputs":"What is Deep Learning?","parameters":{"max_new_tokens":20}}' \
    -H 'Content-Type: application/json'

How to use it: ONNX Runtime with ROCm

Optimum ONNX Runtime integration supports ROCm for AMD GPUs. Usage is as follow:

  1. Install ROCm 5.7 on the host machine.
  2. Use the example Dockerfile or install onnxruntime-rocm package locally from source. Pip wheels are not available at the time.
  3. Run a BERT text classification ONNX model by using ROCMExecutionProvider:
from optimum.onnxruntime import ORTModelForSequenceClassification
from optimum.pipelines import pipeline
from transformers import AutoTokenizer

ort_model = ORTModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased-finetuned-sst-2-english",
    export=True,
    provider="ROCMExecutionProvider",
)
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")
pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer, device="cuda:0")
result = pipe("Both the music and visual were astounding, not to mention the actors performance.")
print(result)
# [{'label': 'POSITIVE', 'score': 0.9997727274894714}]

Ryzen AI

AMD's Ryzen™ AI family of laptop processors provide users with an integrated Neural Processing Unit (NPU) which offloads the host CPU and GPU from AI processing tasks. Ryzen™ AI software consists of the Vitis™ AI execution provider (EP) for ONNX Runtime combined with quantization tools and a pre-optimized model zoo. All of this is made possible based on Ryzen™ AI technology built on AMD XDNA™ architecture, purpose-built to run AI workloads efficiently and locally, offering a host of benefits for the developer innovating the next groundbreaking AI app.

Optimum-AMD provides easy interface for loading and inference of Hugging Face models on Ryzen AI accelerator.

Ryzen AI Environment setup

A Ryzen AI environment needs to be enabled to use this library. Please refer to Ryzen AI's Installation and Runtime Setup.

How to use it?

  • Quantize the ONNX model with Optimum or using the RyzenAI quantization tools

For more information on quantization refer to Model Quantization guide.

  • Load model with Ryzen AI class

To load a model and run inference with RyzenAI, you can just replace your AutoModelForXxx class with the corresponding RyzenAIModelForXxx class.

import requests
from PIL import Image

- from transformers import AutoModelForImageClassification
+ from optimum.amd.ryzenai import RyzenAIModelForImageClassification
from transformers import AutoFeatureExtractor, pipeline

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model_id = <path of the model>
- model = AutoModelForImageClassification.from_pretrained(model_id)
+ model = RyzenAIModelForImageClassification.from_pretrained(model_id)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_id)
cls_pipe = pipeline("image-classification", model=model, feature_extractor=feature_extractor)
outputs = cls_pipe(image)

Tests

An extensive test suite is included to test the library's behavior. The test suite can be found in the tests folder. To run the tests, navigate to the root of the repository and specify a path to a subfolder or a specific test file.

Before running the tests, make sure to install the necessary dependencies by using the following command:

pip install .[tests]

and then run,

pytest -s -v ./tests/ryzenai/

You can also specify a smaller set of tests in order to test only the feature you're working on.

Running Slow Tests

By default, slow tests are skipped, but you can set the RUN_SLOW environment variable to 1 to run them.

RUN_SLOW=1 pytest -s -v ./tests/ryzenai/

NOTE: Enabling slow tests will involve downloading several gigabytes of models. Ensure you have enough disk space and a good internet connection!

Windows Powershell

For Windows Powershell, use the following command to run the slow tests:

$env:RUN_SLOW=1; pytest -s -v ./tests/ryzenai/

Note: The current operators baseline is generated using the Ryzen Software 1.1. To generate for your SDK version follow the steps in README.md

If you find any issue while using those, please open an issue or a pull request.

optimum-amd's People

Contributors

mht-sharma avatar fxmarty avatar nickfraser avatar echarlaix avatar giuseppe5 avatar ilyasmoutawwakil avatar amritgupta98 avatar chaoli-amd avatar mfuntowicz avatar glegendre01 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.