The aws-neuron-samples from philschmid

aws-neuron-samples's Introduction

Hello, my name is Philipp

I am a Technical Lead at Hugging Face, leading our collaboration and partnerships with major cloud providers like AWS, Google Cloud, Azure, and Cloudflare.

At Hugging Face, I have been instrumental in driving the company's growth through our Cloud and Hardware offerings. I serve as the Technical/Engineering Lead for our partnerships with AWS (SageMaker & Neuron), Google (Vertex AI & GKE), Azure (AzureML), and Cloudflare and created Hugging Face Inference Endpoints the easiest way to deploy LLMs into production.

If I'm not putting Transformers and Generative AI models into services or production, I'm collaborating with our open-source and science teams to make LLMs more accessible. I am currently focusing on LLMs using RLHF for enterprise and business use cases.

7+ years ago, I started my passion for cloud concepts and machine learning. Since then, I have leveraged this passion to design and implement cloud-native machine learning architectures for fin-tech and insurance companies. In recognition of my expertise in this field, I became the first German AWS Machine Learning Hero in June 2021.

In addition to my work at Hugging Face, I engage in evangelism and advocacy efforts, sharing my knowledge and insights through research publications, blog posts on LinkedIn, and posts on X (formerly Twitter).

I write about all the stuff i am working under philschmid.de.

🔗 Connect with me

⚡Technologies

Below is a list of Technologies (mostly open source frameworks, libraries, and languages) I regularly use and enjoy working with. If you want to see more what I do or have done, check out my GitHub.

🤖 Machine Learning
Transformers, PyTorch, Scikit-Learn, Langchain, Weights & Bias, Deepspeed, TensorRT, Triton, ONNX.

☁️ Cloud
AWS, GCP, Azure, Kubernetes, Kubeflow, Docker, Terraform, Github Actions, CDK.

🏗️ Non-ML
Rust, Next.js, Svelte, Tailwind, FastAPI, Shadcn, React, GRPC.

aws-neuron-samples's People

Contributors

Stargazers

Watchers

aws-neuron-samples's Issues

Model compilation happening at every step

Hello!

I am trying to train a Flan-T5 small model using the Huggingface Seq2SeqTrainer and am noticing that optimum is compiling the model at every step. Is there a good way to avoid this from happening? I tried to follow these steps: https://huggingface.co/docs/optimum-neuron/guides/cache_system but on trying
optimum-cli neuron cache create
I get an error saying:
Optimum CLI tool: error: invalid choice: 'neuron' (choose from 'export', 'env', 'onnxruntime')

Unable to use Neuron Cores while fine-tuning BERT on Trainium

Hey!

I am trying to follow this guide: https://huggingface.co/docs/optimum-neuron/tutorials/fine_tune_bert and fine tune BERT on a trn1.2xlarge instance. I setup the datasets as mentioned in the blog and then ran the training script but the usage of neuron cores is still at 0%. The reason why this is relevant for me is because the expected training time for me is close to 5 hours.

cc: @philschmid

Issues when trying to convert Donut model to Inferentia2

Hi Phil,
Hope you are doing great and apologies for piggy-backing on this github repository. but I thought you might be able to provide me with some pointers as you have worked extensively with the Donut model.

I've been trying to convert this Donut model to Inferentia2. I've been basing myself on the excellent script provided , inference_transformers_vision.py but getting some exceptions when running the trace model python script.

Here is the code used (trace-model.py) :

`import torch
import os
import importlib
import requests
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image

chip_type = os.environ.get("CHIP_TYPE", "inf2")

print(f"Selecting chip type: {chip_type}")

if chip_type == "inf1":
import torch_neuron as neuron_lib
elif chip_type == "inf2":
import torch_neuronx as neuron_lib

batch_size = 1
sequence_length = 128
model_name = 'naver-clova-ix/donut-base-finetuned-cord-v2'

#2. LOAD PRE-TRAINED MODEL
print(f'\nLoading pre-trained model: {model_name}')
processor = DonutProcessor.from_pretrained(model_name)
model = VisionEncoderDecoderModel.from_pretrained(model_name)

#3. TOKENIZE THE INPUT
#note: if you don't include return_tensors='pt' you'll get a list of lists which is easier for exploration but you cannot feed that into a model.

#Move model to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"

#let's perform inference on an image
url = "https://media.snopes.com/2017/07/walmart-jajket.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image = image.convert("RGB")

#prepare decoder inputs
task_prompt = ""
decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt").input_ids
pixel_values = processor(image, return_tensors="pt").pixel_values

print('\nTracing model ...')

pipeline_cores = 1
model_traced = neuron_lib.trace(model, pixel_values, compiler_workdir=f'{chip_type}-compiler-workdir')
print(' tracing completed.')

model_traced.save('./compiled-model-bs-'+str(batch_size)+'.pt')
print('\n Model Traced and Saved')
`
And the exception I've been getting is shown below:

Tracing model ...
Traceback (most recent call last):
File "/trace-model/trace-model.py", line 47, in
model_traced = neuron_lib.trace(model, tuple(pixel_values), tuple(decoder_input_ids), compiler_workdir=f'{chip_type}-compiler-workdir')
File "/opt/conda/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 289, in trace
neff_filename, metaneff, flattener, packer = _trace(
File "/opt/conda/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 326, in _trace
hlo, input_parameter_names, constant_parameter_tensors, flattener, packer = xla_trace(
File "/opt/conda/lib/python3.10/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 94, in xla_trace
outputs = func(*example_inputs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py", line 581, in forward
encoder_outputs = self.encoder(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py", line 934, in forward
embedding_output, input_dimensions = self.embeddings(pixel_values, bool_masked_pos=bool_masked_pos)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py", line 177, in forward
embeddings, output_dimensions = self.patch_embeddings(pixel_values)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py", line 228, in forward
_, num_channels, height, width = pixel_values.shape
ValueError: not enough values to unpack (expected 4, got 3)

Please let me know what I might be missing here. thanks.

Recommend Projects

philschmid / aws-neuron-samples Goto Github PK

aws-neuron-samples's Introduction

Hello, my name is Philipp

🔗 Connect with me

⚡Technologies

aws-neuron-samples's People

Contributors

Stargazers

Watchers

Forkers

aws-neuron-samples's Issues

Model compilation happening at every step

Unable to use Neuron Cores while fine-tuning BERT on Trainium

Issues when trying to convert Donut model to Inferentia2

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent