Giter Site home page Giter Site logo

aws-neuron-samples's Introduction

Hello, my name is Philipp

I am a Technical Lead at Hugging Face, leading our collaboration and partnerships with major cloud providers like AWS, Google Cloud, Azure, and Cloudflare.

At Hugging Face, I have been instrumental in driving the company's growth through our Cloud and Hardware offerings. I serve as the Technical/Engineering Lead for our partnerships with AWS (SageMaker & Neuron), Google (Vertex AI & GKE), Azure (AzureML), and Cloudflare and created Hugging Face Inference Endpoints the easiest way to deploy LLMs into production.

If I'm not putting Transformers and Generative AI models into services or production, I'm collaborating with our open-source and science teams to make LLMs more accessible. I am currently focusing on LLMs using RLHF for enterprise and business use cases.

7+ years ago, I started my passion for cloud concepts and machine learning. Since then, I have leveraged this passion to design and implement cloud-native machine learning architectures for fin-tech and insurance companies. In recognition of my expertise in this field, I became the first German AWS Machine Learning Hero in June 2021.

In addition to my work at Hugging Face, I engage in evangelism and advocacy efforts, sharing my knowledge and insights through research publications, blog posts on LinkedIn, and posts on X (formerly Twitter).

I write about all the stuff i am working under philschmid.de.

🔗 Connect with me

Personal Website Twitter Medium LinkedIn

⚡Technologies

Below is a list of Technologies (mostly open source frameworks, libraries, and languages) I regularly use and enjoy working with. If you want to see more what I do or have done, check out my GitHub.

🤖 Machine Learning
Transformers, PyTorch, Scikit-Learn, Langchain, Weights & Bias, Deepspeed, TensorRT, Triton, ONNX.

☁️ Cloud
AWS, GCP, Azure, Kubernetes, Kubeflow, Docker, Terraform, Github Actions, CDK.

🏗️ Non-ML
Rust, Next.js, Svelte, Tailwind, FastAPI, Shadcn, React, GRPC.

aws-neuron-samples's People

Contributors

dhruvabansal00 avatar philschmid avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Forkers

dhruvabansal00

aws-neuron-samples's Issues

Model compilation happening at every step

Hello!

I am trying to train a Flan-T5 small model using the Huggingface Seq2SeqTrainer and am noticing that optimum is compiling the model at every step. Is there a good way to avoid this from happening? I tried to follow these steps: https://huggingface.co/docs/optimum-neuron/guides/cache_system but on trying
optimum-cli neuron cache create
I get an error saying:
Optimum CLI tool: error: invalid choice: 'neuron' (choose from 'export', 'env', 'onnxruntime')

Issues when trying to convert Donut model to Inferentia2

Hi Phil,
Hope you are doing great and apologies for piggy-backing on this github repository. but I thought you might be able to provide me with some pointers as you have worked extensively with the Donut model.

I've been trying to convert this Donut model to Inferentia2. I've been basing myself on the excellent script provided , inference_transformers_vision.py but getting some exceptions when running the trace model python script.

Here is the code used (trace-model.py) :

`import torch
import os
import importlib
import requests
from transformers import DonutProcessor, VisionEncoderDecoderModel
from PIL import Image

chip_type = os.environ.get("CHIP_TYPE", "inf2")

print(f"Selecting chip type: {chip_type}")

if chip_type == "inf1":
import torch_neuron as neuron_lib
elif chip_type == "inf2":
import torch_neuronx as neuron_lib

batch_size = 1
sequence_length = 128
model_name = 'naver-clova-ix/donut-base-finetuned-cord-v2'

#2. LOAD PRE-TRAINED MODEL
print(f'\nLoading pre-trained model: {model_name}')
processor = DonutProcessor.from_pretrained(model_name)
model = VisionEncoderDecoderModel.from_pretrained(model_name)

#3. TOKENIZE THE INPUT
#note: if you don't include return_tensors='pt' you'll get a list of lists which is easier for exploration but you cannot feed that into a model.

#Move model to GPU
device = "cuda" if torch.cuda.is_available() else "cpu"

#let's perform inference on an image
url = "https://media.snopes.com/2017/07/walmart-jajket.jpg"
image = Image.open(requests.get(url, stream=True).raw)
image = image.convert("RGB")

#prepare decoder inputs
task_prompt = ""
decoder_input_ids = processor.tokenizer(task_prompt, add_special_tokens=False, return_tensors="pt").input_ids
pixel_values = processor(image, return_tensors="pt").pixel_values

print('\nTracing model ...')

pipeline_cores = 1
model_traced = neuron_lib.trace(model, pixel_values, compiler_workdir=f'{chip_type}-compiler-workdir')
print(' tracing completed.')

model_traced.save('./compiled-model-bs-'+str(batch_size)+'.pt')
print('\n Model Traced and Saved')
`
And the exception I've been getting is shown below:

Tracing model ...
Traceback (most recent call last):
File "/trace-model/trace-model.py", line 47, in
model_traced = neuron_lib.trace(model, tuple(pixel_values), tuple(decoder_input_ids), compiler_workdir=f'{chip_type}-compiler-workdir')
File "/opt/conda/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 289, in trace
neff_filename, metaneff, flattener, packer = _trace(
File "/opt/conda/lib/python3.10/site-packages/torch_neuronx/xla_impl/trace.py", line 326, in _trace
hlo, input_parameter_names, constant_parameter_tensors, flattener, packer = xla_trace(
File "/opt/conda/lib/python3.10/site-packages/torch_neuronx/xla_impl/hlo_conversion.py", line 94, in xla_trace
outputs = func(*example_inputs)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/vision_encoder_decoder/modeling_vision_encoder_decoder.py", line 581, in forward
encoder_outputs = self.encoder(
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py", line 934, in forward
embedding_output, input_dimensions = self.embeddings(pixel_values, bool_masked_pos=bool_masked_pos)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py", line 177, in forward
embeddings, output_dimensions = self.patch_embeddings(pixel_values)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/donut/modeling_donut_swin.py", line 228, in forward
_, num_channels, height, width = pixel_values.shape
ValueError: not enough values to unpack (expected 4, got 3)

Please let me know what I might be missing here. thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.