Giter Site home page Giter Site logo

huggingface / exporters Goto Github PK

View Code? Open in Web Editor NEW
550.0 22.0 31.0 292 KB

Export Hugging Face models to Core ML and TensorFlow Lite

License: Apache License 2.0

Python 100.00%
coreml deep-learning machine-learning model-converter pytorch tensorflow tflite transformer coremltools

exporters's Issues

Problems of `Got max absolute difference of: nan`

I was planning to convert Voicelab/vlt5-base-keywords to coreml model. Everything went well, but I got an error in the end:

Validating Core ML model...
        -[✓] Core ML model output names match reference model ({'last_hidden_state'})
        - Validating Core ML model output "last_hidden_state":
                -[✓] (1, 128, 768) matches (1, 128, 768)
                -[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 146, in main
    convert_model(
  File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 70, in convert_model
    validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
  File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/validate.py", line 220, in validate_model_outputs
    raise ValueError(
ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: nan

Support for `Voicelab/vlt5-base-keywords`

I got a problem when I want to convert a vlt5 model to coreml model

KeyError: "voicelab/vlt5-base-keywords is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'falcon', 'gpt2', 'gpt_bigcode', 'gptj', 'gpt_neo', 'gpt_neox', 'levit', 'llama', 'm2m_100', 'marian', 'mistral', 'mobilebert', 'mobilevit', 'mobilevitv2', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos'] are supported. If you want to support voicelab/vlt5-base-keywords please propose a PR or open up an issue."

Error when export Sentence transformer to Coreml models

Description

Hi, I encounter following error when exporting sentence-tranformers/all-MiniLM-L6-v2 (a pytroch model) to a Coreml model.

python -m exporters.coreml --model=sentence-transformers/all-MiniLM-L6-v2 exported/

Using framework PyTorch: 1.12.1
Overriding 1 configuration item(s)
	- use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:  0%|                         | 0/342 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 99%|█████████████████████████████████████▊| 340/342 [00:00<00:00, 2753.73 ops/s]
Running MIL Common passes:  0%|                               | 0/40 [00:00<?, ? passes/s]/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, ‘546’, of the source model, has been renamed to ‘var_546’ in the Core ML model.
 warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|████████████████████████████████████████████████████| 40/40 [00:00<00:00, 233.90 passes/s]
Running MIL Clean up passes: 100%|██████████████████████████████████████████████████| 11/11 [00:00<00:00, 132.81 passes/s]
/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: “compiler error: Encountered an error while compiling a neural network model: validator error: Model output ‘pooler_output’ has a different shape than its corresponding return value to main.“.
 _warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
 File “/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
  return _run_code(code, main_globals, None,
 File “/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/runpy.py”, line 86, in _run_code
  exec(code, run_globals)
 File “/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 166, in <module>
  main()
 File “/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 154, in main
  convert_model(
 File “/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 65, in convert_model
  validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
 File “/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/validate.py”, line 108, in validate_model_outputs
  coreml_outputs = mlmodel.predict(coreml_inputs)
 File “/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py”, line 553, in predict
  raise self._framework_error
 File “/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py”, line 144, in _get_proxy_and_spec
  return _MLModelProxy(filename, compute_units.name), specification, None
RuntimeError: Error compiling model: “compiler error: Encountered an error while compiling a neural network model: validator error: Model output ‘pooler_output’ has a different shape than its corresponding return value to main.“.

The problem is similar to the problem mentioned in #9. I also tried to use the workaround to fix the problem. However, I got following error when I tried to do the prediction. Note that, "Model.mlpackage" is obtained by using above command.

import torch
import transformers
import coremltools as ct
import numpy as np
from exporters.coreml.models import BertCoreMLConfig
from transformers import AutoConfig

model_name = ‘sentence-transformers/all-MiniLM-L6-v2’
config = AutoConfig.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name, use_fast=True)

mlmodel = ct.models.MLModel(“Model.mlpackage”)

del mlmodel._spec.description.output[1].type.multiArrayType.shape[:]
mlmodel = ct.models.MLModel(mlmodel._spec, weights_dir=mlmodel.weights_dir)
mlmodel.save(“ModelFixed.mlpackage”)


sentences = [‘This is an example sentence’]
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors=‘pt’)
cml_inputs = {k: v.to(torch.int32).numpy() for k, v in encoded_input.items()}
pred_coreml = mlmodel.predict(cml_inputs)
print(pred_coreml)

What I got is following error
KeyError: 'Provided key "token_type_ids", in the input dict, does not match any of the model input name(s), which are: input_ids,attention_mask'

Detr-Resnet-50 Model Conversion to CoreML

I noticed that facebook/detr-resnet-50 is not able to convert into a CoreML format when using the Command Line prompt "python -m exporters.coreml --model=path_to_checkpoint path_to_converted_model.

In MODELS.md, for the Detr model, it states "The conversion completes without errors but the Core ML compiler cannot load the model. "Invalid operation output name: got 'tensor' when expecting token of type 'ID'".

Are you planning to release a complete export for Detr models soon? Could you please keep me posted?

Support for .safetensors files?

When trying to convert a model with safetensor weights, exporters fails with [MODEL] does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. Adding support would help out a lot, especially as safetensors seem to be pushed as the new standard for storing weights.

M2M100 Example?

Hello,
I'm trying to convert M2M100 to CoreML. I saw that it is partially supported, and I was wondering if there's any example script to do this.
Here's what I tried:

from exporters.coreml.models import M2M100CoreMLConfig
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
model_ckpt = "facebook/m2m100_418M"
base_model = M2M100ForConditionalGeneration.from_pretrained(
    model_ckpt, torchscript=True
)
preprocessor = M2M100Tokenizer.from_pretrained(model_ckpt)
coreml_config = M2M100CoreMLConfig(
    base_model.config, 
    task="text2text-generation",
    use_past=False,
)
mlmodel = export(
    preprocessor, base_model, coreml_config
)

However, when trying to run this code, I get the following error:

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

Thank you in advance!

Requesting Support for Salesforce/blip2-opt-2.7b

Hello,

I am very new to HuggingFace and machine learning in general. I understand that the Blip model is not supported for conversion to coreml. Can this be added to this repo? If not, Is there a way I can write my own conversion code?

Thanks


Conversion Settings:

    Model: Salesforce/blip2-opt-2.7b
    Task: None
    Framework: None
    Compute Units: None
    Precision: None
    Tolerance: None
    Push to: None

    Error: "blip is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'gpt2', 'gpt_neo', 'levit', 'm2m_100', 'marian', 'mobilebert', 'mobilevit', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos'] are supported. If you want to support blip please propose a PR or open up an issue."

Export & use T5-Base model for summarization

Hey guys,

I'm pretty new to CoreML conversion stuff and took the naive approach of converting a T5-Base model to CoreML (I want to use it to generate summarisations). As layed out in the README I created an encoder and a decoder model, which worked without a problem:

(base) me@me-MacBook-Pro ~/Development/projects/exporters$ python -m exporters.coreml --model=t5-small --feature=text2text-generation exported                                                      ✭main 
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.0.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
Converting encoder model...
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 755/756 [00:00<00:00, 2482.08 ops/s]
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:00<00:00, 73.01 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 27.71 passes/s]
Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[✓] (1, 128, 768) matches (1, 128, 768)
		-[✓] all values close (atol: 0.0001)
All good, model saved at: exported/encoder_Model.mlpackage
Converting decoder model...
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/transformers/modeling_utils.py:828: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_mask.shape[1] < attention_mask.shape[1]:
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1260/1262 [00:00<00:00, 2404.55 ops/s]
Running MIL Common passes:   5%|████████▊                                                                                                                                                                  | 2/39 [00:00<00:02, 15.47 passes/s]/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1761', of the source model, has been renamed to 'var_1761' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 36.73 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 14.41 passes/s]
Validating Core ML model...
	-[✓] Core ML model output names match reference model ({'logits'})
	- Validating Core ML model output "logits":
		-[✓] (1, 64, 32100) matches (1, 64, 32100)
		-[✓] all values close (atol: 0.0001)
All good, model saved at: exported/decoder_Model.mlpackage

This is where the fun begins :) I've only ever worked with the t5 model through transformers & pipelines. Like this:

from torchvision import models
from torchsummary import summary

from transformers import T5TokenizerFast, T5ForConditionalGeneration, pipeline

text = "summarise: The quick brown fox jumps over the lazy dog"
tokenizer = T5TokenizerFast.from_pretrained("t-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base", return_dict=True)
model.to('cuda')

tokens = tokenizer(text, return_tensors="pt")
input_ids = tokens.input_ids

outputs = model.generate(input_ids.cuda(), max_length=40)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

As far as I understand by using the model.generate method the transformers utilities do all the heavy lifting here like creating the attention_masks, running the encoder, passing the encoder_hidden_states along, etc. pp.
Am I right to assume that I would have to implement all this functionality by hand if I want to work with the CoreML encoder / decoder models?

I'm not only worried about using them in Python, but would also like to use them in Swift. But I guess there's no easy plug'n play solution here, right? :)

Exporter being killed

Similar to #61, my exporter process is being killed. I'd like to verify this is a resource constraint, and not an issue in project. I am running python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 mistral.mlpackage on a M3 MacBook Pro with 18GB of memory.

model-00001-of-00002.safetensors: 100%|████| 9.94G/9.94G [07:47<00:00, 21.3MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.54G/4.54G [04:42<00:00, 16.1MB/s]
Downloading shards: 100%|████████████████████████| 2/2 [12:31<00:00, 375.71s/it]████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.54G/4.54G [04:42<00:00, 16.7MB/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:25<00:00, 12.58s/it]
Using framework PyTorch: 2.1.0
Overriding 1 configuration item(s)
	- use_cache -> False
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:161: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if past_key_values_length > 0:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:285: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:304: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Patching PyTorch conversion 'log' with <function MistralCoreMLConfig.patch_pytorch_ops.<locals>.log at 0x13a115300>
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__contains__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__getitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__delitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__setitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                             | 0/4506 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4505/4506 [00:01<00:00, 3255.50 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 13.02 passes/s]
Running MIL default pipeline:  14%|████████████████████                                                                                                                          | 10/71 [00:00<00:03, 15.93 passes/s]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '5409', of the source model, has been renamed to 'var_5409' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  73%|████████████████████████████████████████████████████████████████████████████████████████████████████████                                      | 52/71 [03:36<02:09,  6.79s/ passes]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
  return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [07:27<00:00,  6.30s/ passes]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 168.96 passes/s]
zsh: killed     python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 
willwalker misty > /opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Support OneFormer Model

Would it be possible to support the OneFormer model? I am not experienced with ML, but would love to use that model on mobile devices if possible.

Thank you so much!

Converting llama-2-7b failed

It runs well until
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
Then I saw the Activity Monitor the python is stop running.

How to fix this?

(LLM_env) tim@TPE exporters % python -m exporters.coreml --model=/Users/tim/GitLab/survey/LLM/llama-meta/Llama-2-7b-hf exported/ 
Torch version 2.0.1 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:02<00:00, 31.31s/it]
Using framework PyTorch: 2.0.1
Overriding 1 configuration item(s)
        - use_cache -> False
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:808: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:375: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:382: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:392: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                                     | 0/3627 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 3626/3627 [00:01<00:00, 3155.13 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 18.10 passes/s]
Running MIL default pipeline:  15%|██████████████████████▋                                                                                                                               | 10/66 [00:01<00:05, 10.96 passes/s]/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '4530', of the source model, has been renamed to 'var_4530' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                  | 51/66 [03:23<01:40,  6.70s/ passes]/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
  return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66/66 [09:44<00:00,  8.86s/ passes]
Running MIL backend_mlprogram pipeline: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 65.90 passes/s]
zsh: killed     python -m exporters.coreml  exported/
(LLM_env) tim@TPE exporters % /Users/tim/.pyenv/versions/3.11.5/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Export of Llama2 fails

I'm unable to use exporters for meta-llama/Llama-2-7b-chat-hf model.

Here is my command

python -m exporters.coreml --model=meta-llama/Llama-2-7b-chat-hf models/llama2.mlpackage

And here is the output

 % python -m exporters.coreml --model=meta-llama/Llama-2-7b-chat-hf models/llama2.mlpackage
Torch version 2.3.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.2.0 is the most recent version that has been tested.
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:30<00:00, 15.44s/it]
Using framework PyTorch: 2.3.0
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/transformers/modeling_utils.py:4371: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
  warnings.warn(
The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:1094: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if sequence_length != 1:
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                             | 0/3690 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!


ERROR - converting 'full' op (located at: 'model'):

Converting PyTorch Frontend ==> MIL Ops:   1%|▉                                                                                                                                 | 28/3690 [00:00<00:00, 5249.21 ops/s]
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
    mlmodel = export(
              ^^^^^^^
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/convert.py", line 660, in export
    return export_pytorch(preprocessor, model, config, quantize, compute_units)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/LLAMA2/exporters/src/exporters/coreml/convert.py", line 553, in export_pytorch
    mlmodel = ct.convert(
              ^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 581, in convert
    mlmodel = mil_convert(
              ^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 288, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
    return load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 82, in load
    return _perform_torch_convert(converter, debug)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 116, in _perform_torch_convert
    prog = converter.convert()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 581, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 86, in convert_nodes
    raise e     # re-raise exception
    ^^^^^^^
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 81, in convert_nodes
    convert_single_node(context, node)
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 134, in convert_single_node
    add_op(context, node)
  File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4211, in full
    else NUM_TO_NUMPY_DTYPE[TORCH_DTYPE_TO_NUM[inputs[2].val]]
                            ~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: 6

I was able to generate a mlpackage for distilbert-base-uncased-finetuned-sst-2-english, with this command: python -m exporters.coreml --model=distilbert-base-uncased-finetuned-sst-2-english --feature=sequence-classification models/defaults.mlpackage, so I have some confidence that the environment is correct and working.

Exporter failing due to output shape

When trying to export the Huggingface models Deeppavlov/rubert-base-cased and ckiplab/bert-base-chinese-ner using the command line, it fails with the output

Some weights of the model checkpoint at Deeppavlov/rubert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 1.12.1
Overriding 1 configuration item(s)
        - use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                           | 0/630 [00:00<?, ? ops/s]CoreML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████████████▋| 628/630 [00:00<00:00, 4660.63 ops/s]
Running MIL Common passes:   0%|                                                                                                                       | 0/39 [00:00<?, ? passes/s]/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1020', of the source model, has been renamed to 'var_1020' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:00<00:00, 47.87 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 31.91 passes/s]
/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py:145: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "compiler error:  Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".
  _warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 166, in <module>
    main()
  File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 154, in main
    convert_model(
  File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 65, in convert_model
    validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
  File "/Users/starlight/exporters/src/exporters/coreml/validate.py", line 108, in validate_model_outputs
    coreml_outputs = mlmodel.predict(coreml_inputs)
  File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 545, in predict
    raise self._framework_error
  File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 143, in _get_proxy_and_spec
    return (_MLModelProxy(filename, compute_units.name), specification, None)
RuntimeError: Error compiling model: "compiler error:  Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".
Exception ignored in: <function MLModel.__del__ at 0x11ebe1ee0>
Traceback (most recent call last):
  File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 369, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down

It runs correctly with --model=distillbert-base-uncased.
Using
python 3.9.13,
coremltools 6.1
torch 1.12.1

A .mlpackage file is created, but I can't use one I can't call predict() on.

Error when transforming Roberta models

Hello, I'm currently encountering an issue while transforming models using Roberta.

Roberta is a text classification model to evaluate emotions or if the content is hateful. So it is supposed to be a very simple text classification model.

I tried to use the exporter with the following models:

  • roberta-large-mnli
  • facebook/roberta-hate-speech-dynabench-r4-target
  • SamLowe/roberta-base-go_emotions

First I was using the web tool to transform directly the model, and I could only select the option "text-generation". When trying directly from the python tool, the following error is returned:

Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/__main__.py", line 141, in main
    model_kind, model_coreml_config = FeaturesManager.check_supported_model_or_raise(model, feature=args.feature)
  File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/features.py", line 498, in check_supported_model_or_raise
    raise ValueError(
ValueError: roberta doesn't support feature text-classification. Supported values are: {'text-generation': functools.partial(<bound method CoreMLConfig.from_model_config of <class 'exporters.coreml.models.RobertaCoreMLConfig'>>, task='text-generation'), 'text-generation-with-past': functools.partial(<bound method CoreMLConfig.with_past of <class 'exporters.coreml.models.RobertaCoreMLConfig'>>, task='text-generation')}

If I understand correctly, all models are supposed to be trained and directly available to use. Am I missing a step or a configuration to make them work ?

Thank you.

exporting to coreml format throws errors

I am doing this:

python -m exporters.coreml --model=bert-base-uncased exported/

and running into error:

RuntimeError: Error compiling model: "compiler error: Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".

Did the underlying Bert implementation's api change?

I hit similar errors with some of the other models mentioned in the Readme (ready-made configurations)

Is it possible to add support for Kosmos-2

Hello HuggingFace and Its Wonderful Employees!!,
I was just checking if It is possible for me to convert "https://huggingface.co/microsoft/kosmos-2-patch14-224" model to support coreml so that I can use it on my mac?

its an Image to Text (Image Captioning Model)

I have tried it now but it says this model is not supported, Is there any way I or we could add support for this?

Thanks!!!!

GPTBigCode Support?

Out of sheer curiosity I tried to export bigcode/starcoder to CoreML and got the following error after downloading the weights:
"gpt_bigcode is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'gpt2', 'gpt_neo', 'levit', 'm2m_100', 'marian', 'mobilebert', 'mobilevit', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos']

I understand GPTBigCode is an optimized GPT2 Model with support for Multi-Query Attention.
https://huggingface.co/docs/transformers/model_doc/gpt_bigcode

Python isn't my strong suit but I just wanted to flag this here. Would running Starcoder on CoreML even be feasible or is it too large?

Converting EleutherAI/Pythia Models

I was wondering if its possible to support the conversion of the Pythia models to coreml. Naively I ran python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/which gave me this error:

Original Ouput
python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   4%|█████▏                                                                                                                                  | 86/2272 [00:00<00:01, 2038.49 ops/s]
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/[email protected]/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
    mlmodel = export(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 687, in export
    return export_pytorch(preprocessor, model, config, quantize, compute_units)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 552, in export_pytorch
    mlmodel = ct.convert(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py", line 530, in convert
    mlmodel = mil_convert(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 286, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
    return load(*args, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 63, in load
    return _perform_torch_convert(converter, debug)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 102, in _perform_torch_convert
    prog = converter.convert()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 439, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
    add_op(context, node)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4502, in gather
    res = mb.gather_along_axis(x=inputs[0], indices=inputs[2], axis=inputs[1], name=node.name)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/builder.py", line 182, in _add_op
    new_op.type_value_inference()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py", line 253, in type_value_inference
    output_types = self.type_inference()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/scatter_gather.py", line 312, in type_inference
    assert self.x.shape[i] == self.indices.shape[i]
AssertionError

I tried bypassing this error by commenting the line out, which results in sometimes a memory leak (I think, as my memory usage goes to 60 GB), but I was able to export it one time but it fails the performance report in xcode. When commenting out the line I get this output:

Check bypassed Output
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                                                                                                                                                                                         | 0/2272 [00:00<?, ? ops/s](is13, 1, 2048, 64) (is11, 1, is12, 64)
(is14, 1, 2048, 64) (is11, 1, is12, 64)
(is53, 1, 2048, 64) (is51, 1, is52, 64)
(is54, 1, 2048, 64) (is51, 1, is52, 64)
Converting PyTorch Frontend ==> MIL Ops:  11%|███████████████████████████████▎                                                                                                                                                                                                                                                             | 250/2272 [00:00<00:00, 2499.35 ops/s](is107, 1, 2048, 64) (is105, 1, is106, 64)
(is108, 1, 2048, 64) (is105, 1, is106, 64)
(is161, 1, 2048, 64) (is159, 1, is160, 64)
(is162, 1, 2048, 64) (is159, 1, is160, 64)
Converting PyTorch Frontend ==> MIL Ops:  23%|████████████████████████████████████████████████████████████████▎                                                                                                                                                                                                                            | 513/2272 [00:00<00:00, 2575.44 ops/s](is215, 1, 2048, 64) (is213, 1, is214, 64)
(is216, 1, 2048, 64) (is213, 1, is214, 64)
Converting PyTorch Frontend ==> MIL Ops:  34%|████████████████████████████████████████████████████████████████████████████████████████████████▋                                                                                                                                                                                            | 771/2272 [00:00<00:00, 2514.44 ops/s](is269, 1, 2048, 64) (is267, 1, is268, 64)
(is270, 1, 2048, 64) (is267, 1, is268, 64)
(is323, 1, 2048, 64) (is321, 1, is322, 64)
(is324, 1, 2048, 64) (is321, 1, is322, 64)
Converting PyTorch Frontend ==> MIL Ops:  45%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉                                                                                                                                                            | 1023/2272 [00:00<00:00, 2458.22 ops/s](is377, 1, 2048, 64) (is375, 1, is376, 64)
(is378, 1, 2048, 64) (is375, 1, is376, 64)
(is431, 1, 2048, 64) (is429, 1, is430, 64)
(is432, 1, 2048, 64) (is429, 1, is430, 64)
Converting PyTorch Frontend ==> MIL Ops:  56%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                                                                                                            | 1274/2272 [00:00<00:00, 2413.73 ops/s](is485, 1, 2048, 64) (is483, 1, is484, 64)
(is486, 1, 2048, 64) (is483, 1, is484, 64)
(is539, 1, 2048, 64) (is537, 1, is538, 64)
(is540, 1, 2048, 64) (is537, 1, is538, 64)
Converting PyTorch Frontend ==> MIL Ops:  67%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                                              | 1516/2272 [00:00<00:00, 2176.52 ops/s](is593, 1, 2048, 64) (is591, 1, is592, 64)
(is594, 1, 2048, 64) (is591, 1, is592, 64)
Converting PyTorch Frontend ==> MIL Ops:  76%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                                                                  | 1738/2272 [00:00<00:00, 2144.58 ops/s](is647, 1, 2048, 64) (is645, 1, is646, 64)
(is648, 1, 2048, 64) (is645, 1, is646, 64)
(is701, 1, 2048, 64) (is699, 1, is700, 64)
(is702, 1, 2048, 64) (is699, 1, is700, 64)
Converting PyTorch Frontend ==> MIL Ops:  87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏                                     | 1969/2272 [00:00<00:00, 2149.72 ops/s](is755, 1, 2048, 64) (is753, 1, is754, 64)
(is756, 1, 2048, 64) (is753, 1, is754, 64)
(is809, 1, 2048, 64) (is807, 1, is808, 64)
(is810, 1, 2048, 64) (is807, 1, is808, 64)
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 2271/2272 [00:01<00:00, 2253.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 36.95 passes/s]
Running MIL default pipeline:  14%|██████████████████████████████████████████▋                                                                                                                                                                                                                                                                | 9/63 [00:00<00:03, 17.14 passes/s]/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:262: UserWarning: Output, '2680', of the source model, has been renamed to 'var_2680' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  38%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌                                                                                                                                                                                        | 24/63 [00:01<00:01, 28.21 passes/s](1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
Running MIL default pipeline:  59%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                                                                           | 37/63 [00:01<00:00, 28.56 passes/s](1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
Running MIL default pipeline:  92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎                       | 58/63 [00:03<00:00, 12.22 passes/s](1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:04<00:00, 14.28 passes/s]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 190.00 passes/s]

Any ideas?

`huggingface-cli env`
Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.15.1
- Platform: macOS-13.4-arm64-arm-64bit
- Python version: 3.10.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /Users/kendreaditya/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: osxkeychain
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.0.0
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- numpy: 1.24.2
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /Users/kendreaditya/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /Users/kendreaditya/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/kendreaditya/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
`pip freeze`
appnope==0.1.3
asttokens==2.2.1
attrs==23.1.0
backcall==0.2.0
cattrs==23.1.2
certifi==2023.5.7
charset-normalizer==3.1.0
comm==0.1.3
coremltools==7.0b1
debugpy==1.6.7
decorator==5.1.1
einops==0.6.1
exceptiongroup==1.1.1
executing==1.2.0
-e git+https://github.com/huggingface/exporters.git@d83cf6268fcaf1c6259511ddbd32dc9dcd79bc03#egg=exporters
fancycompleter==0.9.1
filelock==3.12.2
fsspec==2023.6.0
huggingface-hub==0.15.1
idna==3.4
ipykernel==6.23.2
ipython==8.14.0
jedi==0.18.2
Jinja2==3.1.2
jupyter_client==8.2.0
jupyter_core==5.3.1
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
mpmath==1.3.0
nest-asyncio==1.5.6
networkx==3.1
numpy==1.24.2
packaging==23.1
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.6.0
prompt-toolkit==3.0.38
protobuf==3.20.1
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==23.5.9
Pygments==2.15.1
pyrepl==0.9.0
python-dateutil==2.8.2
PyYAML==6.0
pyzmq==25.1.0
regex==2023.6.3
requests==2.31.0
six==1.16.0
stack-data==0.6.2
sympy==1.12
tokenizers==0.13.3
torch==2.0.0
tornado==6.3.2
tqdm==4.65.0
traitlets==5.9.0
transformers==4.29.2
typing_extensions==4.6.3
urllib3==2.0.3
wcwidth==0.2.6
wmctrl==0.4

Converting a pipeline

What would it take to convert an entire pipeline to a coreml model?

For instance, I have saved the stable-diffusion checkpoint, and several of the models have their own configs, but of course they're not the ready-made configs.

Screen Shot 2022-09-08 at 10 24 00 PM

Would this be just a long, hard, custom slog via exporters and not worth it? Or is there something here worth pursuing?

`trust_remote_code=True`

I'm doing exporters with convert tiiuae/falcon-7b-instruct to CoreML Model.
python -m exporters.coreml --model=tiiuae/falcon-7b-instruct exported/

it shows some error to fix it:
image

Error when exporting gpt2

I get this error when trying to convert gpt2

/site-packages/coremltools/converters/mil/mil/input_type.py", line 162, in validate_inputs
    raise ValueError(msg.format(name, var.name, input_type.type_str,
ValueError: Op "137" (op_type: fill) Input shape="136" expects tensor or scalar of dtype from type domain ['int32'] but got tensor[0,fp32]

I first tried:

python -m exporters.coreml --model=gpt2 --framework=pt --feature=causal-lm models/gpt2.mlpackage

Next I tried:

from exporters.coreml import export
from exporters.coreml.models import GPT2CoreMLConfig
from transformers import GPT2LMHeadModel, GPT2Tokenizer

model_ckpt = "gpt2"
base_model = GPT2LMHeadModel.from_pretrained(
    model_ckpt, torchscript=True
)
preprocessor = GPT2Tokenizer.from_pretrained(model_ckpt)

coreml_config = GPT2CoreMLConfig(
    base_model.config, 
    task="causal-lm",
)
mlmodel = export(
    preprocessor, base_model, coreml_config
)

mlmodel.save(f"models/{model_ckpt}.mlpackage")

But they both give the same error

I realise this repo is WIP, but I had seen the list here saying GPT2 model is supported: https://github.com/huggingface/exporters/blob/main/MODELS.md

`GPTNeoX` incompatible with transformers >= 4.28.0

As discovered in #42.

The incompatibility was introduced in huggingface/transformers@7dcd870

Concretely, the reason for the problem lies in the use of torch.gather. When converted to Core ML, this assertion fails if shapes are flexible.

(There's a new implementation of gather_along_axis for iOS17 but by looking at the source code I don't think it would fix the problem).

The obvious workaround is to disable flexible shapes for GPTNeoX. This, in fact, is better for performance as flexible shapes don't seem to be compatible with GPU or ANE.

Support for OPT Models

Would be great to figure out how to support OPT models. models.md has a note that OPT is not supported yet:

OPT [TODO verify] Conversion error on a slicing operation.

Bloom still has the same note but is now fully supported by exporters. So I'm wondering if there actually still an issue with the OPT models, or if the underlying issue was resolved already. If so, then they could be listed as supported. Happy to pitch in if anyone has context on outstanding issues with the OPT models.

Thanks!

Export Phi-2

Hi!

I'm converting the Microsoft's Phi-2 model to use with swift-transformers.

The conversion process is actually very seamless:

from transformers import AutoTokenizer, AutoModelForCausalLM
from exporters.coreml import CoreMLConfig
from exporters.coreml import export

model = "microsoft/phi-2"

# Load tokenizer and PyTorch weights form the Hub
tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)
pt_model = AutoModelForCausalLM.from_pretrained(model, trust_remote_code=True, torchscript=True)

class Phi2CoreMLConfig(CoreMLConfig):
    modality = "text"


coreml_config = Phi2CoreMLConfig(pt_model.config, task="text-generation")
mlmodel = export(tokenizer, pt_model, coreml_config)
mlmodel.save("Phi2.mlpackage")

Note that by default the export function is using float32.

Then, I'm using the swift-chat repo to run the model. I'm using the Llama-2 tokenizer. It works perfectly well out of the box. There was only one missing token, the 'space' (' '), but apart from that it works.

The issue is that it is super, super slow (I have a MacBook Pro with 16gb RAM and M1) and it's using close to 11GB of memory. Although the inference is slow, the output makes sense.

Given that it is so slow, I converted the model using float16:

mlmodel = export(tokenizer, pt_model, coreml_config, quantize="float16")

The model is now 5GB, but the inference is giving me gibberish (the output was, before, something that made sense, now it's just a bunch of exclamation marks). I downloaded the model (the 5GB one) into my iPhone 14 Pro and after a few seconds, while it is loading, the app just closes itself.

  1. How can I further decrease the model size? Can we quantize the model even more using CoreML?
  2. Why is the inference speed so slow (with the default float32)?
  3. Why is the model with quantize="float16" basically instantaneous, but outputting gibberish?

Thank you so much for the help!

Error convert pytorch bert-small-uncased for text classification

Hello. I am trying to convert finetuned pytorch version of bert-small-uncased model to coreml one but getting the following error:

python -m exporters.coreml --model=./small_legal_bert --feature text-classification  exported/ 

Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
        - use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                            | 0/345 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops:  99%|███████████████████████████████████████████████████████████████████████████████▌| 343/345 [00:00<00:00, 4742.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 948.04 passes/s]
Running MIL default pipeline:   0%|                                                                                                     | 0/56 [00:00<?, ? passes/s]/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:262: UserWarning: Output, '555', of the source model, has been renamed to 'var_555' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 56/56 [00:00<00:00, 159.49 passes/s]
Running MIL backend_mlprogram pipeline: 100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1016.90 passes/s]
/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "Failed to parse the model specification. Error: Unable to parse ML Program: in operation of type classify: Classifier probabilities must have a fully known shape.".
  _warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
  File "/Users/dgilim/anaconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/dgilim/anaconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 175, in <module>
    main()
  File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 163, in main
    convert_model(
  File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 67, in convert_model
    validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
  File "/Users/dgilim/Projects/exporters/src/exporters/coreml/validate.py", line 108, in validate_model_outputs
    coreml_outputs = mlmodel.predict(coreml_inputs)
  File "/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py", line 554, in predict
    raise self._framework_error
  File "/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py", line 144, in _get_proxy_and_spec
    return _MLModelProxy(filename, compute_units.name), specification, None
RuntimeError: Error compiling model: "Failed to parse the model specification. Error: Unable to parse ML Program: in operation of type classify: Classifier probabilities must have a fully known shape.".

Also attaching config.json from the model:

{
  "_name_or_path": "nlpaueb/legal-bert-small-uncased",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_ids": 0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 512,
  "initializer_range": 0.02,
  "intermediate_size": 2048,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_labels": 2,
  "num_attention_heads": 8,
  "num_hidden_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.28.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

mlpackage vs mlmodel for Falcon 7B

Hi! Sorry for a noob question, but I've had an experience using BERT in *.mlmodel format, where I just added it to my project, created a *.swift file with its Class and it worked on iOS. Now, when I use exporters, they create MLPackage files and I don't understand how to use it.

I want to use Falcon 7B locally and don't understand how to convert it to *.mlmodel and how to use it in my iPhone app.

Support for smaller quantization, 8 or 4 at least

This tool is amazing, having tried scripting using the coreml library by hand, running into all kinds of fun issues, then trying this and it all being orchestrated/abstracted for you, this is excellent 👏

I noticed that there's only quantization support for down to 16 bits however, and would love to have smaller options. I do believe CoreML is capable of these so it may just be adding that call to this wrapper.

I did look in convert.py and I do see a flag use_legacy_format being checked before performing quantize 16, is there something different with how the ML Program handles or does lower bit quantization?

CoreML Convert Error for distilbert-base-uncased-squad2 Question/Answering model - ValueError: node input.19 (gelu) got 2 input(s), expected [1]

I get a gelu Value Error when trying to convert a distilbert-base-uncased-squad2' model. I also get the same error with the full Bert model bert-large-cased-whole-word-masking-finetuned-squad. It is that the CoreML converter cannot handle 2 inputs, one input for the "question" and another input for the "context"? How can this be fixed?

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained('twmkn9/distilbert-base-uncased-squad2')
model = AutoModelForQuestionAnswering.from_pretrained('twmkn9/distilbert-base-uncased-squad2', torchscript=True)

tokenizer.save_pretrained("local-pt-checkpoint")
model.save_pretrained("local-pt-checkpoint")

Command Line> python -m exporters.coreml --model=twmkn9/distilbert-base-uncased-squad2 --feature=question-answering local-pt-checkpoint/

ValueError: node input.19 (gelu) got 2 input(s), expected [1]

Exporter being killed

Similar to #61, my exporter process is being killed. I'd like to verify this is a resource constraint, and not an issue in project. I am running python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 mistral.mlpackage on a M3 MacBook Pro with 18GB of memory.

model-00001-of-00002.safetensors: 100%|████| 9.94G/9.94G [07:47<00:00, 21.3MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.54G/4.54G [04:42<00:00, 16.1MB/s]
Downloading shards: 100%|████████████████████████| 2/2 [12:31<00:00, 375.71s/it]████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.54G/4.54G [04:42<00:00, 16.7MB/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:25<00:00, 12.58s/it]
Using framework PyTorch: 2.1.0
Overriding 1 configuration item(s)
	- use_cache -> False
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:161: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if past_key_values_length > 0:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:285: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:304: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Patching PyTorch conversion 'log' with <function MistralCoreMLConfig.patch_pytorch_ops.<locals>.log at 0x13a115300>
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__contains__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__getitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__delitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__setitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                             | 0/4506 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4505/4506 [00:01<00:00, 3255.50 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 13.02 passes/s]
Running MIL default pipeline:  14%|████████████████████                                                                                                                          | 10/71 [00:00<00:03, 15.93 passes/s]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '5409', of the source model, has been renamed to 'var_5409' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  73%|████████████████████████████████████████████████████████████████████████████████████████████████████████                                      | 52/71 [03:36<02:09,  6.79s/ passes]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
  return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [07:27<00:00,  6.30s/ passes]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 168.96 passes/s]
zsh: killed     python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 
willwalker misty > /opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

SegFormer model exported to CoreML is slow

I was trying to export Segformer models to CoreML but the exported model is slow compared to the same model exported on my own.

I tried to export the model using the following command:

python -m exporters.coreml --model=nvidia/mit-b2 --feature=semantic-segmentation exports/

This model median prediction time is 500ms on my MacBook Pro M1 using all the available accelerators (ANE, GPU, CPU), above the 300ms of the same model exported on my own using coremltools directly.

I did a little of profiling to identify the issue using Xcode Instruments. It look like the model is exported and executed in Float32. This greatly undermined the performance since Float16 data is required for the ANE to be used. Thus, the ANE is not used at all and the model is executed on GPU only on most devices. Also, Float32 computations are slower than Float16 computations on the GPU, thus Float32 should be avoided when possible. In the coremltools documentation Apple suggests to use Float16 as a default and as of version 7.0 Float16 is the default precision for CoreML exports.

With the option --quantize=float16 the inference time is on par with the model I exported (around 300ms). I suggest to use the coremltools default Float16 precision instead of Float32 in order to get the most of the specialized hardware or Apple platforms.

I also noted another issue but not related to the exporters framework. In Float16 and with the ANE, the Instruments trace suggests that half of the prediction time is spent in GPU kernels. That is weird since only 1 operator is executed on the GPU in this case: the argmax operation at the end of the model. This slowdown needs further investigation but this may be due to the large size of the input tensor (1000x512x512). I tried with only 16 output classes and the inference time drop down to 60ms.

Screenshot 2023-10-01 at 12 31 47

Error for Keras (TF) models

Seems like each of the Keras models need a config.json file.

For e.g.
python -m exporters.coreml --model=keras-io/transformers-qa exported/ works because this model has a config.json but python -m exporters.coreml --model=keras-io/image-captioning exported/ fails with message OSError: keras-io/image-captioning does not appear to have a file named config.json. Checkout 'https://huggingface.co/keras-io/image-captioning/main' for available files.

Is there any workaround or each of the Keras models need a config file for exporter to work?

why illegal hardware instruction

➜ exporters git:(main) python3 -m exporters.coreml --model=distilbert-base-uncased exported/

[1] 51183 illegal hardware instruction python3 -m exporters.coreml --model=distilbert-base-uncased exported/

Implement optimizations as in `ane_transformers`

ane_transformers (https://github.com/apple/ml-ane-transformers and https://machinelearning.apple.com/research/neural-engine-transformers) suggest weight-compatible changes to transformers allowing better mapping of the ops to ANE and thus resulting in significant performance improvement.

@hollance do you think these optimizations "belong" in 🤗 Exporters? If yes, how do you envision their implementation: within CoreMLConfig abstraction or somewhere else?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.