huggingface / exporters Goto Github PK
View Code? Open in Web Editor NEWExport Hugging Face models to Core ML and TensorFlow Lite
License: Apache License 2.0
Export Hugging Face models to Core ML and TensorFlow Lite
License: Apache License 2.0
I was planning to convert Voicelab/vlt5-base-keywords
to coreml model. Everything went well, but I got an error in the end:
Validating Core ML model...
-[✓] Core ML model output names match reference model ({'last_hidden_state'})
- Validating Core ML model output "last_hidden_state":
-[✓] (1, 128, 768) matches (1, 128, 768)
-[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
main()
File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 146, in main
convert_model(
File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 70, in convert_model
validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/validate.py", line 220, in validate_model_outputs
raise ValueError(
ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: nan
I got a problem when I want to convert a vlt5 model to coreml model
KeyError: "voicelab/vlt5-base-keywords is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'falcon', 'gpt2', 'gpt_bigcode', 'gptj', 'gpt_neo', 'gpt_neox', 'levit', 'llama', 'm2m_100', 'marian', 'mistral', 'mobilebert', 'mobilevit', 'mobilevitv2', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos'] are supported. If you want to support voicelab/vlt5-base-keywords please propose a PR or open up an issue."
Hi, I encounter following error when exporting sentence-tranformers/all-MiniLM-L6-v2 (a pytroch model) to a Coreml model.
python -m exporters.coreml --model=sentence-transformers/all-MiniLM-L6-v2 exported/
Using framework PyTorch: 1.12.1
Overriding 1 configuration item(s)
- use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/342 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 99%|█████████████████████████████████████▊| 340/342 [00:00<00:00, 2753.73 ops/s]
Running MIL Common passes: 0%| | 0/40 [00:00<?, ? passes/s]/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, ‘546’, of the source model, has been renamed to ‘var_546’ in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|████████████████████████████████████████████████████| 40/40 [00:00<00:00, 233.90 passes/s]
Running MIL Clean up passes: 100%|██████████████████████████████████████████████████| 11/11 [00:00<00:00, 132.81 passes/s]
/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: “compiler error: Encountered an error while compiling a neural network model: validator error: Model output ‘pooler_output’ has a different shape than its corresponding return value to main.“.
_warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
File “/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/runpy.py”, line 86, in _run_code
exec(code, run_globals)
File “/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 166, in <module>
main()
File “/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 154, in main
convert_model(
File “/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 65, in convert_model
validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
File “/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/validate.py”, line 108, in validate_model_outputs
coreml_outputs = mlmodel.predict(coreml_inputs)
File “/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py”, line 553, in predict
raise self._framework_error
File “/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py”, line 144, in _get_proxy_and_spec
return _MLModelProxy(filename, compute_units.name), specification, None
RuntimeError: Error compiling model: “compiler error: Encountered an error while compiling a neural network model: validator error: Model output ‘pooler_output’ has a different shape than its corresponding return value to main.“.
The problem is similar to the problem mentioned in #9. I also tried to use the workaround to fix the problem. However, I got following error when I tried to do the prediction. Note that, "Model.mlpackage" is obtained by using above command.
import torch
import transformers
import coremltools as ct
import numpy as np
from exporters.coreml.models import BertCoreMLConfig
from transformers import AutoConfig
model_name = ‘sentence-transformers/all-MiniLM-L6-v2’
config = AutoConfig.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name, use_fast=True)
mlmodel = ct.models.MLModel(“Model.mlpackage”)
del mlmodel._spec.description.output[1].type.multiArrayType.shape[:]
mlmodel = ct.models.MLModel(mlmodel._spec, weights_dir=mlmodel.weights_dir)
mlmodel.save(“ModelFixed.mlpackage”)
sentences = [‘This is an example sentence’]
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors=‘pt’)
cml_inputs = {k: v.to(torch.int32).numpy() for k, v in encoded_input.items()}
pred_coreml = mlmodel.predict(cml_inputs)
print(pred_coreml)
What I got is following error
KeyError: 'Provided key "token_type_ids", in the input dict, does not match any of the model input name(s), which are: input_ids,attention_mask'
I noticed that facebook/detr-resnet-50 is not able to convert into a CoreML format when using the Command Line prompt "python -m exporters.coreml --model=path_to_checkpoint path_to_converted_model
.
In MODELS.md, for the Detr model, it states "The conversion completes without errors but the Core ML compiler cannot load the model. "Invalid operation output name: got 'tensor' when expecting token of type 'ID'".
Are you planning to release a complete export for Detr models soon? Could you please keep me posted?
When trying to convert a model with safetensor weights, exporters fails with [MODEL] does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack
. Adding support would help out a lot, especially as safetensors seem to be pushed as the new standard for storing weights.
this would be really helpful for an app we are building
Hello,
I'm trying to convert M2M100 to CoreML. I saw that it is partially supported, and I was wondering if there's any example script to do this.
Here's what I tried:
from exporters.coreml.models import M2M100CoreMLConfig
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
model_ckpt = "facebook/m2m100_418M"
base_model = M2M100ForConditionalGeneration.from_pretrained(
model_ckpt, torchscript=True
)
preprocessor = M2M100Tokenizer.from_pretrained(model_ckpt)
coreml_config = M2M100CoreMLConfig(
base_model.config,
task="text2text-generation",
use_past=False,
)
mlmodel = export(
preprocessor, base_model, coreml_config
)
However, when trying to run this code, I get the following error:
ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds
Thank you in advance!
Hello,
I am very new to HuggingFace and machine learning in general. I understand that the Blip model is not supported for conversion to coreml. Can this be added to this repo? If not, Is there a way I can write my own conversion code?
Thanks
Conversion Settings:
Model: Salesforce/blip2-opt-2.7b
Task: None
Framework: None
Compute Units: None
Precision: None
Tolerance: None
Push to: None
Error: "blip is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'gpt2', 'gpt_neo', 'levit', 'm2m_100', 'marian', 'mobilebert', 'mobilevit', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos'] are supported. If you want to support blip please propose a PR or open up an issue."
Hey guys,
I'm pretty new to CoreML conversion stuff and took the naive approach of converting a T5-Base model to CoreML (I want to use it to generate summarisations). As layed out in the README I created an encoder and a decoder model, which worked without a problem:
(base) me@me-MacBook-Pro ~/Development/projects/exporters$ python -m exporters.coreml --model=t5-small --feature=text2text-generation exported ✭main
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.0.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
Converting encoder model...
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 755/756 [00:00<00:00, 2482.08 ops/s]
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:00<00:00, 73.01 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 27.71 passes/s]
Validating Core ML model...
-[✓] Core ML model output names match reference model ({'last_hidden_state'})
- Validating Core ML model output "last_hidden_state":
-[✓] (1, 128, 768) matches (1, 128, 768)
-[✓] all values close (atol: 0.0001)
All good, model saved at: exported/encoder_Model.mlpackage
Converting decoder model...
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/transformers/modeling_utils.py:828: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if causal_mask.shape[1] < attention_mask.shape[1]:
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 1260/1262 [00:00<00:00, 2404.55 ops/s]
Running MIL Common passes: 5%|████████▊ | 2/39 [00:00<00:02, 15.47 passes/s]/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1761', of the source model, has been renamed to 'var_1761' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:01<00:00, 36.73 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 14.41 passes/s]
Validating Core ML model...
-[✓] Core ML model output names match reference model ({'logits'})
- Validating Core ML model output "logits":
-[✓] (1, 64, 32100) matches (1, 64, 32100)
-[✓] all values close (atol: 0.0001)
All good, model saved at: exported/decoder_Model.mlpackage
This is where the fun begins :) I've only ever worked with the t5 model through transformers & pipelines. Like this:
from torchvision import models
from torchsummary import summary
from transformers import T5TokenizerFast, T5ForConditionalGeneration, pipeline
text = "summarise: The quick brown fox jumps over the lazy dog"
tokenizer = T5TokenizerFast.from_pretrained("t-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base", return_dict=True)
model.to('cuda')
tokens = tokenizer(text, return_tensors="pt")
input_ids = tokens.input_ids
outputs = model.generate(input_ids.cuda(), max_length=40)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
As far as I understand by using the model.generate
method the transformers utilities do all the heavy lifting here like creating the attention_masks, running the encoder, passing the encoder_hidden_states along, etc. pp.
Am I right to assume that I would have to implement all this functionality by hand if I want to work with the CoreML encoder / decoder models?
I'm not only worried about using them in Python, but would also like to use them in Swift. But I guess there's no easy plug'n play solution here, right? :)
Similar to #61, my exporter process is being killed. I'd like to verify this is a resource constraint, and not an issue in project. I am running python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 mistral.mlpackage
on a M3 MacBook Pro with 18GB of memory.
model-00001-of-00002.safetensors: 100%|████| 9.94G/9.94G [07:47<00:00, 21.3MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.54G/4.54G [04:42<00:00, 16.1MB/s]
Downloading shards: 100%|████████████████████████| 2/2 [12:31<00:00, 375.71s/it]████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.54G/4.54G [04:42<00:00, 16.7MB/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:25<00:00, 12.58s/it]
Using framework PyTorch: 2.1.0
Overriding 1 configuration item(s)
- use_cache -> False
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:161: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:285: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:304: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Patching PyTorch conversion 'log' with <function MistralCoreMLConfig.patch_pytorch_ops.<locals>.log at 0x13a115300>
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__contains__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__getitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__delitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__setitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
warnings.warn(msg, category=FutureWarning)
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/4506 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4505/4506 [00:01<00:00, 3255.50 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 13.02 passes/s]
Running MIL default pipeline: 14%|████████████████████ | 10/71 [00:00<00:03, 15.93 passes/s]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '5409', of the source model, has been renamed to 'var_5409' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 73%|████████████████████████████████████████████████████████████████████████████████████████████████████████ | 52/71 [03:36<02:09, 6.79s/ passes]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [07:27<00:00, 6.30s/ passes]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 168.96 passes/s]
zsh: killed python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1
willwalker misty > /opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Would it be possible to support the OneFormer model? I am not experienced with ML, but would love to use that model on mobile devices if possible.
Thank you so much!
It runs well until
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
Then I saw the Activity Monitor the python is stop running.
How to fix this?
(LLM_env) tim@TPE exporters % python -m exporters.coreml --model=/Users/tim/GitLab/survey/LLM/llama-meta/Llama-2-7b-hf exported/
Torch version 2.0.1 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [01:02<00:00, 31.31s/it]
Using framework PyTorch: 2.0.1
Overriding 1 configuration item(s)
- use_cache -> False
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:808: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if input_shape[-1] > 1:
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:375: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:382: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:392: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/3627 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 3626/3627 [00:01<00:00, 3155.13 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 18.10 passes/s]
Running MIL default pipeline: 15%|██████████████████████▋ | 10/66 [00:01<00:05, 10.96 passes/s]/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '4530', of the source model, has been renamed to 'var_4530' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 77%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 51/66 [03:23<01:40, 6.70s/ passes]/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 66/66 [09:44<00:00, 8.86s/ passes]
Running MIL backend_mlprogram pipeline: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 65.90 passes/s]
zsh: killed python -m exporters.coreml exported/
(LLM_env) tim@TPE exporters % /Users/tim/.pyenv/versions/3.11.5/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
Recently onnx export for AutoModelForVision2Seq has been added: huggingface/transformers#19254
I would be really interested in converting this topology to coreml. Is this feature planned for coreml as well?
Thanks!
Just FYI, the blocker (glu
op) of exporting Speech2Text model has been added into coremltools in this PR.
I'm unable to use exporters
for meta-llama/Llama-2-7b-chat-hf
model.
Here is my command
python -m exporters.coreml --model=meta-llama/Llama-2-7b-chat-hf models/llama2.mlpackage
And here is the output
% python -m exporters.coreml --model=meta-llama/Llama-2-7b-chat-hf models/llama2.mlpackage
Torch version 2.3.0 has not been tested with coremltools. You may run into unexpected errors. Torch 2.2.0 is the most recent version that has been tested.
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:30<00:00, 15.44s/it]
Using framework PyTorch: 2.3.0
Overriding 1 configuration item(s)
- use_cache -> False
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/transformers/modeling_utils.py:4371: FutureWarning: `_is_quantized_training_enabled` is going to be deprecated in transformers 4.39.0. Please use `model.hf_quantizer.is_trainable` instead
warnings.warn(
The cos_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
The sin_cached attribute will be removed in 4.39. Bear in mind that its contents changed in v4.38. Use the forward method of RoPE from now on instead. It is not used in the `LlamaAttention` class
/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:1094: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if sequence_length != 1:
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/3690 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
ERROR - converting 'full' op (located at: 'model'):
Converting PyTorch Frontend ==> MIL Ops: 1%|▉ | 28/3690 [00:00<00:00, 5249.21 ops/s]
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
main()
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 166, in main
convert_model(
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
mlmodel = export(
^^^^^^^
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/convert.py", line 660, in export
return export_pytorch(preprocessor, model, config, quantize, compute_units)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/LLAMA2/exporters/src/exporters/coreml/convert.py", line 553, in export_pytorch
mlmodel = ct.convert(
^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 581, in convert
mlmodel = mil_convert(
^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
proto, mil_program = mil_convert_to_proto(
^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 288, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
return load(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 82, in load
return _perform_torch_convert(converter, debug)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 116, in _perform_torch_convert
prog = converter.convert()
^^^^^^^^^^^^^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 581, in convert
convert_nodes(self.context, self.graph)
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 86, in convert_nodes
raise e # re-raise exception
^^^^^^^
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 81, in convert_nodes
convert_single_node(context, node)
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 134, in convert_single_node
add_op(context, node)
File "/Users/user/anaconda3/envs/hf-exporters/lib/python3.11/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4211, in full
else NUM_TO_NUMPY_DTYPE[TORCH_DTYPE_TO_NUM[inputs[2].val]]
~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
KeyError: 6
I was able to generate a mlpackage
for distilbert-base-uncased-finetuned-sst-2-english
, with this command: python -m exporters.coreml --model=distilbert-base-uncased-finetuned-sst-2-english --feature=sequence-classification models/defaults.mlpackage
, so I have some confidence that the environment is correct and working.
Embeddings models are very useful, and can easily be run on device in terms of hardware specs.
It would be awesome if swift-transformers would work with them.
When trying to export the Huggingface models Deeppavlov/rubert-base-cased and ckiplab/bert-base-chinese-ner using the command line, it fails with the output
Some weights of the model checkpoint at Deeppavlov/rubert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 1.12.1
Overriding 1 configuration item(s)
- use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/630 [00:00<?, ? ops/s]CoreML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████████████▋| 628/630 [00:00<00:00, 4660.63 ops/s]
Running MIL Common passes: 0%| | 0/39 [00:00<?, ? passes/s]/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1020', of the source model, has been renamed to 'var_1020' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 39/39 [00:00<00:00, 47.87 passes/s]
Running MIL Clean up passes: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 31.91 passes/s]
/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py:145: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "compiler error: Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".
_warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 166, in <module>
main()
File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 154, in main
convert_model(
File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 65, in convert_model
validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
File "/Users/starlight/exporters/src/exporters/coreml/validate.py", line 108, in validate_model_outputs
coreml_outputs = mlmodel.predict(coreml_inputs)
File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 545, in predict
raise self._framework_error
File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 143, in _get_proxy_and_spec
return (_MLModelProxy(filename, compute_units.name), specification, None)
RuntimeError: Error compiling model: "compiler error: Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".
Exception ignored in: <function MLModel.__del__ at 0x11ebe1ee0>
Traceback (most recent call last):
File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 369, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down
It runs correctly with --model=distillbert-base-uncased.
Using
python 3.9.13,
coremltools 6.1
torch 1.12.1
A .mlpackage file is created, but I can't use one I can't call predict() on.
Hello, I'm currently encountering an issue while transforming models using Roberta.
Roberta is a text classification model to evaluate emotions or if the content is hateful. So it is supposed to be a very simple text classification model.
I tried to use the exporter with the following models:
First I was using the web tool to transform directly the model, and I could only select the option "text-generation". When trying directly from the python tool, the following error is returned:
Traceback (most recent call last):
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
main()
File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/__main__.py", line 141, in main
model_kind, model_coreml_config = FeaturesManager.check_supported_model_or_raise(model, feature=args.feature)
File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/features.py", line 498, in check_supported_model_or_raise
raise ValueError(
ValueError: roberta doesn't support feature text-classification. Supported values are: {'text-generation': functools.partial(<bound method CoreMLConfig.from_model_config of <class 'exporters.coreml.models.RobertaCoreMLConfig'>>, task='text-generation'), 'text-generation-with-past': functools.partial(<bound method CoreMLConfig.with_past of <class 'exporters.coreml.models.RobertaCoreMLConfig'>>, task='text-generation')}
If I understand correctly, all models are supposed to be trained and directly available to use. Am I missing a step or a configuration to make them work ?
Thank you.
I am doing this:
python -m exporters.coreml --model=bert-base-uncased exported/
and running into error:
RuntimeError: Error compiling model: "compiler error: Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".
Did the underlying Bert implementation's api change?
I hit similar errors with some of the other models mentioned in the Readme (ready-made configurations)
from this tweet https://twitter.com/pcuenq/status/1664605575882366980?s=20, it seems possible ?
Hello HuggingFace and Its Wonderful Employees!!,
I was just checking if It is possible for me to convert "https://huggingface.co/microsoft/kosmos-2-patch14-224"
model to support coreml so that I can use it on my mac?
its an Image to Text (Image Captioning Model)
I have tried it now but it says this model is not supported, Is there any way I or we could add support for this?
Thanks!!!!
Out of sheer curiosity I tried to export bigcode/starcoder to CoreML and got the following error after downloading the weights:
"gpt_bigcode is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'gpt2', 'gpt_neo', 'levit', 'm2m_100', 'marian', 'mobilebert', 'mobilevit', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos']
I understand GPTBigCode is an optimized GPT2 Model with support for Multi-Query Attention.
https://huggingface.co/docs/transformers/model_doc/gpt_bigcode
Python isn't my strong suit but I just wanted to flag this here. Would running Starcoder on CoreML even be feasible or is it too large?
I was wondering if its possible to support the conversion of the Pythia models to coreml. Naively I ran python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/
which gave me this error:
python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 4%|█████▏ | 86/2272 [00:00<00:01, 2038.49 ops/s]
Traceback (most recent call last):
File "/opt/homebrew/Cellar/[email protected]/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/homebrew/Cellar/[email protected]/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
main()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 166, in main
convert_model(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
mlmodel = export(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 687, in export
return export_pytorch(preprocessor, model, config, quantize, compute_units)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 552, in export_pytorch
mlmodel = ct.convert(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py", line 530, in convert
mlmodel = mil_convert(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
proto, mil_program = mil_convert_to_proto(
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 286, in mil_convert_to_proto
prog = frontend_converter(model, **kwargs)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
return load(*args, **kwargs)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 63, in load
return _perform_torch_convert(converter, debug)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 102, in _perform_torch_convert
prog = converter.convert()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 439, in convert
convert_nodes(self.context, self.graph)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
add_op(context, node)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4502, in gather
res = mb.gather_along_axis(x=inputs[0], indices=inputs[2], axis=inputs[1], name=node.name)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
return cls._add_op(op_cls_to_add, **kwargs)
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/builder.py", line 182, in _add_op
new_op.type_value_inference()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py", line 253, in type_value_inference
output_types = self.type_inference()
File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/scatter_gather.py", line 312, in type_inference
assert self.x.shape[i] == self.indices.shape[i]
AssertionError
I tried bypassing this error by commenting the line out, which results in sometimes a memory leak (I think, as my memory usage goes to 60 GB), but I was able to export it one time but it fails the performance report in xcode. When commenting out the line I get this output:
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/2272 [00:00<?, ? ops/s](is13, 1, 2048, 64) (is11, 1, is12, 64)
(is14, 1, 2048, 64) (is11, 1, is12, 64)
(is53, 1, 2048, 64) (is51, 1, is52, 64)
(is54, 1, 2048, 64) (is51, 1, is52, 64)
Converting PyTorch Frontend ==> MIL Ops: 11%|███████████████████████████████▎ | 250/2272 [00:00<00:00, 2499.35 ops/s](is107, 1, 2048, 64) (is105, 1, is106, 64)
(is108, 1, 2048, 64) (is105, 1, is106, 64)
(is161, 1, 2048, 64) (is159, 1, is160, 64)
(is162, 1, 2048, 64) (is159, 1, is160, 64)
Converting PyTorch Frontend ==> MIL Ops: 23%|████████████████████████████████████████████████████████████████▎ | 513/2272 [00:00<00:00, 2575.44 ops/s](is215, 1, 2048, 64) (is213, 1, is214, 64)
(is216, 1, 2048, 64) (is213, 1, is214, 64)
Converting PyTorch Frontend ==> MIL Ops: 34%|████████████████████████████████████████████████████████████████████████████████████████████████▋ | 771/2272 [00:00<00:00, 2514.44 ops/s](is269, 1, 2048, 64) (is267, 1, is268, 64)
(is270, 1, 2048, 64) (is267, 1, is268, 64)
(is323, 1, 2048, 64) (is321, 1, is322, 64)
(is324, 1, 2048, 64) (is321, 1, is322, 64)
Converting PyTorch Frontend ==> MIL Ops: 45%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 1023/2272 [00:00<00:00, 2458.22 ops/s](is377, 1, 2048, 64) (is375, 1, is376, 64)
(is378, 1, 2048, 64) (is375, 1, is376, 64)
(is431, 1, 2048, 64) (is429, 1, is430, 64)
(is432, 1, 2048, 64) (is429, 1, is430, 64)
Converting PyTorch Frontend ==> MIL Ops: 56%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 1274/2272 [00:00<00:00, 2413.73 ops/s](is485, 1, 2048, 64) (is483, 1, is484, 64)
(is486, 1, 2048, 64) (is483, 1, is484, 64)
(is539, 1, 2048, 64) (is537, 1, is538, 64)
(is540, 1, 2048, 64) (is537, 1, is538, 64)
Converting PyTorch Frontend ==> MIL Ops: 67%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 1516/2272 [00:00<00:00, 2176.52 ops/s](is593, 1, 2048, 64) (is591, 1, is592, 64)
(is594, 1, 2048, 64) (is591, 1, is592, 64)
Converting PyTorch Frontend ==> MIL Ops: 76%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 1738/2272 [00:00<00:00, 2144.58 ops/s](is647, 1, 2048, 64) (is645, 1, is646, 64)
(is648, 1, 2048, 64) (is645, 1, is646, 64)
(is701, 1, 2048, 64) (is699, 1, is700, 64)
(is702, 1, 2048, 64) (is699, 1, is700, 64)
Converting PyTorch Frontend ==> MIL Ops: 87%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 1969/2272 [00:00<00:00, 2149.72 ops/s](is755, 1, 2048, 64) (is753, 1, is754, 64)
(is756, 1, 2048, 64) (is753, 1, is754, 64)
(is809, 1, 2048, 64) (is807, 1, is808, 64)
(is810, 1, 2048, 64) (is807, 1, is808, 64)
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 2271/2272 [00:01<00:00, 2253.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 36.95 passes/s]
Running MIL default pipeline: 14%|██████████████████████████████████████████▋ | 9/63 [00:00<00:03, 17.14 passes/s]/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:262: UserWarning: Output, '2680', of the source model, has been renamed to 'var_2680' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 38%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 24/63 [00:01<00:01, 28.21 passes/s](1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
Running MIL default pipeline: 59%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 37/63 [00:01<00:00, 28.56 passes/s](1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
Running MIL default pipeline: 92%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 58/63 [00:03<00:00, 12.22 passes/s](1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 63/63 [00:04<00:00, 14.28 passes/s]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 190.00 passes/s]
Any ideas?
Copy-and-paste the text below in your GitHub issue.
- huggingface_hub version: 0.15.1
- Platform: macOS-13.4-arm64-arm-64bit
- Python version: 3.10.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /Users/kendreaditya/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: osxkeychain
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.0.0
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- numpy: 1.24.2
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /Users/kendreaditya/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /Users/kendreaditya/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/kendreaditya/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
appnope==0.1.3
asttokens==2.2.1
attrs==23.1.0
backcall==0.2.0
cattrs==23.1.2
certifi==2023.5.7
charset-normalizer==3.1.0
comm==0.1.3
coremltools==7.0b1
debugpy==1.6.7
decorator==5.1.1
einops==0.6.1
exceptiongroup==1.1.1
executing==1.2.0
-e git+https://github.com/huggingface/exporters.git@d83cf6268fcaf1c6259511ddbd32dc9dcd79bc03#egg=exporters
fancycompleter==0.9.1
filelock==3.12.2
fsspec==2023.6.0
huggingface-hub==0.15.1
idna==3.4
ipykernel==6.23.2
ipython==8.14.0
jedi==0.18.2
Jinja2==3.1.2
jupyter_client==8.2.0
jupyter_core==5.3.1
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
mpmath==1.3.0
nest-asyncio==1.5.6
networkx==3.1
numpy==1.24.2
packaging==23.1
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.6.0
prompt-toolkit==3.0.38
protobuf==3.20.1
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==23.5.9
Pygments==2.15.1
pyrepl==0.9.0
python-dateutil==2.8.2
PyYAML==6.0
pyzmq==25.1.0
regex==2023.6.3
requests==2.31.0
six==1.16.0
stack-data==0.6.2
sympy==1.12
tokenizers==0.13.3
torch==2.0.0
tornado==6.3.2
tqdm==4.65.0
traitlets==5.9.0
transformers==4.29.2
typing_extensions==4.6.3
urllib3==2.0.3
wcwidth==0.2.6
wmctrl==0.4
What would it take to convert an entire pipeline to a coreml model?
For instance, I have saved the stable-diffusion checkpoint, and several of the models have their own configs, but of course they're not the ready-made configs.
Would this be just a long, hard, custom slog via exporters
and not worth it? Or is there something here worth pursuing?
I get this error when trying to convert gpt2
/site-packages/coremltools/converters/mil/mil/input_type.py", line 162, in validate_inputs
raise ValueError(msg.format(name, var.name, input_type.type_str,
ValueError: Op "137" (op_type: fill) Input shape="136" expects tensor or scalar of dtype from type domain ['int32'] but got tensor[0,fp32]
I first tried:
python -m exporters.coreml --model=gpt2 --framework=pt --feature=causal-lm models/gpt2.mlpackage
Next I tried:
from exporters.coreml import export
from exporters.coreml.models import GPT2CoreMLConfig
from transformers import GPT2LMHeadModel, GPT2Tokenizer
model_ckpt = "gpt2"
base_model = GPT2LMHeadModel.from_pretrained(
model_ckpt, torchscript=True
)
preprocessor = GPT2Tokenizer.from_pretrained(model_ckpt)
coreml_config = GPT2CoreMLConfig(
base_model.config,
task="causal-lm",
)
mlmodel = export(
preprocessor, base_model, coreml_config
)
mlmodel.save(f"models/{model_ckpt}.mlpackage")
But they both give the same error
I realise this repo is WIP, but I had seen the list here saying GPT2 model is supported: https://github.com/huggingface/exporters/blob/main/MODELS.md
Running any model that I've exported myself returns nonsensical generation that is often the same token multiple times.
As discovered in #42.
The incompatibility was introduced in huggingface/transformers@7dcd870
Concretely, the reason for the problem lies in the use of torch.gather
. When converted to Core ML, this assertion fails if shapes are flexible.
(There's a new implementation of gather_along_axis
for iOS17
but by looking at the source code I don't think it would fix the problem).
The obvious workaround is to disable flexible shapes for GPTNeoX
. This, in fact, is better for performance as flexible shapes don't seem to be compatible with GPU or ANE.
Would be great to figure out how to support OPT models. models.md
has a note that OPT is not supported yet:
OPT [TODO verify] Conversion error on a slicing operation.
Bloom still has the same note but is now fully supported by exporters. So I'm wondering if there actually still an issue with the OPT models, or if the underlying issue was resolved already. If so, then they could be listed as supported. Happy to pitch in if anyone has context on outstanding issues with the OPT models.
Thanks!
Hi!
I'm converting the Microsoft's Phi-2 model to use with swift-transformers
.
The conversion process is actually very seamless:
from transformers import AutoTokenizer, AutoModelForCausalLM
from exporters.coreml import CoreMLConfig
from exporters.coreml import export
model = "microsoft/phi-2"
# Load tokenizer and PyTorch weights form the Hub
tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)
pt_model = AutoModelForCausalLM.from_pretrained(model, trust_remote_code=True, torchscript=True)
class Phi2CoreMLConfig(CoreMLConfig):
modality = "text"
coreml_config = Phi2CoreMLConfig(pt_model.config, task="text-generation")
mlmodel = export(tokenizer, pt_model, coreml_config)
mlmodel.save("Phi2.mlpackage")
Note that by default the export
function is using float32
.
Then, I'm using the swift-chat repo to run the model. I'm using the Llama-2 tokenizer. It works perfectly well out of the box. There was only one missing token, the 'space' (' '), but apart from that it works.
The issue is that it is super, super slow (I have a MacBook Pro with 16gb RAM and M1) and it's using close to 11GB of memory. Although the inference is slow, the output makes sense.
Given that it is so slow, I converted the model using float16
:
mlmodel = export(tokenizer, pt_model, coreml_config, quantize="float16")
The model is now 5GB, but the inference is giving me gibberish (the output was, before, something that made sense, now it's just a bunch of exclamation marks). I downloaded the model (the 5GB one) into my iPhone 14 Pro and after a few seconds, while it is loading, the app just closes itself.
float32
)?quantize="float16"
basically instantaneous, but outputting gibberish?Thank you so much for the help!
Hello. I am trying to convert finetuned pytorch version of bert-small-uncased model to coreml one but getting the following error:
python -m exporters.coreml --model=./small_legal_bert --feature text-classification exported/
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
- use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/345 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 99%|███████████████████████████████████████████████████████████████████████████████▌| 343/345 [00:00<00:00, 4742.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 948.04 passes/s]
Running MIL default pipeline: 0%| | 0/56 [00:00<?, ? passes/s]/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:262: UserWarning: Output, '555', of the source model, has been renamed to 'var_555' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████| 56/56 [00:00<00:00, 159.49 passes/s]
Running MIL backend_mlprogram pipeline: 100%|████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 1016.90 passes/s]
/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "Failed to parse the model specification. Error: Unable to parse ML Program: in operation of type classify: Classifier probabilities must have a fully known shape.".
_warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
File "/Users/dgilim/anaconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/Users/dgilim/anaconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 175, in <module>
main()
File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 163, in main
convert_model(
File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 67, in convert_model
validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
File "/Users/dgilim/Projects/exporters/src/exporters/coreml/validate.py", line 108, in validate_model_outputs
coreml_outputs = mlmodel.predict(coreml_inputs)
File "/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py", line 554, in predict
raise self._framework_error
File "/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py", line 144, in _get_proxy_and_spec
return _MLModelProxy(filename, compute_units.name), specification, None
RuntimeError: Error compiling model: "Failed to parse the model specification. Error: Unable to parse ML Program: in operation of type classify: Classifier probabilities must have a fully known shape.".
Also attaching config.json from the model:
{
"_name_or_path": "nlpaueb/legal-bert-small-uncased",
"architectures": [
"BertForSequenceClassification"
],
"attention_probs_dropout_prob": 0.1,
"bos_token_id": 0,
"classifier_dropout": null,
"eos_token_ids": 0,
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 512,
"initializer_range": 0.02,
"intermediate_size": 2048,
"layer_norm_eps": 1e-12,
"max_position_embeddings": 512,
"model_type": "bert",
"num_labels": 2,
"num_attention_heads": 8,
"num_hidden_layers": 6,
"output_past": true,
"pad_token_id": 0,
"position_embedding_type": "absolute",
"problem_type": "single_label_classification",
"torch_dtype": "float32",
"transformers_version": "4.28.1",
"type_vocab_size": 2,
"use_cache": true,
"vocab_size": 30522
}
Hi! Sorry for a noob question, but I've had an experience using BERT in *.mlmodel format, where I just added it to my project, created a *.swift file with its Class and it worked on iOS. Now, when I use exporters, they create MLPackage files and I don't understand how to use it.
I want to use Falcon 7B locally and don't understand how to convert it to *.mlmodel and how to use it in my iPhone app.
This tool is amazing, having tried scripting using the coreml library by hand, running into all kinds of fun issues, then trying this and it all being orchestrated/abstracted for you, this is excellent 👏
I noticed that there's only quantization support for down to 16 bits however, and would love to have smaller options. I do believe CoreML is capable of these so it may just be adding that call to this wrapper.
I did look in convert.py
and I do see a flag use_legacy_format
being checked before performing quantize 16, is there something different with how the ML Program handles or does lower bit quantization?
When the fixes have been released upstream in coremltools
. Reference:
exporters/src/exporters/coreml/models.py
Line 253 in 5711503
Alluded to in #56
I get a gelu Value Error when trying to convert a distilbert-base-uncased-squad2' model. I also get the same error with the full Bert model bert-large-cased-whole-word-masking-finetuned-squad. It is that the CoreML converter cannot handle 2 inputs, one input for the "question" and another input for the "context"? How can this be fixed?
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch
tokenizer = AutoTokenizer.from_pretrained('twmkn9/distilbert-base-uncased-squad2')
model = AutoModelForQuestionAnswering.from_pretrained('twmkn9/distilbert-base-uncased-squad2', torchscript=True)
tokenizer.save_pretrained("local-pt-checkpoint")
model.save_pretrained("local-pt-checkpoint")
Command Line> python -m exporters.coreml --model=twmkn9/distilbert-base-uncased-squad2 --feature=question-answering local-pt-checkpoint/
ValueError: node input.19 (gelu) got 2 input(s), expected [1]
Similar to #61, my exporter process is being killed. I'd like to verify this is a resource constraint, and not an issue in project. I am running python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 mistral.mlpackage
on a M3 MacBook Pro with 18GB of memory.
model-00001-of-00002.safetensors: 100%|████| 9.94G/9.94G [07:47<00:00, 21.3MB/s]
model-00002-of-00002.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.54G/4.54G [04:42<00:00, 16.1MB/s]
Downloading shards: 100%|████████████████████████| 2/2 [12:31<00:00, 375.71s/it]████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.54G/4.54G [04:42<00:00, 16.7MB/s]
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:25<00:00, 12.58s/it]
Using framework PyTorch: 2.1.0
Overriding 1 configuration item(s)
- use_cache -> False
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:161: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if past_key_values_length > 0:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if seq_len > self.max_seq_len_cached:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:285: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:304: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Patching PyTorch conversion 'log' with <function MistralCoreMLConfig.patch_pytorch_ops.<locals>.log at 0x13a115300>
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__contains__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__getitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__delitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__setitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
warnings.warn(msg, category=FutureWarning)
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/4506 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4505/4506 [00:01<00:00, 3255.50 ops/s]
Running MIL frontend_pytorch pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 13.02 passes/s]
Running MIL default pipeline: 14%|████████████████████ | 10/71 [00:00<00:03, 15.93 passes/s]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '5409', of the source model, has been renamed to 'var_5409' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 73%|████████████████████████████████████████████████████████████████████████████████████████████████████████ | 52/71 [03:36<02:09, 6.79s/ passes]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [07:27<00:00, 6.30s/ passes]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 168.96 passes/s]
zsh: killed python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1
willwalker misty > /opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
I was trying to export Segformer models to CoreML but the exported model is slow compared to the same model exported on my own.
I tried to export the model using the following command:
python -m exporters.coreml --model=nvidia/mit-b2 --feature=semantic-segmentation exports/
This model median prediction time is 500ms on my MacBook Pro M1 using all the available accelerators (ANE, GPU, CPU), above the 300ms of the same model exported on my own using coremltools
directly.
I did a little of profiling to identify the issue using Xcode Instruments. It look like the model is exported and executed in Float32. This greatly undermined the performance since Float16 data is required for the ANE to be used. Thus, the ANE is not used at all and the model is executed on GPU only on most devices. Also, Float32 computations are slower than Float16 computations on the GPU, thus Float32 should be avoided when possible. In the coremltools
documentation Apple suggests to use Float16 as a default and as of version 7.0 Float16 is the default precision for CoreML exports.
With the option --quantize=float16
the inference time is on par with the model I exported (around 300ms). I suggest to use the coremltools
default Float16 precision instead of Float32 in order to get the most of the specialized hardware or Apple platforms.
I also noted another issue but not related to the exporters
framework. In Float16 and with the ANE, the Instruments trace suggests that half of the prediction time is spent in GPU kernels. That is weird since only 1 operator is executed on the GPU in this case: the argmax
operation at the end of the model. This slowdown needs further investigation but this may be due to the large size of the input tensor (1000x512x512). I tried with only 16 output classes and the inference time drop down to 60ms.
Seems like each of the Keras models need a config.json file.
For e.g.
python -m exporters.coreml --model=keras-io/transformers-qa exported/
works because this model has a config.json but python -m exporters.coreml --model=keras-io/image-captioning exported/
fails with message OSError: keras-io/image-captioning does not appear to have a file named config.json. Checkout 'https://huggingface.co/keras-io/image-captioning/main' for available files.
Is there any workaround or each of the Keras models need a config file for exporter to work?
➜ exporters git:(main) python3 -m exporters.coreml --model=distilbert-base-uncased exported/
[1] 51183 illegal hardware instruction python3 -m exporters.coreml --model=distilbert-base-uncased exported/
ane_transformers
(https://github.com/apple/ml-ane-transformers and https://machinelearning.apple.com/research/neural-engine-transformers) suggest weight-compatible changes to transformers allowing better mapping of the ops to ANE and thus resulting in significant performance improvement.
@hollance do you think these optimizations "belong" in 🤗 Exporters? If yes, how do you envision their implementation: within CoreMLConfig
abstraction or somewhere else?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.