Giter Site home page Giter Site logo

huggingface / exporters Goto Github PK

View Code? Open in Web Editor NEW
533.0 22.0 31.0 292 KB

Export Hugging Face models to Core ML and TensorFlow Lite

License: Apache License 2.0

Python 100.00%
coreml deep-learning machine-learning model-converter pytorch tensorflow tflite transformer coremltools

exporters's Introduction

πŸ€— Exporters

πŸ‘· WORK IN PROGRESS πŸ‘·

This package lets you export πŸ€— Transformers models to Core ML.

For converting models to TFLite, we recommend using Optimum.

When to use πŸ€— Exporters

πŸ€— Transformers models are implemented in PyTorch, TensorFlow, or JAX. However, for deployment you might want to use a different framework such as Core ML. This library makes it easy to convert Transformers models to this format.

The aim of the Exporters package is to be more convenient than writing your own conversion script with coremltools and to be tightly integrated with the πŸ€— Transformers library and the Hugging Face Hub.

For an even more convenient approach, Exporters powers a no-code transformers to Core ML conversion Space. You can try it out without installing anything to check whether the model you are interested in can be converted. If conversion succeeds, the converted Core ML weights will be pushed to the Hub. For additional flexibility and details about the conversion process, please read on.

Note: Keep in mind that Transformer models are usually quite large and are not always suitable for use on mobile devices. It might be a good idea to optimize the model for inference first using πŸ€— Optimum.

Installation

Clone this repo:

$ git clone https://github.com/huggingface/exporters.git

Install it as a Python package:

$ cd exporters
$ pip install -e .

All done!

Note: The Core ML exporter can be used from Linux but macOS is recommended.

Core ML

Core ML is Apple's software library for fast on-device model inference with neural networks and other types of machine learning models. It can be used on macOS, iOS, tvOS, and watchOS, and is optimized for using the CPU, GPU, and Apple Neural Engine. Although the Core ML framework is proprietary, the Core ML file format is an open format.

The Core ML exporter uses coremltools to perform the conversion from PyTorch or TensorFlow to Core ML.

The exporters.coreml package enables you to convert model checkpoints to a Core ML model by leveraging configuration objects. These configuration objects come ready-made for a number of model architectures, and are designed to be easily extendable to other architectures.

Ready-made configurations include the following architectures:

  • BEiT
  • BERT
  • ConvNeXT
  • CTRL
  • CvT
  • DistilBERT
  • DistilGPT2
  • GPT2
  • LeViT
  • MobileBERT
  • MobileViT
  • SegFormer
  • SqueezeBERT
  • Vision Transformer (ViT)
  • YOLOS

See here for a complete list of supported models.

Exporting a model to Core ML

The exporters.coreml package can be used as a Python module from the command line. To export a checkpoint using a ready-made configuration, do the following:

python -m exporters.coreml --model=distilbert-base-uncased exported/

This exports a Core ML version of the checkpoint defined by the --model argument. In this example it is distilbert-base-uncased, but it can be any checkpoint on the Hugging Face Hub or one that's stored locally.

The resulting Core ML file will be saved to the exported directory as Model.mlpackage. Instead of a directory you can specify a filename, such as DistilBERT.mlpackage.

It's normal for the conversion process to output many warning messages and other logging information. You can safely ignore these. If all went well, the export should conclude with the following logs:

Validating Core ML model...
	-[βœ“] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[βœ“] (1, 128, 768) matches (1, 128, 768)
		-[βœ“] all values close (atol: 0.0001)
All good, model saved at: exported/Model.mlpackage

Note: While it is possible to export models to Core ML on Linux, the validation step will only be performed on Mac, as it requires the Core ML framework to run the model.

The resulting file is Model.mlpackage. This file can be added to an Xcode project and be loaded into a macOS or iOS app.

The exported Core ML models use the mlpackage format with the ML Program model type. This format was introduced in 2021 and requires at least iOS 15, macOS 12.0, and Xcode 13. We prefer to use this format as it is the future of Core ML. The Core ML exporter can also make models in the older .mlmodel format, but this is not recommended.

The process is identical for TensorFlow checkpoints on the Hub. For example, you can export a pure TensorFlow checkpoint from the Keras organization as follows:

python -m exporters.coreml --model=keras-io/transformers-qa exported/

To export a model that's stored locally, you'll need to have the model's weights and tokenizer files stored in a directory. For example, we can load and save a checkpoint as follows:

>>> from transformers import AutoTokenizer, AutoModelForSequenceClassification

>>> # Load tokenizer and PyTorch weights form the Hub
>>> tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
>>> pt_model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")
>>> # Save to disk
>>> tokenizer.save_pretrained("local-pt-checkpoint")
>>> pt_model.save_pretrained("local-pt-checkpoint")

Once the checkpoint is saved, you can export it to Core ML by pointing the --model argument to the directory holding the checkpoint files:

python -m exporters.coreml --model=local-pt-checkpoint exported/

Selecting features for different model topologies

Each ready-made configuration comes with a set of features that enable you to export models for different types of topologies or tasks. As shown in the table below, each feature is associated with a different auto class:

Feature Auto Class
default, default-with-past AutoModel
causal-lm, causal-lm-with-past AutoModelForCausalLM
ctc AutoModelForCTC
image-classification AutoModelForImageClassification
masked-im AutoModelForMaskedImageModeling
masked-lm AutoModelForMaskedLM
multiple-choice AutoModelForMultipleChoice
next-sentence-prediction AutoModelForNextSentencePrediction
object-detection AutoModelForObjectDetection
question-answering AutoModelForQuestionAnswering
semantic-segmentation AutoModelForSemanticSegmentation
seq2seq-lm, seq2seq-lm-with-past AutoModelForSeq2SeqLM
sequence-classification AutoModelForSequenceClassification
speech-seq2seq, speech-seq2seq-with-past AutoModelForSpeechSeq2Seq
token-classification AutoModelForTokenClassification

For each configuration, you can find the list of supported features via the FeaturesManager. For example, for DistilBERT we have:

>>> from exporters.coreml.features import FeaturesManager

>>> distilbert_features = list(FeaturesManager.get_supported_features_for_model_type("distilbert").keys())
>>> print(distilbert_features)
['default', 'masked-lm', 'multiple-choice', 'question-answering', 'sequence-classification', 'token-classification']

You can then pass one of these features to the --feature argument in the exporters.coreml package. For example, to export a text-classification model we can pick a fine-tuned model from the Hub and run:

python -m exporters.coreml --model=distilbert-base-uncased-finetuned-sst-2-english \
                           --feature=sequence-classification exported/

which will display the following logs:

Validating Core ML model...
	- Core ML model is classifier, validating output
		-[βœ“] predicted class NEGATIVE matches NEGATIVE
		-[βœ“] number of classes 2 matches 2
		-[βœ“] all values close (atol: 0.0001)
All good, model saved at: exported/Model.mlpackage

Notice that in this case, the exported model is a Core ML classifier, which predicts the highest scoring class name in addition to a dictionary of probabilities, instead of the last_hidden_state we saw with the distilbert-base-uncased checkpoint earlier. This is expected since the fine-tuned model has a sequence classification head.

The features that have a with-past suffix (e.g. causal-lm-with-past) correspond to model topologies with precomputed hidden states (key and values in the attention blocks) that can be used for fast autoregressive decoding.

Configuring the export options

To see the full list of possible options, run the following from the command line:

python -m exporters.coreml --help

Exporting a model requires at least these arguments:

  • -m <model>: The model ID from the Hugging Face Hub, or a local path to load the model from.
  • --feature <task>: The task the model should perform, for example "image-classification". See the table above for possible task names.
  • <output>: The path where to store the generated Core ML model.

The output path can be a folder, in which case the file will be named Model.mlpackage, or you can also specify the filename directly.

Additional arguments that can be provided:

  • --preprocessor <value>: Which type of preprocessor to use. auto tries to automatically detect it. Possible values are: auto (the default), tokenizer, feature_extractor, processor.
  • --atol <number>: The absolute difference tolerence used when validating the model. The default value is 1e-4.
  • --quantize <value>: Whether to quantize the model weights. The possible quantization options are: float32 for no quantization (the default) or float16 for 16-bit floating point.
  • --compute_units <value>: Whether to optimize the model for CPU, GPU, and/or Neural Engine. Possible values are: all (the default), cpu_and_gpu, cpu_only, cpu_and_ne.

Using the exported model

Using the exported model in an app is just like using any other Core ML model. After adding the model to Xcode, it will auto-generate a Swift class that lets you make predictions from within the app.

Depending on the chosen export options, you may still need to preprocess or postprocess the input and output tensors.

For image inputs, there is no need to perform any preprocessing as the Core ML model will already normalize the pixels. For classifier models, the Core ML model will output the predictions as a dictionary of probabilities. For other models, you might need to do more work.

Core ML does not have the concept of a tokenizer and so text models will still require manual tokenization of the input data. Here is an example of how to perform tokenization in Swift.

Overriding default choices in the configuration object

An important goal of Core ML is to make it easy to use the models inside apps. Where possible, the Core ML exporter will add extra operations to the model, so that you do not have to do your own pre- and postprocessing.

In particular,

  • Image models will automatically perform pixel normalization as part of the model. You do not need to preprocess the image yourself, except potentially resizing or cropping it.

  • For classification models, a softmax layer is added and the labels are included in the model file. Core ML makes a distinction between classifier models and other types of neural networks. For a model that outputs a single classification prediction per input example, Core ML makes it so that the model predicts the winning class label and a dictionary of probabilities instead of a raw logits tensor. Where possible, the exporter uses this special classifier model type.

  • Other models predict logits but do not fit into Core ML's definition of a classifier, such as the token-classificaton task that outputs a prediction for each token in the sequence. Here, the exporter also adds a softmax to convert the logits into probabilities. The label names are added to the model's metadata. Core ML ignores these label names but they can be retrieved by writing a few lines of Swift code.

  • A semantic-segmentation model will upsample the output image to the original spatial dimensions and apply an argmax to obtain the predicted class label indices. It does not automatically apply a softmax.

The Core ML exporter makes these choices because they are the settings you're most likely to need. To override any of the above defaults, you must create a subclass of the configuration object, and then export the model to Core ML by writing a short Python program.

Example: To prevent the MobileViT semantic segmentation model from upsampling the output image, you would create a subclass of MobileViTCoreMLConfig and override the outputs property to set do_upsample to False. Other options you can set for this output are do_argmax and do_softmax.

from collections import OrderedDict
from exporters.coreml.models import MobileViTCoreMLConfig
from exporters.coreml.config import OutputDescription

class MyCoreMLConfig(MobileViTCoreMLConfig):
    @property
    def outputs(self) -> OrderedDict[str, OutputDescription]:
        return OrderedDict(
            [
                (
                    "logits",
                    OutputDescription(
                        "classLabels",
                        "Classification scores for each pixel",
                        do_softmax=True,
                        do_upsample=False,
                        do_argmax=False,
                    )
                ),
            ]
        )

config = MyCoreMLConfig(model.config, "semantic-segmentation")

Here you can also change the name of the output from classLabels to something else, or fill in the output description ("Classification scores for each pixel").

It is also possible to change the properties of the model inputs. For example, for text models the default sequence length is between 1 and 128 tokens. To set the input sequence length on a DistilBERT model to a fixed length of 32 tokens, you could override the config object as follows:

from collections import OrderedDict
from exporters.coreml.models import DistilBertCoreMLConfig
from exporters.coreml.config import InputDescription

class MyCoreMLConfig(DistilBertCoreMLConfig):
    @property
    def inputs(self) -> OrderedDict[str, InputDescription]:
        input_descs = super().inputs
        input_descs["input_ids"].sequence_length = 32
        return input_descs

config = MyCoreMLConfig(model.config, "text-classification")

Using a fixed sequence length generally outputs a simpler, and possibly faster, Core ML model. However, for many models the input needs to have a flexible length. In that case, specify a tuple for sequence_length to set the (min, max) lengths. Use (1, -1) to have no upper limit on the sequence length. (Note: if sequence_length is set to a fixed value, then the batch size is fixed to 1.)

To find out what input and output options are available for the model you're interested in, create its CoreMLConfig object and examine the config.inputs and config.outputs properties.

Not all inputs or outputs are always required: For text models, you may remove the attention_mask input. Without this input, the attention mask is always assumed to be filled with ones (no padding). However, if the task requires a token_type_ids input, there must also be an attention_mask input.

Removing inputs and/or outputs is accomplished by making a subclass of CoreMLConfig and overriding the inputs and outputs properties.

By default, a model is generated in the ML Program format. By overriding the use_legacy_format property to return True, the older NeuralNetwork format will be used. This is not recommended and only exists as a workaround for models that fail to convert to the ML Program format.

Once you have the modified config instance, you can use it to export the model following the instructions from the section "Exporting the model" below.

Not everything is described by the configuration objects. The behavior of the converted model is also determined by the model's tokenizer or feature extractor. For example, to use a different input image size, you'd create the feature extractor with different resizing or cropping settings and use that during the conversion instead of the default feature extractor.

Exporting a model for an unsupported architecture

If you wish to export a model whose architecture is not natively supported by the library, there are three main steps to follow:

  1. Implement a custom Core ML configuration.
  2. Export the model to Core ML.
  3. Validate the outputs of the PyTorch and exported models.

In this section, we'll look at how DistilBERT was implemented to show what's involved with each step.

Implementing a custom Core ML configuration

TODO: didn't write this section yet because the implementation is not done yet

Let’s start with the configuration object. We provide an abstract classes that you should inherit from, CoreMLConfig.

from exporters.coreml import CoreMLConfig

TODO: stuff to cover here:

  • modality property
  • how to implement custom ops + link to coremltools documentation on this topic
  • decoder models (use_past) and encoder-decoder models (seq2seq)

Exporting the model

Once you have implemented the Core ML configuration, the next step is to export the model. Here we can use the export() function provided by the exporters.coreml package. This function expects the Core ML configuration, along with the base model and tokenizer (for text models) or feature extractor (for vision models):

from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer
from exporters.coreml import export
from exporters.coreml.models import DistilBertCoreMLConfig

model_ckpt = "distilbert-base-uncased"
base_model = AutoModelForSequenceClassification.from_pretrained(model_ckpt, torchscript=True)
preprocessor = AutoTokenizer.from_pretrained(model_ckpt)

coreml_config = DistilBertCoreMLConfig(base_model.config, task="text-classification")
mlmodel = export(preprocessor, base_model, coreml_config)

Note: For the best results, pass the argument torchscript=True to from_pretrained when loading the model. This allows the model to configure itself for PyTorch tracing, which is needed for the Core ML conversion.

Additional options that can be passed into export():

  • quantize: Use "float32" for no quantization (the default), "float16" to quantize the weights to 16-bit floats.
  • compute_units: Whether to optimize the model for CPU, GPU, and/or Neural Engine. Defaults to coremltools.ComputeUnit.ALL.

To export the model with precomputed hidden states (key and values in the attention blocks) for fast autoregressive decoding, pass the argument use_past=True when creating the CoreMLConfig object.

It is normal for the Core ML exporter to print out a lot of warning and information messages. In particular, you might see messages such as these:

TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!

Those messages are to be expected and are a normal part of the conversion process. If there is a real problem, the converter will throw an error.

If the export succeeded, the return value from export() is a coremltools.models.MLModel object. Write print(mlmodel) to examine the Core ML model's inputs, outputs, and metadata.

Optionally fill in the model's metadata:

mlmodel.short_description = "Your awesome model"
mlmodel.author = "Your name"
mlmodel.license = "Fill in the copyright information here"
mlmodel.version = "1.0"

Finally, save the model. You can open the resulting mlpackage file in Xcode and examine it there.

mlmodel.save("DistilBert.mlpackage")

Note: If the configuration object used returns True from use_legacy_format, the model can be saved as ModelName.mlmodel instead of .mlpackage.

Exporting a decoder model

Decoder-based models can use a past_key_values input that ontains pre-computed hidden-states (key and values in the self-attention blocks), which allows for much faster sequential decoding. This feature is enabled by passing use_cache=True to the Transformer model.

To enable this feature with the Core ML exporter, set the use_past=True argument when creating the CoreMLConfig object:

coreml_config = CTRLCoreMLConfig(base_model.config, task="text-generation", use_past=True)

# or:
coreml_config = CTRLCoreMLConfig.with_past(base_model.config, task="text-generation")

This adds multiple new inputs and outputs to the model with names such as past_key_values_0_key, past_key_values_0_value, ... (inputs) and present_key_values_0_key, present_key_values_0_value, ... (outputs).

Enabling this option makes the model less convenient to use, since you will have to keep track of many additional tensors, but it does make inference much faster on sequences.

The Transformers model must be loaded with is_decoder=True, for example:

base_model = BigBirdForCausalLM.from_pretrained("google/bigbird-roberta-base", torchscript=True, is_decoder=True)

TODO: Example of how to use this in Core ML. The past_key_values tensors will grow larger over time. The attention_mask tensor must have the size of past_key_values plus new input_ids.

Exporting an encoder-decoder model

TODO: properly write this section

You'll need to export the model as two separate Core ML models: the encoder and the decoder.

Export the model like so:

coreml_config = TODOCoreMLConfig(base_model.config, task="text2text-generation", seq2seq="encoder")
encoder_mlmodel = export(preprocessor, base_model.get_encoder(), coreml_config)

coreml_config = TODOCoreMLConfig(base_model.config, task="text2text-generation", seq2seq="decoder")
decoder_mlmodel = export(preprocessor, base_model, coreml_config)

When the seq2seq option is used, the sequence length in the Core ML model is always unbounded. The sequence_length specified in the configuration object is ignored.

This can also be combined with use_past=True. TODO: explain how to use this.

Validating the model outputs

The final step is to validate that the outputs from the base and exported model agree within some absolute tolerance. You can use the validate_model_outputs() function provided by the exporters.coreml package as follows.

First enable logging:

from exporters.utils import logging
logger = logging.get_logger("exporters.coreml")
logger.setLevel(logging.INFO)

Then validate the model:

from exporters.coreml import validate_model_outputs

validate_model_outputs(
    coreml_config, preprocessor, base_model, mlmodel, coreml_config.atol_for_validation
)

Note: validate_model_outputs only works on Mac computers, as it depends on the Core ML framework to make predictions with the model.

This function uses the CoreMLConfig.generate_dummy_inputs() method to generate inputs for the base and exported model, and the absolute tolerance can be defined in the configuration. We generally find numerical agreement in the 1e-6 to 1e-4 range, although anything smaller than 1e-3 is likely to be OK.

If validation fails with an error such as the following, it doesn't necessarily mean the model is broken:

ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: 0.12345

The comparison is done using an absolute difference value, which in this example is 0.12345. That is much larger than the default tolerance value of 1e-4, hence the reported error. However, the magnitude of the activations also matters. For a model whose activations are on the order of 1e+3, a maximum absolute difference of 0.12345 would usually be acceptable.

If validation fails with this error and you're not entirely sure if this is a true problem, call mlmodel.predict() on a dummy input tensor and look at the largest absolute magnitude in the output tensor.

Contributing a new configuration to πŸ€— Transformers

We are looking to expand the set of ready-made configurations and welcome contributions from the community! If you would like to contribute your addition to the library, you will need to:

  • Implement the Core ML configuration in the models.py file
  • Include the model architecture and corresponding features in [~coreml.features.FeatureManager]
  • Add your model architecture to the tests in test_coreml.py

Troubleshooting: What if Core ML Exporters doesn't work for your model?

It's possible that the model you wish to export fails to convert using Core ML Exporters or even when you try to use coremltools directly. When running these automated conversion tools, it's quite possible the conversion bails out with an inscrutable error message. Or, the conversion may appear to succeed but the model does not work or produces incorrect outputs.

The most common reasons for conversion errors are:

  • You provided incorrect arguments to the converter. The task argument should match the chosen model architecture. For example, the "feature-extraction" task should only be used with models of type AutoModel, not AutoModelForXYZ. Additionally, the seq2seq argument is required to tell apart encoder-decoder type models from encoder-only or decoder-only models. Passing invalid choices for these arguments may give an error during the conversion process or it may create a model that works but does the wrong thing.

  • The model performs an operation that is not supported by Core ML or coremltools. It's also possible coremltools has a bug or can't handle particularly complex models.

If the Core ML export fails due to the latter, you have a couple of options:

  1. Implement the missing operator in the CoreMLConfig's patch_pytorch_ops() function.

  2. Fix the original model. This requires a deep understanding of how the model works and is not trivial. However, sometimes the fix is to hardcode certain values rather than letting PyTorch or TensorFlow calculate them from the shapes of tensors.

  3. Fix coremltools. It is sometimes possible to hack coremltools so that it ignores the issue.

  4. Forget about automated conversion and build the model from scratch using MIL. This is the intermediate language that coremltools uses internally to represent models. It's similar in many ways to PyTorch.

  5. Submit an issue and we'll see what we can do. πŸ˜€

Known issues

The Core ML exporter writes models in the mlpackage format. Unfortunately, for some models the generated ML Program is incorrect, in which case it's recommended to convert the model to the older NeuralNetwork format by setting the configuration object's use_legacy_format property to True. On certain hardware, the older format may also run more efficiently. If you're not sure which one to use, export the model twice and compare the two versions.

Known models that need to be exported with use_legacy_format=True are: GPT2, DistilGPT2.

Using flexible input sequence length with GPT2 or GPT-Neo causes the converter to be extremely slow and allocate over 200 GB of RAM. This is clearly a bug in coremltools or the Core ML framework, as the allocated memory is never used (the computer won't start swapping). After many minutes, the conversion does succeed, but the model may not be 100% correct. Loading the model afterwards takes a very long time and makes similar memory allocations. Likewise for making predictions. While theoretically the conversion succeeds (if you have enough patience), the model is not really usable like this.

Pushing the model to the Hugging Face Hub

The Hugging Face Hub can also host your Core ML models. You can use the huggingface_hub package to upload the converted model to the Hub from Python.

First log in to your Hugging Face account account with the following command:

huggingface-cli login

Once you are logged in, save the mlpackage to the Hub as follows:

from huggingface_hub import Repository

with Repository(
        "<model name>", clone_from="https://huggingface.co/<user>/<model name>",
        use_auth_token=True).commit(commit_message="add Core ML model"):
    mlmodel.save("<model name>.mlpackage")

Make sure to replace <model name> with the name of the model and <user> with your Hugging Face username.

exporters's People

Contributors

hollance avatar laclouis5 avatar lucasnewman avatar pcuenca avatar petrukha-ivan avatar regisss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

exporters's Issues

Problems of `Got max absolute difference of: nan`

I was planning to convert Voicelab/vlt5-base-keywords to coreml model. Everything went well, but I got an error in the end:

Validating Core ML model...
        -[βœ“] Core ML model output names match reference model ({'last_hidden_state'})
        - Validating Core ML model output "last_hidden_state":
                -[βœ“] (1, 128, 768) matches (1, 128, 768)
                -[x] values not close enough (atol: 0.0001)
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 146, in main
    convert_model(
  File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/__main__.py", line 70, in convert_model
    validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
  File "/Users/zhuhaoyu/UTS/HeadingJsonGen/pythonProject/exporters/src/exporters/coreml/validate.py", line 220, in validate_model_outputs
    raise ValueError(
ValueError: Output values do not match between reference model and Core ML exported model: Got max absolute difference of: nan

why illegal hardware instruction

➜ exporters git:(main) python3 -m exporters.coreml --model=distilbert-base-uncased exported/

[1] 51183 illegal hardware instruction python3 -m exporters.coreml --model=distilbert-base-uncased exported/

mlpackage vs mlmodel for Falcon 7B

Hi! Sorry for a noob question, but I've had an experience using BERT in *.mlmodel format, where I just added it to my project, created a *.swift file with its Class and it worked on iOS. Now, when I use exporters, they create MLPackage files and I don't understand how to use it.

I want to use Falcon 7B locally and don't understand how to convert it to *.mlmodel and how to use it in my iPhone app.

Support for .safetensors files?

When trying to convert a model with safetensor weights, exporters fails with [MODEL] does not appear to have a file named pytorch_model.bin, tf_model.h5, model.ckpt or flax_model.msgpack. Adding support would help out a lot, especially as safetensors seem to be pushed as the new standard for storing weights.

Error when exporting gpt2

I get this error when trying to convert gpt2

/site-packages/coremltools/converters/mil/mil/input_type.py", line 162, in validate_inputs
    raise ValueError(msg.format(name, var.name, input_type.type_str,
ValueError: Op "137" (op_type: fill) Input shape="136" expects tensor or scalar of dtype from type domain ['int32'] but got tensor[0,fp32]

I first tried:

python -m exporters.coreml --model=gpt2 --framework=pt --feature=causal-lm models/gpt2.mlpackage

Next I tried:

from exporters.coreml import export
from exporters.coreml.models import GPT2CoreMLConfig
from transformers import GPT2LMHeadModel, GPT2Tokenizer

model_ckpt = "gpt2"
base_model = GPT2LMHeadModel.from_pretrained(
    model_ckpt, torchscript=True
)
preprocessor = GPT2Tokenizer.from_pretrained(model_ckpt)

coreml_config = GPT2CoreMLConfig(
    base_model.config, 
    task="causal-lm",
)
mlmodel = export(
    preprocessor, base_model, coreml_config
)

mlmodel.save(f"models/{model_ckpt}.mlpackage")

But they both give the same error

I realise this repo is WIP, but I had seen the list here saying GPT2 model is supported: https://github.com/huggingface/exporters/blob/main/MODELS.md

Support for smaller quantization, 8 or 4 at least

This tool is amazing, having tried scripting using the coreml library by hand, running into all kinds of fun issues, then trying this and it all being orchestrated/abstracted for you, this is excellent πŸ‘

I noticed that there's only quantization support for down to 16 bits however, and would love to have smaller options. I do believe CoreML is capable of these so it may just be adding that call to this wrapper.

I did look in convert.py and I do see a flag use_legacy_format being checked before performing quantize 16, is there something different with how the ML Program handles or does lower bit quantization?

Requesting Support for Salesforce/blip2-opt-2.7b

Hello,

I am very new to HuggingFace and machine learning in general. I understand that the Blip model is not supported for conversion to coreml. Can this be added to this repo? If not, Is there a way I can write my own conversion code?

Thanks


Conversion Settings:

    Model: Salesforce/blip2-opt-2.7b
    Task: None
    Framework: None
    Compute Units: None
    Precision: None
    Tolerance: None
    Push to: None

    Error: "blip is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'gpt2', 'gpt_neo', 'levit', 'm2m_100', 'marian', 'mobilebert', 'mobilevit', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos'] are supported. If you want to support blip please propose a PR or open up an issue."

Export & use T5-Base model for summarization

Hey guys,

I'm pretty new to CoreML conversion stuff and took the naive approach of converting a T5-Base model to CoreML (I want to use it to generate summarisations). As layed out in the README I created an encoder and a decoder model, which worked without a problem:

(base) me@me-MacBook-Pro ~/Development/projects/exporters$ python -m exporters.coreml --model=t5-small --feature=text2text-generation exported                                                      ✭main 
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.0.0 has not been tested with coremltools. You may run into unexpected errors. Torch 1.12.1 is the most recent version that has been tested.
Converting encoder model...
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 755/756 [00:00<00:00, 2482.08 ops/s]
Running MIL Common passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 39/39 [00:00<00:00, 73.01 passes/s]
Running MIL Clean up passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 27.71 passes/s]
Validating Core ML model...
	-[βœ“] Core ML model output names match reference model ({'last_hidden_state'})
	- Validating Core ML model output "last_hidden_state":
		-[βœ“] (1, 128, 768) matches (1, 128, 768)
		-[βœ“] all values close (atol: 0.0001)
All good, model saved at: exported/encoder_Model.mlpackage
Converting decoder model...
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/transformers/modeling_utils.py:828: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if causal_mask.shape[1] < attention_mask.shape[1]:
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 1260/1262 [00:00<00:00, 2404.55 ops/s]
Running MIL Common passes:   5%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š                                                                                                                                                                  | 2/39 [00:00<00:02, 15.47 passes/s]/opt/homebrew/Caskroom/miniconda/base/lib/python3.9/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1761', of the source model, has been renamed to 'var_1761' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 39/39 [00:01<00:00, 36.73 passes/s]
Running MIL Clean up passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 14.41 passes/s]
Validating Core ML model...
	-[βœ“] Core ML model output names match reference model ({'logits'})
	- Validating Core ML model output "logits":
		-[βœ“] (1, 64, 32100) matches (1, 64, 32100)
		-[βœ“] all values close (atol: 0.0001)
All good, model saved at: exported/decoder_Model.mlpackage

This is where the fun begins :) I've only ever worked with the t5 model through transformers & pipelines. Like this:

from torchvision import models
from torchsummary import summary

from transformers import T5TokenizerFast, T5ForConditionalGeneration, pipeline

text = "summarise: The quick brown fox jumps over the lazy dog"
tokenizer = T5TokenizerFast.from_pretrained("t-base")
model = T5ForConditionalGeneration.from_pretrained("t5-base", return_dict=True)
model.to('cuda')

tokens = tokenizer(text, return_tensors="pt")
input_ids = tokens.input_ids

outputs = model.generate(input_ids.cuda(), max_length=40)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

As far as I understand by using the model.generate method the transformers utilities do all the heavy lifting here like creating the attention_masks, running the encoder, passing the encoder_hidden_states along, etc. pp.
Am I right to assume that I would have to implement all this functionality by hand if I want to work with the CoreML encoder / decoder models?

I'm not only worried about using them in Python, but would also like to use them in Swift. But I guess there's no easy plug'n play solution here, right? :)

Converting EleutherAI/Pythia Models

I was wondering if its possible to support the conversion of the Pythia models to coreml. Naively I ran python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/which gave me this error:

Original Ouput
python -m exporters.coreml --model=EleutherAI/pythia-1b-deduped mlmodels/pythia-1b-deduped-exported/
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   4%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                                                                  | 86/2272 [00:00<00:01, 2038.49 ops/s]
Traceback (most recent call last):
  File "/opt/homebrew/Cellar/[email protected]/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/opt/homebrew/Cellar/[email protected]/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 166, in main
    convert_model(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/__main__.py", line 45, in convert_model
    mlmodel = export(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 687, in export
    return export_pytorch(preprocessor, model, config, quantize, compute_units)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/exporters/src/exporters/coreml/convert.py", line 552, in export_pytorch
    mlmodel = ct.convert(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/_converters_entry.py", line 530, in convert
    mlmodel = mil_convert(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 286, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/converter.py", line 108, in __call__
    return load(*args, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 63, in load
    return _perform_torch_convert(converter, debug)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/load.py", line 102, in _perform_torch_convert
    prog = converter.convert()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/converter.py", line 439, in convert
    convert_nodes(self.context, self.graph)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 92, in convert_nodes
    add_op(context, node)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/frontend/torch/ops.py", line 4502, in gather
    res = mb.gather_along_axis(x=inputs[0], indices=inputs[2], axis=inputs[1], name=node.name)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 183, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/builder.py", line 182, in _add_op
    new_op.type_value_inference()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/operation.py", line 253, in type_value_inference
    output_types = self.type_inference()
  File "/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/scatter_gather.py", line 312, in type_inference
    assert self.x.shape[i] == self.indices.shape[i]
AssertionError

I tried bypassing this error by commenting the line out, which results in sometimes a memory leak (I think, as my memory usage goes to 60 GB), but I was able to export it one time but it fails the performance report in xcode. When commenting out the line I get this output:

Check bypassed Output
Some weights of the model checkpoint at EleutherAI/pythia-1b-deduped were not used when initializing GPTNeoXModel: ['embed_out.weight']
- This IS expected if you are initializing GPTNeoXModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing GPTNeoXModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
	- use_cache -> False
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:503: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert batch_size > 0, "batch_size has to be defined and > 0"
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:269: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:221: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  alpha=(torch.tensor(1.0, dtype=self.norm_factor.dtype, device=self.norm_factor.device) / self.norm_factor),
/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py:228: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
  mask_value = torch.tensor(mask_value, dtype=attn_scores.dtype).to(attn_scores.device)
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                                                                                                                                                                                         | 0/2272 [00:00<?, ? ops/s](is13, 1, 2048, 64) (is11, 1, is12, 64)
(is14, 1, 2048, 64) (is11, 1, is12, 64)
(is53, 1, 2048, 64) (is51, 1, is52, 64)
(is54, 1, 2048, 64) (is51, 1, is52, 64)
Converting PyTorch Frontend ==> MIL Ops:  11%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                                                                                                                                                                                             | 250/2272 [00:00<00:00, 2499.35 ops/s](is107, 1, 2048, 64) (is105, 1, is106, 64)
(is108, 1, 2048, 64) (is105, 1, is106, 64)
(is161, 1, 2048, 64) (is159, 1, is160, 64)
(is162, 1, 2048, 64) (is159, 1, is160, 64)
Converting PyTorch Frontend ==> MIL Ops:  23%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                                                                                                                                                                            | 513/2272 [00:00<00:00, 2575.44 ops/s](is215, 1, 2048, 64) (is213, 1, is214, 64)
(is216, 1, 2048, 64) (is213, 1, is214, 64)
Converting PyTorch Frontend ==> MIL Ops:  34%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                                                                                                                            | 771/2272 [00:00<00:00, 2514.44 ops/s](is269, 1, 2048, 64) (is267, 1, is268, 64)
(is270, 1, 2048, 64) (is267, 1, is268, 64)
(is323, 1, 2048, 64) (is321, 1, is322, 64)
(is324, 1, 2048, 64) (is321, 1, is322, 64)
Converting PyTorch Frontend ==> MIL Ops:  45%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                                                                                                                                            | 1023/2272 [00:00<00:00, 2458.22 ops/s](is377, 1, 2048, 64) (is375, 1, is376, 64)
(is378, 1, 2048, 64) (is375, 1, is376, 64)
(is431, 1, 2048, 64) (is429, 1, is430, 64)
(is432, 1, 2048, 64) (is429, 1, is430, 64)
Converting PyTorch Frontend ==> MIL Ops:  56%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                                                                                                            | 1274/2272 [00:00<00:00, 2413.73 ops/s](is485, 1, 2048, 64) (is483, 1, is484, 64)
(is486, 1, 2048, 64) (is483, 1, is484, 64)
(is539, 1, 2048, 64) (is537, 1, is538, 64)
(is540, 1, 2048, 64) (is537, 1, is538, 64)
Converting PyTorch Frontend ==> MIL Ops:  67%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                              | 1516/2272 [00:00<00:00, 2176.52 ops/s](is593, 1, 2048, 64) (is591, 1, is592, 64)
(is594, 1, 2048, 64) (is591, 1, is592, 64)
Converting PyTorch Frontend ==> MIL Ops:  76%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                                                                  | 1738/2272 [00:00<00:00, 2144.58 ops/s](is647, 1, 2048, 64) (is645, 1, is646, 64)
(is648, 1, 2048, 64) (is645, 1, is646, 64)
(is701, 1, 2048, 64) (is699, 1, is700, 64)
(is702, 1, 2048, 64) (is699, 1, is700, 64)
Converting PyTorch Frontend ==> MIL Ops:  87%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–                                     | 1969/2272 [00:00<00:00, 2149.72 ops/s](is755, 1, 2048, 64) (is753, 1, is754, 64)
(is756, 1, 2048, 64) (is753, 1, is754, 64)
(is809, 1, 2048, 64) (is807, 1, is808, 64)
(is810, 1, 2048, 64) (is807, 1, is808, 64)
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 2271/2272 [00:01<00:00, 2253.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 36.95 passes/s]
Running MIL default pipeline:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                                                                                                                                                                                                | 9/63 [00:00<00:03, 17.14 passes/s]/Users/kendreaditya/Documents/workspace/neural-engine-benchmark/neural-engine-venv/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:262: UserWarning: Output, '2680', of the source model, has been renamed to 'var_2680' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  38%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ                                                                                                                                                                                        | 24/63 [00:01<00:01, 28.21 passes/s](1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is863, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is889, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is915, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is941, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is967, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is993, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1019, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1045, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1071, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1097, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1123, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1149, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1175, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1201, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1227, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
(1, 1, 2048, 64) (1, 1, is1253, 64)
Running MIL default pipeline:  59%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                                                           | 37/63 [00:01<00:00, 28.56 passes/s](1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1289, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1315, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1341, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1367, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1393, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1419, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1445, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1471, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1497, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1523, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1549, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1575, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1601, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1627, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1653, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
(1, 1, 2048, 64) (1, 1, is1679, 64)
Running MIL default pipeline:  92%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž                       | 58/63 [00:03<00:00, 12.22 passes/s](1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
(1, 1, 2048, 64) (1, 1, is1706, 64)
Running MIL default pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 63/63 [00:04<00:00, 14.28 passes/s]
Running MIL backend_mlprogram pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 190.00 passes/s]

Any ideas?

`huggingface-cli env`
Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.15.1
- Platform: macOS-13.4-arm64-arm-64bit
- Python version: 3.10.12
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: /Users/kendreaditya/.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers: osxkeychain
- FastAI: N/A
- Tensorflow: N/A
- Torch: 2.0.0
- Jinja2: 3.1.2
- Graphviz: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- numpy: 1.24.2
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: /Users/kendreaditya/.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: /Users/kendreaditya/.cache/huggingface/assets
- HF_TOKEN_PATH: /Users/kendreaditya/.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
`pip freeze`
appnope==0.1.3
asttokens==2.2.1
attrs==23.1.0
backcall==0.2.0
cattrs==23.1.2
certifi==2023.5.7
charset-normalizer==3.1.0
comm==0.1.3
coremltools==7.0b1
debugpy==1.6.7
decorator==5.1.1
einops==0.6.1
exceptiongroup==1.1.1
executing==1.2.0
-e git+https://github.com/huggingface/exporters.git@d83cf6268fcaf1c6259511ddbd32dc9dcd79bc03#egg=exporters
fancycompleter==0.9.1
filelock==3.12.2
fsspec==2023.6.0
huggingface-hub==0.15.1
idna==3.4
ipykernel==6.23.2
ipython==8.14.0
jedi==0.18.2
Jinja2==3.1.2
jupyter_client==8.2.0
jupyter_core==5.3.1
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
mpmath==1.3.0
nest-asyncio==1.5.6
networkx==3.1
numpy==1.24.2
packaging==23.1
parso==0.8.3
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.6.0
prompt-toolkit==3.0.38
protobuf==3.20.1
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pyaml==23.5.9
Pygments==2.15.1
pyrepl==0.9.0
python-dateutil==2.8.2
PyYAML==6.0
pyzmq==25.1.0
regex==2023.6.3
requests==2.31.0
six==1.16.0
stack-data==0.6.2
sympy==1.12
tokenizers==0.13.3
torch==2.0.0
tornado==6.3.2
tqdm==4.65.0
traitlets==5.9.0
transformers==4.29.2
typing_extensions==4.6.3
urllib3==2.0.3
wcwidth==0.2.6
wmctrl==0.4

Converting llama-2-7b failed

It runs well until
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
Then I saw the Activity Monitor the python is stop running.

How to fix this?

(LLM_env) tim@TPE exporters % python -m exporters.coreml --model=/Users/tim/GitLab/survey/LLM/llama-meta/Llama-2-7b-hf exported/ 
Torch version 2.0.1 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [01:02<00:00, 31.31s/it]
Using framework PyTorch: 2.0.1
Overriding 1 configuration item(s)
        - use_cache -> False
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:808: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if input_shape[-1] > 1:
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:146: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:375: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:382: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:392: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                                     | 0/3627 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3626/3627 [00:01<00:00, 3155.13 ops/s]
Running MIL frontend_pytorch pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 18.10 passes/s]
Running MIL default pipeline:  15%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹                                                                                                                               | 10/66 [00:01<00:05, 10.96 passes/s]/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '4530', of the source model, has been renamed to 'var_4530' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  77%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰                                  | 51/66 [03:23<01:40,  6.70s/ passes]/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/Users/tim/GitLab/survey/LLM/LLM_env/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
  return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 66/66 [09:44<00:00,  8.86s/ passes]
Running MIL backend_mlprogram pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:00<00:00, 65.90 passes/s]
zsh: killed     python -m exporters.coreml  exported/
(LLM_env) tim@TPE exporters % /Users/tim/.pyenv/versions/3.11.5/lib/python3.11/multiprocessing/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Detr-Resnet-50 Model Conversion to CoreML

I noticed that facebook/detr-resnet-50 is not able to convert into a CoreML format when using the Command Line prompt "python -m exporters.coreml --model=path_to_checkpoint path_to_converted_model.

In MODELS.md, for the Detr model, it states "The conversion completes without errors but the Core ML compiler cannot load the model. "Invalid operation output name: got 'tensor' when expecting token of type 'ID'".

Are you planning to release a complete export for Detr models soon? Could you please keep me posted?

Error convert pytorch bert-small-uncased for text classification

Hello. I am trying to convert finetuned pytorch version of bert-small-uncased model to coreml one but getting the following error:

python -m exporters.coreml --model=./small_legal_bert --feature text-classification  exported/ 

Using framework PyTorch: 2.0.0
Overriding 1 configuration item(s)
        - use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                            | 0/345 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops:  99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 343/345 [00:00<00:00, 4742.81 ops/s]
Running MIL frontend_pytorch pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 948.04 passes/s]
Running MIL default pipeline:   0%|                                                                                                     | 0/56 [00:00<?, ? passes/s]/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:262: UserWarning: Output, '555', of the source model, has been renamed to 'var_555' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 56/56 [00:00<00:00, 159.49 passes/s]
Running MIL backend_mlprogram pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 10/10 [00:00<00:00, 1016.90 passes/s]
/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "Failed to parse the model specification. Error: Unable to parse ML Program: in operation of type classify: Classifier probabilities must have a fully known shape.".
  _warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
  File "/Users/dgilim/anaconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Users/dgilim/anaconda3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 175, in <module>
    main()
  File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 163, in main
    convert_model(
  File "/Users/dgilim/Projects/exporters/src/exporters/coreml/__main__.py", line 67, in convert_model
    validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
  File "/Users/dgilim/Projects/exporters/src/exporters/coreml/validate.py", line 108, in validate_model_outputs
    coreml_outputs = mlmodel.predict(coreml_inputs)
  File "/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py", line 554, in predict
    raise self._framework_error
  File "/Users/dgilim/anaconda3/lib/python3.10/site-packages/coremltools/models/model.py", line 144, in _get_proxy_and_spec
    return _MLModelProxy(filename, compute_units.name), specification, None
RuntimeError: Error compiling model: "Failed to parse the model specification. Error: Unable to parse ML Program: in operation of type classify: Classifier probabilities must have a fully known shape.".

Also attaching config.json from the model:

{
  "_name_or_path": "nlpaueb/legal-bert-small-uncased",
  "architectures": [
    "BertForSequenceClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bos_token_id": 0,
  "classifier_dropout": null,
  "eos_token_ids": 0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 512,
  "initializer_range": 0.02,
  "intermediate_size": 2048,
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_labels": 2,
  "num_attention_heads": 8,
  "num_hidden_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "problem_type": "single_label_classification",
  "torch_dtype": "float32",
  "transformers_version": "4.28.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 30522
}

Exporter failing due to output shape

When trying to export the Huggingface models Deeppavlov/rubert-base-cased and ckiplab/bert-base-chinese-ner using the command line, it fails with the output

Some weights of the model checkpoint at Deeppavlov/rubert-base-cased were not used when initializing BertModel: ['cls.seq_relationship.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.predictions.decoder.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.bias']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Using framework PyTorch: 1.12.1
Overriding 1 configuration item(s)
        - use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                           | 0/630 [00:00<?, ? ops/s]CoreML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‹| 628/630 [00:00<00:00, 4660.63 ops/s]
Running MIL Common passes:   0%|                                                                                                                       | 0/39 [00:00<?, ? passes/s]/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, '1020', of the source model, has been renamed to 'var_1020' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 39/39 [00:00<00:00, 47.87 passes/s]
Running MIL Clean up passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 31.91 passes/s]
/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py:145: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "compiler error:  Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".
  _warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 166, in <module>
    main()
  File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 154, in main
    convert_model(
  File "/Users/starlight/exporters/src/exporters/coreml/__main__.py", line 65, in convert_model
    validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
  File "/Users/starlight/exporters/src/exporters/coreml/validate.py", line 108, in validate_model_outputs
    coreml_outputs = mlmodel.predict(coreml_inputs)
  File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 545, in predict
    raise self._framework_error
  File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 143, in _get_proxy_and_spec
    return (_MLModelProxy(filename, compute_units.name), specification, None)
RuntimeError: Error compiling model: "compiler error:  Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".
Exception ignored in: <function MLModel.__del__ at 0x11ebe1ee0>
Traceback (most recent call last):
  File "/Users/starlight/NERConversion/.venv/lib/python3.9/site-packages/coremltools/models/model.py", line 369, in __del__
ImportError: sys.meta_path is None, Python is likely shutting down

It runs correctly with --model=distillbert-base-uncased.
Using
python 3.9.13,
coremltools 6.1
torch 1.12.1

A .mlpackage file is created, but I can't use one I can't call predict() on.

Support for `Voicelab/vlt5-base-keywords`

I got a problem when I want to convert a vlt5 model to coreml model

KeyError: "voicelab/vlt5-base-keywords is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'falcon', 'gpt2', 'gpt_bigcode', 'gptj', 'gpt_neo', 'gpt_neox', 'levit', 'llama', 'm2m_100', 'marian', 'mistral', 'mobilebert', 'mobilevit', 'mobilevitv2', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos'] are supported. If you want to support voicelab/vlt5-base-keywords please propose a PR or open up an issue."

`trust_remote_code=True`

I'm doing exporters with convert tiiuae/falcon-7b-instruct to CoreML Model.
python -m exporters.coreml --model=tiiuae/falcon-7b-instruct exported/

it shows some error to fix it:
image

SegFormer model exported to CoreML is slow

I was trying to export Segformer models to CoreML but the exported model is slow compared to the same model exported on my own.

I tried to export the model using the following command:

python -m exporters.coreml --model=nvidia/mit-b2 --feature=semantic-segmentation exports/

This model median prediction time is 500ms on my MacBook Pro M1 using all the available accelerators (ANE, GPU, CPU), above the 300ms of the same model exported on my own using coremltools directly.

I did a little of profiling to identify the issue using Xcode Instruments. It look like the model is exported and executed in Float32. This greatly undermined the performance since Float16 data is required for the ANE to be used. Thus, the ANE is not used at all and the model is executed on GPU only on most devices. Also, Float32 computations are slower than Float16 computations on the GPU, thus Float32 should be avoided when possible. In the coremltools documentation Apple suggests to use Float16 as a default and as of version 7.0 Float16 is the default precision for CoreML exports.

With the option --quantize=float16 the inference time is on par with the model I exported (around 300ms). I suggest to use the coremltools default Float16 precision instead of Float32 in order to get the most of the specialized hardware or Apple platforms.

I also noted another issue but not related to the exporters framework. In Float16 and with the ANE, the Instruments trace suggests that half of the prediction time is spent in GPU kernels. That is weird since only 1 operator is executed on the GPU in this case: the argmax operation at the end of the model. This slowdown needs further investigation but this may be due to the large size of the input tensor (1000x512x512). I tried with only 16 output classes and the inference time drop down to 60ms.

Screenshot 2023-10-01 at 12 31 47

Exporter being killed

Similar to #61, my exporter process is being killed. I'd like to verify this is a resource constraint, and not an issue in project. I am running python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 mistral.mlpackage on a M3 MacBook Pro with 18GB of memory.

model-00001-of-00002.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆ| 9.94G/9.94G [07:47<00:00, 21.3MB/s]
model-00002-of-00002.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.54G/4.54G [04:42<00:00, 16.1MB/s]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [12:31<00:00, 375.71s/it]β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 4.54G/4.54G [04:42<00:00, 16.7MB/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:25<00:00, 12.58s/it]
Using framework PyTorch: 2.1.0
Overriding 1 configuration item(s)
	- use_cache -> False
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:161: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if past_key_values_length > 0:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:285: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:304: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Patching PyTorch conversion 'log' with <function MistralCoreMLConfig.patch_pytorch_ops.<locals>.log at 0x13a115300>
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__contains__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__getitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__delitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__setitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                             | 0/4506 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 4505/4506 [00:01<00:00, 3255.50 ops/s]
Running MIL frontend_pytorch pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 13.02 passes/s]
Running MIL default pipeline:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                                                          | 10/71 [00:00<00:03, 15.93 passes/s]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '5409', of the source model, has been renamed to 'var_5409' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                      | 52/71 [03:36<02:09,  6.79s/ passes]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
  return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 71/71 [07:27<00:00,  6.30s/ passes]
Running MIL backend_mlprogram pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:00<00:00, 168.96 passes/s]
zsh: killed     python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 
willwalker misty > /opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Is it possible to add support for Kosmos-2

Hello HuggingFace and Its Wonderful Employees!!,
I was just checking if It is possible for me to convert "https://huggingface.co/microsoft/kosmos-2-patch14-224" model to support coreml so that I can use it on my mac?

its an Image to Text (Image Captioning Model)

I have tried it now but it says this model is not supported, Is there any way I or we could add support for this?

Thanks!!!!

Error when transforming Roberta models

Hello, I'm currently encountering an issue while transforming models using Roberta.

Roberta is a text classification model to evaluate emotions or if the content is hateful. So it is supposed to be a very simple text classification model.

I tried to use the exporter with the following models:

  • roberta-large-mnli
  • facebook/roberta-hate-speech-dynabench-r4-target
  • SamLowe/roberta-base-go_emotions

First I was using the web tool to transform directly the model, and I could only select the option "text-generation". When trying directly from the python tool, the following error is returned:

Traceback (most recent call last):
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/__main__.py", line 178, in <module>
    main()
  File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/__main__.py", line 141, in main
    model_kind, model_coreml_config = FeaturesManager.check_supported_model_or_raise(model, feature=args.feature)
  File "/Users/******/Documents/projects/Tests/exporters/src/exporters/coreml/features.py", line 498, in check_supported_model_or_raise
    raise ValueError(
ValueError: roberta doesn't support feature text-classification. Supported values are: {'text-generation': functools.partial(<bound method CoreMLConfig.from_model_config of <class 'exporters.coreml.models.RobertaCoreMLConfig'>>, task='text-generation'), 'text-generation-with-past': functools.partial(<bound method CoreMLConfig.with_past of <class 'exporters.coreml.models.RobertaCoreMLConfig'>>, task='text-generation')}

If I understand correctly, all models are supposed to be trained and directly available to use. Am I missing a step or a configuration to make them work ?

Thank you.

Support for OPT Models

Would be great to figure out how to support OPT models. models.md has a note that OPT is not supported yet:

OPT [TODO verify] Conversion error on a slicing operation.

Bloom still has the same note but is now fully supported by exporters. So I'm wondering if there actually still an issue with the OPT models, or if the underlying issue was resolved already. If so, then they could be listed as supported. Happy to pitch in if anyone has context on outstanding issues with the OPT models.

Thanks!

Support OneFormer Model

Would it be possible to support the OneFormer model? I am not experienced with ML, but would love to use that model on mobile devices if possible.

Thank you so much!

`GPTNeoX` incompatible with transformers >= 4.28.0

As discovered in #42.

The incompatibility was introduced in huggingface/transformers@7dcd870

Concretely, the reason for the problem lies in the use of torch.gather. When converted to Core ML, this assertion fails if shapes are flexible.

(There's a new implementation of gather_along_axis for iOS17 but by looking at the source code I don't think it would fix the problem).

The obvious workaround is to disable flexible shapes for GPTNeoX. This, in fact, is better for performance as flexible shapes don't seem to be compatible with GPU or ANE.

M2M100 Example?

Hello,
I'm trying to convert M2M100 to CoreML. I saw that it is partially supported, and I was wondering if there's any example script to do this.
Here's what I tried:

from exporters.coreml.models import M2M100CoreMLConfig
from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
model_ckpt = "facebook/m2m100_418M"
base_model = M2M100ForConditionalGeneration.from_pretrained(
    model_ckpt, torchscript=True
)
preprocessor = M2M100Tokenizer.from_pretrained(model_ckpt)
coreml_config = M2M100CoreMLConfig(
    base_model.config, 
    task="text2text-generation",
    use_past=False,
)
mlmodel = export(
    preprocessor, base_model, coreml_config
)

However, when trying to run this code, I get the following error:

ValueError: You have to specify either decoder_input_ids or decoder_inputs_embeds

Thank you in advance!

exporting to coreml format throws errors

I am doing this:

python -m exporters.coreml --model=bert-base-uncased exported/

and running into error:

RuntimeError: Error compiling model: "compiler error: Encountered an error while compiling a neural network model: validator error: Model output 'pooler_output' has a different shape than its corresponding return value to main.".

Did the underlying Bert implementation's api change?

I hit similar errors with some of the other models mentioned in the Readme (ready-made configurations)

GPTBigCode Support?

Out of sheer curiosity I tried to export bigcode/starcoder to CoreML and got the following error after downloading the weights:
"gpt_bigcode is not supported yet. Only ['bart', 'beit', 'bert', 'big_bird', 'bigbird_pegasus', 'blenderbot', 'blenderbot_small', 'bloom', 'convnext', 'ctrl', 'cvt', 'data2vec', 'distilbert', 'ernie', 'gpt2', 'gpt_neo', 'levit', 'm2m_100', 'marian', 'mobilebert', 'mobilevit', 'mvp', 'pegasus', 'plbart', 'roberta', 'roformer', 'segformer', 'splinter', 'squeezebert', 't5', 'vit', 'yolos']

I understand GPTBigCode is an optimized GPT2 Model with support for Multi-Query Attention.
https://huggingface.co/docs/transformers/model_doc/gpt_bigcode

Python isn't my strong suit but I just wanted to flag this here. Would running Starcoder on CoreML even be feasible or is it too large?

Converting a pipeline

What would it take to convert an entire pipeline to a coreml model?

For instance, I have saved the stable-diffusion checkpoint, and several of the models have their own configs, but of course they're not the ready-made configs.

Screen Shot 2022-09-08 at 10 24 00 PM

Would this be just a long, hard, custom slog via exporters and not worth it? Or is there something here worth pursuing?

Error for Keras (TF) models

Seems like each of the Keras models need a config.json file.

For e.g.
python -m exporters.coreml --model=keras-io/transformers-qa exported/ works because this model has a config.json but python -m exporters.coreml --model=keras-io/image-captioning exported/ fails with message OSError: keras-io/image-captioning does not appear to have a file named config.json. Checkout 'https://huggingface.co/keras-io/image-captioning/main' for available files.

Is there any workaround or each of the Keras models need a config file for exporter to work?

Error when export Sentence transformer to Coreml models

Description

Hi, I encounter following error when exporting sentence-tranformers/all-MiniLM-L6-v2 (a pytroch model) to a Coreml model.

python -m exporters.coreml --model=sentence-transformers/all-MiniLM-L6-v2 exported/

Using framework PyTorch: 1.12.1
Overriding 1 configuration item(s)
	- use_cache -> False
Skipping token_type_ids input
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:  0%|                         | 0/342 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Š| 340/342 [00:00<00:00, 2753.73 ops/s]
Running MIL Common passes:  0%|                               | 0/40 [00:00<?, ? passes/s]/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/name_sanitization_utils.py:135: UserWarning: Output, β€˜546’, of the source model, has been renamed to β€˜var_546’ in the Core ML model.
 warnings.warn(msg.format(var.name, new_name))
Running MIL Common passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 40/40 [00:00<00:00, 233.90 passes/s]
Running MIL Clean up passes: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 11/11 [00:00<00:00, 132.81 passes/s]
/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: β€œcompiler error: Encountered an error while compiling a neural network model: validator error: Model output β€˜pooler_output’ has a different shape than its corresponding return value to main.β€œ.
 _warnings.warn(
Validating Core ML model...
Traceback (most recent call last):
 File β€œ/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/runpy.py”, line 196, in _run_module_as_main
  return _run_code(code, main_globals, None,
 File β€œ/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/runpy.py”, line 86, in _run_code
  exec(code, run_globals)
 File β€œ/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 166, in <module>
  main()
 File β€œ/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 154, in main
  convert_model(
 File β€œ/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/__main__.py”, line 65, in convert_model
  validate_model_outputs(coreml_config, preprocessor, model, mlmodel, args.atol)
 File β€œ/Volumes/swd_yuqi/MLSession/huggingfaceExport/exporters/src/exporters/coreml/validate.py”, line 108, in validate_model_outputs
  coreml_outputs = mlmodel.predict(coreml_inputs)
 File β€œ/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py”, line 553, in predict
  raise self._framework_error
 File β€œ/Users/t_wangyu/miniconda3/envs/coremltools-env/lib/python3.10/site-packages/coremltools/models/model.py”, line 144, in _get_proxy_and_spec
  return _MLModelProxy(filename, compute_units.name), specification, None
RuntimeError: Error compiling model: β€œcompiler error: Encountered an error while compiling a neural network model: validator error: Model output β€˜pooler_output’ has a different shape than its corresponding return value to main.β€œ.

The problem is similar to the problem mentioned in #9. I also tried to use the workaround to fix the problem. However, I got following error when I tried to do the prediction. Note that, "Model.mlpackage" is obtained by using above command.

import torch
import transformers
import coremltools as ct
import numpy as np
from exporters.coreml.models import BertCoreMLConfig
from transformers import AutoConfig

model_name = β€˜sentence-transformers/all-MiniLM-L6-v2’
config = AutoConfig.from_pretrained(model_name)
tokenizer = transformers.AutoTokenizer.from_pretrained(model_name, use_fast=True)

mlmodel = ct.models.MLModel(β€œModel.mlpackage”)

del mlmodel._spec.description.output[1].type.multiArrayType.shape[:]
mlmodel = ct.models.MLModel(mlmodel._spec, weights_dir=mlmodel.weights_dir)
mlmodel.save(β€œModelFixed.mlpackage”)


sentences = [β€˜This is an example sentence’]
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors=β€˜pt’)
cml_inputs = {k: v.to(torch.int32).numpy() for k, v in encoded_input.items()}
pred_coreml = mlmodel.predict(cml_inputs)
print(pred_coreml)

What I got is following error
KeyError: 'Provided key "token_type_ids", in the input dict, does not match any of the model input name(s), which are: input_ids,attention_mask'

Implement optimizations as in `ane_transformers`

ane_transformers (https://github.com/apple/ml-ane-transformers and https://machinelearning.apple.com/research/neural-engine-transformers) suggest weight-compatible changes to transformers allowing better mapping of the ops to ANE and thus resulting in significant performance improvement.

@hollance do you think these optimizations "belong" in πŸ€— Exporters? If yes, how do you envision their implementation: within CoreMLConfig abstraction or somewhere else?

Exporter being killed

Similar to #61, my exporter process is being killed. I'd like to verify this is a resource constraint, and not an issue in project. I am running python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 mistral.mlpackage on a M3 MacBook Pro with 18GB of memory.

model-00001-of-00002.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆ| 9.94G/9.94G [07:47<00:00, 21.3MB/s]
model-00002-of-00002.safetensors: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4.54G/4.54G [04:42<00:00, 16.1MB/s]
Downloading shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [12:31<00:00, 375.71s/it]β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 4.54G/4.54G [04:42<00:00, 16.7MB/s]
Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2/2 [00:25<00:00, 12.58s/it]
Using framework PyTorch: 2.1.0
Overriding 1 configuration item(s)
	- use_cache -> False
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if (input_shape[-1] > 1 or self.sliding_window is not None) and self.is_causal:
/opt/homebrew/lib/python3.11/site-packages/transformers/modeling_attn_mask_utils.py:161: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if past_key_values_length > 0:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if seq_len > self.max_seq_len_cached:
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:285: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_weights.size() != (bsz, self.num_heads, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:292: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attention_mask.size() != (bsz, 1, q_len, kv_seq_len):
/opt/homebrew/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py:304: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if attn_output.size() != (bsz, self.num_heads, q_len, self.head_dim):
Skipping token_type_ids input
Patching PyTorch conversion 'log' with <function MistralCoreMLConfig.patch_pytorch_ops.<locals>.log at 0x13a115300>
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__contains__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__getitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__delitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
/opt/homebrew/lib/python3.11/site-packages/coremltools/models/_deprecation.py:27: FutureWarning: Function _TORCH_OPS_REGISTRY.__setitem__ is deprecated and will be removed in 7.2.; Please use coremltools.converters.mil.frontend.torch.register_torch_op
  warnings.warn(msg, category=FutureWarning)
Converting PyTorch Frontend ==> MIL Ops:   0%|                                                                                                                                             | 0/4506 [00:00<?, ? ops/s]Saving value type of int64 into a builtin type of int32, might lose precision!
Saving value type of int64 into a builtin type of int32, might lose precision!
Converting PyTorch Frontend ==> MIL Ops: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 4505/4506 [00:01<00:00, 3255.50 ops/s]
Running MIL frontend_pytorch pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00, 13.02 passes/s]
Running MIL default pipeline:  14%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                                                                                                          | 10/71 [00:00<00:03, 15.93 passes/s]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '5409', of the source model, has been renamed to 'var_5409' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  73%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                      | 52/71 [03:36<02:09,  6.79s/ passes]/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
/opt/homebrew/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:896: RuntimeWarning: overflow encountered in cast
  return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 71/71 [07:27<00:00,  6.30s/ passes]
Running MIL backend_mlprogram pipeline: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 12/12 [00:00<00:00, 168.96 passes/s]
zsh: killed     python3 -m exporters.coreml --model=mistralai/Mistral-7B-v0.1 
willwalker misty > /opt/homebrew/Cellar/[email protected]/3.11.7/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

Export Phi-2

Hi!

I'm converting the Microsoft's Phi-2 model to use with swift-transformers.

The conversion process is actually very seamless:

from transformers import AutoTokenizer, AutoModelForCausalLM
from exporters.coreml import CoreMLConfig
from exporters.coreml import export

model = "microsoft/phi-2"

# Load tokenizer and PyTorch weights form the Hub
tokenizer = AutoTokenizer.from_pretrained(model, trust_remote_code=True)
pt_model = AutoModelForCausalLM.from_pretrained(model, trust_remote_code=True, torchscript=True)

class Phi2CoreMLConfig(CoreMLConfig):
    modality = "text"


coreml_config = Phi2CoreMLConfig(pt_model.config, task="text-generation")
mlmodel = export(tokenizer, pt_model, coreml_config)
mlmodel.save("Phi2.mlpackage")

Note that by default the export function is using float32.

Then, I'm using the swift-chat repo to run the model. I'm using the Llama-2 tokenizer. It works perfectly well out of the box. There was only one missing token, the 'space' (' '), but apart from that it works.

The issue is that it is super, super slow (I have a MacBook Pro with 16gb RAM and M1) and it's using close to 11GB of memory. Although the inference is slow, the output makes sense.

Given that it is so slow, I converted the model using float16:

mlmodel = export(tokenizer, pt_model, coreml_config, quantize="float16")

The model is now 5GB, but the inference is giving me gibberish (the output was, before, something that made sense, now it's just a bunch of exclamation marks). I downloaded the model (the 5GB one) into my iPhone 14 Pro and after a few seconds, while it is loading, the app just closes itself.

  1. How can I further decrease the model size? Can we quantize the model even more using CoreML?
  2. Why is the inference speed so slow (with the default float32)?
  3. Why is the model with quantize="float16" basically instantaneous, but outputting gibberish?

Thank you so much for the help!

CoreML Convert Error for distilbert-base-uncased-squad2 Question/Answering model - ValueError: node input.19 (gelu) got 2 input(s), expected [1]

I get a gelu Value Error when trying to convert a distilbert-base-uncased-squad2' model. I also get the same error with the full Bert model bert-large-cased-whole-word-masking-finetuned-squad. It is that the CoreML converter cannot handle 2 inputs, one input for the "question" and another input for the "context"? How can this be fixed?

from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import torch

tokenizer = AutoTokenizer.from_pretrained('twmkn9/distilbert-base-uncased-squad2')
model = AutoModelForQuestionAnswering.from_pretrained('twmkn9/distilbert-base-uncased-squad2', torchscript=True)

tokenizer.save_pretrained("local-pt-checkpoint")
model.save_pretrained("local-pt-checkpoint")

Command Line> python -m exporters.coreml --model=twmkn9/distilbert-base-uncased-squad2 --feature=question-answering local-pt-checkpoint/

ValueError: node input.19 (gelu) got 2 input(s), expected [1]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.