Comments (16)
Indeed you would have to manage all that stuff yourself.
Edit: It might be useful if we provided some Swift wrapper code for this that would hide the complexity (since it's the same for most Transformer models) but right now we don't have this.
from exporters.
I think I originally made it ignore the sequence_length
because seq2seq models always need variable-length inputs. Well, unless you're trying to work around Core ML limitations, I guess. ;-)
from exporters.
Testing T5 is high up in my to-do list, I hope to get to it pretty soon and hopefully I'll have some insight then :) Sorry for the no-answer though.
from exporters.
yikes! I was ready to put my gloves on, but I've spent two days now trying to get the encoder / decoder models to run in python without going through model.generate
without success (except generating gibberish sentences :)
from exporters.
@hollance Hey, I came around of implementing "that stuff" and have it running in Swift on MacOS and iOS now :)
However, the converted model runs exclusively on the CPU (although the Performance Report suggests that some layers are available for GPU / ANE processing - s. screenshot). Is there anything I can do to make this happen? Right now it works, but it's rather slow.
from exporters.
Hi @seboslaw!
I've recently done a similar exercise, and discovered that if the model accepts flexible shapes, then Core ML only uses the CPU. In the case of sequence-to-sequence models such as T5, the decoder is configured to accept inputs whose length is unbounded, as you can see in the Predictions tab of Xcode (1 x 1...
means a batch size of 1 and a sequence length of at least 1, with no upper bound):
I tried to work around this issue by using fixed shapes, but so far I've only tested autoregressive models. Using a fixed sequence length of, say, 128
, makes it possible for Core ML to engage the GPU (even though the ANE is still unused). I'm not sure if this is practical or even possible for the model you are interested in, as the sequence length depends a lot on your particular use case.
In addition, using fixed shapes requires that you prepare your inputs using padding and the appropriate attention masks, which is a bit more work to be done in the Swift code.
This is a very interesting area for us, and as Matthijs mentioned we are considering whether to create some Swift wrappers and a set of "best practices" for conversion to help with these tasks. (No promises though, we're still assessing the problem :)
from exporters.
Hey @pcuenca, thx for your reply!
I've tried your suggestion (I think I did :) and updated the upperBounds of the input parameters. However, the Performance Report still says "CPU only" (see below) :(
I used coremltools to edit the inputs of my already converted decoder model:
import coremltools
import coremltools.proto.FeatureTypes_pb2 as ft
model = coremltools.models.MLModel('../Common/dec.mlpackage')
spec = model.get_spec()
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir) # if model is an mlprogram
input = spec.description.input[0]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 128
input = spec.description.input[1]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1
input = spec.description.input[2]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1
input = spec.description.input[3]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1
output = spec.description.output[0]
output.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1
# print(output)
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir)
model.save("YourNewModel.mlpackage")
Since this didn't seem to work I looked into providing the inputs to the hf exporters tool directly. But then I saw that "The sequence_length specified in the configuration object is ignored" if "seq2seq" is provided.
from exporters.
I've tried your suggestion (I think I did :) and updated the upperBounds of the input parameters
Sorry, I think I wasn't clear. I didn't mean to make the upper limit bounded, but to use fixed shapes for all dimensions. This is an example of a model where Core ML uses the GPU for all operations:
The first dimension is always 1
, and the second dimension is always 128
. My apologies for the confusion!
from exporters.
Hey @pcuenca,
no worries - you were clear, I simply lack experience with the exporter :) I think I understand what needs to be done now, however, it seems that exporters
currently doesn't support this, right?
I need to export the T5 as two separate models, thus providing the seq2seq
parameter to my custom MLConfig. However, as the README states that if I set sequence_length
in my custom MLConfig, it will be ignored:
The sequence_length specified in the configuration object is ignored if "seq2seq" is provided.
Why is it this way anyway? And is there a way to get this done aside from patching convert.py
?
This is what I've started with (only decoder_input_ids
for now):
from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer
from exporters.coreml import export
from exporters.coreml.models import T5CoreMLConfig
from transformers import T5TokenizerFast, T5ForConditionalGeneration
from collections import OrderedDict, UserDict
from exporters.coreml.config import InputDescription
class MyCoreMLConfig(T5CoreMLConfig):
@property
def inputs(self) -> OrderedDict[str, InputDescription]:
input_descs = super().inputs
input_descs["decoder_input_ids"].sequence_length = 128
return input_descs
model_ckpt = "Einmalumdiewelt/T5-Base_GNAD"
base_model = T5ForConditionalGeneration.from_pretrained(model_ckpt, torchscript=True)
preprocessor = T5TokenizerFast.from_pretrained(model_ckpt)
coreml_config = MyCoreMLConfig(base_model.config, task="text2text-generation", seq2seq="decoder")
decoder_mlmodel = export(preprocessor, base_model, coreml_config)
decoder_mlmodel.save('Test.mlpackage')
from exporters.
In the meantime I've tried editing the (using the exporter) exported MLModel through coremltools:
import coremltools
import numpy as np
import coremltools.proto.FeatureTypes_pb2 as ft
from coremltools.proto import FeatureTypes_pb2
model = coremltools.models.MLModel('../Common/dec.mlpackage')
spec = model.get_spec()
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir) # if model is an mlprogram
input = spec.description.input[0]
# Create a new MultiArrayType
new_type = FeatureTypes_pb2.ArrayFeatureType()
new_type.shape.extend([1, 128])
new_type.dataType = FeatureTypes_pb2.ArrayFeatureType.INT32
# Replace the old type with the new one
input.type.multiArrayType.CopyFrom(new_type)
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir)
model.save("YourNewModel.mlpackage")
However, I receive this error:
/opt/homebrew/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "compiler error: Encountered an error while compiling a neural network model: validator error: Model input 'decoder_input_ids' has a different shape than its corresponding parameter to main.".
_warnings.warn(
So as far as I understand modifying an exported MLModel is off the table. @pcuenca Do you think doing it the way described in my prev post will be possible?
from exporters.
@pcuenca no worries and I totally understand :)
Could you tell me real quick though why the sequence_length specified in the configuration object is ignored if "seq2seq" is provided? That way I can maybe start digging into the exporters implementation and try to fix it on my end.
from exporters.
@seboslaw What you tried to do here used to work, but in newer versions of Core ML it results in the error you've seen. The problem is that the model was compiled with flexible shapes and this is inconsistent with the (fixed) shape you assign later on.
I'm working in a local branch with some quick and dirty modifications to convert T5 using fixed shapes. I can push it later today so that you can keep testing on your end.
from exporters.
@seboslaw This is the branch: #37. I have other local changes, so I hope I didn't break or miss anything. I verified that T5 encoder and decoder export with fixed shapes for all their inputs, and that Xcode's performance report successfully chooses the GPU for all operations. I haven't tried to run inference inside an app yet.
from exporters.
@pcuenca awesome! I’ll give it a try as soon as I’m in front of my computer. Thanks a lot already for the effort!
from exporters.
@pcuenca I tried it, but unfortunately it gives different results when compared to the non-GPU model. Hopefully, I simply messed up the padding. Right now I'm focussing on the decoder. I padded as follows:
decoder_input_ids: padded with 0s
decoder_attention_mask: leading 1s the size of the unpadded decoder_input_ids
encoder_last_hidden_state (1 x 128 x 768): padded the 2nd dimension (formerly 104, now 128) with zero-filled [768] arrays/tensors
encoder_attention_mask: leading 1s the size of the unpadded decoder_input_ids
Would you say that's correct?
EDIT: Another problem I found is that the decoder_output.token_scores
have the "wrong" dimension. Before my decoder inputs on the very first run looked like this:
decoder_input_ids: [0]
decoder_attention_mask: [1]
encoder_last_hidden_state: Array with dim 1x104x768
encoder_attention_mask: [1]
decoder_output.token_scores
then had the output dimension: 1x1x768.
With the new model my inputs look like this:
decoder_input_ids: [0,0,0,....0] (dim=128)
decoder_attention_mask: [1,0,0,0,0,....0] (dim=128)
encoder_last_hidden_state: MLMultiArray with dim 1x128x768 (the last 24 tensors of the 2nd dim are filled with 0s)
encoder_attention_mask: [1,0,0,0,0,....0] (dim=128)
decoder_output.token_scores
now has the output dimension: 1x128x768.
I'm not experienced with the sec2sec model architecture, but aren't the attention_masks supposed to suppress the additional decoder_input_ids
entries/padding?
from exporters.
@seboslaw Did you get summarization to work in Swift? How did you implement it? I converted the model, but don't know how to use it, and wasn't able to find much information online.
from exporters.
Related Issues (20)
- llama possible ? HOT 3
- `trust_remote_code=True` HOT 2
- Requesting Support for Salesforce/blip2-opt-2.7b HOT 1
- Converting EleutherAI/Pythia Models HOT 4
- `GPTNeoX` incompatible with transformers >= 4.28.0 HOT 1
- mlpackage vs mlmodel for Falcon 7B HOT 1
- Remove gpt-bigcode workarounds HOT 1
- why illegal hardware instruction HOT 2
- SegFormer model exported to CoreML is slow HOT 3
- Consider using 16-bit precision by default HOT 1
- Error when transforming Roberta models HOT 2
- Converting llama-2-7b failed HOT 12
- Detr-Resnet-50 Model Conversion to CoreML
- Support for embedding models (BGE, GTE etc)
- Export Phi-2 HOT 6
- Support for .safetensors files? HOT 1
- Exporter being killed HOT 5
- Exporter being killed
- Is it possible to add support for Kosmos-2
- Nonsensical Token Output of Exported Models
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from exporters.