Giter Site home page Giter Site logo

Comments (16)

hollance avatar hollance commented on May 18, 2024 4

Indeed you would have to manage all that stuff yourself.

Edit: It might be useful if we provided some Swift wrapper code for this that would hide the complexity (since it's the same for most Transformer models) but right now we don't have this.

from exporters.

hollance avatar hollance commented on May 18, 2024 3

I think I originally made it ignore the sequence_length because seq2seq models always need variable-length inputs. Well, unless you're trying to work around Core ML limitations, I guess. ;-)

from exporters.

pcuenca avatar pcuenca commented on May 18, 2024 1

Testing T5 is high up in my to-do list, I hope to get to it pretty soon and hopefully I'll have some insight then :) Sorry for the no-answer though.

from exporters.

seboslaw avatar seboslaw commented on May 18, 2024

yikes! I was ready to put my gloves on, but I've spent two days now trying to get the encoder / decoder models to run in python without going through model.generate without success (except generating gibberish sentences :)

from exporters.

seboslaw avatar seboslaw commented on May 18, 2024

@hollance Hey, I came around of implementing "that stuff" and have it running in Swift on MacOS and iOS now :)
However, the converted model runs exclusively on the CPU (although the Performance Report suggests that some layers are available for GPU / ANE processing - s. screenshot). Is there anything I can do to make this happen? Right now it works, but it's rather slow.

Screenshot 2023-05-12 at 17 03 29

from exporters.

pcuenca avatar pcuenca commented on May 18, 2024

Hi @seboslaw!

I've recently done a similar exercise, and discovered that if the model accepts flexible shapes, then Core ML only uses the CPU. In the case of sequence-to-sequence models such as T5, the decoder is configured to accept inputs whose length is unbounded, as you can see in the Predictions tab of Xcode (1 x 1... means a batch size of 1 and a sequence length of at least 1, with no upper bound):

Screenshot 2023-05-12 at 20 16 45

I tried to work around this issue by using fixed shapes, but so far I've only tested autoregressive models. Using a fixed sequence length of, say, 128, makes it possible for Core ML to engage the GPU (even though the ANE is still unused). I'm not sure if this is practical or even possible for the model you are interested in, as the sequence length depends a lot on your particular use case.

In addition, using fixed shapes requires that you prepare your inputs using padding and the appropriate attention masks, which is a bit more work to be done in the Swift code.

This is a very interesting area for us, and as Matthijs mentioned we are considering whether to create some Swift wrappers and a set of "best practices" for conversion to help with these tasks. (No promises though, we're still assessing the problem :)

from exporters.

seboslaw avatar seboslaw commented on May 18, 2024

Hey @pcuenca, thx for your reply!

I've tried your suggestion (I think I did :) and updated the upperBounds of the input parameters. However, the Performance Report still says "CPU only" (see below) :(

I used coremltools to edit the inputs of my already converted decoder model:

import coremltools
import coremltools.proto.FeatureTypes_pb2 as ft
model = coremltools.models.MLModel('../Common/dec.mlpackage')
spec = model.get_spec()
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir) # if model is an mlprogram

input = spec.description.input[0]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 128

input = spec.description.input[1]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1

input = spec.description.input[2]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1

input = spec.description.input[3]
input.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1

output = spec.description.output[0]
output.type.multiArrayType.shapeRange.sizeRanges[1].upperBound = 1

# print(output)

model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir)
model.save("YourNewModel.mlpackage")

Since this didn't seem to work I looked into providing the inputs to the hf exporters tool directly. But then I saw that "The sequence_length specified in the configuration object is ignored" if "seq2seq" is provided.

Screenshot 2023-05-12 at 23 06 37 Screenshot 2023-05-12 at 23 07 47

from exporters.

pcuenca avatar pcuenca commented on May 18, 2024

I've tried your suggestion (I think I did :) and updated the upperBounds of the input parameters

Sorry, I think I wasn't clear. I didn't mean to make the upper limit bounded, but to use fixed shapes for all dimensions. This is an example of a model where Core ML uses the GPU for all operations:

Screenshot 2023-05-14 at 19 34 05

The first dimension is always 1, and the second dimension is always 128. My apologies for the confusion!

from exporters.

seboslaw avatar seboslaw commented on May 18, 2024

Hey @pcuenca,

no worries - you were clear, I simply lack experience with the exporter :) I think I understand what needs to be done now, however, it seems that exporters currently doesn't support this, right?

I need to export the T5 as two separate models, thus providing the seq2seq parameter to my custom MLConfig. However, as the README states that if I set sequence_length in my custom MLConfig, it will be ignored:

https://github.com/huggingface/exporters/tree/20e849200d2e4fb29711a7ed8f37c7a16234e60f#exporting-an-encoder-decoder-model

The sequence_length specified in the configuration object is ignored if "seq2seq" is provided.

Why is it this way anyway? And is there a way to get this done aside from patching convert.py?

This is what I've started with (only decoder_input_ids for now):

from transformers import AutoConfig, AutoModelForSequenceClassification, AutoTokenizer
from exporters.coreml import export
from exporters.coreml.models import T5CoreMLConfig
from transformers import T5TokenizerFast, T5ForConditionalGeneration
from collections import OrderedDict, UserDict
from exporters.coreml.config import InputDescription

class MyCoreMLConfig(T5CoreMLConfig):
    @property
    def inputs(self) -> OrderedDict[str, InputDescription]:
        input_descs = super().inputs
        input_descs["decoder_input_ids"].sequence_length = 128
        return input_descs

model_ckpt = "Einmalumdiewelt/T5-Base_GNAD"
base_model = T5ForConditionalGeneration.from_pretrained(model_ckpt, torchscript=True)
preprocessor = T5TokenizerFast.from_pretrained(model_ckpt)

coreml_config = MyCoreMLConfig(base_model.config, task="text2text-generation", seq2seq="decoder")
decoder_mlmodel = export(preprocessor, base_model, coreml_config)

decoder_mlmodel.save('Test.mlpackage')

from exporters.

seboslaw avatar seboslaw commented on May 18, 2024

In the meantime I've tried editing the (using the exporter) exported MLModel through coremltools:

import coremltools
import numpy as np
import coremltools.proto.FeatureTypes_pb2 as ft
from coremltools.proto import FeatureTypes_pb2

model = coremltools.models.MLModel('../Common/dec.mlpackage')
spec = model.get_spec()
model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir) # if model is an mlprogram

input = spec.description.input[0]

# Create a new MultiArrayType
new_type = FeatureTypes_pb2.ArrayFeatureType()
new_type.shape.extend([1, 128])
new_type.dataType = FeatureTypes_pb2.ArrayFeatureType.INT32

# Replace the old type with the new one
input.type.multiArrayType.CopyFrom(new_type)

model = coremltools.models.MLModel(spec, weights_dir=model.weights_dir)
model.save("YourNewModel.mlpackage")

However, I receive this error:

/opt/homebrew/lib/python3.10/site-packages/coremltools/models/model.py:146: RuntimeWarning: You will not be able to run predict() on this Core ML model. Underlying exception message was: Error compiling model: "compiler error:  Encountered an error while compiling a neural network model: validator error: Model input 'decoder_input_ids' has a different shape than its corresponding parameter to main.".
  _warnings.warn(

So as far as I understand modifying an exported MLModel is off the table. @pcuenca Do you think doing it the way described in my prev post will be possible?

from exporters.

seboslaw avatar seboslaw commented on May 18, 2024

@pcuenca no worries and I totally understand :)
Could you tell me real quick though why the sequence_length specified in the configuration object is ignored if "seq2seq" is provided? That way I can maybe start digging into the exporters implementation and try to fix it on my end.

from exporters.

pcuenca avatar pcuenca commented on May 18, 2024

@seboslaw What you tried to do here used to work, but in newer versions of Core ML it results in the error you've seen. The problem is that the model was compiled with flexible shapes and this is inconsistent with the (fixed) shape you assign later on.

I'm working in a local branch with some quick and dirty modifications to convert T5 using fixed shapes. I can push it later today so that you can keep testing on your end.

from exporters.

pcuenca avatar pcuenca commented on May 18, 2024

@seboslaw This is the branch: #37. I have other local changes, so I hope I didn't break or miss anything. I verified that T5 encoder and decoder export with fixed shapes for all their inputs, and that Xcode's performance report successfully chooses the GPU for all operations. I haven't tried to run inference inside an app yet.

from exporters.

seboslaw avatar seboslaw commented on May 18, 2024

@pcuenca awesome! I’ll give it a try as soon as I’m in front of my computer. Thanks a lot already for the effort!

from exporters.

seboslaw avatar seboslaw commented on May 18, 2024

@pcuenca I tried it, but unfortunately it gives different results when compared to the non-GPU model. Hopefully, I simply messed up the padding. Right now I'm focussing on the decoder. I padded as follows:

decoder_input_ids: padded with 0s
decoder_attention_mask: leading 1s the size of the unpadded decoder_input_ids
encoder_last_hidden_state (1 x 128 x 768): padded the 2nd dimension (formerly 104, now 128) with zero-filled [768]  arrays/tensors
encoder_attention_mask: leading 1s the size of the unpadded decoder_input_ids

Would you say that's correct?

EDIT: Another problem I found is that the decoder_output.token_scores have the "wrong" dimension. Before my decoder inputs on the very first run looked like this:

decoder_input_ids: [0]
decoder_attention_mask: [1]
encoder_last_hidden_state: Array with dim 1x104x768
encoder_attention_mask: [1]

decoder_output.token_scores then had the output dimension: 1x1x768.

With the new model my inputs look like this:

decoder_input_ids: [0,0,0,....0] (dim=128)
decoder_attention_mask: [1,0,0,0,0,....0] (dim=128)
encoder_last_hidden_state: MLMultiArray with dim 1x128x768 (the last 24 tensors of the 2nd dim are filled with 0s)
encoder_attention_mask: [1,0,0,0,0,....0] (dim=128)

decoder_output.token_scores now has the output dimension: 1x128x768.

I'm not experienced with the sec2sec model architecture, but aren't the attention_masks supposed to suppress the additional decoder_input_ids entries/padding?

from exporters.

rishaandesai avatar rishaandesai commented on May 18, 2024

@seboslaw Did you get summarization to work in Swift? How did you implement it? I converted the model, but don't know how to use it, and wasn't able to find much information online.

from exporters.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.