abelriboulot / onnxt5 Goto Github PK

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

License: Apache License 2.0

Python 74.83% Jupyter Notebook 25.17%

nlp nlp-machine-learning onnx onnxruntime transformer transformers inference summarization translation text-generation

onnxt5's People

Contributors

Stargazers

Watchers

onnxt5's Issues

Use OnnxRuntime IO Binding to improve GPU inference performance

In current benchmark results, ONNX is slower than PyTorch above 500 words. I think the cause is the OnnxRuntime API used in inference:

onnxt5/onnxt5/models.py

Line 121 in 2844749

    
           encoder_outputs_prompt = self.encoder.run(None, {"input_ids": generated.cpu().numpy()})[0]

For GPU inference, that API need extra memory copy (from CPU to GPU for input tensors, and from GPU to CPU for output tensors). When sequence length is large, the IO latency might be significant.

I suggest to try OnnxRuntime IO Binding to avoid extra memory copy.

int() argument must be a string , when running exemple.

Hello , i can't run the first exemple ,

from onnxt5 import GenerativeT5
from onnxt5.api import get_encoder_decoder_tokenizer

decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
prompt = 'translate English to French: I was a victim of a series of accidents.'

output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
 # output_text: "J'ai été victime d'une série d'accidents."

the model begin calculation but before End, i have this error :

TypeError                                 Traceback (most recent call last)
<ipython-input-1-257f12b63043> in <module>
      5 prompt = 'translate English to French: I was a victim of a series of accidents.'
      6 
----> 7 output_text, output_logits = generative_t5(prompt, max_length=16, temperature=0.)
      8 # output_text: "J'ai été victime d'une série d'accidents."

~\Anaconda3\envs\onnxt5\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    720             result = self._slow_forward(*input, **kwargs)
    721         else:
--> 722             result = self.forward(*input, **kwargs)
    723         for hook in itertools.chain(
    724                 _global_forward_hooks.values(),

~\Anaconda3\envs\onnxt5\lib\site-packages\onnxt5\models.py in forward(self, prompt, max_length, temperature, repetition_penalty, top_k, top_p, max_context_length)
    145                 new_tokens.append(next_token)
    146 
--> 147             return self.tokenizer.decode(new_tokens), new_logits

~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils_base.py in decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
   3000             skip_special_tokens=skip_special_tokens,
   3001             clean_up_tokenization_spaces=clean_up_tokenization_spaces,
-> 3002             **kwargs,
   3003         )
   3004 

~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils.py in _decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, spaces_between_special_tokens)
    730         spaces_between_special_tokens: bool = True,
    731     ) -> str:
--> 732         filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
    733 
    734         # To avoid mixing byte-level and unicode for byte-level BPT

~\Anaconda3\envs\onnxt5\lib\site-packages\transformers\tokenization_utils.py in convert_ids_to_tokens(self, ids, skip_special_tokens)
    708         tokens = []
    709         for index in ids:
--> 710             index = int(index)
    711             if skip_special_tokens and index in self.all_special_ids:
    712                 continue

TypeError: int() argument must be a string, a bytes-like object or a number, not 'list

and i have no idea how to find solution , if you have any solution !? thx !

Default T5 summary contains <extra_id_2>.<extra_id_3>.<extra_id_4>

<extra_id_0> the company<extra_id_1> the company<extra_id_2>.<extra_id_3>.<extra_id_4>.<extra_id_5>.<extra_id_6>. <extra_id_7>.

Do I need some postprocessing? Or it is an issue?

Limit input ingestion to context length of the model

At the moment the context expands indefinitely whereas the self-attention doesn't.

Running example "export_pretrained_model.py" as-is fails (See details)

86%|████████▌ | 18/21 [00:00<00:00, 44.29it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-f543e3365977> in <module>()
     27 # Generating text
     28 generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
---> 29 generative_t5('translate English to French: I was a victim of a series of accidents.', 21, temperature=0.)[0]

3 frames
/usr/local/lib/python3.7/dist-packages/transformers/tokenization_utils_fast.py in _decode(self, token_ids, skip_special_tokens, clean_up_tokenization_spaces, **kwargs)
    505         if isinstance(token_ids, int):
    506             token_ids = [token_ids]
--> 507         text = self._tokenizer.decode(token_ids, skip_special_tokens=skip_special_tokens)
    508 
    509         if clean_up_tokenization_spaces:

TypeError: 'float' object cannot be interpreted as an integer

Any possible version conflicts that you know of?

Build a progress bar for the download of the initial files of the model

The relevant function is download_generation_model in api.py

cpu only inferencing

Hi there,
For text translation tasks, can the onnx model be inferenced using cpu only.
how much ram does it require

Inference time on gpu vs onnxt5-gpu

@abelriboulot , @Ki6an , @brymck .
I have finetuned t5 model for paraphrasing task like this: Paraphrase with t5

I want to reduce inference time, so I exported finetuned t5 model using onnxt5, here I get time taken more in case where
I use onnx model on gpu than pytorch model on gpu.

gpu:
time taken = 0.2357314471155405
time taken = 0.24958523781970143
time taken = 0.20342689706012607
time taken = 0.5490081580355763
time taken = 0.10756197292357683

onnxt5-gpu
time taken = 0.5277913622558117
time taken = 0.6335883080027997
time taken = 0.6975196991115808
time taken = 1.9159171842038631
time taken = 0.7938353712670505

Did I make mistake in exporting/loading model ?
gpu code
onnxt5-gpu code

Repeat variables assignment both

In folder onnxt5 models.py, in lines 118, 124, 125, 126 there are totally 4 useless assignment.

Cosine similarity between embeddings

hi @abelriboulot
How can i calculate cosine similarity between two sentences using the encode and decoder embeddings?

Can this model suitable for multilingual-t5 accelerate?

Recently, I use the chinese function of multilingual-t5 model to accomplish the Chinese NLG tasks. However, the inference speed might be slow, could this model be used for multilingual-t5? How can I do?

Can this be used with Flan-T5?

Specifically, I am using google/flan-t5-large in a Colab, but the inference time is rather slow for my needs. Can it benefit from onnxt5?

Given model could not be parsed while creating inference session. Error message: Protobuf parsing failed.

Hi there, I've run a guide code and it doesn't work.

I'm getting an error on the following line,
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()

text is a text from Wikipedia about cars.

onnxt5==0.1.4
protobuf==3.6.0
python==3.7

abelriboulot / onnxt5 Goto Github PK

onnxt5's People

Contributors

Stargazers

Watchers

Forkers

onnxt5's Issues

Recommend Projects

Recommend Topics

Recommend Org