Giter Site home page Giter Site logo

abhilash1910 / longpegasus Goto Github PK

View Code? Open in Web Editor NEW
4.0 2.0 1.0 8.09 MB

LongPegasus package is used for inducing longformer self attention over base pegasus abstractive summarization model to increase the token limit and performance.

License: Other

Python 26.85% Jupyter Notebook 73.15%
longformer-models pegasus tf2 transformers summarization

longpegasus's Introduction

LongPegasus

A Longer Version of Pegasus TF Model For Abstractive Summarization ๐Ÿค–

img1

LongPegasus package is used for inducing longformer self attention over base pegasus abstractive summarization model to increase the token limit and performance.The Pegasus is a large Transformer-based encoder-decoder model with a new pre-training objective which is adapted to abstractive summarization. More specifically, the pre-training objective, called "Gap Sentence Generation (GSG)", consists of masking important sentences from a document and generating these gap-sentences.On the other hand, the Longformer is a Transformer which replaces the full-attention mechanism (quadratic dependency) with a novel attention mechanism which scale linearly with the input sequence length. Consequently, Longformer can process sequences up to 4,096 tokens long (8 times longer than BERT which is limited to 512 tokens).This package plugs Longformers attention mechanism to Pegasus in order to perform abstractive summarization on long documents. The base modules are built on Tensorflow platform.

Usage

The package can be installed from Pypi using the following code: The latest stable release is 0.3 which resolves an issue related to Keras base layer Issue#2

!pip install LongPegasus==0.3

Using version 0.1

For an older version (0.1) similar syntax can be used with the proper version. The older version 0.1 does not support different pretrained pegasus summarization models from huggingface and resorts to the default pegasus pretrained model from google. The senteencepiece inbuilt package has also to be manually installed (for google colab) in the case of the previous version (0.1). The driver_test_long_model.py contains the steps to run this package which is described as follows:

  • Importing the LongPegasus module from the package
from LongPegasus.LongPegasus import LongPegasus
  • Instantiating an object from that module class and calling the function create_long_model.
long_=LongPegasus()             
model,tokenizer=long_.create_long_model(save_model="E:\\Pegasus\\", attention_window=512, max_pos=4096)
  • This allows the model and the tokenizer to be stored in the 'save_model' folder. The arguements include the attention_window (extendable upto 4096) and the max_pos which is the defailt longformer encoder size (4096 tokens). For the version 0.1 , this only creates a long form of the pegasus-xsum model for summarization.

  • The model and tokenizer can be loaded and then used for inference as follows (either from the stored results in the folders or can be loaded with TFPegasusForConditionalGeneration):

ARTICLE_TO_SUMMARIZE = (
    "PG&E stated it scheduled the blackouts in response to forecasts for high winds "
    "amid dry conditions. The aim is to reduce the risk of wildfires. Nearly 800 thousand customers were "
    "scheduled to be affected by the shutoffs which were expected to last through at least midday tomorrow."
    )
inputs = tokenizer([ARTICLE_TO_SUMMARIZE], max_length=4096, return_tensors='tf')
    
# Generate Summary
summary_ids = model.generate(inputs['input_ids'])
print([tokenizer.decode(g, skip_special_tokens=True, clean_up_tokenization_spaces=False) for g in summary_ids])

The following is the output from the google xsum model for the input article:

['Thousands of people have been affected by wildfires across the US over the past few weeks.']
  • Sentencepiece is not installed by default in this version and requires a manuall installation via pip
!pip install sentencepiece
  • For inference, it is important to specify 'return_tensors' as tf since the module uses Tensorflow backend.

Using the latest versions 0.3 (& 0.2)

  • The only difference is in the arguements of the create_long_model. There is an additional parameter called as 'model_name' which can be None or any model from the pretrained model list for Pegasus .If the model_name parameter is chosen as None , the the default 'google/pegasus-xsum' model is loaded as in version 0.1.The syntax for the create_long_model method is as follows:
model_name='human-centered-summarization/financial-summarization-pegasus'
model,tokenizer=l.create_long_model(save_model="E:\\Pegasus\\", attention_window=4096, max_pos=4096,model_name=model_name)

Rest of the code segment mentioned in the previous version is the same. It is important to highlight that the model_name should be specified as either None or a valid pegasus model from the huggingface model hub.

  • For google colab,(possibly other notebook libraries), there is a requirement to install sentencepiece for transformers to function properly. This is done by default in the latest version (0.2) and no need to manually install it again.

  • For inference, it is important to specify 'return_tensors' as tf since the module uses Tensorflow backend.

  • Due to an issue in the keras base layer arguements, sometimes in Colab an issue arises for trainable arguement. This is resolved in the 0.3 stable version.A Colab notebook is present in the repository.

Finetuning with the LongPegasus models

The models & tokenizers which get stored in the local drives/storages through this package is a longer version of pegasus and can be finetuned for different downstrea,m tasks as well. There will be follow up notebooks on that , and the huggingface site contains steps for finetuning the models.

Samples

More example notebooks would be shared from Kaggle/Colab. In the meantime, the package can be tried in Kaggle as well.A simple walkthrough has been provided in the colab link.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

longpegasus's People

Contributors

abhilash1910 avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

techthiyanes

longpegasus's Issues

Can not get start

Downloading: 100%
1.24k/1.24k [00:00<00:00, 24.3kB/s]
Downloading: 100%
2.12G/2.12G [01:17<00:00, 28.3MB/s]
All model checkpoint layers were used when initializing TFPegasusForConditionalGeneration.

All the layers of TFPegasusForConditionalGeneration were initialized from the model checkpoint at human-centered-summarization/financial-summarization-pegasus.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFPegasusForConditionalGeneration for predictions without further training.
Downloading: 100%
1.82M/1.82M [00:00<00:00, 7.27MB/s]
Downloading: 100%
1.31k/1.31k [00:00<00:00, 29.2kB/s]
Downloading: 100%
1.40k/1.40k [00:00<00:00, 29.2kB/s]
max_encoder_position_embeddings: 4096
max_decoder_position_embeddings: 512

TypeError Traceback (most recent call last)
in ()
1 model_name='human-centered-summarization/financial-summarization-pegasus'
----> 2 model,tokenizer=long_.create_long_model(save_model=".\Pegasus\", attention_window=4096, max_pos=4096,model_name=model_name)

3 frames
/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py in init(self, trainable, name, dtype, dynamic, **kwargs)
339 trainable.dtype is tf.bool)):
340 raise TypeError(
--> 341 'Expected trainable argument to be a boolean, '
342 f'but got: {trainable}')
343 self._trainable = trainable

TypeError: Expected trainable argument to be a boolean, but got: LongformerPegasusConfig {
"_name_or_path": "google/pegasus-xsum",
"activation_dropout": 0.1,
"activation_function": "relu",
"add_bias_logits": false,
"add_final_layer_norm": true,
"architectures": [
"LongformerForPegasus"
],
"attention_dilation": [
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1
],
"attention_dropout": 0.1,
"attention_mode": "sliding_chunks",
"attention_probs_dropout_prob": 0.1,
"attention_window": [
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096,
4096
],
"autoregressive": false,
"bos_token_id": 0,
"classif_dropout": 0.0,
"classifier_dropout": 0.0,
"d_model": 1024,
"decoder_attention_heads": 16,
"decoder_ffn_dim": 4096,
"decoder_layerdrop": 0.0,
"decoder_layers": 16,
"decoder_start_token_id": 0,
"do_blenderbot_90_layernorm": false,
"dropout": 0.1,
"encoder_attention_heads": 16,
"encoder_ffn_dim": 4096,
"encoder_layerdrop": 0.0,
"encoder_layers": 16,
"eos_token_id": 1,
"extra_pos_embeddings": 1,
"force_bos_token_to_be_generated": false,
"forced_eos_token_id": 1,
"gradient_checkpointing": false,
"hidden_dropout_prob": 1e-05,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1",
"2": "LABEL_2"
},
"init_std": 0.02,
"initializer_range": null,
"is_encoder_decoder": true,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1,
"LABEL_2": 2
},
"layer_norm_eps": 1e-05,
"length_penalty": 0.6,
"max_decoder_position_embeddings": 512,
"max_encoder_position_embeddings": 4096,
"max_length": 64,
"max_position_embeddings": 512,
"model_type": "pegasus",
"normalize_before": true,
"normalize_embedding": false,
"num_beams": 8,
"num_hidden_layers": 16,
"pad_token_id": 0,
"scale_embedding": true,
"static_position_embeddings": true,
"transformers_version": "4.12.5",
"use_cache": true,
"vocab_size": 96103
}

Something about the attention_mask setting

Hi, I'm just interested in your work. I'm trying to turn BART into longformer version. In your test program, you just use the pegasus tokenizer to tokenize the sentence,and don't care about the "attention_mask", right?
I mean, In the huggingface's longformer version, "-1,0,1" means "no attention, local attention and global attention", but it is different from the definition of bart tokenizer (which 1 means attention, 0 means no attention. So does pegasus?). So when it comes to the "attention_mask", the situation may be more complicated, right? The user has to define the attention_mask first.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.