Giter Site home page Giter Site logo

ymfa / seq2seq-summarizer Goto Github PK

View Code? Open in Web Editor NEW
356.0 15.0 72.0 75 KB

Pointer-generator reinforced seq2seq summarization in PyTorch

License: MIT License

Python 100.00%
pytorch seq2seq attention-mechanism pointer-network reinforcement-learning abstractive-summarization coverage-mechanism attention coverage summarizer

seq2seq-summarizer's People

Contributors

mollynatsu avatar ymfa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

seq2seq-summarizer's Issues

./ROUGE-1.5.5.pl: Permission denied

I run the code in GPU server,but when run the rouge to value the result ,the error come out:
/bin/sh: 1: ./ROUGE-1.5.5.pl: Permission denied
image

Do you know what happen? I cann't solve it .

UnicodeEncodeError: 'charmap' codec can't encode character

The code in make_cnndm_data.py throw me this error:

File "c:/Users/mmiin/pointer_gen/data/makecnndmdata.py", line 116, in
fout.write(" ".join(text) + "\t" + " ".join(summary) + "\n")
File "C:\Users\mmiin\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2032' in position 614: character maps to

How to fix this issue ?

word vector

sorry, but how can I get data/.vector_cache/glove.6B.100d.txt?

The code in utils is so weird, Can you give some explanation?

class Example(NamedTuple): src: List[str] tgt: List[str] src_len: int # inclusive of EOS, so that it corresponds to tensor shape tgt_len: int # inclusive of EOS, so that it corresponds to tensor shape

code above implemented nothing with class Example

The implementation of coverage mechanism seems to be wrong.

The author of pointer-generator propose a method called coverage mechanism.
Coverage vector, which is the sum of attention distributions over all previous decoder timesteps, but the coverage vector in this repository seems to sum the attention of encoder!
Please help me find the correct method to implementation the mechanism or tell me where is my fault.

Why beam search results are sorted based on loss, not on ROUGE scores

Currently results of the beam search are sorted based on the loss :

sorted(results, key=lambda h: -h.avg_log_prob)[:beam_size] (here)

And then rouge score taken is the score corresponding to the first result :

r1 += scores[0]['1_f'] (here)


Why such a decision ? Aren't we looking for a better ROUGE score (we don't really care of the loss at testing time, do we ?) ?

The computaion of neg_reward is wrong

This code uses batch-averaged (sample_rouge - baseline rouge), but it don't make sense in math and this item should be sample-wise because what we really want to maximize is this:

Rouge score / training for basic seq2seq attention : too slow training

I'm trying to run your (very great) code using a basic seq2seq with attention architecture, in order to get rouge scores.

In order to get a basic seq2seq architecture, I used following parameters :

python train.py --vocab_size 50000 --hidden_size 256 --dec_hidden_size 256 --embed_size 128 --pointer False --enc_attn_cover False --cover_loss 0 --optimizer adagrad --lr 0.15 --batch_size 16 --n_batches 2000 --val_batch_size 16 --n_val_batches 200 --grad_norm 2 --min_out_len 40 --max_out_len 120 --embed_file None

(removing coverage mecanism, pointer network, RL, and setting same parameters as described in the paper Get to the Point)

I'm training on CNN / DailyMail dataset.


My problem is : training is way too long.

It takes 6sec per iteration (training on GPU), with a batch size of 16 it means 17 500 iterations / epochs => 29h per epoch.

In the paper, they trained the model for 33epochs, which would mean more than a month of training time with this implementation.


Why is it so slow to train ? How can I fix it ?

Steps for implementation

Hello,
I hope you are doing good.
I was going through your code and found it pretty interesting. There are lot of things which I can learn from this. It would be really helpful if you can also describe some installation steps or starting steps about format of data and how to train this model and what are the software requirements of this model.

Thank you.

when i try run the script make_cnndm_data.py it tells a error

i am new in pythin & nlp, so i am not sure how to solve this problem.
i will appreciate you if you can tell me how to solve that problem

FileNotFoundError: [Errno 2] No such file or directory: '/Users/******/Downloads/seq2seq-summarizer-master/data/cnndm/all_train.txt'

the end of summarization

i want to ask about the end_token of the generating summarization.
for example. extras needed for film that starts island in the island ; ' for rise again '' film '' sowing learning learning tonight year in '' production : starts - island -- in that tonight production - in - extras : 40,000 production : extras commissioner island , , year -- starts stunning tonight production 40,000 production -- in in later -- -- -- island -- -- ) -- -- -- launch also in later -- are month -- -- -- -- bahama ) -- -- production -- extras year fulton , rookie 2 ; bakersfield prohibition . easy ,

The sentence is so long and has many redundancies , i dont know what i should do to let the decoder to learn to stop the sentence. Is it the mechanism of beam search? i want to ask for author`s help. thanks a lot.

the valid loss will increase

Hello, i have a trouble.
when i train the model by using the data of google.
the trainn loss is decrease, but the valid loss is increase at about 6. it differs greatly.
i dont know what is wrong.
And the results are alway have same words in a sentence. Such as pia has so transported over hajj hajj & new hartford man charged with public lewdness lewdness i am confused whether the coverage work.

One problem about rouge

I'm now running this model on Colab, but there is one problem about rouge evaluation.

Reading dataset data/cnndm.gz... 287089 pairs.
Vocabulary loaded, 30004 words.
29584 pre-trained embeddings loaded.
Reading dataset data/cnndm.val.gz... 13366 pairs.
Training 3609207 trainable parameters...
Epoch 1:   0% 0/1000 [00:00<?, ?it/s]/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Epoch 1: 100% 1000/1000 [1:11:31<00:00,  4.29s/it, loss=4.84341]
Valid 1:   0% 0/100 [00:00<?, ?it/s]/bin/sh: 1: ./ROUGE-1.5.5.pl: not found
Traceback (most recent call last):
  File "train.py", line 232, in <module>
    train(train_gen, v, m, p, val_gen, train_status)
  File "train.py", line 144, in train
    show_cover_loss=params.show_cover_loss)
  File "/content/drive/My Drive/Mylab/seq2seq-summarizer/test.py", line 72, in eval_batch
    scores = rouge(gold_summaries, decoded_batch)
  File "/content/drive/My Drive/Mylab/seq2seq-summarizer/utils.py", line 395, in rouge
    shell=True, cwd=os.path.join(this_dir, 'data'))
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command './ROUGE-1.5.5.pl -e data -a -n 2 -2 4 -u /tmp/tmp9eh8e5z4/task.xml' returned non-zero exit status 127.
Valid 1:   0% 0/100 [00:03<?, ?it/s]

I would appreciate you if you could give me some help.

I have some question about Rouge and how to train it more quickly?

I run these codes on Google Colab.

  1. about Rouge
    I download ROUGE-1.5.5.pl from network and I just put this file under the folder without doing anything. I do not know if it works.
    just like this:(also in ./data, there is the same file)
    QQๅ›พ็‰‡20221128153352
    by doing this, the 'NOT FOUND" problem disappeared, but it have the new problem----"/bin/sh: 1: ./ROUGE-1.5.5.pl: Permission denied"
    20221128155246

I want to know how to solve it.

2.how to train it more quickly
I did not modify any parameter, I just run "python train.py". And it takes me around 1 hour to train one epoch.
I want to kown if there is some way to change the implementation to make it quicker.

The first question is siginificant. And I would appreciate it if you can give me an early reply.

Rouge score

Hello, @ymfa
I am now training this model. Could you tell me what is the ideal rouge score? Thank you.

About rl_loss

Thank you for this excellent job, I still have some questions about rl_loss, rl_loss = neg_reward * sample_out.loss, the neg_reward is obtained by greedy_rouge - sample_rouge, and the sample_out.loss means the cross-entropy loss, it is equal to -LogP(). However, in the paper, self-critical policy gradient training algorithm uses LogP(), this confused me, could you please explain this?


Update
I have read SeqGAN code from SeqGAN, according to the policy gradient, the loss is computed as loss += -out[j][target.data[i][j]]*reward[j], out means Log_softmax, so the author adds "-" to using gradient descent later.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.