ymfa / seq2seq-summarizer Goto Github PK

Pointer-generator reinforced seq2seq summarization in PyTorch

License: MIT License

Python 100.00%

pytorch seq2seq attention-mechanism pointer-network reinforcement-learning abstractive-summarization coverage-mechanism attention coverage summarizer

seq2seq-summarizer's People

Contributors

Stargazers

Watchers

Forkers

bellowzoom johndpope didivassi fendaq zhujunnan likeucode trendingtechnology hrlinlp walden2013 xuhuiren sys1874 angeluau ziheliu mastercaojie totalgood chenbingxiayu tomlinml liyingjiao02 zhuang-li lanlbn summercu madehong anich zmd971202 stalkermustang neineis legendtianjin hapengzi gsj1029 sotheara-leang bakszero lzr9926 mqrshiyan hiteshkalwani wp0517 remarquecal chengli0327 jbot19k-fork caoxu915683474 qianrenjian sarwar187 violetguos hommy112 skydownacai nikolausn iamxpy zhenpingli modriczhou shiyuzh2007 dingdonged yelzhang huangtc2022 irshadbhat alwaysbetter1314 nidhins96 wobudapai haojiepan1 jimmy5217 lunnada songbohu 692845067 sameerreddy13 1561693140 dattatreya303 olufemi-olumaiyegun ntcuong2103 techthiyanes jus1mple xiaoqingnlp solo-zhang linjian-pa

seq2seq-summarizer's Issues

./ROUGE-1.5.5.pl: Permission denied

I run the code in GPU server,but when run the rouge to value the result ,the error come out:
/bin/sh: 1: ./ROUGE-1.5.5.pl: Permission denied

Do you know what happen? I cann't solve it .

How about results

Good job. How about rouge scores?

UnicodeEncodeError: 'charmap' codec can't encode character

The code in make_cnndm_data.py throw me this error:

File "c:/Users/mmiin/pointer_gen/data/makecnndmdata.py", line 116, in
fout.write(" ".join(text) + "\t" + " ".join(summary) + "\n")
File "C:\Users\mmiin\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2032' in position 614: character maps to

How to fix this issue ?

word vector

sorry, but how can I get data/.vector_cache/glove.6B.100d.txt?

The code in utils is so weird, Can you give some explanation?

class Example(NamedTuple): src: List[str] tgt: List[str] src_len: int # inclusive of EOS, so that it corresponds to tensor shape tgt_len: int # inclusive of EOS, so that it corresponds to tensor shape

code above implemented nothing with class Example

The implementation of coverage mechanism seems to be wrong.

The author of pointer-generator propose a method called coverage mechanism.
Coverage vector, which is the sum of attention distributions over all previous decoder timesteps, but the coverage vector in this repository seems to sum the attention of encoder!
Please help me find the correct method to implementation the mechanism or tell me where is my fault.

Why beam search results are sorted based on loss, not on ROUGE scores

Currently results of the beam search are sorted based on the loss :

sorted(results, key=lambda h: -h.avg_log_prob)[:beam_size] (here)

And then rouge score taken is the score corresponding to the first result :

r1 += scores[0]['1_f'] (here)

Why such a decision ? Aren't we looking for a better ROUGE score (we don't really care of the loss at testing time, do we ?) ?

The computaion of neg_reward is wrong

This code uses batch-averaged (sample_rouge - baseline rouge), but it don't make sense in math and this item should be sample-wise because what we really want to maximize is this:

I can't get the data to train. Can you give me the data or the address of your data?

Rouge score / training for basic seq2seq attention : too slow training

I'm trying to run your (very great) code using a basic seq2seq with attention architecture, in order to get rouge scores.

In order to get a basic seq2seq architecture, I used following parameters :

python train.py --vocab_size 50000 --hidden_size 256 --dec_hidden_size 256 --embed_size 128 --pointer False --enc_attn_cover False --cover_loss 0 --optimizer adagrad --lr 0.15 --batch_size 16 --n_batches 2000 --val_batch_size 16 --n_val_batches 200 --grad_norm 2 --min_out_len 40 --max_out_len 120 --embed_file None

(removing coverage mecanism, pointer network, RL, and setting same parameters as described in the paper Get to the Point)

I'm training on CNN / DailyMail dataset.

My problem is : training is way too long.

It takes 6sec per iteration (training on GPU), with a batch size of 16 it means 17 500 iterations / epochs => 29h per epoch.

In the paper, they trained the model for 33epochs, which would mean more than a month of training time with this implementation.

Why is it so slow to train ? How can I fix it ?

Steps for implementation

Hello,
I hope you are doing good.
I was going through your code and found it pretty interesting. There are lot of things which I can learn from this. It would be really helpful if you can also describe some installation steps or starting steps about format of data and how to train this model and what are the software requirements of this model.

Thank you.

when i try run the script make_cnndm_data.py it tells a error

i am new in pythin & nlp, so i am not sure how to solve this problem.
i will appreciate you if you can tell me how to solve that problem

FileNotFoundError: [Errno 2] No such file or directory: '/Users/******/Downloads/seq2seq-summarizer-master/data/cnndm/all_train.txt'

Images for README.md

pre-trained models or time for training

Hello

Do you think you can include the pre-trained models anytime soon ? And how long did it take to train your model ?

the end of summarization

i want to ask about the end_token of the generating summarization.
for example. extras needed for film that starts island in the island ; ' for rise again '' film '' sowing learning learning tonight year in '' production : starts - island -- in that tonight production - in - extras : 40,000 production : extras commissioner island , , year -- starts stunning tonight production 40,000 production -- in in later -- -- -- island -- -- ) -- -- -- launch also in later -- are month -- -- -- -- bahama ) -- -- production -- extras year fulton , rookie 2 ; bakersfield prohibition . easy ,

The sentence is so long and has many redundancies , i dont know what i should do to let the decoder to learn to stop the sentence. Is it the mechanism of beam search? i want to ask for author`s help. thanks a lot.

the valid loss will increase

Hello, i have a trouble.
when i train the model by using the data of google.
the trainn loss is decrease, but the valid loss is increase at about 6. it differs greatly.
i dont know what is wrong.
And the results are alway have same words in a sentence. Such as pia has so transported over hajj hajj & new hartford man charged with public lewdness lewdness i am confused whether the coverage work.

How to turn off generator (make it a pointer only network)

I'm really looking for an effective word-level summarization solution. It isn't clear to me how to turn off the "generator" part of the pointer-generator network.

Let me know if this is possible and how I can achieve this.

One problem about rouge

I'm now running this model on Colab, but there is one problem about rouge evaluation.

Reading dataset data/cnndm.gz... 287089 pairs.
Vocabulary loaded, 30004 words.
29584 pre-trained embeddings loaded.
Reading dataset data/cnndm.val.gz... 13366 pairs.
Training 3609207 trainable parameters...
Epoch 1:   0% 0/1000 [00:00<?, ?it/s]/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
  warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Epoch 1: 100% 1000/1000 [1:11:31<00:00,  4.29s/it, loss=4.84341]
Valid 1:   0% 0/100 [00:00<?, ?it/s]/bin/sh: 1: ./ROUGE-1.5.5.pl: not found
Traceback (most recent call last):
  File "train.py", line 232, in <module>
    train(train_gen, v, m, p, val_gen, train_status)
  File "train.py", line 144, in train
    show_cover_loss=params.show_cover_loss)
  File "/content/drive/My Drive/Mylab/seq2seq-summarizer/test.py", line 72, in eval_batch
    scores = rouge(gold_summaries, decoded_batch)
  File "/content/drive/My Drive/Mylab/seq2seq-summarizer/utils.py", line 395, in rouge
    shell=True, cwd=os.path.join(this_dir, 'data'))
  File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command './ROUGE-1.5.5.pl -e data -a -n 2 -2 4 -u /tmp/tmp9eh8e5z4/task.xml' returned non-zero exit status 127.
Valid 1:   0% 0/100 [00:03<?, ?it/s]

I would appreciate you if you could give me some help.

I have some question about Rouge and how to train it more quickly?

I run these codes on Google Colab.

about Rouge
I download ROUGE-1.5.5.pl from network and I just put this file under the folder without doing anything. I do not know if it works.
just like this:(also in ./data, there is the same file)

by doing this, the 'NOT FOUND" problem disappeared, but it have the new problem----"/bin/sh: 1: ./ROUGE-1.5.5.pl: Permission denied"

I want to know how to solve it.

2.how to train it more quickly
I did not modify any parameter, I just run "python train.py". And it takes me around 1 hour to train one epoch.
I want to kown if there is some way to change the implementation to make it quicker.

The first question is siginificant. And I would appreciate it if you can give me an early reply.

Rouge score

Hello, @ymfa
I am now training this model. Could you tell me what is the ideal rouge score? Thank you.

About rl_loss

Thank you for this excellent job, I still have some questions about rl_loss, rl_loss = neg_reward * sample_out.loss, the neg_reward is obtained by greedy_rouge - sample_rouge, and the sample_out.loss means the cross-entropy loss, it is equal to -LogP(). However, in the paper, self-critical policy gradient training algorithm uses LogP(), this confused me, could you please explain this?

Update
I have read SeqGAN code from SeqGAN, according to the policy gradient, the loss is computed as loss += -out[j][target.data[i][j]]*reward[j], out means Log_softmax, so the author adds "-" to using gradient descent later.