ymfa / seq2seq-summarizer Goto Github PK
View Code? Open in Web Editor NEWPointer-generator reinforced seq2seq summarization in PyTorch
License: MIT License
Pointer-generator reinforced seq2seq summarization in PyTorch
License: MIT License
Good job. How about rouge scores?
The code in make_cnndm_data.py throw me this error:
File "c:/Users/mmiin/pointer_gen/data/makecnndmdata.py", line 116, in
fout.write(" ".join(text) + "\t" + " ".join(summary) + "\n")
File "C:\Users\mmiin\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2032' in position 614: character maps to
How to fix this issue ?
sorry, but how can I get data/.vector_cache/glove.6B.100d.txt?
class Example(NamedTuple): src: List[str] tgt: List[str] src_len: int # inclusive of EOS, so that it corresponds to tensor shape tgt_len: int # inclusive of EOS, so that it corresponds to tensor shape
code above implemented nothing with class Example
The author of pointer-generator propose a method called coverage mechanism.
Coverage vector, which is the sum of attention distributions over all previous decoder timesteps, but the coverage vector in this repository seems to sum the attention of encoder!
Please help me find the correct method to implementation the mechanism or tell me where is my fault.
Currently results of the beam search are sorted based on the loss :
sorted(results, key=lambda h: -h.avg_log_prob)[:beam_size]
(here)
And then rouge score taken is the score corresponding to the first result :
r1 += scores[0]['1_f']
(here)
Why such a decision ? Aren't we looking for a better ROUGE score (we don't really care of the loss at testing time, do we ?) ?
I'm trying to run your (very great) code using a basic seq2seq with attention architecture, in order to get rouge scores.
In order to get a basic seq2seq architecture, I used following parameters :
python train.py --vocab_size 50000 --hidden_size 256 --dec_hidden_size 256 --embed_size 128 --pointer False --enc_attn_cover False --cover_loss 0 --optimizer adagrad --lr 0.15 --batch_size 16 --n_batches 2000 --val_batch_size 16 --n_val_batches 200 --grad_norm 2 --min_out_len 40 --max_out_len 120 --embed_file None
(removing coverage mecanism, pointer network, RL, and setting same parameters as described in the paper Get to the Point)
I'm training on CNN / DailyMail dataset.
My problem is : training is way too long.
It takes 6sec per iteration (training on GPU), with a batch size of 16 it means 17 500 iterations / epochs => 29h per epoch.
In the paper, they trained the model for 33epochs, which would mean more than a month of training time with this implementation.
Why is it so slow to train ? How can I fix it ?
Hello,
I hope you are doing good.
I was going through your code and found it pretty interesting. There are lot of things which I can learn from this. It would be really helpful if you can also describe some installation steps or starting steps about format of data and how to train this model and what are the software requirements of this model.
Thank you.
i am new in pythin & nlp, so i am not sure how to solve this problem.
i will appreciate you if you can tell me how to solve that problem
FileNotFoundError: [Errno 2] No such file or directory: '/Users/******/Downloads/seq2seq-summarizer-master/data/cnndm/all_train.txt'
Hello
Do you think you can include the pre-trained models anytime soon ? And how long did it take to train your model ?
i want to ask about the end_token of the generating summarization.
for example. extras needed for film that starts island in the island ; ' for rise again '' film '' sowing learning learning tonight year in '' production : starts - island -- in that tonight production - in - extras : 40,000 production : extras commissioner island , , year -- starts stunning tonight production 40,000 production -- in in later -- -- -- island -- -- ) -- -- -- launch also in later -- are month -- -- -- -- bahama ) -- -- production -- extras year fulton , rookie 2 ; bakersfield prohibition . easy ,
The sentence is so long and has many redundancies , i dont know what i should do to let the decoder to learn to stop the sentence. Is it the mechanism of beam search? i want to ask for author`s help. thanks a lot.
Hello, i have a trouble.
when i train the model by using the data of google.
the trainn loss is decrease, but the valid loss is increase at about 6. it differs greatly.
i dont know what is wrong.
And the results are alway have same words in a sentence. Such as pia has so transported over hajj hajj & new hartford man charged with public lewdness lewdness i am confused whether the coverage work.
I'm really looking for an effective word-level summarization solution. It isn't clear to me how to turn off the "generator" part of the pointer-generator network.
Let me know if this is possible and how I can achieve this.
I'm now running this model on Colab, but there is one problem about rouge evaluation.
Reading dataset data/cnndm.gz... 287089 pairs.
Vocabulary loaded, 30004 words.
29584 pre-trained embeddings loaded.
Reading dataset data/cnndm.val.gz... 13366 pairs.
Training 3609207 trainable parameters...
Epoch 1: 0% 0/1000 [00:00<?, ?it/s]/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py:1639: UserWarning: nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.
warnings.warn("nn.functional.sigmoid is deprecated. Use torch.sigmoid instead.")
Epoch 1: 100% 1000/1000 [1:11:31<00:00, 4.29s/it, loss=4.84341]
Valid 1: 0% 0/100 [00:00<?, ?it/s]/bin/sh: 1: ./ROUGE-1.5.5.pl: not found
Traceback (most recent call last):
File "train.py", line 232, in <module>
train(train_gen, v, m, p, val_gen, train_status)
File "train.py", line 144, in train
show_cover_loss=params.show_cover_loss)
File "/content/drive/My Drive/Mylab/seq2seq-summarizer/test.py", line 72, in eval_batch
scores = rouge(gold_summaries, decoded_batch)
File "/content/drive/My Drive/Mylab/seq2seq-summarizer/utils.py", line 395, in rouge
shell=True, cwd=os.path.join(this_dir, 'data'))
File "/usr/lib/python3.6/subprocess.py", line 356, in check_output
**kwargs).stdout
File "/usr/lib/python3.6/subprocess.py", line 438, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command './ROUGE-1.5.5.pl -e data -a -n 2 -2 4 -u /tmp/tmp9eh8e5z4/task.xml' returned non-zero exit status 127.
Valid 1: 0% 0/100 [00:03<?, ?it/s]
I would appreciate you if you could give me some help.
I run these codes on Google Colab.
I want to know how to solve it.
2.how to train it more quickly
I did not modify any parameter, I just run "python train.py". And it takes me around 1 hour to train one epoch.
I want to kown if there is some way to change the implementation to make it quicker.
The first question is siginificant. And I would appreciate it if you can give me an early reply.
Hello, @ymfa
I am now training this model. Could you tell me what is the ideal rouge score? Thank you.
Thank you for this excellent job, I still have some questions about rl_loss, rl_loss = neg_reward * sample_out.loss
, the neg_reward
is obtained by greedy_rouge - sample_rouge
, and the sample_out.loss
means the cross-entropy loss, it is equal to -LogP()
. However, in the paper, self-critical policy gradient training algorithm uses LogP()
, this confused me, could you please explain this?
Update
I have read SeqGAN code from SeqGAN, according to the policy gradient, the loss is computed as loss += -out[j][target.data[i][j]]*reward[j]
, out means Log_softmax, so the author adds "-" to using gradient descent later.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.