Comments (18)
I tend to see this kind of output in the earlier phases of training (i.e. when the model is still under-trained). Look at the loss curve on tensorboard -- has the loss decreased much? It may be that the model needs further training.
from pointer-generator.
Thanks @abisee , seems you are right. I found the loss is still high, the model needs more training steps.
from pointer-generator.
@abisee: thank you and this is a great piece of work.
@fishermanff, are you able to tell what a high loss is like? Going with instructions, I am running train and eval concurrently. Is it correct? Also, suggestions on when to stop the training please? Is it ok to stop when the loss stops reducing any further? thank you.
from pointer-generator.
@makcbe Yes, the eval
mode is designed to be run concurrently with train
mode. The idea is you can see the loss on the validation set plotted alongside the loss on the training set in Tensorboard, helping you to spot overfitting etc.
About when to stop training: there's no easy answer for this. You might keep training until you find that the loss on the validation set is not reducing any more. You might find that after some time your validation set loss starts to rise while your training set loss reduces further (overfitting). In that case you want to stop training. If your loss function has gone flat you can try lowering your learning rate.
In any case you should run decode
mode and look at some generated summaries. The visualization tool will make this much more informative.
from pointer-generator.
@makcbe hi, the author has answered the right things. I have run the training mode over 16.77k steps (see from tensorboard), the loss value is about 4.0; then I found the generated summaries have some correct outputs but the total performance is still far away from the ACL results. Hence, I think further steps are needed.
from pointer-generator.
Both, thank you for the support and that's definitely useful.
from pointer-generator.
Hi @fishermanff @abisee ,
when I trained for 3k steps, I saw the generated summary began to repeat the first sentence of the whole text. Did that happen to you?
Thanks.
from pointer-generator.
Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.
from pointer-generator.
@lilyzl what's the loss, does it converge? see in tensorboard.
from pointer-generator.
@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!
from pointer-generator.
@lilyzl maybe you can stop decoding in your code when the decoder reach STOP token.
from pointer-generator.
Hi @lilyzl
- Yes, repetition is very common (it is one of the two big things we are aiming to fix as noted in the ACL paper). That's what the
coverage
setting is for - to reduce repetition. - Yes, the generated summary length is variable. It's generated using beam search. Essentially it keeps producing tokens until it produces the STOP token. I'm not sure why your decoded summaries are all length 30 if your minimum length is 30. Have a look at the code in
beam_search.py
.
from pointer-generator.
Hi @abisee
I have trained the model for 80k steps, and then I press Ctrl+C to terminate the training process. I am confused that if the variables saved in logs/train/ file would be restored automatically when I rerun run_summarization.py in 'train' mode? Or I need to add some code like "tf.train.Saver.restore()" by myself to restore the pre-train variables?
from pointer-generator.
Hi @fishermanff
Yes, running run_summarization.py
in train mode should restore your last training checkpoint. I think it's handled by the supervisor.
from pointer-generator.
Thanks @abisee , copy that
from pointer-generator.
Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.
@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!
hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot
from pointer-generator.
Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot
Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot
I have also same kind of problem. If you have any solution then suggest me.
from pointer-generator.
Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot
Hi @fishermanff @abisee,
when I trained to 40k steps. The results turn to be INFO:tensorflow:GENERATED SUMMARY: [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] [UNK] ....
Just want to make sure further training will make it better.@fishermanff Thanks for replying.
The [UNK] results are due to the NAN in the loss. I fixed it based on previous issue solutions.
Another question I have is that is the generated summary length variable? I set the minimum length to 30, then all results becomes 30 tokens. How should I deal with that?
Thanks a lot!hello, i also get [UNK] in mu SUMMARY result. could you tell me how to solve this problem? i found nothing in previous issues. Thanks a lot
I have also same kind of problem. If you have any solution then suggest me.
Did anyone solve this UNK problem?
from pointer-generator.
Related Issues (20)
- when I run the "beam search decoding", an exception happened, please help! HOT 3
- Can i apply this pretrained model to summarize news
- model 第282行 HOT 2
- the UNK problem HOT 4
- is there something wrong in beam_search.py? HOT 3
- train 和eval时间 HOT 6
- Have anyone transfer it to Python3 version? HOT 1
- 有用transformer做中文摘要的吗,出现了重复输出一个字的问题,考虑怎么在transformer中加上Coverage Mechanism,有做过的吗,欢迎交流q975669552 HOT 3
- NAN source? HOT 1
- Getting a batch error due to sentences being longer than expected HOT 1
- Question about coverage mechanism implementation
- Problem with flags
- why decoder produce same generated summary ? HOT 4
- the time of decode seems too long HOT 1
- DuplicateFlagError: The flag 'data_path' is defined twice. First from run_summarization.py, Second from run_summarization.py. Description from first occurrence: Path expression to tf.Example datafiles. Can include wildcards to access multiple datafiles.
- How to fine-tune pre-trained model on a smaller dataset?
- Same Generated Summary in Decode mode
- implementing n-gram repeat blocking
- Module Queue not found
- TypeError: unsupported operand type(s) for *: 'int' and 'Flag'
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pointer-generator.