Giter Site home page Giter Site logo

Comments (17)

robinsongh381 avatar robinsongh381 commented on August 22, 2024 4

I have faced the same repeating issues when training korean model.
After some amount of research, I have found that this is a general problem for natural language generation and known as degeneration.

I have added an extra module for the decoder, replacing beam search.

Please let me know if anyone is interested in it

Thanks

from presumm.

robinsongh381 avatar robinsongh381 commented on August 22, 2024 2

@colanim
Sorry for late reply
I have replace the decoder with the following method which proposes a new way of sampling tokens at decoding steps, rather than just depending on beam search.

The paper has suggested two methods and they are implemented on here

From my experience, I could avoid the repetition issue with the proposed method and hence improve the ROUGE score !

Hope my opinion helps

from presumm.

rajeshsahu09 avatar rajeshsahu09 commented on August 22, 2024

mine also getting the same results. All output are same

from presumm.

chen1234yue avatar chen1234yue commented on August 22, 2024

me too.

from presumm.

nlpyang avatar nlpyang commented on August 22, 2024

Please paste your training commands here

from presumm.

rajeshsahu09 avatar rajeshsahu09 commented on August 22, 2024

python3 train.py -task abs -mode train -bert_data_path bert_data/ -dec_dropout 0.2 -model_path model_abs/ -sep_optim true -lr_bert 0.002 -lr_dec 0.2 -save_checkpoint_steps 2000 -batch_size 140 -train_steps 200000 -report_every 50 -accum_count 5 -use_bert_emb true -use_interval true -warmup_steps_bert 20000 -warmup_steps_dec 10000 -max_pos 512 -visible_gpus 0 -log_file abs_bert_cnndm

from presumm.

nlpyang avatar nlpyang commented on August 22, 2024

With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.

from presumm.

rajeshsahu09 avatar rajeshsahu09 commented on August 22, 2024

SIr, But I have only one gpu. Can't the training be effective on that.

from presumm.

nlpyang avatar nlpyang commented on August 22, 2024

You can use our Trained Models.

from presumm.

cuthbertjohnkarawa avatar cuthbertjohnkarawa commented on August 22, 2024

With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.

so you need to have how many gpu for training??

from presumm.

astariul avatar astariul commented on August 22, 2024

For extractive summarization, the author trained the model on 3 GPU.

For abstractive summarization, the author trained the model on 4 GPU for 2 days.

from presumm.

astariul avatar astariul commented on August 22, 2024

@robinsongh381 We are interested !

So you replaced beam search and got better results ?

from presumm.

astariul avatar astariul commented on August 22, 2024

Thanks for the message !

Do you remember (approximately) how big is the difference in ROUGE score ?

from presumm.

cuthbertjohnkarawa avatar cuthbertjohnkarawa commented on August 22, 2024

@colanim
Sorry for late reply
I have replace the decoder with the following method which proposes a new way of sampling tokens at decoding steps, rather than just depending on beam search.

The paper has suggested two methods and they are implemented on here

From my experience, I could avoid the repetition issue with the proposed method and hence improve the ROUGE score !

Hope my opinion helps

can share your results ??

from presumm.

Shanzaay avatar Shanzaay commented on August 22, 2024

Hi. I am using the pre-trained models for testing on CNN dataset.
This is the command I am giving:
python train.py -task abs -mode test -test_from ~/Downloads/cnndm_baseline_best.pt -batch_size 3000 -test_batch_size 500 -bert_data_path ../bert_data/test -log_file ../logs/val_abs_bert_cnndm -sep_optim true -use_interval true -visible_gpus 0 -max_pos 512 -max_length 200 -alpha 0.95 -min_length 50 -result_path ../logs/abs_bert_cnndm

But my candidate result is same as the very first single file i used. After that it isn't changing. What am i doing wrong?

from presumm.

Rumi4 avatar Rumi4 commented on August 22, 2024

while training on 1 gpu we need to set the grad accum count greater than 5, u said. how much should that be? Please help.

With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.

from presumm.

wbchief avatar wbchief commented on August 22, 2024

python3 train.py -task abs -mode train -bert_data_path bert_data/ -dec_dropout 0.2 -model_path model_abs/ -sep_optim true -lr_bert 0.002 -lr_dec 0.2 -save_checkpoint_steps 2000 -batch_size 140 -train_steps 200000 -report_every 50 -accum_count 5 -use_bert_emb true -use_interval true -warmup_steps_bert 20000 -warmup_steps_dec 10000 -max_pos 512 -visible_gpus 0 -log_file abs_bert_cnndm

Using a single gpu generation of sentence results, do you solve the problem?

from presumm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.