hello! first thanks for your contribution. when I try to test test the BertSu

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

the candidate results of all the samples are the same about presumm HOT 17 OPEN

myjxm commented on August 22, 2024

the candidate results of all the samples are the same

from presumm.

Comments (17)

robinsongh381 commented on August 22, 2024 4

I have faced the same repeating issues when training korean model.
After some amount of research, I have found that this is a general problem for natural language generation and known as degeneration.

I have added an extra module for the decoder, replacing beam search.

Please let me know if anyone is interested in it

Thanks

from presumm.

robinsongh381 commented on August 22, 2024 2

@colanim
Sorry for late reply
I have replace the decoder with the following method which proposes a new way of sampling tokens at decoding steps, rather than just depending on beam search.

The paper has suggested two methods and they are implemented on here

From my experience, I could avoid the repetition issue with the proposed method and hence improve the ROUGE score !

Hope my opinion helps

from presumm.

rajeshsahu09 commented on August 22, 2024

mine also getting the same results. All output are same

from presumm.

chen1234yue commented on August 22, 2024

me too.

from presumm.

nlpyang commented on August 22, 2024

Please paste your training commands here

from presumm.

rajeshsahu09 commented on August 22, 2024

python3 train.py -task abs -mode train -bert_data_path bert_data/ -dec_dropout 0.2 -model_path model_abs/ -sep_optim true -lr_bert 0.002 -lr_dec 0.2 -save_checkpoint_steps 2000 -batch_size 140 -train_steps 200000 -report_every 50 -accum_count 5 -use_bert_emb true -use_interval true -warmup_steps_bert 20000 -warmup_steps_dec 10000 -max_pos 512 -visible_gpus 0 -log_file abs_bert_cnndm

from presumm.

nlpyang commented on August 22, 2024

With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.

from presumm.

rajeshsahu09 commented on August 22, 2024

SIr, But I have only one gpu. Can't the training be effective on that.

from presumm.

nlpyang commented on August 22, 2024

You can use our Trained Models.

from presumm.

cuthbertjohnkarawa commented on August 22, 2024

With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.

so you need to have how many gpu for training??

from presumm.

astariul commented on August 22, 2024

For extractive summarization, the author trained the model on 3 GPU.

For abstractive summarization, the author trained the model on 4 GPU for 2 days.

from presumm.

astariul commented on August 22, 2024

@robinsongh381 We are interested !

So you replaced beam search and got better results ?

from presumm.

astariul commented on August 22, 2024

Thanks for the message !

Do you remember (approximately) how big is the difference in ROUGE score ?

from presumm.

cuthbertjohnkarawa commented on August 22, 2024

@colanim
Sorry for late reply
I have replace the decoder with the following method which proposes a new way of sampling tokens at decoding steps, rather than just depending on beam search.

The paper has suggested two methods and they are implemented on here

From my experience, I could avoid the repetition issue with the proposed method and hence improve the ROUGE score !

Hope my opinion helps

can share your results ??

from presumm.

Shanzaay commented on August 22, 2024

Hi. I am using the pre-trained models for testing on CNN dataset.
This is the command I am giving:
python train.py -task abs -mode test -test_from ~/Downloads/cnndm_baseline_best.pt -batch_size 3000 -test_batch_size 500 -bert_data_path ../bert_data/test -log_file ../logs/val_abs_bert_cnndm -sep_optim true -use_interval true -visible_gpus 0 -max_pos 512 -max_length 200 -alpha 0.95 -min_length 50 -result_path ../logs/abs_bert_cnndm

But my candidate result is same as the very first single file i used. After that it isn't changing. What am i doing wrong?

from presumm.

Rumi4 commented on August 22, 2024

while training on 1 gpu we need to set the grad accum count greater than 5, u said. how much should that be? Please help.

With only 1 gpu for training, you need to accumulate the gradient for a much larger step, or the model cannot be trained effectively.

from presumm.

wbchief commented on August 22, 2024

python3 train.py -task abs -mode train -bert_data_path bert_data/ -dec_dropout 0.2 -model_path model_abs/ -sep_optim true -lr_bert 0.002 -lr_dec 0.2 -save_checkpoint_steps 2000 -batch_size 140 -train_steps 200000 -report_every 50 -accum_count 5 -use_bert_emb true -use_interval true -warmup_steps_bert 20000 -warmup_steps_dec 10000 -max_pos 512 -visible_gpus 0 -log_file abs_bert_cnndm

Using a single gpu generation of sentence results, do you solve the problem?

from presumm.

the candidate results of all the samples are the same about presumm HOT 17 OPEN

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent