csebuetnlp / crosssum Goto Github PK

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Summarization for 1,500+ Language Pairs" published in Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (ACL’23), July 9-14, 2023.

Python 91.73% Shell 0.62% Jupyter Notebook 7.31% Makefile 0.02% Dockerfile 0.06% Jsonnet 0.01% CSS 0.06% JavaScript 0.19%

cross-lingual-summarization cross-lingual-transfer multilingual-nlp

crosssum's Issues

Flan-t5 for training?

Does CrossSum support Flan-t5 for summarization???

Finetuning on HF using Seq2SeqTrainer.

I need to finetune csebuetnlp/mT5_m2m_crossSum using the Seq2SeqTrainer on HuggingFace. What special symbols need to be inserted during tokenization of input-output pairs?

Questions about replicating m2m model

Hi there, happy new year and thanks for releasing the code for your nice work! I have some questions about the training configs and I hope you could clarify them for me.

Specifically, I would like to replicate the released m2m model and I am using the provided trainer.sh. As I use two A100s, I set PER_DEVICE_TRAIN_BATCH_SIZE=32 so as to keep the effective batch size to 256. I keep the rest configs intact.

Then I do inference on the Chinese Simplified-English test set with the checkpoint at the 25K step. It gets 21.65 on ROUGE-L, while the released m2m model gets 26.75.

After inspecting the model outputs, I found that the model of my replication sometimes generates summaries of non-target languages. For example, for the Chinese Simplified-English test set, around 10% of the generated summaries are Chinese, while the released model is able to generate only English summaries. This may explain the above performance gap.

Another observation is that the above-mentioned problem is more severe with the checkpoint at the 20K step, so I wonder if this is due to underfitting, and it may vanish with more training steps (e.g., 30K). I have not validated this assumption yet as I would like to adopt your original training configs if possible.

I would appreciate it if you could shed some light on how to correctly replicate your m2m model. Are there any particular training configs that I should adopt? It would also help if you could share which checkpoint (training steps) the released m2m model is.

Many thanks!

Some questions about training settings

Dear authors:
I have read your paper for many times, it's a awesome work!
I tried to reproduce your model for my future works. However, I can't get the same result as you provided models.
Would you like to share the training settings so that I can better reproduce your work? Such like: training steps, learning rate and scheduler type choice.
Thanks a lot!

Enhanced model

Hi,

Thank you for this great contribution to the problem of cross-lingual summarization.

I evaluated both mT5_m2m_crossSum and mT5_m2m_crossSum_enhanced in a few language pairs and confirmed that the enhanced model actually has significantly higher ROUGE scores. I am curious to learn more about the specific training process of the enhanced model. Was the improvement primarily the result of an extensive hyperparameter search, or did you employ any additional training techniques that were not detailed in the paper?

Thank you in advance for your response.

Best,
Diogo

Something about Input and output

Hello, I have read your paper 4 times, awesome works!
I got confused about the input and output. If we want to summarize an article that uses English, into French and Chinese. Can we use the same model to do that? Or Train two models, one is English-French, another is English-Chinese?

How was pearson correlation calculated in the experiments ?

Hi, thanks for the great works !

There are a bit of details regarding correlation of LaSE in the paper that I did not quite understand. For each target language, the top-5 source languages were used to evaluate LaSE's correlation with ROUGE-2 for out-lang scenario.

Let's assume those 5 languages are lan1, lan2 .. lan5, with the target language being tgt_lan0. I'm assuming that the procedure is like this: generate summaries in tgt_lan with 5 src_lans to obtain 5 prediction sets pred1, pred2 .. pred5. Aggregate those prediction sets, and evaluate LaSE score with references similar to source languages (ref1, ref2 .. ref5), then calculate ROUGE-2 with ref0. In total, we have len(pred1)+len(pred2) + .. len(pred5) scores each for LaSE and ROUGE-2.

After this, we calculate pearson correlation based on two 1D arrays formed of these two lines of scores. Is this interpretation correct ? If it is, since scores of different references-predictions pair might differ (e.g. a similar score of 0.5 might be bad for certain pairs, but considered good for some other pairs), do you think aggregating them this way is suboptimal ?

Could you help clarify this @Tahmid04 ?

csebuetnlp / crosssum Goto Github PK

crosssum's Issues

Flan-t5 for training?

Finetuning on HF using Seq2SeqTrainer.

Questions about replicating m2m model

Some questions about training settings

Enhanced model

Something about Input and output

How was pearson correlation calculated in the experiments ?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent