hysonlab / videberta Goto Github PK

View Code? Open in Web Editor NEW

52.0 52.0 9.0 397.35 MB

ViDeBERTa: A powerful pre-trained language model for Vietnamese, EACL 2023

Home Page: https://aclanthology.org/2023.findings-eacl.79.pdf

License: MIT License

Jupyter Notebook 98.32% Shell 0.02% Python 1.66%

bert language-model large-language-models vietnamese-nlp

videberta's People

Contributors

Stargazers

Watchers

Forkers

dadelani tdtrinh11 trinh-hoang-hiep daotranbk fsoft-aic justtuananh xuanphu108 ltn18 thviet79

videberta's Issues

Question about v3 pretraining code of DeBERTa

Hi @DaoTranbk and @HyTruongSon,

many thanks for open sourcing the repo for ViDeBERTa!

I'm very interested in the v3 pretraining of a DeBERTa model. In the current version of the pretraining code, I can see that the normal DeBERTa package is called:

ViDeBERTa/pre-training/bash/pre-train_model.sh

Line 13 in 8270cce

CUDA_VISIBLE_DEVICES=1 python -m DeBERTa.apps.run \

However, the publicly available DeBERTa code does not yet include the support of Gradient Disentangled Embedding Sharing (GDES), see e.g.: microsoft/DeBERTa#93.

Did you modify the code to add support for GDES? I would highly be interested in that implementation.

Many thanks and cheers,

Stefan

Will you guys release model checkpoint publicly ?

Issue when loading model checkpoints

Hi there,
When i load your model checkpoint base or xsmall, i see this warning
`Some weights of the model checkpoint at Fsoft-AIC/videberta-base were not used when initializing DebertaV2Model: ['mask_predictions.LayerNorm.weight', 'deberta.embeddings.word_embeddings._weight', 'mask_predictions.LayerNorm.bias', 'mask_predictions.dense.weight', 'mask_predictions.classifier.weight', 'mask_predictions.classifier.bias', 'mask_predictions.dense.bias']

This IS expected if you are initializing DebertaV2Model from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing DebertaV2Model from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
`
it's seem to missing embeddings layer checkpoint when loading model !

Missing file data preprocess and Bug in training task MRC

Hi.
I am trying to use the ViDeBERTa model to refine an MRC task on a ViQuAD dataset. However, according to the provided code, file Finetuning/QA/extractive-qa-mrc/utils/preprocess.py is missing.

Then, I used the load_dataset function of the datasets library instead, and got this error during model training.

model_checkpoint = "Fsoft-AIC/videberta-base"
model = RobertaForQuestionAnswering.from_pretrained(model_checkpoint)

model_name = model_checkpoint.split("/")[-1]
args = TrainingArguments(
    f"{model_name}-finetuned-quad2.0",
    num_train_epochs=2.0,
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    warmup_ratio=0.05,
    weight_decay=0.01,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    load_best_model_at_end=True,
    save_strategy="epoch",
    save_total_limit=5,
    # do_train = True,
    # do_eval = False,
    #change the number of training epochs to get a better result
    #push_to_hub=True,
)

from transformers import default_data_collator
data_collator = default_data_collator

trainer = Trainer(
    model,
    args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_valid,
    data_collator=data_collator,
    tokenizer=tokenizer,
)

Looking forward to getting an answer to solve this problem.

Where is the Model?

Hello,

I noticed that the model code seems to be missing from the repository. I understand that this may limit the functionality and potential usefulness of the repository. Would it be possible to kindly provide an update on the status of the model code or if there are any plans to include it in the future?

Thank you for your attention to this matter, and I appreciate your help in resolving this.

Reproduce MRC task

I want to reproduce the MRC task result on ViSquad 1. dataset. I tried using your code but there are some problems:

ViSquad dataset: there are still some samples with wrong answers, and what did you do to deal with it:
For example:
{
"context": "Ngày 22-7-1954 , Chủ_tịch Hồ_Chí_Minh ra lời kêu_gọi : " Đấu_tranh để củng_cố hoà_bình , thực_hiện thống_nhất , hoàn_thành độc_lập dân_chủ cũng là một cuộc đấu_tranh lâu_dài và gian_khổ " và khẳng_định : " Trung , Nam , Bắc đều là bờ_cõi của nước ta , nước ta nhất_định thống_nhất , đồng_bào cả nước nhất_định được giải_phóng " . Cũng trong ngày này Thủ_tướng Quốc_gia Việt_Nam Ngô_Đình_Diệm ra_lệnh treo cờ rủ toàn Miền Nam từ vĩ_tuyến 17 trở vào để bày_tỏ quan_điểm phản_đối sự chia đôi đất_nước . Tuy_nhiên , trưởng_đoàn đại_biểu Việt_Nam Dân_chủ Cộng_hoà đã lên_tiếng : " Những_ai yêu nước Việt_Nam , những_ai yêu sự thống_nhất Việt_Nam thì không cần phải khóc hôm_nay . Hãy thực_hiện tốt những gì đã ký_kết hôm_nay , thì 2 năm nữa sẽ có một nước Việt_Nam thống_nhất , độc_lập , hoà_bình và giàu_mạnh . Những gì Chính_phủ Việt_Nam Dân_chủ Cộng_hoà làm trong những năm qua chính là vì mục_đích đó . Nước_mắt của chúng_tôi đổ ra trong cuộc đấu_tranh cho sự_nghiệp đó nhiều hơn rất nhiều so với những giọt lệ mà quý_vị ( Quốc_gia Việt_Nam ) nhỏ ra ở đây " .",
"question": "Mục_đích đấu_tranh của Chính_phủ Việt_Nam Dân_chủ Cộng_Hoà là gì ?",
"answers": {
"answer_start": [
-1
],
"answer_end": [
-1
],
"text": [
"Việt_Nam thống_nhất , độc_lập , hoà_bình và giàu mạn"
]
}
},
With pyvi lib, how did you segment? just segment each context, each answer seperately? Because I have problem doing that like below:
{
"context": "Nguồn_gốc của Mặt_Trăng hiện_nay còn chưa chắc_chắn , mặc_dù đa_số bằng_chứng tồn_tại ủng_hộ giả_thuyết sự va_chạm dữ_dội . Trái_Đất có_thể không phải là hành_tinh duy_nhất được tạo thành ở khoảng_cách 150 triệu km từ Mặt_trời . Một giả_thuyết cho rằng một tập_hợp vật_chất khác với khoảng_cách 150 triệu km từ cả Trái_Đất và Mặt_trời , ở điểm Lagrange thứ tư hay thứ năm . Hành_tinh này được gọi là Theia , nó được cho là nhỏ hơn so với Trái_Đất lúc đó , có_lẽ có cùng kích_thước và khối_lượng như Sao_Hoả . Quỹ_đạo của nó ban_đầu là ổn_định nhưng về sau khi Trái_Đất ngày_càng có khối_lượng lớn hơn khi thu_thập thêm vật_chất ở xung_quanh , thì quỹ_đạo của Theia trở_nên bất_ổn_định . Theia đu_đưa tới_lui theo Trái_Đất cho tới khi , cuối_cùng , cách nay khoảng 4.533 tỷ năm ( có_lẽ 0 giờ 05 phút đêm theo giờ cái đồng_hồ của chúng_ta ) , nó va_chạm vào Trái_Đất theo một góc thấp và chéo . Tốc_độ chậm và góc nhỏ không đủ để nó tiêu_diệt Trái_Đất , nhưng một tỷ_lệ lớn lớp vỏ của nó bị bắn ra . Những phần_tử nặng từ Theia chìm sâu vào vỏ Trái_Đất , trong khi những phần còn lại và vật_chất phóng ra tập_hợp lại thành một vật_thể duy_nhất trong vài tuần . Dưới ảnh_hưởng của trọng_lực của chính nó , có_lẽ trong một năm , nó trở_thành một vật_thể có hình_cầu : là Mặt_Trăng . Sự va_chạm cũng được cho rằng đã làm thay_đổi trục của Trái_Đất làm nó nghiêng đi 23,5 ° , trục_quay nghiêng gây ra mùa trên Trái_Đất . ( Một hình_thức lý_tưởng và đơn_giản về nguồn_gốc hành_tinh sẽ có các trục nghiêng 0 ° và không gây ra mùa . ) Có_thể nó cũng đã làm tốc_độ quay của Trái_Đất tăng thêm và khởi_động những kiến_tạo địa_tầng .",
"question": "Mặt_Trăng có kích_thước gần giống với hành_tinh nào trong hệ Mặt_Trời ?",
"answers": {
"answer_start": [
-1,
-1,
-1,
-1
],
"answer_end": [
-1,
-1,
-1,
-1
],
"text": [
"Sao Hoả",
"Sao Hoả",
"Sao Hoả",
"Sao Hoả"
]
}
},

Correct me if im wrong. Wish you could provide more details or your code so that I can reproduce the results. Thanks in advance!

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.