How would one add a new language to the existing set? How would one extended what is

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Adding a New Language and Extending Previous Languages about detoxify HOT 2 OPEN

unitaryai commented on May 20, 2024

Adding a New Language and Extending Previous Languages

from detoxify.

Comments (2)

laurahanu commented on May 20, 2024

If you have a sufficient number of labelled examples you should be able to finetune the multilingual model directly (might want to check if XLM-Roberta was trained on your language). Details on how to train this model are in the README, you might need to create a data loader for your new dataset.

If you don't have enough labelled examples, you could translate the Jigsaw datasets used into the new language and retrain the model, although you would probably need to create a labelled test set to check the performance.

Hope this helps!

from detoxify.

SaadAhmed433 commented on May 20, 2024

@laurahanu I am trying to extend the multilingual model for the dutch language. Since I did not have any labelled examples, I have translated the Jigsaw Datasets to dutch. Now I have the following questions

Do we need to perform any pre processing on the translated datasets?
The instructions in the readme file only show the command for training the model with one of the config files, the stage 2 config is not used? Please clarify if its used or not.
Two sources are mentioned for the translated datasets, I am confused which one to use.
Since I don't have a test dataset for dutch comments, do I translate the entries in the test.csv files to include dutch comments?

Thanks a lot.

from detoxify.

Recommend Projects

Adding a New Language and Extending Previous Languages about detoxify HOT 2 OPEN

Comments (2)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent