Comments (2)
If you have a sufficient number of labelled examples you should be able to finetune the multilingual
model directly (might want to check if XLM-Roberta was trained on your language). Details on how to train this model are in the README, you might need to create a data loader for your new dataset.
If you don't have enough labelled examples, you could translate the Jigsaw datasets used into the new language and retrain the model, although you would probably need to create a labelled test set to check the performance.
Hope this helps!
from detoxify.
@laurahanu I am trying to extend the multilingual model for the dutch language. Since I did not have any labelled examples, I have translated the Jigsaw Datasets to dutch. Now I have the following questions
-
Do we need to perform any pre processing on the translated datasets?
-
The instructions in the
readme
file only show the command for training the model with one of the config files, the stage 2 config is not used? Please clarify if its used or not. -
Two sources are mentioned for the translated datasets, I am confused which one to use.
-
Since I don't have a test dataset for dutch comments, do I translate the entries in the
test.csv
files to include dutch comments?
Thanks a lot.
from detoxify.
Related Issues (20)
- Pinpoint the parts of the speech that trigger high values
- OSError: Unable to load vocabulary from file. Please check that the provided vocabulary is accessible and not corrupted. HOT 1
- Number of epochs to get the best model HOT 2
- Progress Bar HOT 5
- TypeError: 'NoneType' object is not subscriptable HOT 8
- UnicodeDecodeError when installing from git
- Add dutch language HOT 1
- Question regards use case HOT 1
- Question - Adding additional models and labels. HOT 1
- Question regards training with other models HOT 1
- Error during training HOT 1
- Multi GPU predict
- Add license information for toxic-bert on HF HOT 7
- Converting model to AWS Inferentia hardware using Optimum-cli
- FileNotFoundError: [Errno 2] No such file or directory: 'jigsaw_data/jigsaw-toxic-comment-classification-challenge/val.csv'
- What are the max token lengths for the models?
- Error on PIP install HOT 6
- Bump up Transformers version HOT 2
- "torch" error by just importing Detoxify HOT 1
- OSError: None is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from detoxify.