Comments (2)
Hi @keloemma,
Question 1, 2, and 3: The goal of the script extract_split_cls.py
is to prepare data in the proper format for the fine-tuning script. In particular, it extracts the text, get the labels, and split the training set into train and dev sets since there is no dev set provided separately for this task. So you should modify or create your own script to process your data if your data is different from the CLS dataset.
The format of the output files depend on the task. If you use Hugging Face's transformer
library for finetuning, you can check out the details of the output format here if your task is similar to the ones in the GLUE benchmark and prepare your data accordingly.
There is also a "parse.py" script , At which level is it used in the flow of the architecture ?
It is included in the downloaded data but we do not use it in our code.
About the finetuning, what is the utility of the config file ? How is it obtained ?
The configuration file is used to save different training parameters. You can run your experiments with different configurations using the same running command. For more details, you can check out the configuration files in the examples and see the parameters. These parameters are used in the running command.
from flaubert.
I assume that you got your answer @keloemma ?
from flaubert.
Related Issues (20)
- Typo in corpus download HOT 1
- Datasets: EnronSent, Le Monde, PCT HOT 2
- Model names HOT 1
- fastBPE HOT 2
- weird results on new task using flaubert-large-cased model HOT 1
- Update about the coming soon FLUE tasks HOT 3
- Pretraining with News Crawls by WMT 19 HOT 1
- Finetuning on FLUE HOT 12
- Filling masks HOT 4
- Using the Flaubert for Translation HOT 1
- RuntimeError: [enforce fail at CPUAllocator.cpp:64] . DefaultCPUAllocator: can't allocate memory: you tried to allocate 237414383616 bytes. Error code 12 (Cannot allocate memory) HOT 3
- Different Categories for CLS dataset for French HOT 2
- Extracting word embeddings from FlauBERT HOT 1
- Will lemmatization negatively affect FlauBERT word embeddings? HOT 9
- Pre-training FlauBERT with Google Colab? HOT 1
- Continued training of FlauBERT (with --reload_model) -- Question about vocab size HOT 1
- Import error in extract_split_cls.py - No module name tools HOT 1
- From pytorch model (with hugging_face library) to XLM model HOT 1
- D'euro j'dis lĂ du son dis bdia d dis fin dis djaifbfl j'dis. L'as bais. Eiabduua bsujsvd HOT 1
- Example de paraphrase HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
đ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. đđđ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google â¤ď¸ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flaubert.