Comments (3)
Hi James!
Although this was something we never considered, we did have some requests about that and we added that functionality. This is the reason why all of the examples assume you would have more data. However, as I mentioned, you can train with just src, mt and TER.
To do so you need to specify the following in your config file (besides the rest of the parameters):
sentence-level: True
predict-gaps: False
predict-target: False
predict-source: False
Keep in mind that there are some config options based on word-level tags that might not work when training just for sentence-level.
Let us know if you find any errors while training only with sentences!
Miguel
from openkiwi.
Closing this since there have been no updates, feel free to re-open if you have further questions!
from openkiwi.
Thanks Miguel. I successfully trained a predictor-estimator model on WMT data following your advice. I did this using a modified version of the config file in the experiments directory. In case it's useful, here are the modifications I made:
OpenKiwi/experiments/train_estimator.yaml
Line 32 in 715eba7
OpenKiwi/experiments/train_estimator.yaml
Line 46 in 715eba7
OpenKiwi/experiments/train_estimator.yaml
Line 108 in 715eba7
OpenKiwi/experiments/train_estimator.yaml
Line 114 in 715eba7
from openkiwi.
Related Issues (20)
- TypeError: cannot unpack non-iterable NoneType object HOT 1
- The prediction process is not complete by Predictor Estimator. HOT 5
- OpenKiwi always download the tokenizer files for XLMRoberta even if a local path is configured. HOT 2
- Do openKiwi have confident score? HOT 1
- Error Pre-Training Predictor: "model -> encoder -> encode_source extra fields not permitted (type=value_error.extra)" HOT 1
- some confusions
- pkgutil.iter_modules() error: 'PosixPath' object has no attribute 'startswith'
- Got exception when import kiwi
- Seems that maximum token support for a sentence is 512?
- PicklingError: Can't pickle <class 'kiwi.data.encoders.wmt_qe_data_encoder.InputFields[PositiveInt]'>: attribute lookup InputFields[PositiveInt] on kiwi.data.encoders.wmt_qe_data_encoder failed HOT 2
- Do you need to tokenize your data when using a BERT/ROBERTA model?
- Pretrain config file
- What are source_pos and target_pos in the train_config.yaml?
- Why does it need "--model" paramter when I give a specific config? HOT 2
- What languages do the OpenKiwi support?
- some problems about data without alignments HOT 11
- I suppose that the code comment should be remove. HOT 2
- Error at Predictor Training: "Predictor is not a subclass of QESystem" HOT 2
- OSError: Can't load weights for 'xlm-roberta-base'. HOT 16
- open cannot unpack non-iterable NoneType object HOT 16
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from openkiwi.