Hi, I have this problem when run python generate.py with the old ver

Hi Shamil, Thank you for your support ! If I

Error with checkpoint about mlconvgec2018 HOT 7 CLOSED

nusnlp commented on August 19, 2024

Error with checkpoint

from mlconvgec2018.

Comments (7)

shamilcm commented on August 19, 2024

The run.sh only works for pre-trained models. If you are going to generate using your pre-trained model, run with the correct bin directory, i.e. the
mlconvgec2018/training/processed/bin
instead of

/home/michelle/mlc/mlconvgec2018/models/data_bin

What is the number of lines in mlconvgec2018/training/processed/bin/dict.src.txt and mlconvgec2018/training/processed/bin/dict.trg.txt ?

Also, when you run generate.py on trained model, just use a single model (checkpoint_best.pt would be a good choice). It is not good to include all checkpoints from 1 to 10 while decoding. If you want to do ensembling, either choose 2-4 last checkpoints, or run training multiple times from beginning and use the checkpoint_best.pt 's from all training runs.

from mlconvgec2018.

michellegiang commented on August 19, 2024

Hi Shamil,

Thank you for your support !

If I generate using my pertained model, I just use your run.sh but replace "$MODEL_DIR/data_bin" by my model dir (ex mlconvgec2018/training/processed/bin) right ?
The number of lines in dict.src.txt is 28795 and dict.trg.txt is 28980
My development file is CoNLL 2013 test file, so should I use generate.py or generate.py -i ( seem generate.py works with binary file and generate.py -i work with raw file and CoNLL 2013 test file is binary)
I don't know why when use a single model it has below error ( while it works well if I replace "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt"by "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed")

./run.sh "/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src" "/home/michelle/mlc/test" 2 "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt"

++ source paths.sh
+++++ dirname paths.sh
++++ cd .
++++ pwd
+++ BASE_DIR=/home/michelle/mlc/mlconvgec2018
+++ DATA_DIR=/home/michelle/mlc/mlconvgec2018/data
+++ MODEL_DIR=/home/michelle/mlc/mlconvgec2018/models
+++ SCRIPTS_DIR=/home/michelle/mlc/mlconvgec2018/scripts
+++ SOFTWARE_DIR=/home/michelle/mlc/mlconvgec2018/software
++ '[' 4 -ge 4 ']'
++ input_file=/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src
++ output_dir=/home/michelle/mlc/test
++ device=2
++ model_path=/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt
++ '[' 4 -eq 6 ']'
++ '[' -d /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt ']'
++ '[' -f /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt ']'
++ model=/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt
++ FAIRSEQPY=/home/michelle/mlc/mlconvgec2018/software/fairseq-py
++ NBEST_RERANKER=/home/michelle/mlc/mlconvgec2018/software/nbest-reranker
++ beam=12
++ nbest=12
++ threads=12
++ mkdir -p /home/michelle/mlc/test
++ /home/michelle/mlc/mlconvgec2018/scripts/apply_bpe.py -c /home/michelle/mlc/mlconvgec2018/models/bpe_model/train.bpe.model
++ CUDA_VISIBLE_DEVICES=2
++ python3.6 /home/michelle/mlc/mlconvgec2018/software/fairseq-py/generate.py --no-progress-bar --path --beam 12 --nbest 12 --interactive --workers 12 /home/michelle/mlc/mlconvgec2018/models/data_bin
usage: generate.py [-h] [--no-progress-bar] [--log-interval N] [--seed N]
--path FILE [-s SRC] [-t TARGET] [-j N] [--max-positions N]
[-i] [--batch-size N] [--gen-subset SPLIT] [--beam N]
[--nbest N] [--max-len-a N] [--max-len-b N] [--remove-bpe]
[--no-early-stop] [--unnormalized] [--cpu]
[--no-beamable-mm] [--lenpen LENPEN]
[--unk-replace-dict UNK_REPLACE_DICT]
DIR
generate.py: error: argument --path: expected one argument

from mlconvgec2018.

shamilcm commented on August 19, 2024

If I generate using my pertained model, I just use your run.sh but replace "$MODEL_DIR/data_bin" by my model dir (ex mlconvgec2018/training/processed/bin) right ?

Yes. If you generate using a model that you trained (which uses the bin files that you generated with preprocess), then you have to use the same binary directory for decoding. This is because the dictionary files of the model that you trained and the model that you use for generation must match.

The number of lines in dict.src.txt is 28795 and dict.trg.txt is 28980

This indicates that the training data you used contains less than 30K unique tokens. It would be better if you used a lower vocabulary size for your model so that it can learn a better estimate for the UNK token. Otherwise, all the tokens that you see during training will be a known vocabulary word. The data used in the paper is a concatenation of Lang8 v2 and NUCLE, which has more than 30K unique BPE tokens. Hence, a vocab of 30K was okay.

My development file is CoNLL 2013 test file, so should I use generate.py or generate.py -i ( seem generate.py works with binary file and generate.py -i work with raw file and CoNLL 2013 test file is binary)

If you want to decode a file that is in your bin/ directory, i.e. test or valid file, then you can use generate.py without -i flag. However, if you want to decode the file from raw text file as done in the run.sh script use generate.py -i.

I don't know why when use a single model it has below error ( while it works well if I replace "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt"by "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed")

Sorry, there was a typo in the run.sh script. I have now fixed it here f7b9fca. Thanks.

from mlconvgec2018.

michellegiang commented on August 19, 2024

Thank you very much !

So could you please let me know if I need to increase my unique tokens to 30K ? I mean, how can I solve the "RuntimeError: inconsistent tensor size, expected tensor [30004 x 500] and src [28799 x 500]"

from mlconvgec2018.

shamilcm commented on August 19, 2024

So could you please let me know if I need to increase my unique tokens to 30K ? I mean, how can I solve the "RuntimeError: inconsistent tensor size, expected tensor [30004 x 500] and src [28799 x 500]"

I believe this problem is due to using incorrect bin/ directory. Isn't the problem solved when you use the bin directory within training/?

from mlconvgec2018.

shamilcm commented on August 19, 2024

Is the issue solved?

from mlconvgec2018.

michellegiang commented on August 19, 2024

Hi Shamil,

thank you for your support. It is solved. I've finished training and test successfully

Regards,
Michelle

from mlconvgec2018.

Error with checkpoint about mlconvgec2018 HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent