Giter Site home page Giter Site logo

Error with checkpoint about mlconvgec2018 HOT 7 CLOSED

nusnlp avatar nusnlp commented on August 19, 2024
Error with checkpoint

from mlconvgec2018.

Comments (7)

shamilcm avatar shamilcm commented on August 19, 2024

The run.sh only works for pre-trained models. If you are going to generate using your pre-trained model, run with the correct bin directory, i.e. the
mlconvgec2018/training/processed/bin
instead of

/home/michelle/mlc/mlconvgec2018/models/data_bin

What is the number of lines in mlconvgec2018/training/processed/bin/dict.src.txt and mlconvgec2018/training/processed/bin/dict.trg.txt ?

Also, when you run generate.py on trained model, just use a single model (checkpoint_best.pt would be a good choice). It is not good to include all checkpoints from 1 to 10 while decoding. If you want to do ensembling, either choose 2-4 last checkpoints, or run training multiple times from beginning and use the checkpoint_best.pt 's from all training runs.

from mlconvgec2018.

michellegiang avatar michellegiang commented on August 19, 2024

Hi Shamil,

Thank you for your support !

  1. If I generate using my pertained model, I just use your run.sh but replace "$MODEL_DIR/data_bin" by my model dir (ex mlconvgec2018/training/processed/bin) right ?
  2. The number of lines in dict.src.txt is 28795 and dict.trg.txt is 28980
  3. My development file is CoNLL 2013 test file, so should I use generate.py or generate.py -i ( seem generate.py works with binary file and generate.py -i work with raw file and CoNLL 2013 test file is binary)
  4. I don't know why when use a single model it has below error ( while it works well if I replace "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt"by "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed")

./run.sh "/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src" "/home/michelle/mlc/test" 2 "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt"

++ source paths.sh
+++++ dirname paths.sh
++++ cd .
++++ pwd
+++ BASE_DIR=/home/michelle/mlc/mlconvgec2018
+++ DATA_DIR=/home/michelle/mlc/mlconvgec2018/data
+++ MODEL_DIR=/home/michelle/mlc/mlconvgec2018/models
+++ SCRIPTS_DIR=/home/michelle/mlc/mlconvgec2018/scripts
+++ SOFTWARE_DIR=/home/michelle/mlc/mlconvgec2018/software
++ '[' 4 -ge 4 ']'
++ input_file=/home/michelle/mlc/mlconvgec2018/data/test/conll14st-test/conll14st-test.tok.src
++ output_dir=/home/michelle/mlc/test
++ device=2
++ model_path=/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt
++ '[' 4 -eq 6 ']'
++ '[' -d /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt ']'
++ '[' -f /home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt ']'
++ model=/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model1.pt
++ FAIRSEQPY=/home/michelle/mlc/mlconvgec2018/software/fairseq-py
++ NBEST_RERANKER=/home/michelle/mlc/mlconvgec2018/software/nbest-reranker
++ beam=12
++ nbest=12
++ threads=12
++ mkdir -p /home/michelle/mlc/test
++ /home/michelle/mlc/mlconvgec2018/scripts/apply_bpe.py -c /home/michelle/mlc/mlconvgec2018/models/bpe_model/train.bpe.model
++ CUDA_VISIBLE_DEVICES=2
++ python3.6 /home/michelle/mlc/mlconvgec2018/software/fairseq-py/generate.py --no-progress-bar --path --beam 12 --nbest 12 --interactive --workers 12 /home/michelle/mlc/mlconvgec2018/models/data_bin
usage: generate.py [-h] [--no-progress-bar] [--log-interval N] [--seed N]
--path FILE [-s SRC] [-t TARGET] [-j N] [--max-positions N]
[-i] [--batch-size N] [--gen-subset SPLIT] [--beam N]
[--nbest N] [--max-len-a N] [--max-len-b N] [--remove-bpe]
[--no-early-stop] [--unnormalized] [--cpu]
[--no-beamable-mm] [--lenpen LENPEN]
[--unk-replace-dict UNK_REPLACE_DICT]
DIR
generate.py: error: argument --path: expected one argument

from mlconvgec2018.

shamilcm avatar shamilcm commented on August 19, 2024

If I generate using my pertained model, I just use your run.sh but replace "$MODEL_DIR/data_bin" by my model dir (ex mlconvgec2018/training/processed/bin) right ?

Yes. If you generate using a model that you trained (which uses the bin files that you generated with preprocess), then you have to use the same binary directory for decoding. This is because the dictionary files of the model that you trained and the model that you use for generation must match.

The number of lines in dict.src.txt is 28795 and dict.trg.txt is 28980

This indicates that the training data you used contains less than 30K unique tokens. It would be better if you used a lower vocabulary size for your model so that it can learn a better estimate for the UNK token. Otherwise, all the tokens that you see during training will be a known vocabulary word. The data used in the paper is a concatenation of Lang8 v2 and NUCLE, which has more than 30K unique BPE tokens. Hence, a vocab of 30K was okay.

My development file is CoNLL 2013 test file, so should I use generate.py or generate.py -i ( seem generate.py works with binary file and generate.py -i work with raw file and CoNLL 2013 test file is binary)

If you want to decode a file that is in your bin/ directory, i.e. test or valid file, then you can use generate.py without -i flag. However, if you want to decode the file from raw text file as done in the run.sh script use generate.py -i.

I don't know why when use a single model it has below error ( while it works well if I replace "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed/model4.pt"by "/home/michelle/mlc/mlconvgec2018/models/mlconv_embed")

Sorry, there was a typo in the run.sh script. I have now fixed it here f7b9fca. Thanks.

from mlconvgec2018.

michellegiang avatar michellegiang commented on August 19, 2024

Thank you very much !

So could you please let me know if I need to increase my unique tokens to 30K ? I mean, how can I solve the "RuntimeError: inconsistent tensor size, expected tensor [30004 x 500] and src [28799 x 500]"

This indicates that the training data you used contains less than 30K unique tokens. It would be better if you used a lower vocabulary size for your model so that it can learn a better estimate for the UNK token. Otherwise, all the tokens that you see during training will be a known vocabulary word. The data used in the paper is a concatenation of Lang8 v2 and NUCLE, which has more than 30K unique BPE tokens. Hence, a vocab of 30K was okay.

from mlconvgec2018.

shamilcm avatar shamilcm commented on August 19, 2024

So could you please let me know if I need to increase my unique tokens to 30K ? I mean, how can I solve the "RuntimeError: inconsistent tensor size, expected tensor [30004 x 500] and src [28799 x 500]"

I believe this problem is due to using incorrect bin/ directory. Isn't the problem solved when you use the bin directory within training/?

from mlconvgec2018.

shamilcm avatar shamilcm commented on August 19, 2024

Is the issue solved?

from mlconvgec2018.

michellegiang avatar michellegiang commented on August 19, 2024

Hi Shamil,

thank you for your support. It is solved. I've finished training and test successfully

Regards,
Michelle

from mlconvgec2018.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.