I use trained model testing conll2014-test, output four files are input.bpe.txt,output

Use interactive.py instead of <code class="notranslat

Oh ok. The version of Fairseq-py in the download.sh compiles only on previous v

If you use the recent version of Fairseq-py (which uses PyTorch 0

About testing models about mlconvgec2018 HOT 12 CLOSED

nusnlp commented on July 19, 2024

About testing models

from mlconvgec2018.

Comments (12)

renhongkai commented on July 19, 2024 1

1、The data set is provided by the teacher，include Lang-8 and NUCLE(version 3.2), 5458 sentence pairs from NUCLE, is taken out to be used as the development data. The training data include 132M sentence pairs .
2、I will try use interactive.py instead of generate.py.
3、You mean I need to turn the test set(conll14st-test.tok.src file) into the --testpref by run the command
python3.5 $FAIRSEQPY/preprocess.py --source-lang src --target-lang trg --trainpref processed/train --validpref processed/dev --testpref processed/dev --nwordssrc 30000 --nwordstgt 30000 --destdir processed/bin
4、Can you explain what the /training/processed/bin directory is for?

5、If I use version of Fairseq-py (which uses PyTorch 0.2.0), Do I need to compile and install pytorch from source? Instead of installing via pip？ And Do other parameters need to be changed?

from mlconvgec2018.

shamilcm commented on July 19, 2024 1

Use interactive.py instead of generate.py to decode the test set if you are using the latest Fairseq-py version. I was saying that alternatively, you can use generate.py itself if you had used conll14st-test for --testpref while doing preprocessing. The reason, I believe, is that in the current Fairseq-py, generate.py automatically uses the test.src-trg.{src,trg}.{bin,idx} files within processed/bin directory to perform decoding. And interactive.py decodes any input file that is passed through standard input.

The training/processed/bin directory contains the binarized and indexed versions of the training, development and test datasets for faster loading during training, validation and testing. Also, it contains the vocabulary files (dict.src.txt and dict.trg.txt).
Yes, I had to compile Pytorch from source since the Fairseq-py version that I used required the ATen library which was only available on the github version of PyTorch and not in the official release back then.

from mlconvgec2018.

shamilcm commented on July 19, 2024

Use the output.tok.txt file. We use the M2scorer, which is the standard scorer used for evaluating the CoNLL 2014 shared task systems. Note that the evaluation on some sentences can take long time with the standard scorer.

from mlconvgec2018.

renhongkai commented on July 19, 2024

Thank you very much. And I encountered a new problem : the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file 。the number of sentences in the output.tok.txt file is 5458, it is the same as validation set. Can you help me ?
I would be obliged if you could reply me at your earlist convenience.Thanks a lot in advance for your time and attention.

from mlconvgec2018.

renhongkai commented on July 19, 2024

I used command : ./run.sh ./data/test/conll14st-test/conll14st-test.tok.src ./data/test/conll14st-test/output 0 ./training/models/mlconv/model1000

$SCRIPTS_DIR/apply_bpe.py -c $TRAINING_DIR/models/bpe_model/train.bpe.model < $input_file > $output_dir/input.bpe.txt

running fairseq on the test data --workers $threads $MODEL_DIR/data_bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt

CUDA_VISIBLE_DEVICES=$device python3.5 $FAIRSEQPY/generate.py --no-progress-bar --path $models --beam $beam --nbest $beam --workers $threads $TRAINING_DIR/processed/bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt --skip-invalid-size-inputs-valid-test

from mlconvgec2018.

shamilcm commented on July 19, 2024

The flag --interactive is necessary while running fairseq on a custom input test set.

CUDA_VISIBLE_DEVICES=$device python3.5 $FAIRSEQPY/generate.py --no-progress-bar --path $models --beam $beam --nbest $beam --interactive --workers $threads $MODEL_DIR/data_bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt

from mlconvgec2018.

renhongkai commented on July 19, 2024

Thanks a lot in advance for your time and attention. I summed up the questions I encountered. I think this is a problem of version.
First, I use the sofeware directory download.sh file download fairseq-py (github: https://github.com/shamilcm/fairseq-py), but when I run the command “python setup.py build”，there are a error : cffi.error.VerificationError: CompileError: command 'x86_64-linux-gnu-gcc' failed with exit status 1.
so I change the fairseq-py version (**github : https://github.com/facebookresearch/fairseq-py.git**),this error did not appear.
But then I found a problem: the parameter does not correspond. When I run the command "./run.sh ./data/test/conll14st-test/conll14st-test.tok.src ./data/test/conll14st-test/output 0 ./training/models/mlconv/model1000",there are two error : **generate.py: error: unrecognized arguments: --interactive ,**so I remove the flag --interactive .
Another error is：Exception: Sample #10 has size (src=1, dst=1) but max size is 1022. Skip this example with --skip-invalid-size-inputs-valid-test，so I add the flag --skip-invalid-size-inputs-valid-test. Then the order can be successfully implemented ,but the the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file.
Can you help me ?Thank you very much.

from mlconvgec2018.

shamilcm commented on July 19, 2024

Oh ok. The version of Fairseq-py in the download.sh script compiles only on previous version of PyTorch (PyTorch 0.2.0) that is compiled from source.

In the recent version of fairseq-py, the developers have replaced generate.py --interactive with a different script interactive.py

https://github.com/facebookresearch/fairseq-py/blob/master/interactive.py

from mlconvgec2018.

renhongkai commented on July 19, 2024

1、So, you mean I can use pytorch (0.3.0) and remove the flag --interactive ？How should I solve the mistake of the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file ?
2、I tested with pre-trained models, I run the command "./run.sh ./data/test/conll14st-test/conll14st-test.m2 ./log/ 0 ./models/mlconv_embed/ eolm" ,there are a same error: the number of sentences in the output.tok.txt file different form the number of sentences in the conll14st-test.tok.src file.
Thank you very much.

from mlconvgec2018.

shamilcm commented on July 19, 2024

If you use the recent version of Fairseq-py (which uses PyTorch 0.3.0), you should use the script interactive.py (https://github.com/facebookresearch/fairseq-py/blob/master/interactive.py) instead of generate.py.
If you run run.sh with the recent version of Fairseq-py and not the one mentioned in download.sh script, you may encounter this error. This is because generate.py does not have the --interactive flag anymore. I believe, it will use the test set within the processed/bin directory and not the one that is provided through standard input. In our training script, we pass the development data itself to the --testpref flag. See:

mlconvgec2018/training/preprocess.sh

Line 41 in 3f270bc

    
           python3.5 $FAIRSEQPY/preprocess.py --source-lang src --target-lang trg --trainpref processed/train --validpref processed/dev --testpref  processed/dev --nwordssrc 30000 --nwordstgt 30000 --destdir processed/bin

Btw, where did you obtain the 5458 sentences development set from? Did you download and process the training data yourself ?

from mlconvgec2018.

NikhilCherian commented on July 19, 2024

@renhongkai @shamilcm

Hello again. I can also trying to test the models using run.sh. But, ran into the same problem. I want to get the m2 scores, which is not in run.sh. The output would be output.bpe.nbest.txt .How to get those scores with the trained models?
I will follow the new fairseq implementation.

Any help is appreciated.
Thanks

from mlconvgec2018.

YoonJeongLulu commented on July 19, 2024

Thank you for the wonderful source code.
I have a favor to ask of you.
The only GPU I can use is...
Colab GPU. Therefore, I couldn't do pretrain myself, so I wanted to use the pre-trained one.

https://tinyurl.com/yd6wvhgw/mlconvgec2018/models

Can I download test.src-trg.src.bin, test.src-trg.src.idx, etc. in addition to dict.src.txt, which is published in the link above?

I am referring to https://github.com/kanekomasahiro/bert-gec.

from mlconvgec2018.

About testing models about mlconvgec2018 HOT 12 CLOSED

Comments (12)

running fairseq on the test data --workers $threads $MODEL_DIR/data_bin < $output_dir/input.bpe.txt > $output_dir/output.bpe.nbest.txt

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent