Giter Site home page Giter Site logo

fairseq-laser's People

Contributors

raymondhs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

fairseq-laser's Issues

A general question: Why not using the multilingual_translation task directly?

Hello Raymond:

Long time no see!

I have a general question: Why not using the multilingual_translation task directly?
for example, would a setting like this take a similar effect? with a few modifications to the laser_lst model?
--task multilingual_translation --arch laser
--lang-pairs de-en,de-es,en-es,es-en,fr-en,fr-es
--share-decoders --share-decoder-input-output-embed \

I am asking this is because I also implemented a baseline of transformer structure with the setting below
--task multilingual_translation --arch multilingual_transformer_iwslt_de_en
--lang-pairs de-en,de-es,en-es,es-en,fr-en,fr-es
--share-decoders --share-decoder-input-output-embed \

What I want to achieve is to compare the difference between LASER's LSTM structure with Transformer's attention based structure. When I doing this, I find the only major difference between the laser task and multilingual_translatio task is that the former uses a "multi_corpus_sampled_dataset" based laser data while the latter uses a "round_robin_zip_datasets dataset. And multilingual_translation provides a more general purposed setting.

So would there be a performance difference between round_robin_zip and multi_corpus_sampled
method in this task? I think the sampling is uniform, therefore, in theory, they should be roughly the same?

Why do you want to implement a LASER task instead of directly using the default multilingual_translation task provided by fairseq, and setting encoder and decoder shared by all languages and using the same dictionary? Do I miss something?

Thank you very much!

I run into error when loading a previously trained model.

Is this codes still compatible with recent fairseq?
I can train the model with the codes, (but with only one GPU card, 2 cards will cause an error)
I also run into error when loading a previously trained model. I cannot produce embeddings because of that.

I also get a warning say the FairseqMultiModel is deprecated, is that the cause of the error?

RuntimeWarning: overflow encountered in reduce ret = umr_sum(arr, axis, dtype, out, keepdims)

Hi:

When I run bucc.sh
No test file in the dataset, and F score is 0.
Should I copy the de-en.sample.en and rename it to de-en.test.en?

It seems also encount an overflow problem:

RuntimeWarning: overflow encountered in reduce ret = umr_sum(arr, axis, dtype, out, keepdims)
I train the model with FP32, while the original script use FP16 in training, is there any problem with that.

log:
#=========================================================================
#=========================================================================

`- extract from tar bucc2018-de-en.sample-gold.tar.bz2

  • extract from tar bucc2018-de-en.training-gold.tar.bz2
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.dev in en
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.dev in de
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train in en
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train in de
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.test in en
    cat: /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.test.en: No such file or directory
    cat: /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.test.en: No such file or directory
  • extract files /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.test in de
    cat: /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.test.de: No such file or directory
    cat: /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.test.de: No such file or directory
    Loading vocabulary from europarl_en_de_es_fr/bpe.40k/vocab ...
    Read 677693430 words (40248 unique) from vocabulary file.
    Loading codes from europarl_en_de_es_fr/bpe.40k/codes ...
    Read 40000 codes from the codes file.
    Namespace(all_gather_list_size=16384, beam=5, bpe=None, buffer_size=2000, cpu=False, criterion='cross_entropy', data='data-bin/europarl.de_en_es_fr.bpe40k/', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,de-es,en-es,es-en,fr-en,fr-es', left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='/4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.de', path='checkpoints/laser_lstm/checkpoint_last.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='de', target_lang='en', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='laser/', warmup_updates=0, weight_decay=0.0)
    | loading model(s) from checkpoints/laser_lstm/checkpoint_last.pt
    | Sentence buffer size: 2000
    Loading vocabulary from europarl_en_de_es_fr/bpe.40k/vocab ...
    Read 677693430 words (40248 unique) from vocabulary file.
    Loading codes from europarl_en_de_es_fr/bpe.40k/codes ...
    Read 40000 codes from the codes file.
    Namespace(all_gather_list_size=16384, beam=5, bpe=None, buffer_size=2000, cpu=False, criterion='cross_entropy', data='data-bin/europarl.de_en_es_fr.bpe40k/', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, diversity_rate=-1.0, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='-', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,de-es,en-es,es-en,fr-en,fr-es', left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='/4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.en', path='checkpoints/laser_lstm/checkpoint_last.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=False, source_lang='en', target_lang='es', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir='laser/', warmup_updates=0, weight_decay=0.0)
    | loading model(s) from checkpoints/laser_lstm/checkpoint_last.pt
    | Sentence buffer size: 2000
    LASER: tool to search, score or mine bitexts
  • knn will run on all available GPUs (recommended)
  • loading texts /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.txt.de: 413869 lines, 412909 unique
  • loading texts /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.txt.en: 399337 lines, 397151 unique
  • Embeddings: /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.de, 413869x1024
  • unify embeddings: 413869 -> 412909
  • Embeddings: /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.enc.en, 399337x1024
  • unify embeddings: 399337 -> 397151
  • perform 4-nn source against target
    /4tssd/wliax/anaconda3/lib/python3.7/site-packages/numpy/core/_methods.py:151: RuntimeWarning: overflow encountered in reduce
    ret = umr_sum(arr, axis, dtype, out, keepdims)
  • perform 4-nn target against source
  • mining for parallel data
  • scoring 412909 candidates
  • scoring 397151 candidates
  • writing alignments to /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.candidates.tsv
    LASER: tools for BUCC bitext mining
  • reading sentences and IDs
  • reading candidates /4tssd/wliax/research_2020/fairseq/bucc_data/embed/bucc2018.de-en.train.candidates.tsv
  • optimizing threshold on gold alignments /4tssd/wliax/research_2020/fairseq/bucc_data/bucc2018/de-en/de-en.training.gold
  • best threshold=0.000000: precision=0.00, recall=0.00, F1=0.00`

Exception: Size of sample #3996 is invalid (={'fr-en': (1619, 0)}) since max_positions={'fr-en': (1024, 1024)},

Hi:
I am testing the embedding on the MLdoc task and I have been trying to connect this embedding with the default MLdoc task in the LASER folder.

I insert a new function in the LASER mldoc.py
`def encode_file_lw(input_fn,output_fn,lang,buffer_size):

print ('enter encode_file_lw')

import argparse
parser = options.get_generation_parser(interactive=False)
parser.add_argument('--buffer_size',  type=int, required=True,
                    help='buffer_size')
parser.add_argument('--input', required=True,
                    help='input sentence file')
parser.add_argument('--output-file', required=True,
                    help='Output sentence embeddings')
parser.add_argument('--spm-model',
                    help='(optional) Path to SentencePiece model')
#parser = argparse.ArgumentParser(description='liwei mod.')
#input_fn='./data-bin/valid.de-en_liwtest.de'
data='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/data-bin/iwslt17.de_fr.en.bpe16k' 
path='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/checkpoints//laser_lstm5_newcodetest/checkpoint_best.pt'
#output_file='iwslt17.valid.de-en.de.enc'
spm_model='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/examples/translation/iwslt17.de_fr.en.bpe16k/sentencepiece.bpe.model' 
#buffer_size='4000'
batch_size='128'
if lang=='fr':
  tar_lang='en'
if lang=='de':
  tar_lang='en'
para_ls=[data,\
        '--input',input_fn, \
        '--task', 'translation_laser',\
        '--lang-pairs','de-en,fr-en',\
        '--path', path, \
        '--source-lang', lang,\
        '--target-lang', tar_lang ,\
        '--path',path,\
        '--buffer_size', str(buffer_size), \
        '--batch-size', batch_size, \
        '--output-file',output_fn, \
        '--spm-model', spm_model \
        ]
        #'--batch-size', batch_size, \
#foo_parser.parse_args(['--parent', '2', 'XXX'])
#args = parser.parse_args(para_ls)
args = options.parse_args_and_arch(parser,input_args=para_ls)
#print ('args ={}'.format(args))

embed_liwei.main(args)

print ('exit encode_file_lw')
return

`

And I also use it to replace the BPE function and embedding function in the original LASER processing process.


'''
print('\nProcessing:')

#print ('enter cli_main_lwtest')
#cli_main_lwtest()
#print ('exit cli_main_lwtest')

for part in ('train.1000', 'dev', 'test'):
    # for lang in "en" if part == 'train1000' else args.lang:
    for lang in args.lang:
        cfname = os.path.join(args.data_dir, 'mldoc.' + part)
        Token(cfname + '.txt.' + lang,
              cfname + '.tok.' + lang,
              lang=lang,
              romanize=(True if lang == 'el' else False),
              lower_case=True, gzip=False,
              verbose=args.verbose, over_write=False)
        SplitLines(cfname + '.tok.' + lang,
                   cfname + '.split.' + lang,
                   cfname + '.sid.' + lang)

      #BPEfastApply(cfname + '.split.' + lang,
                     cfname + '.split.bpe.' + lang,
                     args.bpe_codes,
                     verbose=args.verbose, over_write=False)

      #apply bpe that I replaced     


       #EncodeFile(enc,
                   cfname + '.split.bpe.' + lang,
                   cfname + '.split.enc.' + lang,
                   verbose=args.verbose, over_write=False,
                   buffer_size=args.buffer_size)

         #the encode that I replaced      

        encode_file_lw(input_fn=cfname + '.split.' + lang,output_fn= cfname + '.split.enc.' + lang,lang=lang,buffer_size=args.buffer_size)

       # I use this function to replace the BPE and encode, this function as cited above is a wrapper of your embedding function (the previous version)


        JoinEmbed(cfname + '.split.enc.' + lang,
                  cfname + '.sid.' + lang,
                  cfname + '.enc.' + lang)

It seems to be alright when producing English and German files, but I encountered this error when producing french files.
Do you have an idea what could cause this?

BTW: I saw you have updated the codes and readme.
In the previous version I used, it seems that your embedding function would take in normal txt file and convert it into embeddings.

However, from what I read in your bucc task, you apply BPE first in the current version of codes.
Does that mean If I use the version, I should also apply BPE first?

Extracting MLDoc data
LASER: calculate embeddings for MLDoc
 - loading encoder /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/LASER_lw/models/bilstm.93langs.2018-12-26.pt

Processing:
 - SplitLines: embed/mldoc.train.1000.split.de already exists
enter encode_file_lw
Namespace(beam=5, bpe=None, buffer_size=4000, cpu=False, criterion='cross_entropy', data='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/data-bin/iwslt17.de_fr.en.bpe16k', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='embed/mldoc.train.1000.split.de', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,fr-en', lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='embed/mldoc.train.1000.split.enc.de', path='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/checkpoints//laser_lstm5_newcodetest/checkpoint_best.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=True, source_lang='de', spm_model='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/examples/translation/iwslt17.de_fr.en.bpe16k/sentencepiece.bpe.model', target_lang='en', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir=None, warmup_updates=0, weight_decay=0.0)
| [de] dictionary: 13880 types
| [en] dictionary: 13880 types
| [fr] dictionary: 13880 types
| loading model(s) from /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/checkpoints//laser_lstm5_newcodetest/checkpoint_best.pt
/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/fairseq/models/fairseq_model.py:280: UserWarning: FairseqModel is deprecated, please use FairseqEncoderDecoderModel or BaseFairseqModel instead
  for key in self.keys
| Sentence buffer size: 4000
| Reading input sentence from stdin
exit encode_file_lw
 - JoinEmbed: embed/mldoc.train.1000.enc.de already exists
 - SplitLines: embed/mldoc.train.1000.split.fr already exists
enter encode_file_lw
Namespace(beam=5, bpe=None, buffer_size=4000, cpu=False, criterion='cross_entropy', data='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/data-bin/iwslt17.de_fr.en.bpe16k', dataset_impl=None, decoder_langtok=False, decoding_format=None, diverse_beam_groups=-1, diverse_beam_strength=0.5, empty_cache_freq=0, encoder_langtok=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, gen_subset='test', input='embed/mldoc.train.1000.split.fr', iter_decode_eos_penalty=0.0, iter_decode_force_max_iter=False, iter_decode_max_iter=10, iter_decode_with_beam=1, iter_decode_with_external_reranker=False, lang_pairs='de-en,fr-en', lazy_load=False, left_pad_source='True', left_pad_target='False', lenpen=1, log_format=None, log_interval=1000, lr_scheduler='fixed', lr_shrink=0.1, match_source_len=False, max_len_a=0, max_len_b=200, max_sentences=128, max_source_positions=1024, max_target_positions=1024, max_tokens=None, memory_efficient_fp16=False, min_len=1, min_loss_scale=0.0001, model_overrides='{}', momentum=0.99, nbest=1, no_beamable_mm=False, no_early_stop=False, no_progress_bar=False, no_repeat_ngram_size=0, num_shards=1, num_workers=1, optimizer='nag', output_file='embed/mldoc.train.1000.split.enc.fr', path='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/checkpoints//laser_lstm5_newcodetest/checkpoint_best.pt', prefix_size=0, print_alignment=False, print_step=False, quiet=False, raw_text=False, remove_bpe=None, replace_unk=None, required_batch_size_multiple=8, results_path=None, retain_iter_history=False, sacrebleu=False, sampling=False, sampling_topk=-1, sampling_topp=-1.0, score_reference=False, seed=1, shard_id=0, skip_invalid_size_inputs_valid_test=True, source_lang='fr', spm_model='/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/examples/translation/iwslt17.de_fr.en.bpe16k/sentencepiece.bpe.model', target_lang='en', task='translation_laser', temperature=1.0, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, unkpen=0, unnormalized=False, upsample_primary=1, user_dir=None, warmup_updates=0, weight_decay=0.0)
| [de] dictionary: 13880 types
| [en] dictionary: 13880 types
| [fr] dictionary: 13880 types
| loading model(s) from /home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/checkpoints//laser_lstm5_newcodetest/checkpoint_best.pt
| Sentence buffer size: 4000
| Reading input sentence from stdin
Traceback (most recent call last):
  File "mldoc.py", line 166, in <module>
    encode_file_lw(input_fn=cfname + '.split.' + lang,output_fn= cfname + '.split.enc.' + lang,lang=lang,buffer_size=args.buffer_size)
  File "mldoc.py", line 92, in encode_file_lw
    embed_liwei.main(args)
  File "/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/embed_liwei.py", line 112, in main
    for batch in make_batches(inputs, args, task, max_positions, encode_fn):
  File "/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/embed_liwei.py", line 43, in make_batches
    max_positions=max_positions,
  File "/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/fairseq/tasks/fairseq_task.py", line 150, in get_batch_iterator
    indices, dataset, max_positions, raise_exception=(not ignore_invalid_inputs),
  File "/home/wei/LIWEI_workspace/fairseq_liweimod/fairseq/fairseq/data/data_utils.py", line 188, in filter_by_size
    ).format(ignored[0], dataset.size(ignored[0]), max_positions))
Exception: Size of sample #3996 is invalid (={'fr-en': (1619, 0)}) since max_positions={'fr-en': (1024, 1024)}, skip this example with --skip-invalid-size-inputs-valid-test


Transformer based Multilingual document Embedding model

Hi Raymondhs:
Long time no see!
I wrote a paper and use your code framework for the laser baseline and utility.

It is called `Transformer based Multilingual document Embedding model' for the upcoming EACL conference (deadline Sep. 20th), in the attachment is my manuscript:
https://arxiv.org/pdf/2008.08567.pdf

It there any related publication by you that I need to cite in the paper?

May I ask for your comment and advice about my paper? Since there is still about one month to go, do you have any suggestions on improving the acceptance chance (e.g., are there some important experiments that need to be done)? I especially welcome negative comments that point out the drawback and flaws, so I can have a timely fix.

Of course, If you provide your original unique ideas that change the content of the paper in a significant way I can include you in the author list. (Need to be agreed by my co-author )

BTW:
You can send an email to [email protected] for negative comments :) or I can send for you if I may have your email address.

                                                                                           Regards!
                                                                                                             Wei

ModuleNotFoundError: No module named 'laser'

I ran fairseq-train as next command:

fairseq-train ../data/laser/data-bin/ \
  --max-epoch 7 \
  --ddp-backend=no_c10d \
  --task translation_laser --lang-pairs ne-en \
  --arch laser_lstm_artetxe \
  --encoder-num-layers 5 \
  --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \
  --lr 0.001 --lr-scheduler fixed \
  --weight-decay 0.0 --criterion cross_entropy \
  --save-dir checkpoints/laser_lstm \
  --max-tokens 3584 \
  --update-freq 8 \
  --log-interval 50 \
  --user-dir ./fairseq-laser/laser

However, No module named 'laser' was displayed while training.

Could you please tell me how to fix this?

| distributed init (rank 0): tcp://localhost:16810
| distributed init (rank 1): tcp://localhost:16810
| initialized host wmt19_pcf as rank 1
| initialized host wmt19_pcf as rank 0
Namespace(adam_betas='(0.9, 0.98)', adam_eps=1e-08, arch='laser_lstm_artetxe', best_checkpoint_metric='loss', bpe=None, bucket_cap_mb=25, clip_norm=0.0, cpu=False, criterion='cross_entropy', curriculum=0, data='../data/laser/data-bin/', dataset_impl=None, ddp_backend='no_c10d', decoder_dropout=0.1, decoder_embed_dim=320, decoder_hidden_dim=2048, decoder_langtok=False, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method='tcp://localhost:16810', distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=2, encoder_dropout=0.1, encoder_embed_dim=320, encoder_hidden_dim=512, encoder_langtok=None, encoder_num_layers=5, find_unused_parameters=False, fix_batches_to_gpus=False, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_interval_updates=-1, keep_last_epochs=-1, lang_embed_dim=32, lang_embeddings=True, lang_pairs='ne-en', lazy_load=False, left_pad_source='True', left_pad_target='False', log_format=None, log_interval=50, lr=[0.001], lr_scheduler='fixed', lr_shrink=0.1, max_epoch=7, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=3584, max_tokens_valid=3584, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, no_epoch_checkpoints=False, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', raw_text=False, required_batch_size_multiple=8, reset_dataloader=False, reset_lr_scheduler=False, reset_meters=False, reset_optimizer=False, restore_file='checkpoint_last.pt', save_dir='checkpoints/laser_lstm', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, skip_invalid_size_inputs_valid_test=False, source_lang=None, target_lang=None, task='translation_laser', tbmf_wrapper=False, tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, train_subset='train', update_freq=[8], upsample_primary=1, use_bmuf=False, user_dir='./fairseq-laser/laser', valid_subset='valid', validate_interval=1, warmup_updates=0, weight_decay=0.0)
| [en] dictionary: 49880 types
| [ne] dictionary: 49880 types
| loaded 5457 examples from: ../data/laser/data-bin/valid.ne-en.ne
| loaded 5457 examples from: ../data/laser/data-bin/valid.ne-en.en
| ../data/laser/data-bin/ valid ne-en 5457 examples
/usr/local/lib/python3.6/dist-packages/fairseq/models/fairseq_model.py:275: UserWarning: FairseqModel is deprecated, please use FairseqEncoderDecoderModel or BaseFairseqModel instead
  for key in self.keys
/usr/local/lib/python3.6/dist-packages/fairseq/models/fairseq_model.py:275: UserWarning: FairseqModel is deprecated, please use FairseqEncoderDecoderModel or BaseFairseqModel instead
  for key in self.keys
LaserLSTMModel(
  (models): ModuleDict(
    (ne-en): FairseqModel(
      (encoder): LaserLSTMEncoder(
        (embed_tokens): Embedding(49880, 320, padding_idx=1)
        (lstm): LSTM(320, 512, num_layers=5, dropout=0.1, bidirectional=True)
      )
      (decoder): LaserLSTMDecoder(
        (embed_tokens): Embedding(49880, 320, padding_idx=1)
        (embed_lang): Embedding(3, 32, padding_idx=0)
        (lstm): LSTM(1376, 2048)
        (sentemb_hidden_proj): Linear(in_features=1024, out_features=2048, bias=True)
        (sentemb_cell_proj): Linear(in_features=1024, out_features=2048, bias=True)
        (fc_out): Linear(in_features=2048, out_features=49880, bias=True)
      )
    )
  )
)
| model laser_lstm_artetxe, criterion CrossEntropyCriterion
| num. model params: 195006264 (num. trained: 195006264)
| training on 2 GPUs
| max tokens per GPU = 3584 and max sentences per GPU = None
| no existing checkpoint found checkpoints/laser_lstm/checkpoint_last.pt
| loading train data for epoch 0
| loaded 11328074 examples from: ../data/laser/data-bin/train.ne-en.ne
| loaded 11328074 examples from: ../data/laser/data-bin/train.ne-en.en
| ../data/laser/data-bin/ train ne-en 11328074 examples
| epoch 001:   0%|                                     | 0/2577 [00:00<?, ?it/s]Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "/usr/lib/python3.6/multiprocessing/spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
ModuleNotFoundError: No module named 'laser'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.