Giter Site home page Giter Site logo

Comments (15)

jiaaoc avatar jiaaoc commented on May 27, 2024

Are you using BART-large? One possible reason might be that facebook has updated the bpe and there might exist some mismatch when initializing the embedding matrix and the id of our special separator token.

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

Also, it is abnormal that you achieve the best performance after 16 epoches. Based on my previous observations, the best model for single-view/multi-view will be achieved after 6 or 7 epoches.

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

I think the codes in this repo should be good as I received emails from other people saying that they could replicate similar results.

from multi-view-seq2seq.

negrinho avatar negrinho commented on May 27, 2024

I haven't changed anything. Just cloned the repo and used colab to run your experiments. See the link for the colab file if you want to take a look (https://colab.research.google.com/drive/1tzmWGhSlnXBuBkYE2Llvzl0cS7k1KW-m?usp=sharing). You just have to upload your compressed data into your Google Drive folder and should be able to run the colab file right away. I cleaned the colab notebook a bit now, but definitely got those results last time I ran the code using colab.

from multi-view-seq2seq.

negrinho avatar negrinho commented on May 27, 2024

The learning rate seems very low. Can you post the output logs that result from running this code on your setup?

2020-11-06 17:46:21 | INFO | fairseq_cli.train | Namespace(T=1, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_large', attention_dropout=0.1, balance=False, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='cnn_dm-bin', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=16, decoder_embed_dim=1024, decoder_embed_path=None, decoder_ffn_embed_dim=4096, decoder_input_dim=1024, decoder_layerdrop=0, decoder_layers=12, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=1024, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=16, encoder_embed_dim=1024, encoder_embed_path=None, encoder_ffn_embed_dim=4096, encoder_layerdrop=0, encoder_layers=12, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, end_learning_rate=0.0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format=None, log_interval=1000, lr=[3e-05], lr_scheduler='polynomial_decay', lr_weight=1, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=False, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=-1, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.large/model.pt', save_dir='checkpoints_stage', save_interval=1, save_interval_updates=0, seed=14632, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, total_num_update=5000, train_subset='train', truncate_source=True, update_freq=[32], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=200, weight_decay=0.01)
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | [source] dictionary: 50264 types
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | [target] dictionary: 50264 types
2020-11-06 17:46:21 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.source
2020-11-06 17:46:21 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.target
2020-11-06 17:46:21 | INFO | fairseq.tasks.translation | cnn_dm-bin valid source-target 818 examples
2020-11-06 17:46:31 | INFO | fairseq_cli.train | BARTModel(
  (encoder): TransformerEncoder(
    (embed_tokens): Embedding(50264, 1024, padding_idx=1)
    (embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
    (layers): ModuleList(
      (0): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (1): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (2): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (3): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (4): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (5): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (6): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (7): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (8): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (9): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (10): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (11): TransformerEncoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
    )
    (layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (decoder): TransformerDecoder(
    (embed_tokens): Embedding(50264, 1024, padding_idx=1)
    (embed_positions): LearnedPositionalEmbedding(1026, 1024, padding_idx=1)
    (layers): ModuleList(
      (0): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (1): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (2): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (3): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (4): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (5): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (6): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (7): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (8): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (9): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (10): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
      (11): TransformerDecoderLayer(
        (self_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (self_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (encoder_attn): MultiheadAttention(
          (k_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (v_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (q_proj): Linear(in_features=1024, out_features=1024, bias=True)
          (out_proj): Linear(in_features=1024, out_features=1024, bias=True)
        )
        (encoder_attn_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
        (fc1): Linear(in_features=1024, out_features=4096, bias=True)
        (fc2): Linear(in_features=4096, out_features=1024, bias=True)
        (final_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
      )
    )
    (layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  )
  (classification_heads): ModuleDict()
  (section_positions): LearnedPositionalEmbedding(1025, 1024, padding_idx=0)
  (section_layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  (section): LSTM(1024, 1024)
  (w_proj_layer_norm): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
  (w_proj): Linear(in_features=1024, out_features=1024, bias=True)
  (w_context_vector): Linear(in_features=1024, out_features=1, bias=False)
  (softmax): Softmax(dim=1)
)
2020-11-06 17:46:31 | INFO | fairseq_cli.train | model bart_large, criterion LabelSmoothedCrossEntropyCriterion
2020-11-06 17:46:31 | INFO | fairseq_cli.train | num. model params: 416791552 (num. trained: 416791552)
2020-11-06 17:46:38 | INFO | fairseq_cli.train | training on 1 GPUs
2020-11-06 17:46:38 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None
2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt
2020-11-06 17:46:38 | INFO | fairseq.trainer | loading train data for epoch 0
2020-11-06 17:46:38 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.source
2020-11-06 17:46:38 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.target
2020-11-06 17:46:38 | INFO | fairseq.tasks.translation | cnn_dm-bin train source-target 14731 examples
2020-11-06 17:46:38 | WARNING | fairseq.data.data_utils | 5 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269]
group1: 
511
group2: 
12
2020-11-06 17:46:38 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
here schedule!
False
epoch 001:  40% 37/93 [06:54<10:44, 11.52s/it, loss=14.612, nll_loss=14.464, ppl=22602, wps=377.8, ups=0.09, wpb=4223.9, bsz=156.7, num_updates=37, lr=5.55e-06, gnorm=4.996, clip=100, oom=0, train_wall=410, wall=415] 

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

Hi this is one example log when we are training multi_view BART_base:

2020-10-16 20:22:37 | INFO | fairseq_cli.train | Namespace(T=0.2, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_base', attention_dropout=0.1, balance=True, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=0.1, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='cnn_dm-bin_2', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=12, decoder_embed_dim=768, decoder_embed_path=None, decoder_ffn_embed_dim=3072, decoder_input_dim=768, decoder_layerdrop=0, decoder_layers=6, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=768, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.1, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, end_learning_rate=0.0, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, force_anneal=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format='json', log_interval=1000, lr=[3e-05], lr_scheduler='polynomial_decay', lr_weight=500.0, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=True, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=5, pooler_activation_fn='tanh', pooler_dropout=0.0, power=1.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.base/model.pt', save_dir='checkpoints_multi_base_1', save_interval=1, save_interval_updates=0, seed=1, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, total_num_update=2000, train_subset='train', truncate_source=True, update_freq=[16], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, warmup_updates=120, weight_decay=0.01)
2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | [source] dictionary: 51200 types
2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | [target] dictionary: 51200 types
2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin_2/valid.source-target.source
2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin/valid.source-target.source
2020-10-16 20:22:37 | INFO | fairseq.data.data_utils | loaded 818 examples from: cnn_dm-bin_2/valid.source-target.target
2020-10-16 20:22:37 | INFO | fairseq.tasks.translation | cnn_dm-bin_2 valid source-target 818 examples
!!! 818 818
2020-10-16 20:22:40 | INFO | fairseq_cli.train | BARTModel(
(encoder): TransformerEncoder(
(embed_tokens): Embedding(51200, 768, padding_idx=1)
(embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(decoder): TransformerDecoder(
(embed_tokens): Embedding(51200, 768, padding_idx=1)
(embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1)
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(classification_heads): ModuleDict()
(section_positions): LearnedPositionalEmbedding(1025, 1024, padding_idx=0)
(section_layernorm_embedding): LayerNorm((1024,), eps=1e-05, elementwise_affine=True)
(section): LSTM(768, 768)
(w_proj_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(w_proj): Linear(in_features=768, out_features=768, bias=True)
(w_context_vector): Linear(in_features=768, out_features=1, bias=False)
(softmax): Softmax(dim=1)
)
2020-10-16 20:22:40 | INFO | fairseq_cli.train | model bart_base, criterion LabelSmoothedCrossEntropyCriterion
2020-10-16 20:22:40 | INFO | fairseq_cli.train | num. model params: 146507776 (num. trained: 146507776)
2020-10-16 20:22:43 | INFO | fairseq_cli.train | training on 1 GPUs
2020-10-16 20:22:43 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None
2020-10-16 20:22:43 | INFO | fairseq.trainer | loaded checkpoint ./bart.base/model.pt (epoch 14 @ 0 updates)
group1:
259
group2:
12
2020-10-16 20:22:43 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
here schedule!
2020-10-16 20:22:43 | INFO | fairseq.trainer | loading train data for epoch 0
2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin_2/train.source-target.source
2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin/train.source-target.source
2020-10-16 20:22:43 | INFO | fairseq.data.data_utils | loaded 14731 examples from: cnn_dm-bin_2/train.source-target.target
2020-10-16 20:22:43 | INFO | fairseq.tasks.translation | cnn_dm-bin_2 train source-target 14731 examples
!!! 14731 14731
2020-10-16 20:22:43 | WARNING | fairseq.data.data_utils | 6 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269, 8197]
True
2020-10-16 20:28:05 | INFO | train | {"epoch": 1, "train_loss": "5.334", "train_nll_loss": "3.491", "train_ppl": "11.247", "train_wps": "1206.4", "train_ups": "0.59", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "189", "train_lr": "2.88989e-05", "train_gnorm": "6.384", "train_clip": "100", "train_oom": "0", "train_train_wall": "303", "train_wall": "323"}
/pytorch/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha)
2020-10-16 20:28:11 | INFO | valid | {"epoch": 1, "valid_loss": "4.494", "valid_nll_loss": "2.632", "valid_ppl": "6.201", "valid_wps": "3638.8", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "189"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.39893327494573744, 'p': 0.48739021531416354, 'r': 0.3672381425752768}, 'rouge-2': {'f': 0.19168286247403196, 'p': 0.23579704030498724, 'r': 0.1772675131514576}, 'rouge-l': {'f': 0.38773004473056544, 'p': 0.4650643030400437, 'r': 0.3571665562085555}}
2020-10-16 20:29:16 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 1 @ 189 updates, score 4.494) (writing took 2.720979069825262 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3881301009025109, 'p': 0.47039422544482545, 'r': 0.3606226446800223}, 'rouge-2': {'f': 0.17881792205695904, 'p': 0.21852663998652969, 'r': 0.16731151894505894}, 'rouge-l': {'f': 0.3800338863639725, 'p': 0.4518477819676159, 'r': 0.35300024402391994}}
/pytorch/aten/src/ATen/native/BinaryOps.cpp:66: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
2020-10-16 20:35:49 | INFO | train | {"epoch": 2, "train_loss": "4.432", "train_nll_loss": "2.625", "train_ppl": "6.168", "train_wps": "835.9", "train_ups": "0.41", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "378", "train_lr": "2.5883e-05", "train_gnorm": "2.332", "train_clip": "100", "train_oom": "0", "train_train_wall": "313", "train_wall": "786"}
2020-10-16 20:35:54 | INFO | valid | {"epoch": 2, "valid_loss": "4.322", "valid_nll_loss": "2.492", "valid_ppl": "5.627", "valid_wps": "3825.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "378", "valid_best_loss": "4.322"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4279867966292195, 'p': 0.4571306118661938, 'r': 0.44022592801096416}, 'rouge-2': {'f': 0.21371075541532447, 'p': 0.2285121015700478, 'r': 0.22154388398488878}, 'rouge-l': {'f': 0.4196784994354386, 'p': 0.4456033541138049, 'r': 0.4278439895292393}}
2020-10-16 20:37:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 2 @ 378 updates, score 4.322) (writing took 11.695231701247394 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.4135366281231374, 'p': 0.444507361098039, 'r': 0.423101539955033}, 'rouge-2': {'f': 0.19432386047889444, 'p': 0.21068450242099074, 'r': 0.19930810921791703}, 'rouge-l': {'f': 0.4059372042257342, 'p': 0.43402206622848205, 'r': 0.41056085492657124}}
2020-10-16 20:44:23 | INFO | train | {"epoch": 3, "train_loss": "4.19", "train_nll_loss": "2.368", "train_ppl": "5.162", "train_wps": "753.5", "train_ups": "0.37", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "567", "train_lr": "2.2867e-05", "train_gnorm": "2.312", "train_clip": "100", "train_oom": "0", "train_train_wall": "323", "train_wall": "1300"}
2020-10-16 20:44:28 | INFO | valid | {"epoch": 3, "valid_loss": "4.242", "valid_nll_loss": "2.397", "valid_ppl": "5.266", "valid_wps": "3877.8", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "567", "valid_best_loss": "4.242"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4315874956151277, 'p': 0.48791636226347873, 'r': 0.41966804049154866}, 'rouge-2': {'f': 0.2188827333313949, 'p': 0.2480313270750059, 'r': 0.21386282199141377}, 'rouge-l': {'f': 0.41806919758048416, 'p': 0.4660477028142457, 'r': 0.40645590435600293}}
2020-10-16 20:45:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 3 @ 567 updates, score 4.242) (writing took 8.149096994195133 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.4138701252407696, 'p': 0.4678226724582506, 'r': 0.4065260752587133}, 'rouge-2': {'f': 0.19377017502951563, 'p': 0.22115075342729995, 'r': 0.1904202394946015}, 'rouge-l': {'f': 0.4010310496097956, 'p': 0.44619094434249496, 'r': 0.3929682749938597}}
2020-10-16 20:52:16 | INFO | train | {"epoch": 4, "train_loss": "4.009", "train_nll_loss": "2.171", "train_ppl": "4.505", "train_wps": "818.8", "train_ups": "0.4", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "756", "train_lr": "1.98511e-05", "train_gnorm": "2.142", "train_clip": "100", "train_oom": "0", "train_train_wall": "298", "train_wall": "1773"}
2020-10-16 20:52:21 | INFO | valid | {"epoch": 4, "valid_loss": "4.192", "valid_nll_loss": "2.352", "valid_ppl": "5.104", "valid_wps": "3870.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "756", "valid_best_loss": "4.192"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.43669851905346935, 'p': 0.47079025587829726, 'r': 0.4442491635654981}, 'rouge-2': {'f': 0.21691059357860815, 'p': 0.23458404727614512, 'r': 0.220700188219085}, 'rouge-l': {'f': 0.424007418421588, 'p': 0.4522198002073135, 'r': 0.4289625411274865}}
2020-10-16 20:53:51 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 4 @ 756 updates, score 4.192) (writing took 9.192486869171262 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.4225243192276242, 'p': 0.45223754352590134, 'r': 0.43401393897009366}, 'rouge-2': {'f': 0.19391878994057452, 'p': 0.20822517840877766, 'r': 0.1994167266777721}, 'rouge-l': {'f': 0.4080676812912881, 'p': 0.43246768831954485, 'r': 0.4165620181146185}}
2020-10-16 21:00:31 | INFO | train | {"epoch": 5, "train_loss": "3.887", "train_nll_loss": "2.039", "train_ppl": "4.11", "train_wps": "781.9", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "945", "train_lr": "1.68351e-05", "train_gnorm": "2.074", "train_clip": "100", "train_oom": "0", "train_train_wall": "303", "train_wall": "2269"}
2020-10-16 21:00:37 | INFO | valid | {"epoch": 5, "valid_loss": "4.186", "valid_nll_loss": "2.342", "valid_ppl": "5.071", "valid_wps": "3877.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "945", "valid_best_loss": "4.186"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4461844243274079, 'p': 0.45666747672161484, 'r': 0.4767132459495706}, 'rouge-2': {'f': 0.22200930553762793, 'p': 0.22706208545278755, 'r': 0.23842818519947412}, 'rouge-l': {'f': 0.43306061447923366, 'p': 0.44238726551550944, 'r': 0.4563430992482793}}
2020-10-16 21:02:12 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 5 @ 945 updates, score 4.186) (writing took 11.725180207751691 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.4333094974371008, 'p': 0.4428674781714452, 'r': 0.46658344288666276}, 'rouge-2': {'f': 0.20111995145025907, 'p': 0.20625643433873106, 'r': 0.217710295965629}, 'rouge-l': {'f': 0.42099773909932536, 'p': 0.42971386878603307, 'r': 0.44643271326223904}}
2020-10-16 21:08:55 | INFO | train | {"epoch": 6, "train_loss": "3.787", "train_nll_loss": "1.93", "train_ppl": "3.81", "train_wps": "769.1", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1134", "train_lr": "1.38191e-05", "train_gnorm": "2.034", "train_clip": "100", "train_oom": "0", "train_train_wall": "302", "train_wall": "2772"}
2020-10-16 21:09:00 | INFO | valid | {"epoch": 6, "valid_loss": "4.18", "valid_nll_loss": "2.343", "valid_ppl": "5.075", "valid_wps": "3875.3", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1134", "valid_best_loss": "4.18"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.44834014142761747, 'p': 0.47490514105110054, 'r': 0.46418072489161993}, 'rouge-2': {'f': 0.22448318205207965, 'p': 0.23786780200771554, 'r': 0.23428752100684014}, 'rouge-l': {'f': 0.431713338823393, 'p': 0.4549432489377001, 'r': 0.44269410749902055}}
2020-10-16 21:10:27 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_best.pt (epoch 6 @ 1134 updates, score 4.18) (writing took 7.535578944720328 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.4353075633115118, 'p': 0.45960422226275544, 'r': 0.4524936710724383}, 'rouge-2': {'f': 0.20453167333689543, 'p': 0.21761783799140255, 'r': 0.21275645602953855}, 'rouge-l': {'f': 0.419203755880583, 'p': 0.43906384085100353, 'r': 0.43266115281125556}}
2020-10-16 21:17:11 | INFO | train | {"epoch": 7, "train_loss": "3.715", "train_nll_loss": "1.85", "train_ppl": "3.604", "train_wps": "781.3", "train_ups": "0.38", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1323", "train_lr": "1.08032e-05", "train_gnorm": "2.042", "train_clip": "100", "train_oom": "0", "train_train_wall": "306", "train_wall": "3268"}
2020-10-16 21:17:16 | INFO | valid | {"epoch": 7, "valid_loss": "4.181", "valid_nll_loss": "2.345", "valid_ppl": "5.081", "valid_wps": "3853.4", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1323", "valid_best_loss": "4.18"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4481947505100136, 'p': 0.46780376177094585, 'r': 0.4699430608678166}, 'rouge-2': {'f': 0.22568330262401542, 'p': 0.23667032669984672, 'r': 0.2375391979501824}, 'rouge-l': {'f': 0.43290810524697976, 'p': 0.4484189183310228, 'r': 0.45029655273945113}}
2020-10-16 21:18:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 7 @ 1323 updates, score 4.181) (writing took 3.958364794962108 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.42504939222262134, 'p': 0.4437278759292623, 'r': 0.44713921113126553}, 'rouge-2': {'f': 0.19796039505355403, 'p': 0.2078876050553927, 'r': 0.20811213382174953}, 'rouge-l': {'f': 0.41333885103722146, 'p': 0.42888192585788004, 'r': 0.4307527462400245}}
2020-10-16 21:25:20 | INFO | train | {"epoch": 8, "train_loss": "3.655", "train_nll_loss": "1.783", "train_ppl": "3.442", "train_wps": "791.5", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1512", "train_lr": "7.78723e-06", "train_gnorm": "2.018", "train_clip": "100", "train_oom": "0", "train_train_wall": "296", "train_wall": "3757"}
2020-10-16 21:25:25 | INFO | valid | {"epoch": 8, "valid_loss": "4.188", "valid_nll_loss": "2.358", "valid_ppl": "5.126", "valid_wps": "3883.7", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1512", "valid_best_loss": "4.18"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4497831590533799, 'p': 0.4743879319512072, 'r': 0.46696211017029543}, 'rouge-2': {'f': 0.22560126332234826, 'p': 0.23866609130364316, 'r': 0.23526072930189967}, 'rouge-l': {'f': 0.4331148209032409, 'p': 0.4523118415115414, 'r': 0.4466045581005789}}
2020-10-16 21:26:54 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 8 @ 1512 updates, score 4.188) (writing took 6.4392871032468975 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.42990064221283536, 'p': 0.4527699793144171, 'r': 0.44901333807162896}, 'rouge-2': {'f': 0.20288890059810166, 'p': 0.21534024009359243, 'r': 0.21117838919893522}, 'rouge-l': {'f': 0.41598493880199483, 'p': 0.4349848431222833, 'r': 0.4314877150138075}}
2020-10-16 21:33:30 | INFO | train | {"epoch": 9, "train_loss": "3.608", "train_nll_loss": "1.731", "train_ppl": "3.32", "train_wps": "789.6", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1701", "train_lr": "4.77128e-06", "train_gnorm": "1.996", "train_clip": "100", "train_oom": "0", "train_train_wall": "298", "train_wall": "4248"}
2020-10-16 21:33:36 | INFO | valid | {"epoch": 9, "valid_loss": "4.19", "valid_nll_loss": "2.359", "valid_ppl": "5.129", "valid_wps": "3839.3", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1701", "valid_best_loss": "4.18"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4469569361973397, 'p': 0.4798210784938584, 'r': 0.45528237283388123}, 'rouge-2': {'f': 0.2263709493623402, 'p': 0.2433974733721229, 'r': 0.23132458588703095}, 'rouge-l': {'f': 0.4316170975348626, 'p': 0.45903993291697, 'r': 0.43719317336834507}}
2020-10-16 21:34:59 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 9 @ 1701 updates, score 4.19) (writing took 3.861534607131034 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.42781119671981177, 'p': 0.46124989848989056, 'r': 0.436261581761792}, 'rouge-2': {'f': 0.20098620993180524, 'p': 0.21921218562638198, 'r': 0.20423255976966606}, 'rouge-l': {'f': 0.41332683231640294, 'p': 0.44168964220540274, 'r': 0.4191730895606922}}
2020-10-16 21:41:41 | INFO | train | {"epoch": 10, "train_loss": "3.575", "train_nll_loss": "1.693", "train_ppl": "3.233", "train_wps": "789.3", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "1890", "train_lr": "1.75532e-06", "train_gnorm": "1.981", "train_clip": "100", "train_oom": "0", "train_train_wall": "307", "train_wall": "4739"}
2020-10-16 21:41:47 | INFO | valid | {"epoch": 10, "valid_loss": "4.203", "valid_nll_loss": "2.369", "valid_ppl": "5.167", "valid_wps": "3792", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "1890", "valid_best_loss": "4.18"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.4489107167227575, 'p': 0.472013059581364, 'r': 0.46714979898744746}, 'rouge-2': {'f': 0.22538691538484998, 'p': 0.23704805111264826, 'r': 0.23613808765906907}, 'rouge-l': {'f': 0.4316388762633301, 'p': 0.45038233458541693, 'r': 0.44596016494663443}}
2020-10-16 21:43:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 10 @ 1890 updates, score 4.203) (writing took 3.90104684792459 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.4306987172598409, 'p': 0.4531877008856705, 'r': 0.4487099330991109}, 'rouge-2': {'f': 0.2012845838629506, 'p': 0.21321799010752718, 'r': 0.20959143334202265}, 'rouge-l': {'f': 0.41571496654100026, 'p': 0.43409465797013363, 'r': 0.4300684739298519}}
2020-10-16 21:50:12 | INFO | train | {"epoch": 11, "train_loss": "3.556", "train_nll_loss": "1.671", "train_ppl": "3.185", "train_wps": "758.1", "train_ups": "0.37", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "2079", "train_lr": "0", "train_gnorm": "1.957", "train_clip": "100", "train_oom": "0", "train_train_wall": "316", "train_wall": "5249"}
2020-10-16 21:50:18 | INFO | valid | {"epoch": 11, "valid_loss": "4.203", "valid_nll_loss": "2.371", "valid_ppl": "5.172", "valid_wps": "3874.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "2079", "valid_best_loss": "4.18"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.446642502255209, 'p': 0.4723302484715944, 'r': 0.46238440805767694}, 'rouge-2': {'f': 0.22346453229760987, 'p': 0.2373075042245457, 'r': 0.23220979841378647}, 'rouge-l': {'f': 0.4295766026698361, 'p': 0.4502630944373418, 'r': 0.4416669748596527}}
2020-10-16 21:51:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 11 @ 2079 updates, score 4.203) (writing took 4.998790017794818 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.42976349488980414, 'p': 0.4532897733258767, 'r': 0.44788645672433053}, 'rouge-2': {'f': 0.199710011641502, 'p': 0.21187374900537204, 'r': 0.20828099794102461}, 'rouge-l': {'f': 0.4141090520180718, 'p': 0.43257857065512195, 'r': 0.42896840192756164}}
2020-10-16 21:58:20 | INFO | train | {"epoch": 12, "train_loss": "3.55", "train_nll_loss": "1.665", "train_ppl": "3.172", "train_wps": "793.6", "train_ups": "0.39", "train_wpb": "2049.3", "train_bsz": "77.9", "train_num_updates": "2268", "train_lr": "0", "train_gnorm": "1.939", "train_clip": "100", "train_oom": "0", "train_train_wall": "300", "train_wall": "5738"}
2020-10-16 21:58:26 | INFO | valid | {"epoch": 12, "valid_loss": "4.203", "valid_nll_loss": "2.371", "valid_ppl": "5.172", "valid_wps": "3878.2", "valid_wpb": "130.4", "valid_bsz": "5", "valid_num_updates": "2268", "valid_best_loss": "4.18"}
here bpe NONE
here!
Test on val set:
Val {'rouge-1': {'f': 0.446642502255209, 'p': 0.4723302484715944, 'r': 0.46238440805767694}, 'rouge-2': {'f': 0.22346453229760987, 'p': 0.2373075042245457, 'r': 0.23220979841378647}, 'rouge-l': {'f': 0.4295766026698361, 'p': 0.4502630944373418, 'r': 0.4416669748596527}}
2020-10-16 21:59:52 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_multi_base_1/checkpoint_last.pt (epoch 12 @ 2268 updates, score 4.203) (writing took 4.920202174689621 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.42976349488980414, 'p': 0.4532897733258767, 'r': 0.44788645672433053}, 'rouge-2': {'f': 0.199710011641502, 'p': 0.21187374900537204, 'r': 0.20828099794102461}, 'rouge-l': {'f': 0.4141090520180718, 'p': 0.43257857065512195, 'r': 0.42896840192756164}}
2020-10-16 22:01:12 | INFO | fairseq_cli.train | early stop since valid performance hasn't improved for last 5 runs
2020-10-16 22:01:12 | INFO | fairseq_cli.train | done training in 5908.6 seconds

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

Since I am not able to get access to the P100 machines, I am testing the train_single_view.sh with a max_length = 500. And I will post the training log here later.

from multi-view-seq2seq.

negrinho avatar negrinho commented on May 27, 2024

Thanks for looking. I think that you may be right about it being a discrepancy between the tokenizations somehow. I get much lower results. The preprocessed files may no longer be up to date for the versions that colab is pulling. If you get a change to run the colab file, that would be great. I will preprocess the data again and see if I get different results.

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

Yes, I have tried training from scratch without BART_initialization as well, and the results were better than what you have observed.

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

This is the log for BART_base encoder + random initialized decoder for single-view training:

2020-09-12 23:48:39 | INFO | fairseq_cli.train | Namespace(T=1, activation_fn='gelu', adam_betas='(0.9, 0.999)', adam_eps=1e-08, adaptive_softmax_cutoff=None, adaptive_softmax_dropout=0, all_gather_list_size=16384, arch='bart_encoder_base', attention_dropout=0.2, balance=False, best_checkpoint_metric='loss', bpe=None, broadcast_buffers=False, bucket_cap_mb=25, clip_norm=5.0, cpu=False, criterion='label_smoothed_cross_entropy', cross_self_attention=False, curriculum=0, data='data_none', dataset_impl=None, ddp_backend='no_c10d', decoder_attention_heads=4, decoder_embed_dim=768, decoder_embed_path=None, decoder_ffn_embed_dim=3072, decoder_input_dim=768, decoder_layerdrop=0, decoder_layers=2, decoder_layers_to_keep=None, decoder_learned_pos=True, decoder_normalize_before=False, decoder_output_dim=768, device_id=0, disable_validation=False, distributed_backend='nccl', distributed_init_method=None, distributed_no_spawn=False, distributed_port=-1, distributed_rank=0, distributed_world_size=1, dropout=0.2, empty_cache_freq=0, encoder_attention_heads=12, encoder_embed_dim=768, encoder_embed_path=None, encoder_ffn_embed_dim=3072, encoder_layerdrop=0, encoder_layers=6, encoder_layers_to_keep=None, encoder_learned_pos=True, encoder_normalize_before=False, eval_bleu=False, eval_bleu_args=None, eval_bleu_detok='space', eval_bleu_detok_args=None, eval_bleu_print_samples=False, eval_bleu_remove_bpe=None, eval_tokenized_bleu=False, fast_stat_sync=False, find_unused_parameters=True, fix_batches_to_gpus=False, fixed_validation_seed=None, fp16=False, fp16_init_scale=128, fp16_no_flatten_grads=False, fp16_scale_tolerance=0.0, fp16_scale_window=None, keep_best_checkpoints=-1, keep_interval_updates=-1, keep_last_epochs=-1, label_smoothing=0.1, layer_wise_attention=False, layernorm_embedding=True, left_pad_source='True', left_pad_target='False', load_alignments=False, log_format='json', log_interval=1000, lr=[3e-05], lr_scheduler='inverse_sqrt', lr_weight=100.0, max_epoch=0, max_sentences=None, max_sentences_valid=None, max_source_positions=1024, max_target_positions=1024, max_tokens=800, max_tokens_valid=800, max_update=0, maximize_best_checkpoint_metric=False, memory_efficient_fp16=False, min_loss_scale=0.0001, min_lr=-1, multi_views=False, no_cross_attention=False, no_epoch_checkpoints=True, no_last_checkpoints=False, no_progress_bar=False, no_save=False, no_save_optimizer_state=False, no_scale_embedding=True, no_token_positional_embeddings=False, num_workers=1, optimizer='adam', optimizer_overrides='{}', patience=30, pooler_activation_fn='tanh', pooler_dropout=0.0, relu_dropout=0.0, required_batch_size_multiple=1, reset_dataloader=True, reset_lr_scheduler=False, reset_meters=True, reset_optimizer=True, restore_file='./bart.base/model.pt', save_dir='checkpoints_scratch_1', save_interval=1, save_interval_updates=0, seed=0, sentence_avg=False, share_all_embeddings=True, share_decoder_input_output_embed=True, skip_invalid_size_inputs_valid_test=True, source_lang='source', target_lang='target', task='translation', temp_file='bart_base_scratch', tensorboard_logdir='', threshold_loss_scale=None, tokenizer=None, train_subset='train', truncate_source=True, update_freq=[16], upsample_primary=1, use_bmuf=False, use_old_adam=False, user_dir=None, valid_subset='valid', validate_interval=1, view_2_path='None', warmup_init_lr=-1, warmup_updates=400, weight_decay=0.1)
2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | [source] dictionary: 51200 types
2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | [target] dictionary: 51200 types
2020-09-12 23:48:39 | INFO | fairseq.data.data_utils | loaded 818 examples from: data_none/valid.source-target.source
2020-09-12 23:48:39 | INFO | fairseq.data.data_utils | loaded 818 examples from: data_none/valid.source-target.target
2020-09-12 23:48:39 | INFO | fairseq.tasks.translation | data_none valid source-target 818 examples
2020-09-12 23:48:44 | INFO | fairseq_cli.train | BARTModel(
(encoder): TransformerEncoder(
(embed_tokens): Embedding(51200, 768, padding_idx=1)
(embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1)
(layers): ModuleList(
(0): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(2): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(3): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(4): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(5): TransformerEncoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(decoder): TransformerDecoder(
(embed_tokens): Embedding(51200, 768, padding_idx=1)
(embed_positions): LearnedPositionalEmbedding(1026, 768, padding_idx=1)
(layers): ModuleList(
(0): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(1): TransformerDecoderLayer(
(self_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(self_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(encoder_attn): MultiheadAttention(
(k_proj): Linear(in_features=768, out_features=768, bias=True)
(v_proj): Linear(in_features=768, out_features=768, bias=True)
(q_proj): Linear(in_features=768, out_features=768, bias=True)
(out_proj): Linear(in_features=768, out_features=768, bias=True)
)
(encoder_attn_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(fc1): Linear(in_features=768, out_features=3072, bias=True)
(fc2): Linear(in_features=3072, out_features=768, bias=True)
(final_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
)
(layernorm_embedding): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
)
(classification_heads): ModuleDict()
(section): LSTM(768, 768)
(w_proj_layer_norm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
(w_proj): Linear(in_features=768, out_features=768, bias=True)
(w_context_vector): Linear(in_features=768, out_features=1, bias=False)
(softmax): Softmax(dim=1)
)
2020-09-12 23:48:44 | INFO | fairseq_cli.train | model bart_encoder_base, criterion LabelSmoothedCrossEntropyCriterion
2020-09-12 23:48:44 | INFO | fairseq_cli.train | num. model params: 107649024 (num. trained: 107649024)
2020-09-12 23:48:48 | INFO | fairseq_cli.train | training on 1 GPUs
2020-09-12 23:48:48 | INFO | fairseq_cli.train | max tokens per GPU = 800 and max sentences per GPU = None
bart_encoder_base
2020-09-12 23:48:48 | INFO | fairseq.trainer | loaded checkpoint ./bart.base/model.pt (epoch 14 @ 0 updates)
group1:
103
group2:
61
2020-09-12 23:48:48 | INFO | fairseq.trainer | NOTE: your device may support faster training with --fp16
here schedule!
2020-09-12 23:48:48 | INFO | fairseq.trainer | loading train data for epoch 0
2020-09-12 23:48:48 | INFO | fairseq.data.data_utils | loaded 14731 examples from: data_none/train.source-target.source
2020-09-12 23:48:48 | INFO | fairseq.data.data_utils | loaded 14731 examples from: data_none/train.source-target.target
2020-09-12 23:48:48 | INFO | fairseq.tasks.translation | data_none train source-target 14731 examples
2020-09-12 23:48:49 | WARNING | fairseq.data.data_utils | 5 samples have invalid sizes and will be skipped, max_positions=(800, 800), first few sample ids=[6248, 12799, 12502, 9490, 4269]
False
2020-09-12 23:51:50 | INFO | train | {"epoch": 1, "train_loss": "10.421", "train_nll_loss": "9.332", "train_ppl": "644.363", "train_wps": "2140", "train_ups": "1.01", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "182", "train_lr": "1.365e-05", "train_gnorm": "2.41", "train_clip": "8.2", "train_oom": "0", "train_train_wall": "170", "train_wall": "182"}
/pytorch/torch/csrc/utils/python_arg_parser.cpp:756: UserWarning: This overload of add_ is deprecated:
add_(Number alpha, Tensor other)
Consider using one of the following signatures instead:
add_(Tensor other, *, Number alpha)
2020-09-12 23:51:54 | INFO | valid | {"epoch": 1, "valid_loss": "7.602", "valid_nll_loss": "6.152", "valid_ppl": "71.102", "valid_wps": "4947.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "182"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.21663793375370063, 'p': 0.22864870158974218, 'r': 0.2287239463029277}, 'rouge-2': {'f': 0.054124139134273476, 'p': 0.05736806219841362, 'r': 0.057741245203472076}, 'rouge-l': {'f': 0.22703430161156996, 'p': 0.2589247848795017, 'r': 0.21878059623920781}}
2020-09-12 23:53:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 1 @ 182 updates, score 7.602) (writing took 3.440072625875473 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.2143088315976068, 'p': 0.22446929769837956, 'r': 0.2289120110339817}, 'rouge-2': {'f': 0.05192138274699011, 'p': 0.05429316369788991, 'r': 0.05615224649979625}, 'rouge-l': {'f': 0.2247126841007863, 'p': 0.25170078332283147, 'r': 0.22005435135775175}}
/pytorch/aten/src/ATen/native/BinaryOps.cpp:66: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
2020-09-12 23:57:52 | INFO | train | {"epoch": 2, "train_loss": "7.32", "train_nll_loss": "5.913", "train_ppl": "60.24", "train_wps": "1069.3", "train_ups": "0.5", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "364", "train_lr": "2.73e-05", "train_gnorm": "3.339", "train_clip": "16.5", "train_oom": "0", "train_train_wall": "170", "train_wall": "545"}
2020-09-12 23:57:57 | INFO | valid | {"epoch": 2, "valid_loss": "7.298", "valid_nll_loss": "5.785", "valid_ppl": "55.125", "valid_wps": "4876", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "364", "valid_best_loss": "7.298"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.2902339258903693, 'p': 0.35493902025894375, 'r': 0.27679295120544645}, 'rouge-2': {'f': 0.09023858321606967, 'p': 0.1113931624274892, 'r': 0.08757866133645495}, 'rouge-l': {'f': 0.2997845837858199, 'p': 0.401101296990871, 'r': 0.26031260412345514}}
2020-09-12 23:59:21 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 2 @ 364 updates, score 7.298) (writing took 7.891215533949435 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.28494613701125965, 'p': 0.3516514035592438, 'r': 0.2699670810086879}, 'rouge-2': {'f': 0.08420177692788793, 'p': 0.10510303003646479, 'r': 0.08140088225186674}, 'rouge-l': {'f': 0.292386051869777, 'p': 0.39432852537102037, 'r': 0.2527454550206724}}
2020-09-13 00:03:40 | INFO | train | {"epoch": 3, "train_loss": "7.642", "train_nll_loss": "6.298", "train_ppl": "78.696", "train_wps": "1114.8", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "546", "train_lr": "2.56776e-05", "train_gnorm": "9.301", "train_clip": "99.5", "train_oom": "0", "train_train_wall": "174", "train_wall": "892"}
2020-09-13 00:03:45 | INFO | valid | {"epoch": 3, "valid_loss": "6.83", "valid_nll_loss": "5.266", "valid_ppl": "38.479", "valid_wps": "4512.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "546", "valid_best_loss": "6.83"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.28004401636313203, 'p': 0.45120614382755203, 'r': 0.22356730714002326}, 'rouge-2': {'f': 0.1042067031811556, 'p': 0.17100306931561168, 'r': 0.0837493874679366}, 'rouge-l': {'f': 0.2834823452672735, 'p': 0.46510518092133646, 'r': 0.22059007016249965}}
2020-09-13 00:04:39 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 3 @ 546 updates, score 6.83) (writing took 8.794937412254512 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.26935009742145355, 'p': 0.43147623041129457, 'r': 0.21399299095643845}, 'rouge-2': {'f': 0.0928812558099079, 'p': 0.15352525158121336, 'r': 0.07361179442736973}, 'rouge-l': {'f': 0.2711337233458924, 'p': 0.44552105877903414, 'r': 0.20951603467624907}}
2020-09-13 00:08:36 | INFO | train | {"epoch": 4, "train_loss": "6.986", "train_nll_loss": "5.561", "train_ppl": "47.22", "train_wps": "1310", "train_ups": "0.62", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "728", "train_lr": "2.22375e-05", "train_gnorm": "7.492", "train_clip": "90.1", "train_oom": "0", "train_train_wall": "179", "train_wall": "1188"}
2020-09-13 00:08:41 | INFO | valid | {"epoch": 4, "valid_loss": "6.512", "valid_nll_loss": "4.873", "valid_ppl": "29.308", "valid_wps": "4090.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "728", "valid_best_loss": "6.512"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3058323314303917, 'p': 0.45101872632920387, 'r': 0.255721901517651}, 'rouge-2': {'f': 0.12803820225900034, 'p': 0.19195720632392738, 'r': 0.10762098110671683}, 'rouge-l': {'f': 0.3125195961416917, 'p': 0.4667913347552745, 'r': 0.25371581048821884}}
2020-09-13 00:09:39 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 4 @ 728 updates, score 6.512) (writing took 8.845161844976246 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3067963357485839, 'p': 0.4412779946655861, 'r': 0.2587007664111758}, 'rouge-2': {'f': 0.12092130534646459, 'p': 0.17638283156638504, 'r': 0.10283473953818749}, 'rouge-l': {'f': 0.31175281522793147, 'p': 0.4556750348102311, 'r': 0.2543185560104487}}
2020-09-13 00:13:33 | INFO | train | {"epoch": 5, "train_loss": "6.593", "train_nll_loss": "5.121", "train_ppl": "34.798", "train_wps": "1301.7", "train_ups": "0.61", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "910", "train_lr": "1.98898e-05", "train_gnorm": "6.119", "train_clip": "61", "train_oom": "0", "train_train_wall": "173", "train_wall": "1485"}
2020-09-13 00:13:38 | INFO | valid | {"epoch": 5, "valid_loss": "6.283", "valid_nll_loss": "4.649", "valid_ppl": "25.088", "valid_wps": "4915", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "910", "valid_best_loss": "6.283"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.32631664800109805, 'p': 0.4096853241359607, 'r': 0.30773293666069307}, 'rouge-2': {'f': 0.139532263771638, 'p': 0.17597922407733474, 'r': 0.13409987180914382}, 'rouge-l': {'f': 0.3375112445140239, 'p': 0.44056842796025036, 'r': 0.29934067025889066}}
2020-09-13 00:14:56 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 5 @ 910 updates, score 6.283) (writing took 8.922468357719481 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.32195108164976866, 'p': 0.3986055423428635, 'r': 0.3072986476227612}, 'rouge-2': {'f': 0.13255688382054837, 'p': 0.16593460499965418, 'r': 0.12889569667431885}, 'rouge-l': {'f': 0.3336251633595829, 'p': 0.4308272105408829, 'r': 0.2980551182792911}}
2020-09-13 00:19:12 | INFO | train | {"epoch": 6, "train_loss": "6.299", "train_nll_loss": "4.79", "train_ppl": "27.674", "train_wps": "1144.9", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1092", "train_lr": "1.81568e-05", "train_gnorm": "4.966", "train_clip": "25.8", "train_oom": "0", "train_train_wall": "173", "train_wall": "1824"}
2020-09-13 00:19:16 | INFO | valid | {"epoch": 6, "valid_loss": "6.073", "valid_nll_loss": "4.4", "valid_ppl": "21.112", "valid_wps": "5230.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1092", "valid_best_loss": "6.073"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3334118320946748, 'p': 0.4410821539215469, 'r': 0.29588426535787143}, 'rouge-2': {'f': 0.13739283912842795, 'p': 0.18316512984611386, 'r': 0.12292501717355879}, 'rouge-l': {'f': 0.3367563539841945, 'p': 0.45088752831270734, 'r': 0.2903563191983348}}
2020-09-13 00:20:22 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 6 @ 1092 updates, score 6.073) (writing took 8.304749015718699 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.32018946591540753, 'p': 0.43157002485898993, 'r': 0.2830485811595604}, 'rouge-2': {'f': 0.127293851871242, 'p': 0.17473306979593142, 'r': 0.11274861530450966}, 'rouge-l': {'f': 0.3211206923058116, 'p': 0.43612260347418846, 'r': 0.2759708589901207}}
2020-09-13 00:24:22 | INFO | train | {"epoch": 7, "train_loss": "6.062", "train_nll_loss": "4.522", "train_ppl": "22.981", "train_wps": "1246.6", "train_ups": "0.59", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1274", "train_lr": "1.681e-05", "train_gnorm": "3.981", "train_clip": "13.2", "train_oom": "0", "train_train_wall": "171", "train_wall": "2134"}
2020-09-13 00:24:27 | INFO | valid | {"epoch": 7, "valid_loss": "5.974", "valid_nll_loss": "4.283", "valid_ppl": "19.47", "valid_wps": "4940.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1274", "valid_best_loss": "5.974"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.34953445294994195, 'p': 0.45927096966662395, 'r': 0.3099192101544123}, 'rouge-2': {'f': 0.15042073792671246, 'p': 0.1998221241410976, 'r': 0.1338806674885114}, 'rouge-l': {'f': 0.3493160643305676, 'p': 0.4626922715315685, 'r': 0.3021469814220278}}
2020-09-13 00:25:40 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 7 @ 1274 updates, score 5.974) (writing took 9.386484532617033 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3454603555704317, 'p': 0.45500522729028614, 'r': 0.307872752358408}, 'rouge-2': {'f': 0.14921191873819553, 'p': 0.20035740171083352, 'r': 0.13298830103947873}, 'rouge-l': {'f': 0.35050286922913615, 'p': 0.46836836790711445, 'r': 0.30264623667919816}}
2020-09-13 00:29:36 | INFO | train | {"epoch": 8, "train_loss": "5.833", "train_nll_loss": "4.263", "train_ppl": "19.199", "train_wps": "1234.4", "train_ups": "0.58", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1456", "train_lr": "1.57243e-05", "train_gnorm": "3.139", "train_clip": "1.6", "train_oom": "0", "train_train_wall": "170", "train_wall": "2448"}
2020-09-13 00:29:40 | INFO | valid | {"epoch": 8, "valid_loss": "5.84", "valid_nll_loss": "4.149", "valid_ppl": "17.745", "valid_wps": "5219.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1456", "valid_best_loss": "5.84"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.37134382327991744, 'p': 0.4483338326050663, 'r': 0.3499853577404138}, 'rouge-2': {'f': 0.16603384362767615, 'p': 0.19866856937856062, 'r': 0.15915049114742058}, 'rouge-l': {'f': 0.37879774030543395, 'p': 0.4653144943165627, 'r': 0.34473968051528203}}
2020-09-13 00:30:53 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 8 @ 1456 updates, score 5.84) (writing took 9.01857946626842 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.36901160113448783, 'p': 0.44108292289748047, 'r': 0.3526763811870377}, 'rouge-2': {'f': 0.16666756440683705, 'p': 0.19984167677488363, 'r': 0.1618494645628281}, 'rouge-l': {'f': 0.3778187479842473, 'p': 0.4602486552120849, 'r': 0.3473739981188793}}
2020-09-13 00:35:03 | INFO | train | {"epoch": 9, "train_loss": "5.66", "train_nll_loss": "4.064", "train_ppl": "16.729", "train_wps": "1184", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1638", "train_lr": "1.4825e-05", "train_gnorm": "2.92", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "174", "train_wall": "2776"}
2020-09-13 00:35:08 | INFO | valid | {"epoch": 9, "valid_loss": "5.799", "valid_nll_loss": "4.105", "valid_ppl": "17.213", "valid_wps": "5105", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1638", "valid_best_loss": "5.799"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.36842622898234595, 'p': 0.44961516033632526, 'r': 0.34284042718663166}, 'rouge-2': {'f': 0.16566291407189612, 'p': 0.20170401529199547, 'r': 0.15595426964593812}, 'rouge-l': {'f': 0.37329684998556206, 'p': 0.46107006212992346, 'r': 0.3381812931818861}}
2020-09-13 00:36:22 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 9 @ 1638 updates, score 5.799) (writing took 8.725783603265882 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.36854854367280065, 'p': 0.446462179037044, 'r': 0.3453575409922443}, 'rouge-2': {'f': 0.15766752097723266, 'p': 0.19205310732297123, 'r': 0.14911261178538193}, 'rouge-l': {'f': 0.3723420714636821, 'p': 0.45483385435551693, 'r': 0.33894738585564493}}
2020-09-13 00:40:36 | INFO | train | {"epoch": 10, "train_loss": "5.499", "train_nll_loss": "3.88", "train_ppl": "14.719", "train_wps": "1164.2", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "1820", "train_lr": "1.40642e-05", "train_gnorm": "2.644", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "171", "train_wall": "3108"}
2020-09-13 00:40:41 | INFO | valid | {"epoch": 10, "valid_loss": "5.764", "valid_nll_loss": "4.061", "valid_ppl": "16.685", "valid_wps": "4140", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "1820", "valid_best_loss": "5.764"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3732009581077997, 'p': 0.45658435160220856, 'r': 0.34843650153963096}, 'rouge-2': {'f': 0.16955068859082412, 'p': 0.20811159765922504, 'r': 0.16092543242290838}, 'rouge-l': {'f': 0.37735274143117503, 'p': 0.46684437321975286, 'r': 0.34234094727570397}}
2020-09-13 00:41:52 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 10 @ 1820 updates, score 5.764) (writing took 9.631018000654876 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3716137675352023, 'p': 0.45290106682582165, 'r': 0.350055845189764}, 'rouge-2': {'f': 0.16761863723149356, 'p': 0.2054667073198634, 'r': 0.159476663711512}, 'rouge-l': {'f': 0.3766825457695764, 'p': 0.46341365477026236, 'r': 0.3429991475782948}}
2020-09-13 00:45:59 | INFO | train | {"epoch": 11, "train_loss": "5.389", "train_nll_loss": "3.752", "train_ppl": "13.477", "train_wps": "1200", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2002", "train_lr": "1.34097e-05", "train_gnorm": "2.583", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "3431"}
2020-09-13 00:46:02 | INFO | valid | {"epoch": 11, "valid_loss": "5.732", "valid_nll_loss": "4.006", "valid_ppl": "16.062", "valid_wps": "5937", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2002", "valid_best_loss": "5.732"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3712171758030596, 'p': 0.46406767790531694, 'r': 0.3399832503572037}, 'rouge-2': {'f': 0.17340701068522252, 'p': 0.218702325075554, 'r': 0.15972205472044512}, 'rouge-l': {'f': 0.37458693081114397, 'p': 0.47097623661393523, 'r': 0.3349146889430698}}
2020-09-13 00:47:13 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 11 @ 2002 updates, score 5.732) (writing took 8.599667175672948 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3759528658322778, 'p': 0.4600031515568631, 'r': 0.34994197804675164}, 'rouge-2': {'f': 0.16784789754607138, 'p': 0.2078567819662484, 'r': 0.15733512603695773}, 'rouge-l': {'f': 0.37885456493676783, 'p': 0.4665253560189782, 'r': 0.34335369752238043}}
2020-09-13 00:51:19 | INFO | train | {"epoch": 12, "train_loss": "5.266", "train_nll_loss": "3.61", "train_ppl": "12.21", "train_wps": "1211.6", "train_ups": "0.57", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2184", "train_lr": "1.28388e-05", "train_gnorm": "2.559", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "3751"}
2020-09-13 00:51:22 | INFO | valid | {"epoch": 12, "valid_loss": "5.695", "valid_nll_loss": "3.958", "valid_ppl": "15.541", "valid_wps": "5973.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2184", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.37738179617980694, 'p': 0.45113007243822967, 'r': 0.35812026539253183}, 'rouge-2': {'f': 0.174381935944973, 'p': 0.20928232150430362, 'r': 0.1667457526834203}, 'rouge-l': {'f': 0.381769707403124, 'p': 0.4607884172272312, 'r': 0.3523183895063523}}
2020-09-13 00:52:37 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_best.pt (epoch 12 @ 2184 updates, score 5.695) (writing took 8.865372630767524 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37599974934852276, 'p': 0.44884508733258244, 'r': 0.3585387811436595}, 'rouge-2': {'f': 0.16874446443891233, 'p': 0.20483613250924657, 'r': 0.1618007263923004}, 'rouge-l': {'f': 0.3813476499552326, 'p': 0.4596740958090682, 'r': 0.35183457630713216}}
2020-09-13 00:56:48 | INFO | train | {"epoch": 13, "train_loss": "5.158", "train_nll_loss": "3.483", "train_ppl": "11.182", "train_wps": "1176.3", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2366", "train_lr": "1.23351e-05", "train_gnorm": "2.471", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "4080"}
2020-09-13 00:56:53 | INFO | valid | {"epoch": 13, "valid_loss": "5.712", "valid_nll_loss": "3.969", "valid_ppl": "15.661", "valid_wps": "4424.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2366", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3884982705551785, 'p': 0.4491673629778516, 'r': 0.37780412740852076}, 'rouge-2': {'f': 0.17685692236323902, 'p': 0.2042786532892362, 'r': 0.17430026356282213}, 'rouge-l': {'f': 0.3935735209373062, 'p': 0.4617984008105712, 'r': 0.37031878737194673}}
2020-09-13 00:58:10 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 13 @ 2366 updates, score 5.712) (writing took 4.134681691415608 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37886357099711665, 'p': 0.43458029252143165, 'r': 0.3729596920501715}, 'rouge-2': {'f': 0.16854719468745627, 'p': 0.19559694299078864, 'r': 0.1669496828829494}, 'rouge-l': {'f': 0.38364553730125733, 'p': 0.44831425303180794, 'r': 0.3636603790462315}}
2020-09-13 01:02:30 | INFO | train | {"epoch": 14, "train_loss": "5.054", "train_nll_loss": "3.361", "train_ppl": "10.275", "train_wps": "1131.2", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2548", "train_lr": "1.18864e-05", "train_gnorm": "2.442", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "4423"}
2020-09-13 01:02:35 | INFO | valid | {"epoch": 14, "valid_loss": "5.707", "valid_nll_loss": "3.958", "valid_ppl": "15.544", "valid_wps": "4832.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2548", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3998796789622521, 'p': 0.44386233353207316, 'r': 0.402440874022752}, 'rouge-2': {'f': 0.18989748323963274, 'p': 0.20863603806938805, 'r': 0.1955903717688269}, 'rouge-l': {'f': 0.4074896840434411, 'p': 0.4639988402234145, 'r': 0.3924353598143554}}
2020-09-13 01:03:55 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 14 @ 2548 updates, score 5.707) (writing took 4.522745947353542 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.389306128602423, 'p': 0.43236154677359795, 'r': 0.3928469004687963}, 'rouge-2': {'f': 0.1749124813810089, 'p': 0.19233446115494, 'r': 0.18101805932331297}, 'rouge-l': {'f': 0.3974381446141553, 'p': 0.45394463380960515, 'r': 0.3816860716155822}}
2020-09-13 01:08:15 | INFO | train | {"epoch": 15, "train_loss": "4.955", "train_nll_loss": "3.246", "train_ppl": "9.486", "train_wps": "1124.8", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2730", "train_lr": "1.14834e-05", "train_gnorm": "2.476", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "4767"}
2020-09-13 01:08:19 | INFO | valid | {"epoch": 15, "valid_loss": "5.72", "valid_nll_loss": "3.969", "valid_ppl": "15.664", "valid_wps": "5135.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2730", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.385496212805097, 'p': 0.45344757978858063, 'r': 0.37208548238317946}, 'rouge-2': {'f': 0.17987898260277432, 'p': 0.2117429818111738, 'r': 0.17606453929508845}, 'rouge-l': {'f': 0.3877792425552155, 'p': 0.45854887620658535, 'r': 0.3646117908261955}}
2020-09-13 01:09:32 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 15 @ 2730 updates, score 5.72) (writing took 4.130691207945347 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37790240747430026, 'p': 0.43971254753318373, 'r': 0.367746760947724}, 'rouge-2': {'f': 0.16425074216297936, 'p': 0.19264495690629893, 'r': 0.16212163303873436}, 'rouge-l': {'f': 0.38059486158567596, 'p': 0.4492443951232903, 'r': 0.35745930848360596}}
2020-09-13 01:13:41 | INFO | train | {"epoch": 16, "train_loss": "4.867", "train_nll_loss": "3.141", "train_ppl": "8.82", "train_wps": "1187.1", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "2912", "train_lr": "1.11187e-05", "train_gnorm": "2.448", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "5093"}
2020-09-13 01:13:45 | INFO | valid | {"epoch": 16, "valid_loss": "5.701", "valid_nll_loss": "3.929", "valid_ppl": "15.235", "valid_wps": "6218.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "2912", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38106680686091043, 'p': 0.4805848063501437, 'r': 0.3447942668199565}, 'rouge-2': {'f': 0.175794098245, 'p': 0.22236168866719144, 'r': 0.16057201572100432}, 'rouge-l': {'f': 0.3806436944039715, 'p': 0.4789247600612308, 'r': 0.3390977899415061}}
2020-09-13 01:14:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 16 @ 2912 updates, score 5.701) (writing took 4.02212131395936 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3718838641380148, 'p': 0.4666792110932706, 'r': 0.3399888800206822}, 'rouge-2': {'f': 0.1671697426502001, 'p': 0.21140715464498486, 'r': 0.1542190531743976}, 'rouge-l': {'f': 0.3728099296900896, 'p': 0.46619239167445015, 'r': 0.3346634641171861}}
2020-09-13 01:18:54 | INFO | train | {"epoch": 17, "train_loss": "4.782", "train_nll_loss": "3.04", "train_ppl": "8.228", "train_wps": "1238.2", "train_ups": "0.58", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3094", "train_lr": "1.07868e-05", "train_gnorm": "2.405", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "5406"}
2020-09-13 01:18:58 | INFO | valid | {"epoch": 17, "valid_loss": "5.733", "valid_nll_loss": "3.974", "valid_ppl": "15.715", "valid_wps": "5150", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3094", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.391276670828434, 'p': 0.43821071505750875, 'r': 0.3908326539960555}, 'rouge-2': {'f': 0.17966333440452606, 'p': 0.2006399715420435, 'r': 0.18217533769836514}, 'rouge-l': {'f': 0.3948289271214795, 'p': 0.4524492153099438, 'r': 0.3798363085924027}}
2020-09-13 01:20:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 17 @ 3094 updates, score 5.733) (writing took 4.8997919876128435 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3845810681201655, 'p': 0.4282195186242171, 'r': 0.3862897923232961}, 'rouge-2': {'f': 0.17073755266846638, 'p': 0.1894686524531282, 'r': 0.17556790042355605}, 'rouge-l': {'f': 0.391043491686661, 'p': 0.4431799101178343, 'r': 0.3779568118927497}}
2020-09-13 01:24:28 | INFO | train | {"epoch": 18, "train_loss": "4.697", "train_nll_loss": "2.94", "train_ppl": "7.672", "train_wps": "1159.4", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3276", "train_lr": "1.04828e-05", "train_gnorm": "2.467", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "5740"}
2020-09-13 01:24:32 | INFO | valid | {"epoch": 18, "valid_loss": "5.724", "valid_nll_loss": "3.944", "valid_ppl": "15.396", "valid_wps": "5295.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3276", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.37861704558770404, 'p': 0.4419875194307351, 'r': 0.3667373863426889}, 'rouge-2': {'f': 0.1726066525356969, 'p': 0.20114080833632547, 'r': 0.1692294466791264}, 'rouge-l': {'f': 0.3817555654265253, 'p': 0.4502364007470248, 'r': 0.3588787943829516}}
2020-09-13 01:25:43 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 18 @ 3276 updates, score 5.724) (writing took 5.381191832944751 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3723413859817262, 'p': 0.4282642263502103, 'r': 0.36388886661332226}, 'rouge-2': {'f': 0.15908020930468886, 'p': 0.18290676480735674, 'r': 0.15773721634104743}, 'rouge-l': {'f': 0.3772182373129486, 'p': 0.4386399747551869, 'r': 0.35658496212946833}}
2020-09-13 01:29:54 | INFO | train | {"epoch": 19, "train_loss": "4.614", "train_nll_loss": "2.841", "train_ppl": "7.163", "train_wps": "1189.8", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3458", "train_lr": "1.02033e-05", "train_gnorm": "2.388", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "6066"}
2020-09-13 01:29:59 | INFO | valid | {"epoch": 19, "valid_loss": "5.742", "valid_nll_loss": "3.966", "valid_ppl": "15.631", "valid_wps": "3865.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3458", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38044592866113847, 'p': 0.46748203688066003, 'r': 0.355039752203741}, 'rouge-2': {'f': 0.17804838492656566, 'p': 0.21772699643126558, 'r': 0.16901485873684227}, 'rouge-l': {'f': 0.3813148465803212, 'p': 0.46892094755586644, 'r': 0.3488118629587613}}
2020-09-13 01:31:08 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 19 @ 3458 updates, score 5.742) (writing took 4.588620846159756 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3701339419496253, 'p': 0.45463960006384346, 'r': 0.3417336798726597}, 'rouge-2': {'f': 0.16203221978539503, 'p': 0.20101164160722118, 'r': 0.15035608008610543}, 'rouge-l': {'f': 0.3689392742120454, 'p': 0.4531052832313649, 'r': 0.3346368169251904}}
2020-09-13 01:35:14 | INFO | train | {"epoch": 20, "train_loss": "4.54", "train_nll_loss": "2.754", "train_ppl": "6.744", "train_wps": "1207.7", "train_ups": "0.57", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3640", "train_lr": "9.9449e-06", "train_gnorm": "2.43", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "6387"}
2020-09-13 01:35:19 | INFO | valid | {"epoch": 20, "valid_loss": "5.765", "valid_nll_loss": "3.981", "valid_ppl": "15.794", "valid_wps": "5212", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3640", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38917925586628854, 'p': 0.4545888729297281, 'r': 0.37521025150304493}, 'rouge-2': {'f': 0.18128938136060216, 'p': 0.2108173460566329, 'r': 0.1779577175289945}, 'rouge-l': {'f': 0.3912123475237053, 'p': 0.4626659366951105, 'r': 0.36707352170149965}}
2020-09-13 01:36:31 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 20 @ 3640 updates, score 5.765) (writing took 4.491002192720771 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37944542213327404, 'p': 0.4419097225174281, 'r': 0.36700773898625333}, 'rouge-2': {'f': 0.16679895064998046, 'p': 0.19638910895842057, 'r': 0.16331358354493733}, 'rouge-l': {'f': 0.3822408225204509, 'p': 0.4509045323855888, 'r': 0.3583453351231742}}
2020-09-13 01:40:40 | INFO | train | {"epoch": 21, "train_loss": "4.459", "train_nll_loss": "2.656", "train_ppl": "6.305", "train_wps": "1190.5", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "3822", "train_lr": "9.70523e-06", "train_gnorm": "2.442", "train_clip": "0", "train_oom": "0", "train_train_wall": "168", "train_wall": "6712"}
2020-09-13 01:40:44 | INFO | valid | {"epoch": 21, "valid_loss": "5.801", "valid_nll_loss": "4.013", "valid_ppl": "16.145", "valid_wps": "5059.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "3822", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3951976657153381, 'p': 0.4313811856628662, 'r': 0.40212658363768433}, 'rouge-2': {'f': 0.18158599410484844, 'p': 0.19714233842488113, 'r': 0.1880275069846484}, 'rouge-l': {'f': 0.400753788566494, 'p': 0.4453694229577254, 'r': 0.39373391050291107}}
2020-09-13 01:42:04 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 21 @ 3822 updates, score 5.801) (writing took 4.3676733430475 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37992031392670983, 'p': 0.4103137483489205, 'r': 0.39027553986147556}, 'rouge-2': {'f': 0.16588326288147193, 'p': 0.1786441782759031, 'r': 0.17327696142366736}, 'rouge-l': {'f': 0.3866994768252795, 'p': 0.4268482135450003, 'r': 0.38055574575484663}}
2020-09-13 01:46:22 | INFO | train | {"epoch": 22, "train_loss": "4.388", "train_nll_loss": "2.573", "train_ppl": "5.95", "train_wps": "1132.6", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4004", "train_lr": "9.48209e-06", "train_gnorm": "2.438", "train_clip": "0", "train_oom": "0", "train_train_wall": "169", "train_wall": "7054"}
2020-09-13 01:46:27 | INFO | valid | {"epoch": 22, "valid_loss": "5.806", "valid_nll_loss": "4.013", "valid_ppl": "16.15", "valid_wps": "4606.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4004", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38750722254797787, 'p': 0.4277637839583848, 'r': 0.3917518647792673}, 'rouge-2': {'f': 0.17892783275537408, 'p': 0.1954041191558916, 'r': 0.18563060651094296}, 'rouge-l': {'f': 0.3901464960798033, 'p': 0.4382336878278232, 'r': 0.38022299917227464}}
2020-09-13 01:47:44 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 22 @ 4004 updates, score 5.806) (writing took 4.484134818427265 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37365247137031243, 'p': 0.4107872063220731, 'r': 0.37916166737146184}, 'rouge-2': {'f': 0.1645194217180295, 'p': 0.18036856200061924, 'r': 0.17038522739655573}, 'rouge-l': {'f': 0.3788353883034097, 'p': 0.42472162650174466, 'r': 0.36942716034722173}}
2020-09-13 01:51:58 | INFO | train | {"epoch": 23, "train_loss": "4.321", "train_nll_loss": "2.493", "train_ppl": "5.628", "train_wps": "1151.3", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4186", "train_lr": "9.27367e-06", "train_gnorm": "2.455", "train_clip": "0.5", "train_oom": "0", "train_train_wall": "171", "train_wall": "7391"}
2020-09-13 01:52:03 | INFO | valid | {"epoch": 23, "valid_loss": "5.827", "valid_nll_loss": "4.033", "valid_ppl": "16.372", "valid_wps": "4875", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4186", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38822926461557655, 'p': 0.45348295623984464, 'r': 0.37020477945877106}, 'rouge-2': {'f': 0.1777558814272791, 'p': 0.20734387979453298, 'r': 0.17123390284814333}, 'rouge-l': {'f': 0.3870378946705613, 'p': 0.453376818512756, 'r': 0.3616563152380114}}
2020-09-13 01:53:12 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 23 @ 4186 updates, score 5.827) (writing took 4.091774018481374 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3752148029208675, 'p': 0.43863114918235374, 'r': 0.358486619417161}, 'rouge-2': {'f': 0.16180559507864367, 'p': 0.19086761931101376, 'r': 0.1558113872763702}, 'rouge-l': {'f': 0.3749091393825165, 'p': 0.44034156816542647, 'r': 0.3498163907525991}}
2020-09-13 01:57:21 | INFO | train | {"epoch": 24, "train_loss": "4.255", "train_nll_loss": "2.414", "train_ppl": "5.331", "train_wps": "1200.3", "train_ups": "0.56", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4368", "train_lr": "9.07841e-06", "train_gnorm": "2.447", "train_clip": "0", "train_oom": "0", "train_train_wall": "170", "train_wall": "7713"}
2020-09-13 01:57:25 | INFO | valid | {"epoch": 24, "valid_loss": "5.827", "valid_nll_loss": "4.02", "valid_ppl": "16.219", "valid_wps": "5117.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4368", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38534117547261393, 'p': 0.445273298289193, 'r': 0.3742189594129202}, 'rouge-2': {'f': 0.17705063825525105, 'p': 0.20385055188358236, 'r': 0.17461070647467383}, 'rouge-l': {'f': 0.38760590954771934, 'p': 0.45168003704103943, 'r': 0.36699590315532576}}
2020-09-13 01:58:41 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 24 @ 4368 updates, score 5.827) (writing took 4.580549734644592 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37376658505892546, 'p': 0.42513689289838574, 'r': 0.36824430078261083}, 'rouge-2': {'f': 0.16449381722326553, 'p': 0.18792195589835636, 'r': 0.16472545042139783}, 'rouge-l': {'f': 0.37640842528791896, 'p': 0.4302305671551292, 'r': 0.36077114610382716}}
2020-09-13 02:02:58 | INFO | train | {"epoch": 25, "train_loss": "4.189", "train_nll_loss": "2.336", "train_ppl": "5.049", "train_wps": "1148.2", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4550", "train_lr": "8.89499e-06", "train_gnorm": "2.448", "train_clip": "0", "train_oom": "0", "train_train_wall": "173", "train_wall": "8051"}
2020-09-13 02:03:03 | INFO | valid | {"epoch": 25, "valid_loss": "5.868", "valid_nll_loss": "4.082", "valid_ppl": "16.938", "valid_wps": "5108.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4550", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3871704147563257, 'p': 0.42776014371986826, 'r': 0.39233499649709835}, 'rouge-2': {'f': 0.1735141403257615, 'p': 0.18983819963280862, 'r': 0.1801092131406639}, 'rouge-l': {'f': 0.38881395036855076, 'p': 0.4373233829107774, 'r': 0.37913972883849767}}
2020-09-13 02:04:26 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 25 @ 4550 updates, score 5.868) (writing took 4.335143620148301 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3844215514195955, 'p': 0.42117882229965664, 'r': 0.38836392231731154}, 'rouge-2': {'f': 0.16712838914507203, 'p': 0.184210597325008, 'r': 0.16991914128454474}, 'rouge-l': {'f': 0.38612055818280827, 'p': 0.43107652154520104, 'r': 0.37593315070715955}}
2020-09-13 02:08:44 | INFO | train | {"epoch": 26, "train_loss": "4.123", "train_nll_loss": "2.257", "train_ppl": "4.78", "train_wps": "1122.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4732", "train_lr": "8.72226e-06", "train_gnorm": "2.424", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "8396"}
2020-09-13 02:08:48 | INFO | valid | {"epoch": 26, "valid_loss": "5.882", "valid_nll_loss": "4.086", "valid_ppl": "16.987", "valid_wps": "5032.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4732", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.39059463228636443, 'p': 0.44845589456484236, 'r': 0.37968675708755983}, 'rouge-2': {'f': 0.17511335211715656, 'p': 0.19906234650600044, 'r': 0.1739893469380071}, 'rouge-l': {'f': 0.3879455405786022, 'p': 0.4496592808236638, 'r': 0.3675400057841032}}
2020-09-13 02:10:03 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 26 @ 4732 updates, score 5.882) (writing took 4.394494019448757 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37693971962065465, 'p': 0.43127100071839813, 'r': 0.37002592342998164}, 'rouge-2': {'f': 0.1637296222980477, 'p': 0.187921960463544, 'r': 0.16291519359218756}, 'rouge-l': {'f': 0.37658267706833715, 'p': 0.43598237022943104, 'r': 0.3580529888703775}}
2020-09-13 02:14:15 | INFO | train | {"epoch": 27, "train_loss": "4.067", "train_nll_loss": "2.192", "train_ppl": "4.569", "train_wps": "1170.3", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "4914", "train_lr": "8.55921e-06", "train_gnorm": "2.464", "train_clip": "0", "train_oom": "0", "train_train_wall": "171", "train_wall": "8727"}
2020-09-13 02:14:19 | INFO | valid | {"epoch": 27, "valid_loss": "5.915", "valid_nll_loss": "4.113", "valid_ppl": "17.299", "valid_wps": "5272.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "4914", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38434759891377074, 'p': 0.4249833165393893, 'r': 0.38662224677719503}, 'rouge-2': {'f': 0.17192755651144725, 'p': 0.19044749867481733, 'r': 0.1749212887993873}, 'rouge-l': {'f': 0.38774917991430324, 'p': 0.4365569133041221, 'r': 0.37632971013264493}}
2020-09-13 02:15:35 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 27 @ 4914 updates, score 5.915) (writing took 4.622201276943088 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37903839501209047, 'p': 0.4130527827785208, 'r': 0.38674322772887626}, 'rouge-2': {'f': 0.16408303228928034, 'p': 0.17857210979076796, 'r': 0.169772298030173}, 'rouge-l': {'f': 0.3828725069419752, 'p': 0.42540750664994087, 'r': 0.3750088333333603}}
2020-09-13 02:19:56 | INFO | train | {"epoch": 28, "train_loss": "4.005", "train_nll_loss": "2.117", "train_ppl": "4.339", "train_wps": "1133.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5096", "train_lr": "8.40498e-06", "train_gnorm": "2.417", "train_clip": "0", "train_oom": "0", "train_train_wall": "175", "train_wall": "9069"}
2020-09-13 02:20:01 | INFO | valid | {"epoch": 28, "valid_loss": "5.924", "valid_nll_loss": "4.113", "valid_ppl": "17.301", "valid_wps": "5072.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5096", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3789412021753395, 'p': 0.44009502943218975, 'r': 0.3673566576087134}, 'rouge-2': {'f': 0.172794589968652, 'p': 0.19937717288075993, 'r': 0.17077481830782684}, 'rouge-l': {'f': 0.37829677515222787, 'p': 0.44274850813900934, 'r': 0.3577681743468191}}
2020-09-13 02:21:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 28 @ 5096 updates, score 5.924) (writing took 4.540925501845777 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.36527019423752055, 'p': 0.4208007754021258, 'r': 0.35525130146740025}, 'rouge-2': {'f': 0.1572877265691285, 'p': 0.1817244103207206, 'r': 0.15514638974429046}, 'rouge-l': {'f': 0.36299730631095184, 'p': 0.42155101744468565, 'r': 0.34371323149546557}}
2020-09-13 02:25:31 | INFO | train | {"epoch": 29, "train_loss": "3.946", "train_nll_loss": "2.047", "train_ppl": "4.134", "train_wps": "1157.1", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5278", "train_lr": "8.25879e-06", "train_gnorm": "2.41", "train_clip": "0", "train_oom": "0", "train_train_wall": "176", "train_wall": "9403"}
2020-09-13 02:25:35 | INFO | valid | {"epoch": 29, "valid_loss": "5.948", "valid_nll_loss": "4.141", "valid_ppl": "17.646", "valid_wps": "5013.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5278", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.39392286385154907, 'p': 0.44045543067367227, 'r': 0.39267143683548494}, 'rouge-2': {'f': 0.17773576677957223, 'p': 0.1967579777648653, 'r': 0.18110397051020538}, 'rouge-l': {'f': 0.3941220431573893, 'p': 0.44562952698546576, 'r': 0.3812440137558409}}
2020-09-13 02:26:56 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 29 @ 5278 updates, score 5.948) (writing took 4.303143389523029 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.38080278641895143, 'p': 0.4233969204218693, 'r': 0.3843529679579355}, 'rouge-2': {'f': 0.1626977341974013, 'p': 0.18099048848906363, 'r': 0.16602780449388885}, 'rouge-l': {'f': 0.38036795751367164, 'p': 0.43089260804423674, 'r': 0.369552310419994}}
2020-09-13 02:31:11 | INFO | train | {"epoch": 30, "train_loss": "3.895", "train_nll_loss": "1.988", "train_ppl": "3.966", "train_wps": "1139.3", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5460", "train_lr": "8.11998e-06", "train_gnorm": "2.409", "train_clip": "0", "train_oom": "0", "train_train_wall": "170", "train_wall": "9743"}
2020-09-13 02:31:16 | INFO | valid | {"epoch": 30, "valid_loss": "5.981", "valid_nll_loss": "4.171", "valid_ppl": "18.011", "valid_wps": "4936.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5460", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38020709374000805, 'p': 0.425796857846852, 'r': 0.37872173949599347}, 'rouge-2': {'f': 0.16795702651937106, 'p': 0.18762125339060082, 'r': 0.17083357743421537}, 'rouge-l': {'f': 0.38090418992038316, 'p': 0.4310409715465135, 'r': 0.36867556519723077}}
2020-09-13 02:32:29 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 30 @ 5460 updates, score 5.981) (writing took 4.3459603087976575 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37965358788271986, 'p': 0.4206577074238292, 'r': 0.380269873971165}, 'rouge-2': {'f': 0.16305562269169338, 'p': 0.18144303909345297, 'r': 0.16565712858345996}, 'rouge-l': {'f': 0.3809386293041335, 'p': 0.4267185372492751, 'r': 0.3700918835000441}}
2020-09-13 02:36:41 | INFO | train | {"epoch": 31, "train_loss": "3.83", "train_nll_loss": "1.911", "train_ppl": "3.76", "train_wps": "1174.4", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5642", "train_lr": "7.98794e-06", "train_gnorm": "2.399", "train_clip": "0", "train_oom": "0", "train_train_wall": "169", "train_wall": "10073"}
2020-09-13 02:36:47 | INFO | valid | {"epoch": 31, "valid_loss": "5.991", "valid_nll_loss": "4.188", "valid_ppl": "18.225", "valid_wps": "3896.5", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5642", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38086083577746893, 'p': 0.43856192527978594, 'r': 0.36852514930253355}, 'rouge-2': {'f': 0.1720548694357604, 'p': 0.19719350666715524, 'r': 0.1688531110170121}, 'rouge-l': {'f': 0.37940756909649354, 'p': 0.43991824611795655, 'r': 0.3593013025166184}}
2020-09-13 02:38:05 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 31 @ 5642 updates, score 5.991) (writing took 4.520576075650752 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.370679717039701, 'p': 0.4252353421116953, 'r': 0.362809943313995}, 'rouge-2': {'f': 0.15675368799422942, 'p': 0.18101486674427333, 'r': 0.15407280461847872}, 'rouge-l': {'f': 0.3695017426852557, 'p': 0.4269730776434452, 'r': 0.3521532197453053}}
2020-09-13 02:42:13 | INFO | train | {"epoch": 32, "train_loss": "3.782", "train_nll_loss": "1.855", "train_ppl": "3.617", "train_wps": "1166.1", "train_ups": "0.55", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "5824", "train_lr": "7.86214e-06", "train_gnorm": "2.394", "train_clip": "0", "train_oom": "0", "train_train_wall": "166", "train_wall": "10405"}
2020-09-13 02:42:17 | INFO | valid | {"epoch": 32, "valid_loss": "6", "valid_nll_loss": "4.194", "valid_ppl": "18.306", "valid_wps": "5229.5", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "5824", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38137652179916004, 'p': 0.43313790455810236, 'r': 0.3772229819910243}, 'rouge-2': {'f': 0.16691298025795592, 'p': 0.18836874967588021, 'r': 0.16863772276161515}, 'rouge-l': {'f': 0.38013909271578145, 'p': 0.4366292868501368, 'r': 0.3653993312593402}}
2020-09-13 02:43:34 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 32 @ 5824 updates, score 6.0) (writing took 4.351189863868058 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37981067693153064, 'p': 0.42971224310521866, 'r': 0.37687873722717913}, 'rouge-2': {'f': 0.15992774092760015, 'p': 0.18342171362162557, 'r': 0.15837989344554051}, 'rouge-l': {'f': 0.3799422632878725, 'p': 0.43464593193037043, 'r': 0.36586217115314146}}
2020-09-13 02:47:51 | INFO | train | {"epoch": 33, "train_loss": "3.726", "train_nll_loss": "1.789", "train_ppl": "3.455", "train_wps": "1147.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6006", "train_lr": "7.7421e-06", "train_gnorm": "2.331", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "10743"}
2020-09-13 02:47:55 | INFO | valid | {"epoch": 33, "valid_loss": "6.02", "valid_nll_loss": "4.212", "valid_ppl": "18.537", "valid_wps": "5786.6", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6006", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.37915361599396163, 'p': 0.4177151715958892, 'r': 0.38225942974637367}, 'rouge-2': {'f': 0.16823645287535424, 'p': 0.18276700397376724, 'r': 0.17357744357848057}, 'rouge-l': {'f': 0.38136667391475326, 'p': 0.4250451610297481, 'r': 0.3732515439655231}}
2020-09-13 02:49:11 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 33 @ 6006 updates, score 6.02) (writing took 4.523772260174155 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3718742933369427, 'p': 0.4099806920967013, 'r': 0.3767762931178083}, 'rouge-2': {'f': 0.15370868260410603, 'p': 0.17047928337500037, 'r': 0.15684602569324277}, 'rouge-l': {'f': 0.3688537640181489, 'p': 0.41053246892705925, 'r': 0.3627296345795084}}
2020-09-13 02:53:29 | INFO | train | {"epoch": 34, "train_loss": "3.669", "train_nll_loss": "1.723", "train_ppl": "3.3", "train_wps": "1146.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6188", "train_lr": "7.62739e-06", "train_gnorm": "2.323", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "11081"}
2020-09-13 02:53:33 | INFO | valid | {"epoch": 34, "valid_loss": "6.066", "valid_nll_loss": "4.269", "valid_ppl": "19.278", "valid_wps": "5174.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6188", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.387153766867382, 'p': 0.42792687135337976, 'r': 0.3892787185709572}, 'rouge-2': {'f': 0.17028365573313206, 'p': 0.1865320310783622, 'r': 0.17456691285290213}, 'rouge-l': {'f': 0.3850822596116743, 'p': 0.43329137654837147, 'r': 0.3746488459567868}}
2020-09-13 02:54:49 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 34 @ 6188 updates, score 6.066) (writing took 4.374015459790826 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37794691927940804, 'p': 0.4146940259079912, 'r': 0.3847413121249158}, 'rouge-2': {'f': 0.1627020618517039, 'p': 0.17883773270864975, 'r': 0.1677699635770461}, 'rouge-l': {'f': 0.38046893390803, 'p': 0.4244051828628842, 'r': 0.37318182299953945}}
2020-09-13 02:59:05 | INFO | train | {"epoch": 35, "train_loss": "3.621", "train_nll_loss": "1.667", "train_ppl": "3.175", "train_wps": "1152.5", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6370", "train_lr": "7.51764e-06", "train_gnorm": "2.332", "train_clip": "0", "train_oom": "0", "train_train_wall": "172", "train_wall": "11417"}
2020-09-13 02:59:10 | INFO | valid | {"epoch": 35, "valid_loss": "6.081", "valid_nll_loss": "4.28", "valid_ppl": "19.431", "valid_wps": "4026.8", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6370", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3863770618665606, 'p': 0.41960423765025195, 'r': 0.39353742674033426}, 'rouge-2': {'f': 0.16849963347107963, 'p': 0.18116140757543353, 'r': 0.1752595358690957}, 'rouge-l': {'f': 0.3868700708130419, 'p': 0.4284176759227474, 'r': 0.38066459517190543}}
2020-09-13 03:00:35 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 35 @ 6370 updates, score 6.081) (writing took 4.590174483135343 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3785464402442006, 'p': 0.4101153566021953, 'r': 0.3892325582359251}, 'rouge-2': {'f': 0.1635688031477682, 'p': 0.17727291271497997, 'r': 0.17079954619481266}, 'rouge-l': {'f': 0.38081945358634506, 'p': 0.4199370914132296, 'r': 0.37711290928707314}}
2020-09-13 03:04:57 | INFO | train | {"epoch": 36, "train_loss": "3.576", "train_nll_loss": "1.613", "train_ppl": "3.059", "train_wps": "1101.4", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6552", "train_lr": "7.41249e-06", "train_gnorm": "2.297", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "11769"}
2020-09-13 03:05:01 | INFO | valid | {"epoch": 36, "valid_loss": "6.083", "valid_nll_loss": "4.281", "valid_ppl": "19.438", "valid_wps": "4978.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6552", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38480255056640067, 'p': 0.41896340222175316, 'r': 0.3925201975546814}, 'rouge-2': {'f': 0.16801251321048902, 'p': 0.18062486712904227, 'r': 0.17507848305413856}, 'rouge-l': {'f': 0.382260686554629, 'p': 0.42213632057966827, 'r': 0.37822787809887537}}
2020-09-13 03:06:23 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 36 @ 6552 updates, score 6.083) (writing took 4.2699774550274014 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37602470396509274, 'p': 0.411255046807531, 'r': 0.38215708009749216}, 'rouge-2': {'f': 0.1597641638757478, 'p': 0.17617491833186072, 'r': 0.16325763230195292}, 'rouge-l': {'f': 0.37238202458966757, 'p': 0.41385589088931496, 'r': 0.36541661129126923}}
2020-09-13 03:10:43 | INFO | train | {"epoch": 37, "train_loss": "3.52", "train_nll_loss": "1.549", "train_ppl": "2.925", "train_wps": "1118.5", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6734", "train_lr": "7.31164e-06", "train_gnorm": "2.273", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "12115"}
2020-09-13 03:10:46 | INFO | valid | {"epoch": 37, "valid_loss": "6.094", "valid_nll_loss": "4.296", "valid_ppl": "19.645", "valid_wps": "6031.3", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6734", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38520799197303024, 'p': 0.4115952466989357, 'r': 0.401465322621547}, 'rouge-2': {'f': 0.16788612832087196, 'p': 0.17883063755349102, 'r': 0.1782279060988303}, 'rouge-l': {'f': 0.3861095337034907, 'p': 0.4201549510855608, 'r': 0.3877350945267522}}
2020-09-13 03:12:15 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 37 @ 6734 updates, score 6.094) (writing took 4.625352236442268 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3743512324371333, 'p': 0.3969204019868071, 'r': 0.3928996754216558}, 'rouge-2': {'f': 0.15513973954352014, 'p': 0.1640198600019053, 'r': 0.1656530221490409}, 'rouge-l': {'f': 0.3754077971732901, 'p': 0.4063686794601194, 'r': 0.3776842747295159}}
2020-09-13 03:16:43 | INFO | train | {"epoch": 38, "train_loss": "3.475", "train_nll_loss": "1.497", "train_ppl": "2.822", "train_wps": "1074.9", "train_ups": "0.51", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "6916", "train_lr": "7.21479e-06", "train_gnorm": "2.282", "train_clip": "0", "train_oom": "0", "train_train_wall": "176", "train_wall": "12476"}
2020-09-13 03:16:48 | INFO | valid | {"epoch": 38, "valid_loss": "6.091", "valid_nll_loss": "4.292", "valid_ppl": "19.588", "valid_wps": "4328.9", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "6916", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3906977486823824, 'p': 0.4322244472662039, 'r': 0.3904387899271124}, 'rouge-2': {'f': 0.17065932983497842, 'p': 0.18826351149371626, 'r': 0.17272125641072303}, 'rouge-l': {'f': 0.3853463228641199, 'p': 0.43086111541067484, 'r': 0.37498756649126375}}
2020-09-13 03:18:01 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 38 @ 6916 updates, score 6.091) (writing took 4.299341707490385 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3778648195003795, 'p': 0.4124395622043999, 'r': 0.38638326843135784}, 'rouge-2': {'f': 0.15725456356423087, 'p': 0.17166198095409285, 'r': 0.16285370981177014}, 'rouge-l': {'f': 0.3730423527751766, 'p': 0.4130845975685863, 'r': 0.36802060468026837}}
2020-09-13 03:22:20 | INFO | train | {"epoch": 39, "train_loss": "3.431", "train_nll_loss": "1.444", "train_ppl": "2.721", "train_wps": "1149.8", "train_ups": "0.54", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7098", "train_lr": "7.12169e-06", "train_gnorm": "2.279", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "12812"}
2020-09-13 03:22:24 | INFO | valid | {"epoch": 39, "valid_loss": "6.127", "valid_nll_loss": "4.333", "valid_ppl": "20.159", "valid_wps": "6119.1", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7098", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3850664930124747, 'p': 0.41624993500454843, 'r': 0.39500751417890606}, 'rouge-2': {'f': 0.16650493821705545, 'p': 0.17903338552596454, 'r': 0.1734290759497076}, 'rouge-l': {'f': 0.38333622366926445, 'p': 0.4211383929772211, 'r': 0.38013266010559055}}
2020-09-13 03:23:47 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 39 @ 7098 updates, score 6.127) (writing took 4.8811688451096416 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3791954685119963, 'p': 0.4095449415666187, 'r': 0.38998344500957005}, 'rouge-2': {'f': 0.1587988932718357, 'p': 0.1721871158793693, 'r': 0.16477044960993284}, 'rouge-l': {'f': 0.3762471094410884, 'p': 0.4124276670798772, 'r': 0.3733499950158345}}
2020-09-13 03:28:07 | INFO | train | {"epoch": 40, "train_loss": "3.382", "train_nll_loss": "1.389", "train_ppl": "2.618", "train_wps": "1116.7", "train_ups": "0.52", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7280", "train_lr": "7.03211e-06", "train_gnorm": "2.242", "train_clip": "0", "train_oom": "0", "train_train_wall": "174", "train_wall": "13159"}
2020-09-13 03:28:13 | INFO | valid | {"epoch": 40, "valid_loss": "6.151", "valid_nll_loss": "4.357", "valid_ppl": "20.485", "valid_wps": "3936.6", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7280", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3890387077350804, 'p': 0.42607329008636396, 'r': 0.39286505207167427}, 'rouge-2': {'f': 0.1711647526477222, 'p': 0.18699617326343265, 'r': 0.17517819545443322}, 'rouge-l': {'f': 0.3875280498801758, 'p': 0.42916514940329664, 'r': 0.3812118893956973}}
2020-09-13 03:29:30 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 40 @ 7280 updates, score 6.151) (writing took 4.4904003804549575 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3726439693825235, 'p': 0.40527439922433883, 'r': 0.38066306967102415}, 'rouge-2': {'f': 0.1534322184436473, 'p': 0.16804306661481672, 'r': 0.15854620618257684}, 'rouge-l': {'f': 0.3707951982072947, 'p': 0.40862711990312467, 'r': 0.36647558059887014}}
2020-09-13 03:33:52 | INFO | train | {"epoch": 41, "train_loss": "3.35", "train_nll_loss": "1.351", "train_ppl": "2.551", "train_wps": "1122.4", "train_ups": "0.53", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7462", "train_lr": "6.94582e-06", "train_gnorm": "2.213", "train_clip": "0", "train_oom": "0", "train_train_wall": "177", "train_wall": "13504"}
2020-09-13 03:33:58 | INFO | valid | {"epoch": 41, "valid_loss": "6.181", "valid_nll_loss": "4.391", "valid_ppl": "20.984", "valid_wps": "4025.2", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7462", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3843756468861572, 'p': 0.41117690198532353, 'r': 0.39729475731101416}, 'rouge-2': {'f': 0.16308450642519431, 'p': 0.17295082127690298, 'r': 0.17102846873431943}, 'rouge-l': {'f': 0.37910076864649406, 'p': 0.4103915731589071, 'r': 0.3803033642081009}}
2020-09-13 03:35:23 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 41 @ 7462 updates, score 6.181) (writing took 4.650649065151811 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3769917556036458, 'p': 0.3944497389851229, 'r': 0.40075455946786587}, 'rouge-2': {'f': 0.1521325700581021, 'p': 0.1589907345850074, 'r': 0.16469397096833938}, 'rouge-l': {'f': 0.3697896732897443, 'p': 0.39433115895849924, 'r': 0.37772640423431814}}
2020-09-13 03:39:50 | INFO | train | {"epoch": 42, "train_loss": "3.311", "train_nll_loss": "1.307", "train_ppl": "2.473", "train_wps": "1084", "train_ups": "0.51", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7644", "train_lr": "6.86264e-06", "train_gnorm": "2.195", "train_clip": "0", "train_oom": "0", "train_train_wall": "179", "train_wall": "13862"}
2020-09-13 03:39:54 | INFO | valid | {"epoch": 42, "valid_loss": "6.192", "valid_nll_loss": "4.401", "valid_ppl": "21.12", "valid_wps": "4981.4", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7644", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.3839548844216928, 'p': 0.40919695990657146, 'r': 0.39678524748658583}, 'rouge-2': {'f': 0.16309939507844493, 'p': 0.1717916669360824, 'r': 0.17226327704521602}, 'rouge-l': {'f': 0.38036248312741655, 'p': 0.4108766577386286, 'r': 0.381103948466934}}
2020-09-13 03:41:25 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 42 @ 7644 updates, score 6.192) (writing took 4.394819853827357 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.37688485075539657, 'p': 0.39846735624071966, 'r': 0.3945277424880961}, 'rouge-2': {'f': 0.1555117580701479, 'p': 0.1640307380388293, 'r': 0.16614039633603137}, 'rouge-l': {'f': 0.37278658003532866, 'p': 0.4006159776983299, 'r': 0.37631725030144797}}
2020-09-13 03:45:51 | INFO | train | {"epoch": 43, "train_loss": "3.28", "train_nll_loss": "1.27", "train_ppl": "2.412", "train_wps": "1070.6", "train_ups": "0.5", "train_wpb": "2128.5", "train_bsz": "80.9", "train_num_updates": "7826", "train_lr": "6.78237e-06", "train_gnorm": "2.213", "train_clip": "0", "train_oom": "0", "train_train_wall": "177", "train_wall": "14224"}
2020-09-13 03:45:55 | INFO | valid | {"epoch": 43, "valid_loss": "6.194", "valid_nll_loss": "4.405", "valid_ppl": "21.181", "valid_wps": "5927.7", "valid_wpb": "135.3", "valid_bsz": "5.1", "valid_num_updates": "7826", "valid_best_loss": "5.695"}
here bpe NONE
here!
Val {'rouge-1': {'f': 0.38388910329361303, 'p': 0.4258477894016644, 'r': 0.3848852055702036}, 'rouge-2': {'f': 0.16835026873515999, 'p': 0.18584556947743405, 'r': 0.17154795199389153}, 'rouge-l': {'f': 0.3794284964164359, 'p': 0.42386850952855576, 'r': 0.37060369000872206}}
2020-09-13 03:47:31 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_scratch_1/checkpoint_last.pt (epoch 43 @ 7826 updates, score 6.194) (writing took 4.312541832216084 seconds)
Test on testing set:
Test {'rouge-1': {'f': 0.3649928682374293, 'p': 0.40100515797443215, 'r': 0.37178177746836855}, 'rouge-2': {'f': 0.14818549602209022, 'p': 0.16269601833548097, 'r': 0.15371445974590012}, 'rouge-l': {'f': 0.3619876909686621, 'p': 0.40180170267059645, 'r': 0.35681259215856076}}
2020-09-13 03:48:41 | INFO | fairseq_cli.train | early stop since valid performance hasn't improved for last 30 runs
2020-09-13 03:48:41 | INFO | fairseq_cli.train | done training in 14391.9 seconds

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

I think I figure out the reason, if you did not download the pre-trained model in the folder, the model is not initialized with pre-trained BART, instead, they are going to be randomly initialized.

Please download the pre-trained BART here (https://github.com/pytorch/fairseq/tree/master/examples/bart)

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

as shown in your log:

2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt

from multi-view-seq2seq.

jiaaoc avatar jiaaoc commented on May 27, 2024

that's probably the reason why your results are pretty low.

from multi-view-seq2seq.

negrinho avatar negrinho commented on May 27, 2024

Good catch. You are right. I'm training the single view model and the results seem to match those reported in the paper. Thanks for the help.

Test on val set: 
100% 817/817 [03:08<00:00,  4.33it/s]
Val {'rouge-1': {'f': 0.47053820487934117, 'p': 0.481068078158503, 'r': 0.5012747517270539}, 'rouge-2': {'f': 0.23280899121248622, 'p': 0.23762821988807867, 'r': 0.2502566730166665}, 'rouge-l': {'f': 0.45843104678080715, 'p': 0.4705976576858032, 'r': 0.47959589277465375}}
2020-11-06 21:23:24 | INFO | fairseq.checkpoint_utils | saved checkpoint checkpoints_stage/checkpoint_best.pt (epoch 1 @ 93 updates, score 4.057) (writing took 234.77840801099956 seconds)
Test on testing set: 
100% 818/818 [03:15<00:00,  4.19it/s]
Test {'rouge-1': {'f': 0.46512253633774703, 'p': 0.4772971625389979, 'r': 0.4974225331330478}, 'rouge-2': {'f': 0.2247942720566339, 'p': 0.23095709935043798, 'r': 0.24239651268780865}, 'rouge-l': {'f': 0.452616026351333, 'p': 0.46413084533332033, 'r': 0.47522827237678494}}
epoch 002:  73% 68/93 [14:01<05:11, 12.46s/it, loss=4.098, nll_loss=2.26, ppl=4.791, wps=191.4, ups=0.05, wpb=4184.4, bsz=160.2, num_updates=161, lr=2.415e-05, gnorm=2.919, clip=100, oom=0, train_wall=833, wall=2710]

from multi-view-seq2seq.

chostyouwang avatar chostyouwang commented on May 27, 2024

as shown in your log:

2020-11-06 17:46:38 | INFO | fairseq.trainer | no existing checkpoint found ./bart.large/model.pt

I have the same problom,but I hava downloaded 'model.pt'file in /content/drive/MyDrive/Multi-View-Seq2Seq/train_sh/bart.large/model.pt why can not find

from multi-view-seq2seq.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.