Giter Site home page Giter Site logo

xlnet's People

Contributors

cclauss avatar charliebickerton avatar graykode avatar kimiyoung avatar manrajgrover avatar nirantk avatar shujian2015 avatar ymcui avatar zihangdai avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xlnet's Issues

why not use a partial factorization ?

Hi, thanks for the great work.
I have a question about the process of pre-training.
Assuming a lenth-T sequence with a permutation z, since we use a partial prediction(only predict tokens after z_c), why dont we use a partial factorization? which means that using a all-zero attention mask for the first c tokens to make the context tokens see each other ?

Thank you.

Ranking formulation for QNLI Glue task on v2 version

Hey,
I was wondering, In the paper you mentioned about Ranking formulation for QNLI glue task (similar to MT-DNN). I wonder how did you achieve that with QNLI v2 data splits?
Even in the MT-DNN source code they are unable to do that on new split here

The model did not distinguish all possible factorization order

Good paper!
I am confused about the factorization order. Looks the model did not distinguish all possible factorization order. Follow Figure1 to predict x3, the model can not distinguish permutation Pa(2-4-3) with Pb(4-2-3) like traditional AR model, so the model did not model all possible factorization order?

what should I do?

AttributeError: 'MirroredStrategy' object has no attribute 'num_replicas_in_sync'
thx!

Comparisons with Bert using same dataset

I notice that XLNet uses Giga5 (16GB text) ClueWeb 2012-B (19GB), and Common Crawl (78GB) for pretraining. For a possibly better comparisons with Bert, I'm curious about what would be the performance if XLNet just uses BooksCorpus and English Wikipedia?

Throws an exception while trying to use with Colab TPU[with solution]

Hello,
The run_classifier.py gives out an int error when performing eval on the pretrained model after finetuning:
invalid literal for int() with base 10: 'gs://my_bert_2/xlnet/models/TASK/xlnet_model.ckpt'.

Solution:
Adding a try-catch block at line 776 of run_classifier:

    try:
        global_step = int(cur_filename.split("-")[-1])
        tf.logging.info("Add {} to eval list.".format(cur_filename))
        steps_and_files.append([global_step, cur_filename])
    except Exception:
        print(cur_filename+" skipped")

Best,
Aditya Malte

Provide a utility to extract features from texts?

Congratulations on the wonderful work!

I can't wait to try the model. Do you have plan to provide a utility to extract vectors for some sample texts. Just as the extract_features.py in the BERT repo? Or is there other ways to easily run the model against some raw texts?

Stopping condition for pretraining?

Congratulations to the authors on this excellent work.

I'd like to pretrain my own XLNet models. Then, I plan to do some NAS / design space exploration on models similar to XLNet.

My question is: What results on the pretraining set (Wikipedia + BooksCorpus + ...) did you get before you stopped training?
I'm being deliberately vague because I don't know how you measured success on the pretraining set (training loss, test loss, perplexity, bits per character ... or what).

Thanks in advance for the help!

Default parameters for base model

Hi @kimiyoung and @zihangdai (and all others from the xlnet team),

thanks for sharing the implementation and pre-trained model(s) ❤️

I've some questions regarding to pre-training XLNet:

  • Could you provide the default parameters for the train.py and train_gpu.py script when training a base XLNet model? The current readme only shows parameters for a large model.
  • Do you think a single TPU v3 is sufficient to pre-train a base model?

Thanks so much,

Stefan

Can't load model in GCS directly

When I wanted to run the model on TPU, I used "gs://..." replace the ${LARGE_DIR}. But it turns out the IOError.
Traceback (most recent call last): File "run_classifier.py", line 903, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "run_classifier.py", line 722, in main sp.Load(FLAGS.spiece_model_file) File "/usr/local/lib/python2.7/dist-packages/sentencepiece.py", line 118, in Load return _sentencepiece.SentencePieceProcessor_Load(self, filename) IOError: Not found: "gs://ykproject/pre-trained/xlnet_cased_L-24_H-1024_A-16/spiece.model": No such file or directory Error #2

Did this mean sp.Load() doesn't support load GCS file? And I should change the code. Or something other should I do?

Has the data been split into segments for pretraining

The paper says

During the pretraining phase, following BERT, we randomly sample two segments (either from the same context or not) and treat the concatenation of two segments as one sequence to perform permutation language modeling.

I don't really get this, if there is no next sentence prediction what is the point of concatenating segments that do not belong to the same context? Won't that degrade the performance of the model? What is the objective behind using two segments (both from the same context and not)?

Out of memory with TPU v3-8 when running tpu_squad_large.sh

Hello, thank you for the interesting paper and for releasing your code alongside the paper!

I am trying to train XL-Net on Squad, but I am getting OOM errors when running scripts/tpu_squad_large.sh. This strikes me as odd, because you say in the README that you can run this script without issues. I have not modified the parameters of the script, except for specifying the necessary data/model directories.

For context, my setup is as follows. I spun up a TPU v3-8 using ctpu up in the us-central1-a region. I preprocessed the data as directed, using scripts/prepro_squad.sh, and moved to a Google Storage bucket in the same region as the TPU. I have model checkpoint folders both locally (for sentencepiece) and in the cloud (for loading the model).

I have worked with TPUs before, but only TPU-v2 (not v3); is there something I am doing incorrectly?

When I run scripts/tpu_squad_large.sh, loading and initialization work fine, but the script breaks with what I believe is a memory issue:

# ... normal tensorflow logs ...

I0621 17:53:36.702727 140612788254144 tpu_estimator.py:536] Enqueue next (1000) batch(es) of data to infeed.
I0621 17:53:36.703403 140612788254144 tpu_estimator.py:540] Dequeue next (1000) batch(es) of data from outfeed.
I0621 17:56:15.833373 140611248187136 error_handling.py:70] Error recorded from outfeed: Bad hardware status: 0x1

# ... stack trace ...

Status code: Resource exhausted [9x]
  Compilation failure: Ran out of memory in memory space hbm. Used 20.90G of 16.00G hbm. Exceeded hbm capacity by 4.90G.

  Total hbm usage >= 20.90G:
      reserved        528.00M
      program          20.38G
      arguments       unknown size

  Output size unknown.

Is there something I am doing incorrectly?

Also, have others managed to run scripts/tpu_squad_large.sh successfully (with batch size 48, etc.)?

Pre-training: checkpoint files are not written

Hi,

I was able to train a smaller model from scratch with a v3-8 TPU. However, after the final 100,000 training steps, no checkpoint files were written.

I specified a gs://model_dir as model_dir parameter, but only the following files are located under this directory:

image

Last log of the training script:

I0625 05:33:03.994360 139956438738368 basic_session_run_hooks.py:247] loss = 2.5767722, step = 100000 (266.367 sec)
I0625 05:33:03.995930 139956438738368 tpu_estimator.py:1874] global_step/sec: 3.75421
I0625 05:33:03.996390 139956438738368 tpu_estimator.py:1875] examples/sec: 60.0674
I0625 05:33:04.449701 139956438738368 tpu_estimator.py:545] Stop infeed thread controller
I0625 05:33:04.450102 139956438738368 tpu_estimator.py:392] Shutting down InfeedController thread.
I0625 05:33:04.450336 139955057714944 tpu_estimator.py:387] InfeedController received shutdown signal, stopping.
I0625 05:33:04.450455 139955057714944 tpu_estimator.py:479] Infeed thread finished, shutting down.
I0625 05:33:04.450696 139956438738368 error_handling.py:93] infeed marked as finished
I0625 05:33:04.450809 139956438738368 tpu_estimator.py:549] Stop output thread controller
I0625 05:33:04.450900 139956438738368 tpu_estimator.py:392] Shutting down OutfeedController thread.
I0625 05:33:04.451042 139955049322240 tpu_estimator.py:387] OutfeedController received shutdown signal, stopping.
I0625 05:33:04.451132 139955049322240 tpu_estimator.py:488] Outfeed thread finished, shutting down.
I0625 05:33:04.451303 139956438738368 error_handling.py:93] outfeed marked as finished
I0625 05:33:04.451407 139956438738368 tpu_estimator.py:553] Shutdown TPU system.
I0625 05:33:07.445307 139956438738368 estimator.py:359] Loss for final step: 2.5767722.
I0625 05:33:07.446149 139956438738368 error_handling.py:93] training_loop marked as finished

Could you help? Thanks ❤️

A Question about "How Standard LM Parameterization Fails"

image

I just use the screenshot from the appendix of the paper for simplicity.

My question is, based on the first condition, it seems that the second equation does not hold.
For two reasons (Assume we're predicting 3rd element with z_{<3} from permutation {x2, x1, x3, x4} and {x2, x1, x4, x3}, namely x3 and x4 respectively):

  1. Since i =3<>j=4, even though the input to h(z_{<3}) are the same {e(x2), e(x1)}, the weights are different, {w_23, w_13} to predict x3, and {w_24. w_14} to predict x4.
    For all layers, the weights are also different for different target predictions, even with same input neurons.
    Therefore, h(x_{z<t}) does not equal for the two permutations, even with the same prefix.

  2. Since i <> j, if:
    2.1. X_i <> X_j, then e(X_i) <> e(X_j), the first term in exp() does not equal when predicting z_3
    (i.e. x3 and x4 are the same word)
    2.2. X_i = X_j, then e(X_i) = e(X_j), the first term in exp(
    ) equals when predicting z_3

So the e(x) h(x_{z<t}) may not hold for the two permutations with same prefix.

This is my confusion. Please correct me if I'm wrong.

Thx a lot.

对xlnet预训练过程的一点疑问

对于一段文本,选取其中的K个单词,每次只MASK掉一个,生成K条训练数据,再最大化K条训练数据的对应正确单词的对数概率。

是不是也可以达到和xlnet一样的效果?

I have questions about creating a pre-training model.

HI We're working on a pre-training model. I have two questions about this process.

First of all, The amount of data I have is about 180 million sentences, and it takes too long to make a tfrecord. I need advice to make tfrecord.

Second, Is there no performance problem if I change the model type to another type when I create the Sentencepiece model? like bpe, char, or word.

AttributeError: lr_layer_decay_rate

Traceback (most recent call last):
  File "train_gpu.py", line 328, in <module>
    tf.app.run()
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "train_gpu.py", line 324, in main
    train("/gpu:0")
  File "train_gpu.py", line 258, in train
    grads_and_vars=grads_and_vars)
  File "/home/husein/xlnet/model_utils.py", line 147, in get_train_op
    if FLAGS.lr_layer_decay_rate != 1.0:
  File "/usr/local/lib/python3.6/dist-packages/absl/flags/_flagvalues.py", line 473, in __getattr__
    raise AttributeError(name)
AttributeError: lr_layer_decay_rate

For both train.py and train_gpu.py. Both missing lr_layer_decay_rate from the FLAGS.

When I check run_classifier.py,

flags.DEFINE_float("lr_layer_decay_rate", 1.0,
                   "Top layer: lr[L] = FLAGS.learning_rate."
                   "Low layer: lr[l-1] = lr[l] * lr_layer_decay_rate.")

Default value is 1.0, but I am not sure the default value during pretraining is 1.0.

Getting the following error when trying to run tpu_squad_large.sh

W0624 16:40:52.848234 140595823699392 __init__.py:44] file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/__init__.py", line 41, in autodetect
    from . import file_cache
  File "/usr/local/lib/python2.7/dist-packages/googleapiclient/discovery_cache/file_cache.py", line 41, in <module>
    'file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth')
ImportError: file_cache is unavailable when using oauth2client >= 4.0.0 or google-auth
I0624 16:40:53.032814 140595823699392 model_utils.py:32] Use TPU without distribute strategy.
W0624 16:40:53.034595 140595823699392 estimator.py:1924] Estimator's model_fn (<function model_fn at 0x7fded610ded8>) includes params argument, but params are not passed to Estimator.
I0624 16:40:53.035511 140595823699392 estimator.py:201] Using config: {'_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
, '_keep_checkpoint_max': 5, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fded610c2d0>, '_model_dir': 'gs://question-answering/experiment/squad', '_protocol': None, '_save_checkpoints_steps': 1000, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tpu_config': TPUConfig(iterations_per_loop=1000, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_tf_random_seed': None, '_save_summary_steps': 100, '_device_fn': None, '_cluster': None, '_experimental_distribute': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': None, '_evaluation_master': u'grpc://10.240.1.2:8470', '_eval_distribute': None, '_global_id_in_cluster': 0, '_master': u'grpc://10.240.1.2:8470'}
I0624 16:40:53.035886 140595823699392 tpu_context.py:202] _TPUContext: eval_on_tpu True
I0624 16:40:53.036292 140595823699392 run_squad.py:940] Input tfrecord file glob gs://question-answering/proc_data/squad/spiece.model.*.slen-512.qlen-64.train.tf_record
I0624 16:40:53.103672 140595823699392 run_squad.py:943] Find 0 input paths []
I0624 16:40:53.243366 140595823699392 tpu_system_metadata.py:59] Querying Tensorflow master (grpc://10.240.1.2:8470) for TPU system metadata.
2019-06-24 16:40:53.244997: W tensorflow/core/distributed_runtime/rpc/grpc_session.cc:354] GrpcSession::ListDevices will initialize the session with an empty graph and other defaults because the session has not yet been created.
I0624 16:40:53.250566 140595823699392 tpu_system_metadata.py:120] Found TPU system:
I0624 16:40:53.250852 140595823699392 tpu_system_metadata.py:121] *** Num TPU Cores: 8
I0624 16:40:53.251368 140595823699392 tpu_system_metadata.py:122] *** Num TPU Workers: 1
I0624 16:40:53.251487 140595823699392 tpu_system_metadata.py:124] *** Num TPU Cores Per Worker: 8
I0624 16:40:53.251578 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 13676165870058292740)
I0624 16:40:53.251995 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 18431886415160989968)
I0624 16:40:53.252130 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 1709911759425913454)
I0624 16:40:53.252240 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 10844450437283158931)
I0624 16:40:53.252331 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 6304466678072412335)
I0624 16:40:53.252414 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 1347834186282897648)
I0624 16:40:53.252512 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 2010934665306124677)
I0624 16:40:53.252598 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 1558411301377583255)
I0624 16:40:53.252691 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 15582409736436553171)
I0624 16:40:53.252773 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 13427578911967334923)
I0624 16:40:53.252856 140595823699392 tpu_system_metadata.py:126] *** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 17740777277430650014)
W0624 16:40:53.257469 140595823699392 deprecation.py:323] From /usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
I0624 16:40:53.268704 140595823699392 estimator.py:1111] Calling model_fn.
W0624 16:40:53.273418 140595823699392 deprecation.py:323] From run_squad.py:1001: parallel_interleave (from tensorflow.contrib.data.python.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.parallel_interleave(...)`.
I0624 16:40:53.275295 140595823699392 error_handling.py:70] Error recorded from training_loop: Tensor conversion requested dtype string for Tensor with dtype float32: 'Tensor("arg0:0", shape=(), dtype=float32, device=/job:tpu_worker/task:0/device:CPU:0)'
I0624 16:40:53.275455 140595823699392 error_handling.py:93] training_loop marked as finished
W0624 16:40:53.275588 140595823699392 error_handling.py:127] Reraising captured error
Traceback (most recent call last):
  File "run_squad.py", line 1310, in <module>
    tf.app.run()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "run_squad.py", line 1209, in main
    estimator.train(input_fn=train_input_fn, max_steps=FLAGS.train_steps)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2457, in train
    rendezvous.raise_errors()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/error_handling.py", line 128, in raise_errors
    six.reraise(typ, value, traceback)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2452, in train
    saving_listeners=saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 358, in train
    loss = self._train_model(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1124, in _train_model
return self._train_model_default(input_fn, hooks, saving_listeners)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1154, in _train_model_default
    features, labels, model_fn_lib.ModeKeys.TRAIN, self.config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2251, in _call_model_fn
    config)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow_estimator/python/estimator/estimator.py", line 1112, in _call_model_fn
    model_fn_results = self._model_fn(features=features, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2547, in _model_fn
    input_holders.generate_infeed_enqueue_ops_and_dequeue_fn())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1167, in generate_infeed_enqueue_ops_and_dequeue_fn
    self._invoke_input_fn_and_record_structure())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 1243, in _invoke_input_fn_and_record_structure
    self._inputs_structure_recorder, host_device, host_id))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 830, in generate_per_host_v2_enqueue_ops_fn_for_host
    inputs = _Inputs.from_input_fn(input_fn(user_context))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/tpu/python/tpu/tpu_estimator.py", line 2423, in _input_fn
    return input_fn(**kwargs)
  File "run_squad.py", line 1001, in input_fn
    cycle_length=cycle_length))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1605, in apply
    return DatasetV1Adapter(super(DatasetV1, self).apply(transformation_func))
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 1127, in apply
    dataset = transformation_func(self)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/experimental/ops/interleave_ops.py", line 88, in _apply_fn
    buffer_output_elements, prefetch_input_elements)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/readers.py", line 133, in __init__
    cycle_length, block_length)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2827, in __init__
    super(InterleaveDataset, self).__init__(input_dataset, map_func)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2798, in __init__
    map_func, self._transformation_name(), dataset=input_dataset)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2124, in __init__
    self._function.add_to_graph(ops.get_default_graph())
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/function.py", line 490, in add_to_graph
    self._create_definition_if_needed()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/function.py", line 341, in _create_definition_if_needed
    self._create_definition_if_needed_impl()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/function.py", line 355, in _create_definition_if_needed_impl
    whitelisted_stateful_ops=self._whitelisted_stateful_ops)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/function.py", line 883, in func_graph_from_py_func
    outputs = func(*func_graph.inputs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/dataset_ops.py", line 2099, in tf_data_structured_function_wrapper
    ret = func(*nested_args)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/readers.py", line 247, in __init__
    filenames, compression_type, buffer_size, num_parallel_reads)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/data/ops/readers.py", line 199, in __init__
    filenames = ops.convert_to_tensor(filenames, dtype=dtypes.string)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1039, in convert_to_tensor
    return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1097, in convert_to_tensor_v2
    as_ref=False)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1175, in internal_convert_to_tensor
    ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 977, in _TensorTensorConversionFunction
    (dtype.name, t.dtype.name, str(t)))
ValueError: Tensor conversion requested dtype string for Tensor with dtype float32: 'Tensor("arg0:0", shape=(), dtype=float32, device=/job:tpu_worker/task:0/device:CPU:0)'

XLNet stuck for Text Classification task

Hello,

I want to use the Text Classification task on our own data:

1- In BERT the data is formatted in (id, label, etc). I understood that for XLNet, the data required to be formatted in the same way correct?

2- I tried to run XLnet, firstly it worked and saved some files then reached to this point (last thing printed as the screenshot shows) and then stuck for hours without giving any update or even saving files (checkpoints) and it didn't show any error at all.

I am using Google Colab GPU.

Thanks,
image_2019_06_23T15_46_46_465Z

Error while running the pretrained model on MNLI

Used the following command to run MNLI using the pretrained model:

python run_classifier.py --do_train=False --do_eval=True --task_name=mnli_matched --data_dir=../MNLI/MNLI --output_dir=results --model_dir=model/xlnet_cased_L-24_H-1024_A-16 --uncased=False --spiece_model_file=model/xlnet_cased_L-24_H-1024_A-16/spiece.model --model_config_path=model/xlnet_cased_L-24_H-1024_A-16/xlnet_config.json --max_seq_length=128 --eval_batch_size=8 --num_hosts=1 --num_core_per_host=1 --eval_all_ckpt=False --is_regression=False

It throws the following error:
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key model/classification_mnli_matched/logit/bias not found in checkpoint
[[node save/RestoreV2 (defined at /home/demo/anaconda2/lib/python2.7/site-packages/tensorflow_estimator/python/estimator/estimator.py:1537) ]]

Am I using the correct command (TF version 1.13.1)? Thank you.

The poor performance when trying to evaluate on CoLA Dataset

I tried to use XLNet-large-cased ckpt to evaluate on the CoLA dataset. However, it turned out that the accuracy on eval.tsv is only 0.042...

I modified run_classifier.py like following.

class ColaProcessor(GLUEProcessor):
  def __init__(self):
    super(ColaProcessor, self).__init__()
    self.label_column = 1
    self.text_a_column = 3
    self.test_text_a_column = 1

  def get_labels(self):
    return ["0", "1"]

  def _create_examples(self, lines, set_type):
    """Creates examples for the training and dev sets."""
    examples = []
    for (i, line) in enumerate(lines):
      if i == 0 and self.contains_header and set_type != "test":
        continue
      if i == 0 and self.test_contains_header and set_type == "test":
        continue
      guid = "%s-%s" % (set_type, i)

      a_column = (self.text_a_column if set_type != "test" else
          self.test_text_a_column)

      # there are some incomplete lines in QNLI
      if len(line) <= a_column:
        tf.logging.warning('Incomplete line, ignored.')
        continue
      text_a = line[a_column]

      if set_type == "test":
        label = self.get_labels()[0]
      else:
        if len(line) <= self.label_column:
          tf.logging.warning('Incomplete line, ignored.')
          continue
        label = line[self.label_column]
      examples.append(
          InputExample(guid=guid, text_a=text_a, text_b=None, label=label))
    return examples

The command line is the following:

CUDA_VISIBLE_DEVICES=0,1,2,3 python run_classifier.py \
  --do_train=True \
  --do_eval=True \
  --do_predict=True \
  --eval_split=eval \
  --task_name=CoLA \
  --data_dir=${GLUE_DIR}/CoLA \
  --output_dir=proc_data/${TASK_NAME} \
  --model_dir=exp/${TASK_NAME} \
  --predict_dir=result/${TASK_NAME} \
  --uncased=False \
  --spiece_model_file=${LARGE_DIR}/spiece.model \
  --model_config_path=${LARGE_DIR}/xlnet_config.json \
  --init_checkpoint=${LARGE_DIR}/xlnet_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=8 \
  --num_hosts=1 \
  --num_core_per_host=4 \
  --learning_rate=5e-5 \
  --train_steps=1200 \
  --warmup_steps=120 \
  --save_steps=600 \
  --is_regression=False

Could anyone help me with this?

Could you check the memory usage for XL-large model?

I use a single V100 GPU with 32G RAM, and run the train sts-b script given in readme. The script runs succeed when I set batch size=4, or it will stop with OOM error. This phenomenon is different from the description in the memory usage table. So could you check the number in that table?

p.s. I notice you set batch size = 8 even you use 4 GPU.

Long Sequence in SQuAD

Case: SQuAD task, sequence length > 512

Does your script utilizes cached memory/extended context in a segment, such that the predictions are inferred from sequence longer than 512 tokens?

If yes, where is the code that achieves this?

If not, what do you suggest to utilize cached memory to perform QA task?

Thank you for such a great work!

Shuffle the examples in prediction phase.

I notice that in run_classifier.py, row 409, your code shuffles the input examples.

While in predicting phase, we actually need an ordered input sequence for online submission.

You may add a flag here.

tf.logging.info("Create new tfrecord {}.".format(output_file))
writer = tf.python_io.TFRecordWriter(output_file)

## Here
np.random.shuffle(examples)

if num_passes > 1:
    examples *= num_passes

Typo in README

I found a small typo in README.

--mask_alpht in the options for data_utils.py should be --mask_alpha.

Thanks.

run_squad is giving wrong answer and not consistent

I am using xlnet with given uncased checkpoint. Using predict mode only (no training). For predict mode i have added following code in run_squad to load the checkpoint provided:

   if mode == tf.estimator.ModeKeys.PREDICT:
      if FLAGS.init_checkpoint:
        # tf.logging.info("init_checkpoint not being used in predict mode.")
        print(">> calling init_from_checkpoint")
        from model_utils import init_from_checkpoint # added by sandeep
        init_from_checkpoint(FLAGS)   # added by sandeep

After running in the predict mode, the answers produced by xlnet in nbest_predictions.json are nowhere close the question asked. Also, everytime I run same code with same question on same data, the answer produced is changing.

FYI.. following log states that checkpoint is loaded:

>> calling init_from_checkpoint
I0625 12:28:59.124107 139827578967936 model_utils.py:71] Initialize from the ckpt xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt
W0625 12:28:59.127810 139827578967936 deprecation_wrapper.py:119] From /content/xlnet/model_utils.py:82: The name tf.train.init_from_checkpoint is deprecated. Please use tf.compat.v1.train.init_from_checkpoint instead.

I0625 12:29:00.149512 139827578967936 model_utils.py:85] **** Global Variables ****
I0625 12:29:00.149802 139827578967936 model_utils.py:91]   name = model/transformer/r_w_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.149934 139827578967936 model_utils.py:91]   name = model/transformer/r_r_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.150032 139827578967936 model_utils.py:91]   name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.150125 139827578967936 model_utils.py:91]   name = model/transformer/r_s_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.150226 139827578967936 model_utils.py:91]   name = model/transformer/seg_embed:0, shape = (24, 2, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.150321 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.150409 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.150493 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.150575 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.150685 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.150772 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.150850 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.150926 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.151008 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.151085 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.151165 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.151241 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.151317 139827578967936 model_utils.py:91]   name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.151393 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.151474 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.151556 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.151659 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.151744 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.151826 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.151904 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.151981 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.152060 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.152152 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.152232 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.152308 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.152383 139827578967936 model_utils.py:91]   name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.152458 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.152543 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.152738 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.152846 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.152931 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.153010 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.153096 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.153177 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.153260 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.153337 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.153416 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.153492 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.153567 139827578967936 model_utils.py:91]   name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.153665 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.153754 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.153837 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.153921 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.154012 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.154094 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.154173 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.154251 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.154331 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.154409 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.154487 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.154564 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.154660 139827578967936 model_utils.py:91]   name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.154741 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.154823 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.154904 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.154985 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.155069 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.155152 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.155229 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.155306 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.155385 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.155462 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.155541 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.155623 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.155720 139827578967936 model_utils.py:91]   name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.155797 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.155880 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.155970 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.156053 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.156136 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.156218 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.156294 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.156371 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.156452 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.156529 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.156615 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.156714 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.156797 139827578967936 model_utils.py:91]   name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.156875 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.156957 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.157040 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.157121 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.157203 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.157286 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.157362 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.157437 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.157515 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.157592 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.157692 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.157770 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.157846 139827578967936 model_utils.py:91]   name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.157930 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.158014 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.158096 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.158179 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.158267 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.158349 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.158425 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.158501 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.158581 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.158678 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.158758 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.158835 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.158910 139827578967936 model_utils.py:91]   name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.158985 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.159067 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.159148 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.159230 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.159312 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.159395 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.159472 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.159549 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.159646 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.159723 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.159804 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.241137 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.241423 139827578967936 model_utils.py:91]   name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.241559 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.241753 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.241889 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.242016 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.242137 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.242250 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.242371 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.242488 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.242654 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.242802 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.242927 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.243036 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.243145 139827578967936 model_utils.py:91]   name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.243255 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.243373 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.243489 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.243620 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.243768 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.243888 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.244001 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.244110 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.244223 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.244331 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.244446 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.244552 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.244694 139827578967936 model_utils.py:91]   name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.244808 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.244924 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.245042 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.245158 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.245272 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.245412 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.245530 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.245676 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.245800 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.245911 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.246023 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.246131 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.246239 139827578967936 model_utils.py:91]   name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.246342 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.246455 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.246571 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.246725 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.246848 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.246965 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.247072 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.247182 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.247294 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.247399 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.247510 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.247651 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.247772 139827578967936 model_utils.py:91]   name = model/transformer/layer_12/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.247880 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.247998 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.248111 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.248227 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.248344 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.248459 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.248564 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.248716 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.248831 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.248938 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.249052 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.249160 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.249265 139827578967936 model_utils.py:91]   name = model/transformer/layer_13/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.249411 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.249537 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.249694 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.249818 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.249934 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.250050 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.250161 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.250265 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.250376 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.250483 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.250591 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.250747 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.250855 139827578967936 model_utils.py:91]   name = model/transformer/layer_14/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.250958 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.251072 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.251189 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.251302 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.251416 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.251530 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.251698 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.251814 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.251927 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.252049 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.252192 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.252303 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.252407 139827578967936 model_utils.py:91]   name = model/transformer/layer_15/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.252516 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.252660 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.252786 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.252919 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.253034 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.253150 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.253260 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.253366 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.253477 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.253585 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.253736 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.253846 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.253953 139827578967936 model_utils.py:91]   name = model/transformer/layer_16/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.254056 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.254172 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.254290 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.254402 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.254515 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.254661 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.254776 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.254885 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.254998 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.255104 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.255215 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.255322 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.255427 139827578967936 model_utils.py:91]   name = model/transformer/layer_17/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.255531 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.255674 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.255794 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.255907 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.256023 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.256139 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.256246 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.256351 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.256463 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.256568 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.256716 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.256827 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.256930 139827578967936 model_utils.py:91]   name = model/transformer/layer_18/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.257036 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.257151 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.257261 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.257374 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.257488 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.257611 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.257746 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.257856 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.257965 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.258070 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.258180 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.258285 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.258390 139827578967936 model_utils.py:91]   name = model/transformer/layer_19/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.258496 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.258620 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.258768 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.258884 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.258999 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.259113 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.259223 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.259328 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.259440 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.259546 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.259688 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.259802 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.259908 139827578967936 model_utils.py:91]   name = model/transformer/layer_20/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.260013 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.260128 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.260245 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.260358 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.260472 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.260590 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.260744 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.260855 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.260971 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.261080 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.261193 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.261302 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.261406 139827578967936 model_utils.py:91]   name = model/transformer/layer_21/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.261512 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.261659 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.261783 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.261898 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.262012 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.262125 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.262235 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.262340 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.262448 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.262553 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.262698 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.262808 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.262915 139827578967936 model_utils.py:91]   name = model/transformer/layer_22/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.263020 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.263138 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.263252 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.263367 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.263481 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
I0625 12:29:00.263594 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.263743 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.263850 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
I0625 12:29:00.263962 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
I0625 12:29:00.264071 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
I0625 12:29:00.264182 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.264290 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.264398 139827578967936 model_utils.py:91]   name = model/transformer/layer_23/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
I0625 12:29:00.264501 139827578967936 model_utils.py:91]   name = start_logits/dense/kernel:0, shape = (1024, 1)
I0625 12:29:00.264626 139827578967936 model_utils.py:91]   name = start_logits/dense/bias:0, shape = (1,)
I0625 12:29:00.264764 139827578967936 model_utils.py:91]   name = end_logits/dense_0/kernel:0, shape = (2048, 1024)
I0625 12:29:00.264877 139827578967936 model_utils.py:91]   name = end_logits/dense_0/bias:0, shape = (1024,)
I0625 12:29:00.264984 139827578967936 model_utils.py:91]   name = end_logits/LayerNorm/beta:0, shape = (1024,)
I0625 12:29:00.265092 139827578967936 model_utils.py:91]   name = end_logits/LayerNorm/gamma:0, shape = (1024,)
I0625 12:29:00.265203 139827578967936 model_utils.py:91]   name = end_logits/dense_1/kernel:0, shape = (1024, 1)
I0625 12:29:00.265314 139827578967936 model_utils.py:91]   name = end_logits/dense_1/bias:0, shape = (1,)
I0625 12:29:00.265426 139827578967936 model_utils.py:91]   name = answer_class/dense_0/kernel:0, shape = (2048, 1024)
I0625 12:29:00.265540 139827578967936 model_utils.py:91]   name = answer_class/dense_0/bias:0, shape = (1024,)
I0625 12:29:00.265679 139827578967936 model_utils.py:91]   name = answer_class/dense_1/kernel:0, shape = (1024, 1)
I0625 12:29:00.266186 139827578967936 estimator.py:1147] Done calling model_fn.

Stagnant when fintuning XLNet-Large in STS-B task with 4 gpus.

My training is stagnant at training step 0. My STS-B fintuning command is as follow:

GLUE_DIR=glue_data
LARGE_DIR=xlnet_cased_L-24_H-1024_A-16

python run_classifier.py \
  --do_train=True \
  --do_eval=False \
  --task_name=sts-b \
  --data_dir=${GLUE_DIR}/STS-B \
  --output_dir=proc_data/sts-b \
  --model_dir=exp/sts-b \
  --uncased=False \
  --spiece_model_file=${LARGE_DIR}/spiece.model \
  --model_config_path=${LARGE_DIR}/xlnet_config.json \
  --init_checkpoint=${LARGE_DIR}/xlnet_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=8 \
  --num_hosts=1 \
  --num_core_per_host=4 \
  --learning_rate=5e-5 \
  --train_steps=1200 \
  --warmup_steps=120 \
  --save_steps=600 \
  --is_regression=True

And here's my log

1,2,4,5                                                                                                                                                                                                             
gqxx-01-071
2019年 06月 22日 星期六 18:19:32 CST
2019-06-22 18:19:35.814846: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-06-22 18:19:35.821830: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2400050000 Hz
2019-06-22 18:19:35.822174: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x4752650 executing computations on platform Host. Devices:
2019-06-22 18:19:35.822229: I tensorflow/compiler/xla/service/service.cc:158]   StreamExecutor device (0): <undefined>, <undefined>
INFO:tensorflow:Device is available but not used by distribute strategy: /device:CPU:0
INFO:tensorflow:Device is available but not used by distribute strategy: /device:XLA_CPU:0
WARNING:tensorflow:Not all devices in `tf.distribute.Strategy` are visible to TensorFlow.
INFO:tensorflow:Use MirroredStrategy with 4 devices.
INFO:tensorflow:Initializing RunConfig with distribution strategies.
INFO:tensorflow:Not using Distribute Coordinator.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_keep_checkpoint_max': 0, '_task_type': 'worker', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f3143b1ded0>, '_keep_checkpoint_every_n_hours': 10000, '_service': None, '_num_ps_replicas': 0, '_tpu_config': TPUConfig(iterations_per_loop=600, num_shards=4, num_cores_per_replica=None, per_host_input_for_training=3, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_tf_random_seed': None, '_device_fn': None, '_cluster': None, '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_evaluation_master': '', '_eval_distribute': None, '_train_distribute': <tensorflow.contrib.distribute.python.mirrored_strategy.MirroredStrategy object at 0x7f314584e8d0>, '_distribute_coordinator_mode': None, '_session_config': allow_soft_placement: true
, '_global_id_in_cluster': 0, '_is_chief': True, '_protocol': None, '_save_checkpoints_steps': 600, '_experimental_distribute': None, '_save_summary_steps': 100, '_model_dir': 'exp/sts-b', '_master': ''}
WARNING:tensorflow:Estimator's model_fn (<function model_fn at 0x7f3143a579b0>) includes params argument, but params are not passed to Estimator.
INFO:tensorflow:Use tfrecord file proc_data/sts-b/spiece.model.len-128.train.tf_record
INFO:tensorflow:Num of train samples: 5749
INFO:tensorflow:Do not overwrite tfrecord proc_data/sts-b/spiece.model.len-128.train.tf_record exists.
INFO:tensorflow:Input tfrecord file proc_data/sts-b/spiece.model.len-128.train.tf_record
WARNING:tensorflow:From run_classifier.py:535: map_and_batch (from tensorflow.contrib.data.python.ops.batching) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.experimental.map_and_batch(...)`.
WARNING:tensorflow:From /mnt/lustre/sjtu/home/myl01/anaconda3/envs/xlnet/lib/python2.7/site-packages/tensorflow/python/data/ops/dataset_ops.py:1419: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
OMP: Info #204: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #202: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {5,6,7,33,34}
OMP: Info #156: KMP_AFFINITY: 5 available OS procs
OMP: Info #158: KMP_AFFINITY: Nonuniform topology
OMP: Info #179: KMP_AFFINITY: 1 packages x 3 cores/pkg x 2 threads/core (3 total cores)
OMP: Info #206: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 5 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 33 maps to package 0 core 5 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 6 thread 0 
OMP: Info #171: KMP_AFFINITY: OS proc 34 maps to package 0 core 6 thread 1 
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 8 thread 0 
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 0 bound to OS proc set {5}
2019-06-22 18:19:37.729498: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:memory input None
INFO:tensorflow:Use float type <dtype: 'float32'>
WARNING:tensorflow:From /mnt/lustre/sjtu/home/myl01/NLP/xlnet/modeling.py:532: dropout (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dropout instead.
WARNING:tensorflow:From /mnt/lustre/sjtu/home/myl01/anaconda3/envs/xlnet/lib/python2.7/site-packages/tensorflow/python/keras/layers/core.py:143: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /mnt/lustre/sjtu/home/myl01/NLP/xlnet/modeling.py:67: dense (from tensorflow.python.layers.core) is deprecated and will be removed in a future version.
Instructions for updating:
Use keras.layers.dense instead.
INFO:tensorflow:#params: 361318401
INFO:tensorflow:Initialize from the ckpt xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:memory input None
INFO:tensorflow:Use float type <dtype: 'float32'>
INFO:tensorflow:#params: 361318401
INFO:tensorflow:Initialize from the ckpt xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:memory input None
INFO:tensorflow:Use float type <dtype: 'float32'>
INFO:tensorflow:#params: 361318401
INFO:tensorflow:Initialize from the ckpt xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:memory input None
INFO:tensorflow:Use float type <dtype: 'float32'>
INFO:tensorflow:#params: 361318401
INFO:tensorflow:Initialize from the ckpt xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_0/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_0/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_1/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_1/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_10/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_10/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_11/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_11/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_12/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_12/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_13/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_13/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_14/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_14/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_15/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_15/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_16/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_16/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_17/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_17/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_18/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_18/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_19/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_19/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_2/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_2/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_20/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_20/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_21/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_21/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_22/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_22/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_23/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_23/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_3/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_3/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_4/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_4/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_5/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_5/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_6/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_6/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_7/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_7/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_8/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_8/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/ff/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/ff/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/ff/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/ff/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/ff/layer_1/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/ff/layer_1/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/ff/layer_1/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/ff/layer_1/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/ff/layer_2/bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/ff/layer_2/bias
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/ff/layer_2/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/ff/layer_2/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/rel_attn/LayerNorm/beta:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/rel_attn/LayerNorm/beta
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/rel_attn/LayerNorm/gamma:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/rel_attn/LayerNorm/gamma
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/rel_attn/k/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/rel_attn/k/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/rel_attn/o/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/rel_attn/o/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/rel_attn/q/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/rel_attn/q/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/rel_attn/r/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/rel_attn/r/kernel
DEBUG:tensorflow:Initialize variable model/transformer/layer_9/rel_attn/v/kernel:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/layer_9/rel_attn/v/kernel
DEBUG:tensorflow:Initialize variable model/transformer/r_r_bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/r_r_bias
DEBUG:tensorflow:Initialize variable model/transformer/r_s_bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/r_s_bias
DEBUG:tensorflow:Initialize variable model/transformer/r_w_bias:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/r_w_bias
DEBUG:tensorflow:Initialize variable model/transformer/seg_embed:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/seg_embed
DEBUG:tensorflow:Initialize variable model/transformer/word_embedding/lookup_table:0 from checkpoint xlnet_cased_L-24_H-1024_A-16/xlnet_model.ckpt with model/transformer/word_embedding/lookup_table
INFO:tensorflow:**** Global Variables ****
INFO:tensorflow:  name = model/transformer/r_w_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_r_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_s_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/seg_embed:0, shape = (24, 2, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/sequnece_summary/summary/kernel:0, shape = (1024, 1024)
INFO:tensorflow:  name = model/sequnece_summary/summary/bias:0, shape = (1024,)
INFO:tensorflow:  name = model/regression_sts-b/logit/kernel:0, shape = (1024, 1)
INFO:tensorflow:  name = model/regression_sts-b/logit/bias:0, shape = (1,)
WARNING:tensorflow:From /mnt/lustre/sjtu/home/myl01/anaconda3/envs/xlnet/lib/python2.7/site-packages/tensorflow/python/training/learning_rate_decay_v2.py:321: div (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Deprecated in favor of operator or tf.math.divide.
WARNING:tensorflow:From /mnt/lustre/sjtu/home/myl01/anaconda3/envs/xlnet/lib/python2.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
INFO:tensorflow:**** Global Variables ****
INFO:tensorflow:  name = model/transformer/r_w_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_r_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_s_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/seg_embed:0, shape = (24, 2, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/sequnece_summary/summary/kernel:0, shape = (1024, 1024)
INFO:tensorflow:  name = model/sequnece_summary/summary/bias:0, shape = (1024,)
INFO:tensorflow:  name = model/regression_sts-b/logit/kernel:0, shape = (1024, 1)
INFO:tensorflow:  name = model/regression_sts-b/logit/bias:0, shape = (1,)
INFO:tensorflow:**** Global Variables ****
INFO:tensorflow:  name = model/transformer/r_w_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_r_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_s_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/seg_embed:0, shape = (24, 2, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/sequnece_summary/summary/kernel:0, shape = (1024, 1024)
INFO:tensorflow:  name = model/sequnece_summary/summary/bias:0, shape = (1024,)
INFO:tensorflow:  name = model/regression_sts-b/logit/kernel:0, shape = (1024, 1)
INFO:tensorflow:  name = model/regression_sts-b/logit/bias:0, shape = (1,)
INFO:tensorflow:**** Global Variables ****
INFO:tensorflow:  name = model/transformer/r_w_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_r_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/word_embedding/lookup_table:0, shape = (32000, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/r_s_bias:0, shape = (24, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/seg_embed:0, shape = (24, 2, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_0/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_1/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_2/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_3/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_4/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_5/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_6/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_7/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_8/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_9/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_10/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_11/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_12/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_13/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_14/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_15/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_16/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_17/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_18/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_19/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_20/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_21/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_22/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/q/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/k/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/v/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/r/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/o/kernel:0, shape = (1024, 16, 64), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/rel_attn/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_1/kernel:0, shape = (1024, 4096), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_1/bias:0, shape = (4096,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_2/kernel:0, shape = (4096, 1024), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/layer_2/bias:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/beta:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/transformer/layer_23/ff/LayerNorm/gamma:0, shape = (1024,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = model/sequnece_summary/summary/kernel:0, shape = (1024, 1024)
INFO:tensorflow:  name = model/sequnece_summary/summary/bias:0, shape = (1024,)
INFO:tensorflow:  name = model/regression_sts-b/logit/kernel:0, shape = (1024, 1)
INFO:tensorflow:  name = model/regression_sts-b/logit/bias:0, shape = (1,)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
2019-06-22 18:21:58.667821: I tensorflow/core/common_runtime/process_util.cc:71] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
WARNING:tensorflow:From /mnt/lustre/sjtu/home/myl01/anaconda3/envs/xlnet/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
INFO:tensorflow:Restoring parameters from exp/sts-b/model.ckpt-0
WARNING:tensorflow:From /mnt/lustre/sjtu/home/myl01/anaconda3/envs/xlnet/lib/python2.7/site-packages/tensorflow/python/training/saver.py:1070: get_checkpoint_mtimes (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file utilities to get mtimes.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into exp/sts-b/model.ckpt.
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 1 bound to OS proc set {6}
INFO:tensorflow:Initialize strategy
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 2 bound to OS proc set {7}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 3 bound to OS proc set {33}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 4 bound to OS proc set {34}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 6 bound to OS proc set {6}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 5 bound to OS proc set {5}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 7 bound to OS proc set {7}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 8 bound to OS proc set {33}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 9 bound to OS proc set {34}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 10 bound to OS proc set {5}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 11 bound to OS proc set {6}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 12 bound to OS proc set {7}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 13 bound to OS proc set {33}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 14 bound to OS proc set {34}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 15 bound to OS proc set {5}
OMP: Info #242: KMP_AFFINITY: pid 11876 thread 16 bound to OS proc set {6}
INFO:tensorflow:loss = 12.0331, step = 0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.