kamalkraj / albert-tf2.0 Goto Github PK
View Code? Open in Web Editor NEWALBERT model Pretraining and Fine Tuning using TF2.0
License: Apache License 2.0
ALBERT model Pretraining and Fine Tuning using TF2.0
License: Apache License 2.0
how can i run pretraining in TPU ?
Please containerize this model for fast fine tuning tasks.
I can't get past this error with run_classifier.py
AssertionError: Nothing except the root object matched a checkpointed value. Typically this means that the checkpoint does not match the Python program. The following objects have no matching checkpointed value: [MirroredVariable:{
0 /job:localhost/replica:0/task:0/device:GPU:0: <tf.Variable 'albert_model/encoder/shared_layer/self_attention/value/bias:0' shape=(1024,) dtype=float32, numpy=array([0., 0., 0., ..., 0., 0., 0.], dtype=float32)> ...
Below is my call of the script. I am only testing the workflow so I pretrained for 1 epoch. I made a custom task for my particular use case.
ALBERT_CONFIG=$HOME/idbd-bio-dev/top-binner-albert/data/configs/config_10mers_tf2_2.json
EVAL=$HOME/mnt/corpuses/finetune_corpus_10mers_test/fine_tune_tf_records/eval.tfrecord
TRAIN=$HOME/mnt/corpuses/finetune_corpus_10mers_test/fine_tune_tf_records/training.tfrecord
META=$HOME/mnt/corpuses/finetune_corpus_10mers_test/fine_tune_tf_records/metadata.txt
OUTPUT_DIR=$HOME/mnt/models/albert_finetune_10mer_15_len
INIT_CHKPNT=$HOME/mnt/models/albert_pretrain_10mer_tf2_15_len/ctl_step_31250.ckpt-1
VOCAB=$HOME/mnt/vocab/10mers.vocab
SPM_MODEL=$HOME/mnt/vocab/10mers.model
export PYTHONPATH=$PYTHONPATH:../../albert_tf2
cd ../../albert_tf2
python run_classifer.py \
--albert_config_file=$ALBERT_CONFIG \
--eval_data_path=$EVAL \
--input_meta_data_path=$META \
--train_data_path=$TRAIN \
--strategy_type=mirror \
--output_dir=$OUTPUT_DIR \
--vocab_file=$VOCAB \
--spm_model_file=$SPM_MODEL \
--do_train=True \
--do_eval=True \
--do_predict=False \
--max_seq_length=15 \
--optimizer=AdamW \
--task_name=GENOMIC \
--train_batch_size=32 \
--init_checkpoint=$INIT_CHKPNT
ALBERT-TF2.0/run_pretraining.py
Line 174 in 8d0cc21
when you run squad 2.0 training, you used default value for "version_2_with_negative", which is false, should you be using "True" instead?
In order to save the model, I have added this line after the training loop:
tf.saved_model.save(model, os.path.join(FLAGS.output_dir, "1") )
in order to get: assets, saved_model.pb and variables
from there, I am trying to load the model and predict a single value:
loaded = tf.saved_model.load( os.path.join(model_dir, "1") )
tokenizer = tokenization.FullTokenizer(vocab_file=None,spm_model_file=spm_model_file, do_lower_case=True)
text_a = "the movie was not good"
example = classifier_data_lib.InputExample(guid=0, text_a=text_a, text_b=None, label=0)
labels = [0, 1]
max_seq_length = 128
feature = classifier_data_lib.convert_single_example(ex_index=0, example=example, label_list=labels, max_seq_length=max_seq_length, tokenizer=tokenizer)
test_input_word_ids =tf.convert_to_tensor([feature.input_ids], dtype=tf.int32, name='input_word_ids')
test_input_mask =tf.convert_to_tensor([feature.input_mask], dtype=tf.int32, name='input_mask')
test_input_type_ids =tf.convert_to_tensor([feature.segment_ids], dtype=tf.int32, name='input_type_ids')
logit = loaded.signatures["serving_default"]( input_mask=test_input_mask,input_type_ids=test_input_type_ids,input_word_ids=test_input_word_ids )
pred = tf.argmax(logit['output'], axis=-1, output_type=tf.int32)
prob = tf.nn.softmax(logit['output'], axis=-1)
print(f'Prediction: {pred} Probabilities: {prob}')
This solution works for a single value. Thanks
Hi, I am working on STS-B data set and I am executing the following commands in Ubuntu
`export GLUE_DIR=glue_data
export ALBERT_DIR=model_configs/large
export TASK_NAME=STS
export OUTPUT_DIR=stsb_processed
mkdir $OUTPUT_DIR
python create_finetuning_data.py
--input_data_dir=${GLUE_DIR}/
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model
--train_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record
--eval_data_output_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record
--meta_data_file_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data
--fine_tuning_task_type=classification --max_seq_length=128
--classification_task_name=${TASK_NAME}
`
I1206 14:39:44.645808 139799230306112 classifier_data_lib.py:761] Writing example 0 of 5749 Traceback (most recent call last): File "create_finetuning_data.py", line 149, in <module> app.run(main) File "/home/chirag/venv/lib/python3.6/site-packages/absl/app.py", line 299, in run _run_main(main, args) File "/home/chirag/venv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main sys.exit(main(argv)) File "create_finetuning_data.py", line 137, in main input_meta_data = generate_classifier_dataset() File "create_finetuning_data.py", line 122, in generate_classifier_dataset do_lower_case=FLAGS.do_lower_case) File "/home/chirag/git/ALBERT-TF2.0/classifier_data_lib.py", line 835, in generate_tf_record_from_data_file train_data_output_path) File "/home/chirag/git/ALBERT-TF2.0/classifier_data_lib.py", line 764, in file_based_convert_examples_to_features max_seq_length, tokenizer) File "/home/chirag/git/ALBERT-TF2.0/classifier_data_lib.py", line 732, in convert_single_example label_id = label_map[example.label] KeyError: 5.0
Any idea on why I am getting KeyError ? Thanks in advance.
I am having and 1660Ti with 6GB memory but still when i check the usage of the GPU it is using only 2 to 4 % can you tell me why this is happening or is there a way i can make it use my GPU
I am running the following command for COLA example
python run_classifer.py --train_data_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record --eval_data_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record --input_meta_data_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data --albert_config_file=${ALBERT_DIR}/config.json --task_name=${TASK_NAME} --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model --output_dir=${MODEL_DIR} --init_checkpoint=${ALBERT_DIR}/tf2_model.h5 --do_train --do_eval --train_batch_size=16 --learning_rate=1e-5 --custom_training_loop
I have also created the tf2.h5 model files using this link [https://github.com/kamalkraj/ALBERT-TF2.0/blob/master/converter.md]
but still iam getting an OOM error can yopu help in this :
Limit: 2312241152
InUse: 2299682816
MaxInUse: 2299704576
NumAllocs: 1254
MaxAllocSize: 31680256
2020-01-17 12:39:30.108357: W tensorflow/core/common_runtime/bfc_allocator.cc:424] ****************************************************************************************************
2020-01-17 12:39:30.108429: W tensorflow/core/framework/op_kernel.cc:1622] OP_REQUIRES failed at cwise_ops_common.cc:82 : Resource exhausted: OOM when allocating tensor with shape[16,128,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2020-01-17 12:39:30.108517: W tensorflow/core/common_runtime/base_collective_executor.cc:216] BaseCollectiveExecutor::StartAbort Resource exhausted: OOM when allocating tensor with shape[16,128,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node model_1/albert_model/encoder/shared_layer_10/intermediate/add}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
Traceback (most recent call last):
File "run_classifer.py", line 452, in <module>
app.run(main)
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_classifer.py", line 355, in main
custom_callbacks = custom_callbacks)
File "/media/xxxx/NewVolume/ALBERT-TF2.0/model_training_utils.py", line 324, in run_customized_training_loop
train_single_step(train_iterator)
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 457, in __call__
result = self._call(*args, **kwds)
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/tensorflow_core/python/eager/def_function.py", line 520, in _call
return self._stateless_fn(*args, **kwds)
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1823, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1141, in _filtered_call
self.captured_inputs)
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 1224, in _call_flat
ctx, args, cancellation_manager=cancellation_manager)
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/tensorflow_core/python/eager/function.py", line 511, in call
ctx=ctx)
File "/media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/tensorflow_core/python/eager/execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[16,128,3072] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node model_1/albert_model/encoder/shared_layer_10/intermediate/add (defined at /media/xxxx/NewVolume/kamal/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py:1751) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.
[Op:__inference_train_single_step_24488]
Function call stack:
train_single_step
What's the command to create fine-tuning data?
Example?
Where can i find the large//tf2_model.h5, i am trying to execute the example COLA for running the run_classifer.py file
I am doing pre-training from scratch. It seems that training is started as gpu's are being used but nothing is on terminal except this:
***** Number of cores used : 4
I0227 09:00:31.841020 140137372948224 run_pretraining.py:226] Training using customized training loop TF 2.0 with distrubutedstrategy.
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:44.563593 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:44.569019 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:45.620952 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:45.625989 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:46.679141 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:46.684157 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:47.734523 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:47.739573 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:57.697876 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I0227 09:00:57.703157 140137372948224 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:batch_all_reduce: 32 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I0227 09:01:07.835676 140137372948224 cross_device_ops.py:748] batch_all_reduce: 32 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 32 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I0227 09:01:28.672055 140137372948224 cross_device_ops.py:748] batch_all_reduce: 32 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
2020-02-27 09:01:50.162839: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
I tried on smaller text data also but same results.
@kamalkraj
Hi,
Thanks for the code!
I'm trying to run the code on squad2.0 dataset. I used the Version 2 base model (Not the xxlarge one When I ran
python3 run_squad.py --mode=train_and_predict --input_meta_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}meta_data --train_data_path=${OUTPUT_DIR}/squad${SQUAD_VERSION}_train.tf_record --predict_file=${SQUAD_DIR}/dev-${SQUAD_VERSION}.json --albert_config_file=${ALBERT_DIR}/config.json --init_checkpoint=${ALBERT_DIR}/tf2_model.h5 --spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model --train_batch_size=24 --predict_batch_size=24 --learning_rate=1.5e-5 --num_train_epochs=3 --model_dir=${OUTPUT_DIR} --strategy_type=mirror --version_2_with_negative --max_seq_length=384
An exception occurred,
AttributeError: 'AdamWeightDecay' object has no attribute '_decayed_lr_t'
i ran the code on Ubuntu 18.04, and my tensorflow versions are as follows
tb-nightly 1.14.0a20190603
tensorboard 1.14.0
tensorflow 2.0.0b1
tensorflow-estimator 1.14.0
tensorflow-gpu 2.0.0b1
tf-estimator-nightly 1.14.0.dev2019060501
Is there something wrong with the versions?
The detailed error information are as follows.
W1116 11:33:43.881644 140642252482304 optimizer_v2.py:979] Gradients does not exist for variables ['albert_model/pooler_transform/kernel:0', 'albert_model/pooler_transform/bias:0'] when minimizing the loss.
I1116 11:33:43.977660 140648276031296 coordinator.py:219] Error reported to Coordinator: 'AdamWeightDecay' object has no attribute '_decayed_lr_t'
Traceback (most recent call last):
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
yield
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 189, in _call_for_each_replica
**merge_kwargs)
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 476, in _distributed_apply
var, apply_grad_to_update_var, args=(grad,), group=False))
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py", line 1458, in update
return self._update(var, fn, args, kwargs, group)
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py", line 766, in _update
**values.select_device_mirrored(d, kwargs)))
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 460, in apply_grad_to_update_var
grad.values, var, grad.indices)
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 850, in _resource_apply_sparse_duplicate_indices
return self._resource_apply_sparse(summed_grad, handle, unique_indices)
File "/home/cjy/Albert/ALBERT/optimization.py", line 168, in _resource_apply_sparse
var.device, var.dtype.base_dtype, apply_state)
File "/home/cjy/Albert/ALBERT/optimization.py", line 148, in _get_lr
return self._decayed_lr_t[var_dtype], {}
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 542, in getattribute
raise e
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py", line 532, in getattribute
return super(OptimizerV2, self).getattribute(name)
AttributeError: 'AdamWeightDecay' object has no attribute '_decayed_lr_t'
Traceback (most recent call last):
File "run_squad.py", line 845, in
app.run(main)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "run_squad.py", line 837, in main
train_squad(strategy, input_meta_data)
File "run_squad.py", line 742, in train_squad
custom_callbacks=custom_callbacks)
File "/home/cjy/Albert/ALBERT/model_training_utils.py", line 328, in run_customized_training_loop
tf.convert_to_tensor(steps, dtype=tf.int32))
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 416, in call
self._initialize(args, kwds, add_initializers_to=initializer_map)
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 359, in _initialize
*args, **kwds))
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1360, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1648, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 1541, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 716, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 309, in wrapped_fn
return weak_wrapped_fn().wrapped(*args, **kwds)
File "/home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 706, in wrapper
raise e.ag_error_metadata.to_exception(type(e))
AttributeError: in converted code:/home/cjy/Albert/ALBERT/model_training_utils.py:239 train_steps * strategy.experimental_run_v2(_replicated_step, args=(next(iterator),)) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:708 experimental_run_v2 return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:1710 call_for_each_replica return self._call_for_each_replica(fn, args, kwargs) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py:708 _call_for_each_replica fn, args, kwargs) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py:195 _call_for_each_replica coord.join(threads) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py:389 join six.reraise(*self._exc_info_to_raise) /usr/lib/python3/dist-packages/six.py:693 reraise raise value /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py:297 stop_on_exception yield /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py:189 _call_for_each_replica **merge_kwargs) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:476 _distributed_apply var, apply_grad_to_update_var, args=(grad,), group=False)) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/distribute_lib.py:1458 update return self._update(var, fn, args, kwargs, group) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_strategy.py:766 _update **values.select_device_mirrored(d, kwargs))) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:460 apply_grad_to_update_var grad.values, var, grad.indices) /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:850 _resource_apply_sparse_duplicate_indices return self._resource_apply_sparse(summed_grad, handle, unique_indices) /home/cjy/Albert/ALBERT/optimization.py:168 _resource_apply_sparse var.device, var.dtype.base_dtype, apply_state) /home/cjy/Albert/ALBERT/optimization.py:148 _get_lr return self._decayed_lr_t[var_dtype], {} /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:542 __getattribute__ raise e /home/cjy/.local/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py:532 __getattribute__ return super(OptimizerV2, self).__getattribute__(name) AttributeError: 'AdamWeightDecay' object has no attribute '_decayed_lr_t'`
Hi,
I have finetuned the base_2 model on squad2.0 for 3 epochs. Now I would like to continue the training process for another several epochs, and when I run the training instruction, the training process immediately ended.
What option should I add to continue the finetuning?
Thanks!
Running the cola script returns:
2020-01-15 17:53:21.504699: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2020-01-15 17:53:21.505194: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-01-15 17:53:21.518577: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3599910000 Hz
2020-01-15 17:53:21.519665: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3c2f130 executing computations on platform Host. Devices:
2020-01-15 17:53:21.519701: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version
Traceback (most recent call last):
File "run_classifer.py", line 457, in <module>
app.run(main)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/usr/local/lib/python3.6/dist-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_classifer.py", line 307, in main
loss_multiplier=loss_multiplier)
File "run_classifer.py", line 195, in get_model
pooled_output, _ = albert_layer(input_word_ids, input_mask, input_type_ids)
File "/root/ALBERT-TF2.0/albert.py", line 212, in __call__
return super(AlbertModel, self).__call__(inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py", line 842, in __call__
outputs = call_fn(cast_inputs, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
raise e.ag_error_metadata.to_exception(e)
RuntimeError: in converted code:
/root/ALBERT-TF2.0/albert.py:229 call *
word_embeddings = self.embedding_lookup(input_word_ids)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:817 __call__
self._maybe_build(inputs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:2141 _maybe_build
self.build(input_shapes)
/root/ALBERT-TF2.0/albert.py:273 build
dtype=self.dtype)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer.py:522 add_weight
aggregation=aggregation)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/training/tracking/base.py:744 _add_variable_with_custom_getter
**kwargs_for_getter)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer_utils.py:139 make_variable
shape=variable_shape if variable_shape else None)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:258 __call__
return cls._variable_v1_call(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:219 _variable_v1_call
shape=shape)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:65 getter
return captured_getter(captured_previous, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/distribute_lib.py:1322 creator_with_resource_vars
return self._create_variable(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/distribute/one_device_strategy.py:262 _create_variable
return next_creator(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:197 <lambda>
previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variable_scope.py:2507 default_variable_creator
shape=shape)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/variables.py:262 __call__
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1406 __init__
distribute_strategy=distribute_strategy)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1537 _init_from_args
initial_value() if init_from_fn else initial_value,
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/keras/engine/base_layer_utils.py:119 <lambda>
init_val = lambda: initializer(shape, dtype=dtype)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/init_ops_v2.py:343 __call__
self.stddev, dtype)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/init_ops_v2.py:809 truncated_normal
shape=shape, mean=mean, stddev=stddev, dtype=dtype, seed=self.seed)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/random_ops.py:171 truncated_normal
mean_tensor = ops.convert_to_tensor(mean, dtype=dtype, name="mean")
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1184 convert_to_tensor
return convert_to_tensor_v2(value, dtype, preferred_dtype, name)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1242 convert_to_tensor_v2
as_ref=False)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/ops.py:1296 internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/tensor_conversion_registry.py:52 _default_conversion_function
return constant_op.constant(value, dtype, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:227 constant
allow_broadcast=True)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:235 _constant_impl
t = convert_to_eager_tensor(value, ctx, dtype)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/constant_op.py:96 convert_to_eager_tensor
return ops.EagerTensor(value, ctx.device_name, dtype)
RuntimeError: /job:localhost/replica:0/task:0/device:GPU:0 unknown device.
I am trying pre-training from scratch using GPUs in Japanese, but the pre-training seems strange.
In the following log, masked_lm_accuracy
and sentence_order_accuracy
suddenly dropped.
..
I1211 00:37:45.981178 139995264022336 model_training_utils.py:346] Train Step: 45595/273570 / loss = 0.8961147665977478 masked_lm_accuracy = 0.397345 lm_example_loss = 2.636538 sentence_order_accuracy = 0.772450 sentence_order_mean_loss = 0.425534
I1211 14:28:47.512063 139995264022336 model_training_utils.py:346] Train Step: 91190/273570 / loss = 0.7142021656036377 masked_lm_accuracy = 0.454914 lm_example_loss = 2.074183 sentence_order_accuracy = 0.810986 sentence_order_mean_loss = 0.372746
I1212 04:19:05.215945 139995264022336 model_training_utils.py:346] Train Step: 136785/273570 / loss = 1.9355322122573853 masked_lm_accuracy = 0.062883 lm_example_loss = 5.900585 sentence_order_accuracy = 0.572066 sentence_order_mean_loss = 0.668080
..
Has someone succeeded in pre-training from scratch?
In readme, performance of CoLA task is defined by accuracy. But this task always measured by Matthew correlation. Does it mean the say thing by just calling Matthew_corr as accuracy?
I am new to TF 2.0, I tried to save model by " tf.saved_model.save(squad_m......", but always get errors, such as: " start_positions = inputs["start_positions"] KeyError: 'start_positions'". I am guessing this is because the use of subclassing of keras_model: "class ALBertQAModel(tf.keras.Model):" , could you confirm or help me understand if otherwise?
Thanks,
Jim
Is it possible to do this and could you please, if possible, provide some general instructions?
Thanks in anticipation.
Hi,
Can the script do predict? I may miss it, but I didn't see a "do_pred" flag.
I have generated pretraining data using the given steps in this repo.
I am doing this for the Hindi language with 22gb of data. Generating pretraining data itself took 1 month!
So I have meta_data
file associated with each tf.record file. I have added all the train_data_size
values from all the meta_data
files to make one meta_data
file because in run_pretraining.py
requires it. So my final meta_data
file which looks something like this:
{
"task_type": "albert_pretraining",
"train_data_size": 596972848,
"max_seq_length": 512,
"max_predictions_per_seq": 20
}
Here number of training steps are calculated as below:
num_train_steps = int(total_train_examples / train_batch_size) * num_train_epochs
So total_train_examples
is 596972848 hence I am getting num_train_steps
to be 9327700 with batch size of 64 and with 1 epoch only. I saw that in readme here num_train_steps=125000
. I am not getting whats went wrong here.
With such huge train steps, it will take forever to train Albert. Even if I make batch size to 512 with 1 epoch only the training step will be 1165962 which is still huge!
As Albert was trained on very huge data why there are only 125000 steps only?
Want to know-how many epochs are there in Albert training for English?
Can anyone suggest what went wrong and what should I do now?
I have a question regarding your experiment finetuning for SQuAd 2.0 with 4x Titan RTX 24 GB. How long was the total training time? Iยดm running the same experiment with 8x Tesla V100 16 GB which according to my calculations takes about 200 hrs. I was expecting much lower training time with 8 GPUs.
python albert-tf2/run_squad.py
--mode=train_and_predict
--input_meta_data_path=${OUTPUT_DIR}/squad_${SQUAD_VERSION}meta_data
--train_data_path=${OUTPUT_DIR}/squad${SQUAD_VERSION}_train.tf_record
--predict_file=${SQUAD_DIR}/dev-${SQUAD_VERSION}.json
--albert_config_file=${ALBERT_DIR}/config.json
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5
--spm_model_file=${ALBERT_DIR}/30k-clean.model
--train_batch_size=32
--predict_batch_size=32
--learning_rate=1.5e-5
--num_train_epochs=3
--model_dir=${OUTPUT_DIR}
--strategy_type=mirror
--version_2_with_negative
--max_seq_length=384
Thanks in advance!
I have tried to perform pre-training from scratch on GPUs using the following command:
python run_pretraining.py --albert_config_file=albert_config.json --do_train --input_files=/somewhere/*/tf_examples.*.tfrecord --meta_data_file_path=/somewhere/train_meta_data --output_dir=/somewhere --strategy_type=mirror --train_batch_size=128 --num_train_epochs=2
But it seems to be stuck as follows:
...
INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
I1209 00:48:14.076103 139679391237952 cross_device_ops.py:427] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
INFO:tensorflow:batch_all_reduce: 32 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 00:48:24.566839 139679391237952 cross_device_ops.py:748] batch_all_reduce: 32 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
INFO:tensorflow:batch_all_reduce: 32 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
I1209 00:48:45.377745 139679391237952 cross_device_ops.py:748] batch_all_reduce: 32 all-reduces with algorithm = nccl, num_packs = 1, agg_small_grads_max_bytes = 0 and agg_small_grads_max_group = 10
2019-12-09 00:49:16.104345: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
GPUs are running, but no outputs are found.
The core of pre-training code is similar to the following tensorflow BERT code, and I have succeeded in running the following pre-training code.
https://github.com/tensorflow/models/tree/master/official/nlp/bert
My environment is as follows:
Thanks in advance.
It might be nice to, as a final step, show an instance of an actual inference on the model so a reader can "tie it all together". It isn't strictly useful, but for anyone who doesn't know a lot of the terminology it would bring it home.
@kamalkraj Thank you for giving clear instructions with respect to SQuAD which wasn't available in the main repo
With the following parameters:
run_squad.py --mode=predict \
--albert_config_file=../albert_base_resources/config.json \
--model_dir=../albert_base_resources/ \
--input_meta_data_path=../squad_out_v1.1/squad_v1.1_meta_data \
--predict_file=../squad_dataset/dev_small.json.txt \
--spm_model_file=../albert_base_resources/vocab/30k-clean.model
Fails with object referred before assigned
So, shouldn't this line:
Lines 565 to 567 in 8d0cc21
if FLAGS.version_2_with_negative:
predicted = get_raw_results_v2(predictions)
else:
predicted = get_raw_results(predictions)
for result in predicted:
After training a model for some epochs, how can I restore it and continue training from the checkpoints outputted as they are not in the hdf5 format?
What's the equivalent of bert-joint nq dataset prep script for ALBERT?
I run convert.py to convert albert tensorhub model to TF2.0 model with following commands
MODEL_DIR=albert-base
SIZE=base
# Converting weights to TF 2.0
python converter.py --tf_hub_path=${MODEL_DIR}/ --model_type=albert_encoder --version=2 --model=${SIZE}
# Copy albert_config.json to config.json
cp ${MODEL_DIR}/assets/albert_config.json ${MODEL_DIR}/config.json
# Rename assets to vocab
mv ${MODEL_DIR}/assets/ ${MODEL_DIR}/vocab
however, at the end of converting, it shows following messages
Done loading 25 ALBERT weights from: pretrain/albert-base-v2// into <albert.AlbertModel object at 0x7f393e172b00> (prefix:albert). Count of weights not found in the checkpoint was: [0]. Count of weights with mismatched shape: [0]
Unused weights from saved model:
cls/predictions/output_bias
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/kernel
Is this message showed convert success?
Hi,
Thanks for your code :) It's very helpful for me to study ALBERT.
As long as I know ALBERT batch size is 4096 on the paper.
Have you ever tried to pretrain from scratch via GPU?
I've seen your guide for squad fine tuning but couldn't find any information about pretraining from scratch.
Please let me know if you have any info on that.
Running the cola returns:
FATAL Flags parsing error: flag --classification_task_name=CoLA: value should be one of <COLA|STS|SST|MNLI|QNLI|QQP|RTE|MRPC|WNLI|XNLI>
Hi @kamalkraj Thank you for the previous fix.
I am working on STS-B data set and I am executing the following commands in Ubuntu
export GLUE_DIR=glue_data
export ALBERT_DIR=model_configs/large
export TASK_NAME=STS
export OUTPUT_DIR=stsb_processed
mkdir $OUTPUT_DIR
export MODEL_DIR=output_stsb
python run_classifer.py \
--train_data_path=${OUTPUT_DIR}/${TASK_NAME}_train.tf_record \
--eval_data_path=${OUTPUT_DIR}/${TASK_NAME}_eval.tf_record \
--input_meta_data_path=${OUTPUT_DIR}/${TASK_NAME}_meta_data \
--albert_config_file=${ALBERT_DIR}/config.json \
--task_name=${TASK_NAME} \
--spm_model_file=${ALBERT_DIR}/vocab/30k-clean.model \
--output_dir=${MODEL_DIR} \
--init_checkpoint=${ALBERT_DIR}/tf2_model.h5 \
--do_train \
--do_eval \
--train_batch_size=16 \
--learning_rate=1e-5 \
--custom_training_loop
I1209 13:14:37.739436 140685254485824 run_classifer.py:306] ***** Running training *****
I1209 13:14:37.739539 140685254485824 run_classifer.py:307] Num examples = 5749
I1209 13:14:37.739591 140685254485824 run_classifer.py:308] Batch size = 16
I1209 13:14:37.739633 140685254485824 run_classifer.py:309] Num steps = 1077
Traceback (most recent call last):
File "run_classifer.py", line 452, in
app.run(main)
File "/home/vv/venvv/lib/python3.6/site-packages/absl/app.py", line 299, in run
_run_main(main, args)
File "/home/vv/venvv/lib/python3.6/site-packages/absl/app.py", line 250, in _run_main
sys.exit(main(argv))
File "run_classifer.py", line 355, in main
custom_callbacks = custom_callbacks)
File "/home/vv/git/ALBERT-TF2.0/model_training_utils.py", line 155, in run_customized_training_loop
assert tf.executing_eagerly()
AssertionError
Any idea on the same?
When training I see progress followed by degradation. This is (likely) because the model is over fitting due to the limited corpus size of 8k samples. What is happening is we are overwriting the pre-trained weights in the fine-tuning task. What we would like to do is freeze the original layers. We need to figure out how to do this.
I have tried to run run_classifer.py, it works well on GPU, then I have made a little fix, it runs well on TPU. However, when I tried to run run_squad.py, I met this bug on GPU and TPU:
Model was constructed with shape Tensor("unique_ids:0", shape=(None, 1), dtype=int32) for input (None, 1), but it was re-called on a Tensor with incompatible shape (None,).
WARNING:tensorflow:Gradients do not exist for variables ['albert_model/pooler_transform/kernel:0', 'albert_model/pooler_transform/bias:0'] when minimizing the loss.
W1211 11:49:55.828053 139666686506368 optimizer_v2.py:1043] Gradients do not exist for variables ['albert_model/pooler_transform/kernel:0', 'albert_model/pooler_transform/bias:0'] when minimizing the loss.
WARNING:tensorflow:Model was constructed with shape Tensor("unique_ids:0", shape=(None, 1), dtype=int32) for input (None, 1), but it was re-called on a Tensor with incompatible shape (None,).
W1211 11:49:59.275795 139666686506368 network.py:847] Model was constructed with shape Tensor("unique_ids:0", shape=(None, 1), dtype=int32) for input (None, 1), but it was re-called on a Tensor with incompatible shape (None,).
WARNING:tensorflow:Gradients do not exist for variables ['albert_model/pooler_transform/kernel:0', 'albert_model/pooler_transform/bias:0'] when minimizing the loss.
W1211 11:50:02.947960 139666686506368 optimizer_v2.py:1043] Gradients do not exist for variables ['albert_model/pooler_transform/kernel:0', 'albert_model/pooler_transform/bias:0'] when minimizing the loss.
Hi There, I am having some issues getting the model to finetune.
I'm sort of confused and could use some help. Is there a forum I could ask for help?
The issue is that the model doesn't learn, it just stays at ~ 0.5 accuracy. (N.B. the output is 2 class dense)
Here's a sample output:
input_word_ids (InputLayer) [(None, 35)]
input_mask (InputLayer) [(None, 35)]
input_type_ids (InputLayer) [(None, 35)]
albert_model (AlbertModel) [(None, 1024)], (None 17683968)
input_word_ids[0][0]
input_mask[0][0]
input_type_ids[0][0]
dropout (Dropout) (None, 1024) 0 albert_model[0][0]
output (Dense) (None, 2) 2050 dropout[0][0]
Total params: 17,686,018
Trainable params: 17,686,018
Non-trainable params: 0
I0416 20:14:06.850114 140122845333248 finetune.py:186] ***** Running training *****
I0416 20:14:06.850288 140122845333248 finetune.py:187] Num examples = 52500
I0416 20:14:06.850376 140122845333248 finetune.py:188] Batch size = 32
I0416 20:14:06.850451 140122845333248 finetune.py:189] Num steps = 32812
Train on 47261 samples, validate on 5252 samples
Epoch 1/20
2020-04-16 20:14:41.742967: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
4064/47261 [=>............................] - ETA: 25:16 - loss: 0.8179 - sparse_categorical_accuracy: 0.4783
Could you give some more info about the weights linked here? Trained on English corpus only? As the original article?
You write the last layers is not available. This would probably mean they can not be used for additional domain specific pre-training, right? What would be required for doing this?
Hi,
Can you plz share ALBERT xxlarge model fine-tuned on SQUAD 2 and if possible a REST API same as your previous BERT-SQUAD (https://github.com/kamalkraj/BERT-SQuAD) or at least the inference code where the program takes paragraph and question as input and return the answer as output.
I have limited resources and time to fine-tune the model. So requesting you to share the same.
Thank You!!!
I want to train AlBert from scratch in a non-English language. I have access to a corpus of 1-2 B words. Would that be sufficient?
Would training on one single Cloud TPU v3 with 128Gb RAM be feasible? Can you give an estimated training time for base, large and xlarge?
i am trying to inference online with tensorflow2.0. my code is as follows:
self.graph = tf.Graph() with self.graph.as_default() as g: self.input_ids = tf.compat.v1.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.max_seq_length], name="input_ids") self.input_mask = tf.compat.v1.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.max_seq_length], name="input_mask") self.p_mask = tf.compat.v1.placeholder(tf.float32, [FLAGS.batch_size, FLAGS.max_seq_length], name="p_mask") self.segment_ids = tf.compat.v1.placeholder(tf.int32, [FLAGS.batch_size, FLAGS.max_seq_length], name="segment_ids") self.cls_index = tf.compat.v1.placeholder(tf.int32, [FLAGS.batch_size], name="segment_ids") self.unique_ids = tf.compat.v1.placeholder(tf.int32, [FLAGS.batch_size], name="unique_ids") # unpacked_inputs = tf_utils.unpack_inputs(inputs) self.squad_model = ALBertQAModel( albert_config, FLAGS.max_seq_length, init_checkpoint, FLAGS.start_n_top, FLAGS.end_n_top, FLAGS.squad_dropout) learning_rate_fn = tf.keras.optimizers.schedules.PolynomialDecay(initial_learning_rate=1e-5, decay_steps=10000, end_learning_rate=0.0) optimizer_fn = AdamWeightDecay optimizer = optimizer_fn( learning_rate=learning_rate_fn, weight_decay_rate=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-6, exclude_from_weight_decay=['layer_norm', 'bias']) self.squad_model.optimizer = optimizer graph_init_op = tf.compat.v1.global_variables_initializer() y = self.squad_model( self.unique_ids, self.input_ids, self.input_mask, self.segment_ids, self.cls_index, self.p_mask, training=False) self.unique_ids, self.start_tlp, self.start_ti, self.end_tlp, self.end_ti, self.cls_logits = y self.sess = tf.compat.v1.Session(graph=self.graph, config=gpu_config) self.sess.run(graph_init_op) with self.sess.as_default() as sess: self.squad_model.load_weights(FLAGS.model_dir)
This code is executable, but it runs bad result. It looks like the parameters are unloaded.I guess this is probably because I'm not using tf.Session to set default parameters on the model, such as' saver.restore(sess, tf.train. Latest_checkpoint (init_checkpoint)) '.
I've tried several ways to do this, but it hasn't worked.And there are very few examples of online inferencing using tensorflow2.0 on the Internet, and I have trouble finding a solution. :((((
May i get some help here, thx very much!!
should they be different?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.