separius / bert-keras Goto Github PK
View Code? Open in Web Editor NEWKeras implementation of BERT with pre-trained weights
License: GNU General Public License v3.0
Keras implementation of BERT with pre-trained weights
License: GNU General Public License v3.0
I don't quite understand one point. When I downloaded your keras representation of BERT and check the number of trainable parameters in summary, it showed ~177 mil parameters, while in official bert it should be 110 mil for base model. Could you explain where this difference comes from?
I was running the tutorial notebook on google colab and faced this issue.
# @title Compile keras model here
from transformer.train import train_model
if use_tpu:
assert 'COLAB_TPU_ADDR' in os.environ, 'ERROR: Not connected to a TPU runtime; Maybe you should switch hardware accelerator to TPU for TPU support'
import tensorflow as tf
tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
strategy = tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(tpu=tpu_address)
)
g_bert = tf.contrib.tpu.keras_to_tpu_model(
g_bert, strategy=strategy)
g_bert.compile('adam', 'mse')
InvalidArgumentError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1334 try:
-> 1335 return fn(*args)
1336 except errors.OpError as e:
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
1317 # Ensure any changes to the graph are reflected in the runtime.
-> 1318 self._extend_graph()
1319 return self._call_tf_sessionrun(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _extend_graph(self)
1352 with self._graph._session_run_lock(): # pylint: disable=protected-access
-> 1353 tf_session.ExtendSession(self._session)
1354
InvalidArgumentError: NodeDef mentions attr 'explicit_paddings' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=dilations:list(int),default=[1, 1, 1, 1]>; NodeDef: {{node layer_0/c_attn/conv1d}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
During handling of the above exception, another exception occurred:
InvalidArgumentError Traceback (most recent call last)
<ipython-input-9-8db02f074733> in <module>()
9 g_bert = tf.contrib.tpu.keras_to_tpu_model(
10 g_bert, strategy=strategy)
---> 11 g_bert.compile('adam', 'mse')
/content/bert_keras_repo/transformer/__init__.py in tpu_compile(self, optimizer, loss, metrics, loss_weights, sample_weight_mode, weighted_metrics, target_tensors, **kwargs)
36 sample_weight_mode, weighted_metrics,
37 target_tensors, **kwargs)
---> 38 initialize_uninitialized_variables() # for unknown reason, we should run this after compile sometimes
39 KerasTPUModel.compile = tpu_compile
40
/content/bert_keras_repo/transformer/__init__.py in initialize_uninitialized_variables()
15 from tensorflow.contrib.tpu.python.tpu.keras_support import KerasTPUModel
16 def initialize_uninitialized_variables():
---> 17 sess = K.get_session()
18 uninitialized_variables = set([i.decode('ascii') for i in sess.run(tf.report_uninitialized_variables())])
19 init_op = tf.variables_initializer(
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py in get_session()
430 if not _MANUAL_VAR_INIT:
431 with session.graph.as_default():
--> 432 _initialize_variables(session)
433 return session
434
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py in _initialize_variables(session)
706 # marked as initialized.
707 is_initialized = session.run(
--> 708 [variables_module.is_variable_initialized(v) for v in candidate_vars])
709 uninitialized_vars = []
710 for flag, v in zip(is_initialized, candidate_vars):
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
928 try:
929 result = self._run(None, fetches, feed_dict, options_ptr,
--> 930 run_metadata_ptr)
931 if run_metadata:
932 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
1151 if final_fetches or final_targets or (handle and feed_dict_tensor):
1152 results = self._do_run(handle, final_targets, final_fetches,
-> 1153 feed_dict_tensor, options, run_metadata)
1154 else:
1155 results = []
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
1327 if handle is None:
1328 return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1329 run_metadata)
1330 else:
1331 return self._do_call(_prun_fn, handle, feeds, fetches)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
1347 pass
1348 message = error_interpolation.interpolate(message, self._graph)
-> 1349 raise type(e)(node_def, op, message)
1350
1351 def _extend_graph(self):
InvalidArgumentError: NodeDef mentions attr 'explicit_paddings' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=dilations:list(int),default=[1, 1, 1, 1]>; NodeDef: node layer_0/c_attn/conv1d (defined at bert_keras_repo/transformer/model.py:20) . (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).
Errors may have originated from an input operation.
Input Source operations connected to node layer_0/c_attn/conv1d:
layer_normalization/gamma (defined at bert_keras_repo/transformer/layers.py:45)
layer_normalization/beta (defined at bert_keras_repo/transformer/layers.py:46)
layer_normalization/Mean/reduction_indices (defined at bert_keras_repo/transformer/layers.py:50)
layer_normalization/add/y (defined at bert_keras_repo/transformer/layers.py:52)
PositionEmbedding/embeddings (defined at bert_keras_repo/transformer/embedding.py:67)
token_input (defined at bert_keras_repo/transformer/model.py:63)
position_input (defined at bert_keras_repo/transformer/model.py:65)
segment_input (defined at bert_keras_repo/transformer/model.py:64)
TokenEmbedding/embeddings (defined at bert_keras_repo/transformer/embedding.py:68)
layer_normalization/Mean_1/reduction_indices (defined at bert_keras_repo/transformer/layers.py:51)
keras_learning_phase/input (defined at bert_keras_repo/transformer/embedding.py:73)
SegmentEmbedding/embeddings (defined at bert_keras_repo/transformer/embedding.py:66)
Thanks for your code. Is there any plan to use Colab & TPU?
i have installed tensorflow-gpu 1.7.0, but when i run this code on GPU, it got error, "ModuleNotFoundError: No module named 'tensorflow.contrib.tpu.python.tpu.keras_support'", how can i solve it?
I have a fundamental question about BERT, do I need to run that only on TPU or it is doable for running on GPU and also for CPU as well?
Because I am testing the example on my local machine without connecting to my cloud TPU or GPU, and wondering why I face with this error after running tutorial.py:
File "/Users/.../BERT-keras/transformer/init.py", line 16, in tpu_compatible
from tensorflow.contrib.tpu.python.tpu.keras_support import KerasTPUModel
ModuleNotFoundError: No module named 'tensorflow.contrib'
if it will resolve with GPU, I can use eGPU and give it a go.
Hello, I tried to simplify your code for NER task. I made a model as below
def load_model(self):
self.encoder = create_transformer(embedding_layer_norm=True,
neg_inf=-10000.0,
use_attn_mask=self.config.use_attn_mask,
vocab_size=self.bert_config.vocab_size,
accurate_gelu=True,
layer_norm_epsilon=1e-12,
max_len=self.config.max_len,
use_one_embedding_dropout=True,
d_hid=self.bert_config.intermediate_size,
embedding_dim=self.bert_config.hidden_size,
num_layers=self.bert_config.num_hidden_layers,
num_heads=self.bert_config.num_attention_heads,
residual_dropout=self.bert_config.hidden_dropout_prob,
attention_dropout=self.bert_config.attention_probs_dropout_prob)
self.encoder = load_google_bert(self.encoder, self.bert_config.vocab_size, self.config.bert_dir_path, self.config.max_len, self.config.verbose)
decoder = Dense(units=self.config.num_classes)
logits = TimeDistributed(decoder)(
Dropout(self.config.dropout)(self.encoder.outputs[0]))
task_target = Input(batch_shape=(None, self.config.max_len,), dtype='int32')
task_mask = Input(batch_shape=(None, self.config.max_len), dtype='int32')
task_loss = Lambda(lambda x: masked_classification_loss(x[0], x[1], x[2]))([task_target, logits, task_mask])
# sharing layers between training model and prediction model
self.train_model = Model(inputs=self.encoder.inputs+[task_target, task_mask], outputs=task_loss)
self.model = Model(inputs=self.encoder.inputs, outputs=logits)
def compile(self, *args, **kwargs):
return self.train_model.compile(*args, loss=pass_through_loss, **kwargs)
Then train the model by
model = XXXX(config)
model.compile(optimizer='adam')
earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
checkpoint = ModelCheckpoint(
os.path.join(config.dir_output, 'best-weights.h5'),
monitor='val_loss',
verbose=1,
save_best_only=True,
save_weights_only=True
)
model.train_model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch,
validation_data=dev_generator,
validation_steps=dev_steps, verbose=1, callbacks=[earlystop, checkpoint],
shuffle=False, epochs=100)
```.
In addition, I modified the function load_google_bert, commented the line
`weights[w_id][vocab_size + TextEncoder.EOS_OFFSET] = saved[3 + TextEncoder.BERT_UNUSED_COUNT]`
because the variable `TextEncoder.BERT_SPECIAL_COUNT` is 4 instead of 5,
so the created model does not have so many weigths.
Traceback (most recent call last):
File "tutorial.py", line 6, in
model_name='tutorial', vocab_size=20)
File "/media/bin_lab/C4F6073207B3A949/Linux/Bert/BERT-keras-master/data/vocab.py", line 59, in init
spm.SentencePieceTrainer.Train(
NameError: name 'spm' is not defined
The pad_id
as defined in data/vocab.py
seems inconsistent with BERT's pad id.
Let's say I run:
bert_text_encoder = BERTTextEncoder(vocab_file = './google_bert/model/uncased_L-12_H-768_A-12/vocab.txt')
(In this example, the path for the vocab file is installed in the google_bert
submodule under a directory called model
.)
I can get pad_id
with:
bert_text_encoder.pad_id
which gives me 30522
.
BUT if I dig into the BERT tokenizer I get something else:
bert_text_encoder.tokenizer.inv_vocab[0]
which gives me '[PAD]'
, suggesting that the real id of the pad token is 0
.
In short, the id definitions in the TextEncoder
base class look like they disagree with the BERT tokenizer's use of ids.
I am looking at section 4.2 of the BERT paper or how to set up BERT for reading comprehension. It looks like a module needs to be added to the end of BERT, S and E are new parameters, and a log softmax loss is calculated for the start and end positions.
This extension is included in the original tensorflow BERT in the 'run_squad.py' script in the repository.
Does does an extension exist for BERT Keras?
I was trying to run the tutorial.ipynb where I encountered this error while running the following cell.
# This is a tutorial on using this library
# first off we need a text_encoder so we would know our vocab_size (and later on use it to encode sentences)
from data.vocab import SentencePieceTextEncoder # you could also import OpenAITextEncoder
sentence_piece_encoder = SentencePieceTextEncoder(text_corpus_address='openai/model/params_shapes.json',
model_name='tutorial', vocab_size=20)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-9-a0a8b2fa2e06> in <module>()
2
3 sentence_piece_encoder = SentencePieceTextEncoder(text_corpus_address='bert_keras_repo/openai/model/params_shapes.json',
----> 4 model_name='tutorial', vocab_size=20)
/content/bert_keras_repo/data/vocab.py in __init__(self, text_corpus_address, model_name, vocab_size, spm_model_type)
64 '--training_sentence_size=100000000'.format(
65 input=text_corpus_address, model_name=model_name, vocab_size=vocab_size, coverage=1,
---> 66 model_type=spm_model_type.lower()))
67 self.sp = spm.SentencePieceProcessor()
68 self.sp.load('{}.model'.format(model_name))
OSError: Not found: unknown field name "training_sentence_size" in TrainerSpec.
Is there any possible way to train a BERT-based classifier, using [CLS] vector, as described in BERT article?
I was able to load BERT encoder successfully using:
bert_encoder = load_google_bert(base_location='./google_bert/uncased_L-12_H-768_A-12/',
use_attn_mask=False, max_len=512, verbose=False)
, but I am not able to find a workaround in order to wrap the encoder in a keras Model (wrapper).
I was looking forward to something like that:
bert_encoder = load_google_bert(base_location='./google_bert/uncased_L-12_H-768_A-12/',
use_attn_mask=False, max_len=512, verbose=False)
outputs = Dense(n_classes, activation='softmax')(bert_encoder.outputs)
classifier = Model(inputs=bert_encoder.inputs, outputs=outputs)
classifier.compile()
classifier.fit()
.....
If there is such a solution, it would be the easiest way possible to use BERT!
I am using tf 1.13 on Google Colab.
When compiling the model for TPU I got an issue:
training_model_tpu = tf.contrib.tpu.keras_to_tpu_model(
training_model,
strategy=tf.contrib.tpu.TPUDistributionStrategy(
tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/generic_utils.py in class_and_config_for_serialized_keras_object(config, module_objects, custom_objects, printable_module_name)
164 cls = module_objects.get(class_name)
165 if cls is None:
--> 166 raise ValueError('Unknown ' + printable_module_name + ': ' + class_name)
167 return (cls, config['config'])
168
ValueError: Unknown layer: LayerNormalization
How do I compile a BERT-keras model for TPU?
When running the example notebook (on Colab), I receive the following error :
ValueError: You need tensorflow >= 1.3 for better keras tpu support!
This error appear when I execute :
g_bert.compile('adam', 'mse')
Any idea from where it comes from ?
I checked the version of Tensorflow with :
!pip list | grep tensorflow
And it gives me an expected result :
mesh-tensorflow 0.0.5
tensorflow 1.12.0
tensorflow-hub 0.2.0
tensorflow-metadata 0.9.0
tensorflow-probability 0.5.0
I'm trying to fine-tune BERT-keras on the STS-B dataset.
Did someone already use this repo to fine-tune BERT on an end-to-end task ? Is there code example for this ?
I have difficulties to make it work. My runtime die before even training on 1 single batch...
You can take a look at my notebook here : Colab
Hi, I have a dataset like :
From Monday to Friday most people are busy working or studying, but in the evenings and weekends they are free and _ themselves.
And there are four candidates for the missing blank area:
["love", "work", "enjoy", "play"], here "enjoy" is the correct answer, it is a cloze-style task, and it looks like the maskLM in the BERT, the difference is that I don't want to search the candidate from all the tokens but the four given candidates, how can I do this? It looks like negtive sampling method. Do you have any idea? Thank you!
When I use the BERT-keras, I don't understand this part:
class TextEncoder: PAD_OFFSET = 0 MSK_OFFSET = 1 BOS_OFFSET = 2 DEL_OFFSET = 3 # delimiter EOS_OFFSET = 4 SPECIAL_COUNT = 5 NUM_SEGMENTS = 2 BERT_UNUSED_COUNT = 99 # bert pretrained models BERT_SPECIAL_COUNT = 4 # they don't have DEL
Why would you set it up like this?
and the BERT_UNUSED_COUNT = 99 BERT_SPECIAL_COUNT = 4 are used in load_google_bert.
Hi
Just running the first cell of tutorial.ipynb
from data.vocab import SentencePieceTextEncoder # you could also import OpenAITextEncoder
sentence_piece_encoder = SentencePieceTextEncoder(text_corpus_address='/openai/model/params_shapes.json',model_name='tutorial', vocab_size=20)
Facing with this error from vocab.py file:
File "/Users/shabnamrashtchi/Dropbox/Deep leanring 2019_2020 reserch/Embedinng/BERT-keras-2/data/vocab.py", line 69, in init
model_type=spm_model_type.lower()))
OSError: Not found: unknown field name "training_sentence_size" in TrainerSpec.
Awesome job that you did !
Do you plan implementing the BERT pre-trained weights, now that they are released ?
I'm trying to fine tune BERT on STS-B dataset.
I used the following notebook to fine tune it using BERT-keras.
(As described in the paper, I just added a classification layer using the CLS token of the output of BERT).
However, there is great differences in performance and results between this notebook and the script used in the official version for fine tuning :
BERT-keras | Official BERT | |
---|---|---|
Pearson | 0.0254 | 0.8956 |
Spearman | 0.0289 | 0.7942 |
MSE | 2.2691 | 0.5456 |
Training time | 9h | 10min |
Note : Pearson / Spearman and correlation metrics used to evaluate the accuracy on the STS-B dataset
Why there is such a difference between the 2 approach ?
It's totally no problem when using tensorflow backend.
Now I test the theano.
When running train_model of tutorial.ipynb,we get 1d~2d tensor but not Tensortype(float32,3D) error from
T.nnet.softmax() of K.sparse_categorical_crossentropy
<ipython-input-22-27837df85ad1> in classification_loss(y_true, y_pred)
2 import keras.backend as K
3 def classification_loss(y_true, y_pred):
----> 4 return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
5 train.classification_loss = classification_loss
/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in sparse_categorical_crossentropy(target, output, from_logits, axis)
1788 target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
1789 target = reshape(target, shape(output))
-> 1790 return categorical_crossentropy(target, output, from_logits, axis=-1)
1791
1792
/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in categorical_crossentropy(target, output, from_logits, axis)
1762 target = permute_dimensions(target, permutation)
1763 if from_logits:
-> 1764 output = T.nnet.softmax(output)
1765 else:
1766 # scale preds so that the class probas of each sample sum to 1
/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in softmax(c)
813 if c.broadcastable[-1]:
814 warnings.warn("The softmax is applied on a dimension of shape 1, which does not have a semantic meaning.")
--> 815 return softmax_op(c)
816
817
/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
613 """
614 return_list = kwargs.pop('return_list', False)
--> 615 node = self.make_node(*inputs, **kwargs)
616
617 if config.compute_test_value != 'off':
/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in make_node(self, x)
428 or x.type.dtype not in tensor.float_dtypes:
429 raise ValueError('x must be 1-d or 2-d tensor of floats. Got %s' %
--> 430 x.type)
431 if x.ndim == 1:
432 warnings.warn("DEPRECATION: If x is a vector, Softmax will not automatically pad x "
ValueError: x must be 1-d or 2-d tensor of floats. Got TensorType(float32, 3D)
Then I use this to avoid it:
import keras.backend as K
_softmax = K.T.nnet.softmax
def softmax(x):
if x.ndim == 3:
d1,d2,d3 = x.shape
return _softmax(x.reshape((d1*d2,d3))).reshape((d1,d2,d3))
return _softmax(x)
K.T.nnet.softmax = softmax
but run
m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)
again
we get error:
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_logits and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
.format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 6), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
.format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_flatten and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 8, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
.format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_gather and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 8, 6), (None, 1)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
.format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 1), (None, 8, 2)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
.format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_random_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 25), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
.format(self.name, input_shape))
Epoch 1/100
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
902 outputs =\
--> 903 self.fn() if output_subset is None else\
904 self.fn(output_subset=output_subset)
/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
891 def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892 r = p(n, [x[0] for x in i], o)
893 for o in node.outputs:
/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
2338 if self.set_instead_of_inc:
-> 2339 out[0][inputs[2:]] = inputs[1]
2340 else:
IndexError: index 8 is out of bounds for axis 1 with size 6
During handling of the above exception, another exception occurred:
IndexError Traceback (most recent call last)
<ipython-input-39-7b7276d2ce06> in <module>()
1 m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
2 finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
----> 3 finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)
4 # now m is ready to be used!
5 print(m.inputs)
/content/bert_keras_repo/transformer/train.py in train_model(base_model, is_causal, tasks_meta_data, pretrain_generator, finetune_generator, pretrain_epochs, pretrain_optimizer, pretrain_steps, pretrain_callbacks, finetune_epochs, finetune_optimizer, finetune_steps, finetune_callbacks, verbose, TPUStrategy)
145
146 if pretrain_generator is not None:
--> 147 train_step(True)
148 if finetune_generator is not None:
149 train_step(False)
/content/bert_keras_repo/transformer/train.py in train_step(is_pretrain)
142 _model.fit_generator(_generator, steps_per_epoch=pretrain_steps if is_pretrain else finetune_steps,
143 verbose=verbose, callbacks=pretrain_callbacks if is_pretrain else finetune_callbacks,
--> 144 shuffle=False, epochs=pretrain_epochs if is_pretrain else finetune_epochs)
145
146 if pretrain_generator is not None:
/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
89 warnings.warn('Update your `' + object_name + '` call to the ' +
90 'Keras 2 API: ' + signature, stacklevel=2)
---> 91 return func(*args, **kwargs)
92 wrapper._original_function = func
93 return wrapper
/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
1416 use_multiprocessing=use_multiprocessing,
1417 shuffle=shuffle,
-> 1418 initial_epoch=initial_epoch)
1419
1420 @interfaces.legacy_generator_methods_support
/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
215 outs = model.train_on_batch(x, y,
216 sample_weight=sample_weight,
--> 217 class_weight=class_weight)
218
219 outs = to_list(outs)
/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
1215 ins = x + y + sample_weights
1216 self._make_train_function()
-> 1217 outputs = self.train_function(ins)
1218 return unpack_singleton(outputs)
1219
/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in __call__(self, inputs)
1386 def __call__(self, inputs):
1387 assert isinstance(inputs, (list, tuple))
-> 1388 return self.function(*inputs)
1389
1390
/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
915 node=self.fn.nodes[self.fn.position_of_error],
916 thunk=thunk,
--> 917 storage_map=getattr(self.fn, 'storage_map', None))
918 else:
919 # old-style linkers raise their own exceptions
/usr/local/lib/python3.6/dist-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
323 # extra long error message in that case.
324 pass
--> 325 reraise(exc_type, exc_value, exc_trace)
326
327
/usr/local/lib/python3.6/dist-packages/six.py in reraise(tp, value, tb)
690 value = tp()
691 if value.__traceback__ is not tb:
--> 692 raise value.with_traceback(tb)
693 raise value
694 finally:
/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
901 try:
902 outputs =\
--> 903 self.fn() if output_subset is None else\
904 self.fn(output_subset=output_subset)
905 except Exception:
/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
890 # default arguments are stored in the closure of `rval`
891 def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892 r = p(n, [x[0] for x in i], o)
893 for o in node.outputs:
894 compute_map[o][0] = True
/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
2337
2338 if self.set_instead_of_inc:
-> 2339 out[0][inputs[2:]] = inputs[1]
2340 else:
2341 np.add.at(out[0], tuple(inputs[2:]), inputs[1])
IndexError: index 8 is out of bounds for axis 1 with size 6
Apply node that caused the error: AdvancedIncSubtensor{inplace=False, set_instead_of_inc=True}(Alloc.0, TensorConstant{1}, ARange{dtype='int64'}.0, Reshape{1}.0)
Toposort index: 315
Inputs types: [TensorType(float32, matrix), TensorType(int8, scalar), TensorType(int64, vector), TensorType(int32, vector)]
Inputs shapes: [(64, 6), (), (64,), (64,)]
Inputs strides: [(24, 4), (), (8,), (4,)]
Inputs values: ['not shown', array(1, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[Reshape{3}(AdvancedIncSubtensor{inplace=False, set_instead_of_inc=True}.0, MakeVector{dtype='int64'}.0)]]
Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "bert_keras_repo/transformer/train.py", line 68, in train_model
[task_loss_weight, task_target, logits, task_mask])
File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 457, in __call__
output = self.call(inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/keras/layers/core.py", line 687, in call
return self.function(inputs, **arguments)
File "bert_keras_repo/transformer/train.py", line 67, in <lambda>
task_loss = Lambda(lambda x: x[0] * masked_classification_loss(x[1], x[2], x[3]), name=task.name + '_loss')(
File "bert_keras_repo/transformer/train.py", line 20, in masked_classification_loss
return _mask_loss(y_true, y_pred, y_mask, classification_loss)
File "bert_keras_repo/transformer/train.py", line 11, in _mask_loss
l = K.switch(y_mask, element_wise_loss(y_true, y_pred), K.zeros_like(y_mask, dtype=K.floatx()))
File "<ipython-input-22-27837df85ad1>", line 4, in classification_loss
return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py", line 1788, in sparse_categorical_crossentropy
target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
It seems it's not my coding bug because I checkout the branch back to that one is before tpu support.
Hi Separius,
Thank you and really nice work! I think in lm_dataset, the line 118 should be [0] * (len(sent.tokens) + 2) right? I fix this then the lm can run. Many thanks
I wanna a Chinese version BERT-keras, does it exist?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.