Giter Site home page Giter Site logo

separius / bert-keras Goto Github PK

View Code? Open in Web Editor NEW
814.0 31.0 198.0 565 KB

Keras implementation of BERT with pre-trained weights

License: GNU General Public License v3.0

Python 86.20% Jupyter Notebook 13.80%
keras transformer theano language-modeling nlp transfer-learning pretrained-models tensorflow

bert-keras's People

Contributors

highcwu avatar separius avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert-keras's Issues

number of trainable parameters

I don't quite understand one point. When I downloaded your keras representation of BERT and check the number of trainable parameters in summary, it showed ~177 mil parameters, while in official bert it should be 110 mil for base model. Could you explain where this difference comes from?

Error while compiling the model after loading the google-BERT model on tpu

I was running the tutorial notebook on google colab and faced this issue.

Code

# @title Compile keras  model here
from transformer.train import train_model
if use_tpu:
  assert 'COLAB_TPU_ADDR' in os.environ, 'ERROR: Not connected to a TPU runtime; Maybe you should switch hardware accelerator to TPU for TPU support'
  import tensorflow as tf
  tpu_address = 'grpc://' + os.environ['COLAB_TPU_ADDR']
  strategy = tf.contrib.tpu.TPUDistributionStrategy(
          tf.contrib.cluster_resolver.TPUClusterResolver(tpu=tpu_address)
  )
  g_bert = tf.contrib.tpu.keras_to_tpu_model(
                      g_bert, strategy=strategy)
g_bert.compile('adam', 'mse')  

Error

InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1334     try:
-> 1335       return fn(*args)
   1336     except errors.OpError as e:

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run_fn(feed_dict, fetch_list, target_list, options, run_metadata)
   1317       # Ensure any changes to the graph are reflected in the runtime.
-> 1318       self._extend_graph()
   1319       return self._call_tf_sessionrun(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _extend_graph(self)
   1352     with self._graph._session_run_lock():  # pylint: disable=protected-access
-> 1353       tf_session.ExtendSession(self._session)
   1354 

InvalidArgumentError: NodeDef mentions attr 'explicit_paddings' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=dilations:list(int),default=[1, 1, 1, 1]>; NodeDef: {{node layer_0/c_attn/conv1d}}. (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

During handling of the above exception, another exception occurred:

InvalidArgumentError                      Traceback (most recent call last)
<ipython-input-9-8db02f074733> in <module>()
      9   g_bert = tf.contrib.tpu.keras_to_tpu_model(
     10                       g_bert, strategy=strategy)
---> 11 g_bert.compile('adam', 'mse')

/content/bert_keras_repo/transformer/__init__.py in tpu_compile(self, optimizer, loss, metrics, loss_weights, sample_weight_mode, weighted_metrics, target_tensors, **kwargs)
     36                     sample_weight_mode, weighted_metrics,
     37                     target_tensors, **kwargs)
---> 38         initialize_uninitialized_variables() # for unknown reason, we should run this after compile sometimes
     39     KerasTPUModel.compile = tpu_compile
     40 

/content/bert_keras_repo/transformer/__init__.py in initialize_uninitialized_variables()
     15     from tensorflow.contrib.tpu.python.tpu.keras_support import KerasTPUModel
     16     def initialize_uninitialized_variables():
---> 17         sess = K.get_session()
     18         uninitialized_variables = set([i.decode('ascii') for i in sess.run(tf.report_uninitialized_variables())])
     19         init_op = tf.variables_initializer(

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py in get_session()
    430   if not _MANUAL_VAR_INIT:
    431     with session.graph.as_default():
--> 432       _initialize_variables(session)
    433   return session
    434 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/backend.py in _initialize_variables(session)
    706     # marked as initialized.
    707     is_initialized = session.run(
--> 708         [variables_module.is_variable_initialized(v) for v in candidate_vars])
    709     uninitialized_vars = []
    710     for flag, v in zip(is_initialized, candidate_vars):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    928     try:
    929       result = self._run(None, fetches, feed_dict, options_ptr,
--> 930                          run_metadata_ptr)
    931       if run_metadata:
    932         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1151     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1152       results = self._do_run(handle, final_targets, final_fetches,
-> 1153                              feed_dict_tensor, options, run_metadata)
   1154     else:
   1155       results = []

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1327     if handle is None:
   1328       return self._do_call(_run_fn, feeds, fetches, targets, options,
-> 1329                            run_metadata)
   1330     else:
   1331       return self._do_call(_prun_fn, handle, feeds, fetches)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1347           pass
   1348       message = error_interpolation.interpolate(message, self._graph)
-> 1349       raise type(e)(node_def, op, message)
   1350 
   1351   def _extend_graph(self):

InvalidArgumentError: NodeDef mentions attr 'explicit_paddings' not in Op<name=Conv2D; signature=input:T, filter:T -> output:T; attr=T:type,allowed=[DT_HALF, DT_BFLOAT16, DT_FLOAT, DT_DOUBLE]; attr=strides:list(int); attr=use_cudnn_on_gpu:bool,default=true; attr=padding:string,allowed=["SAME", "VALID"]; attr=data_format:string,default="NHWC",allowed=["NHWC", "NCHW"]; attr=dilations:list(int),default=[1, 1, 1, 1]>; NodeDef: node layer_0/c_attn/conv1d (defined at bert_keras_repo/transformer/model.py:20) . (Check whether your GraphDef-interpreting binary is up to date with your GraphDef-generating binary.).

Errors may have originated from an input operation.
Input Source operations connected to node layer_0/c_attn/conv1d:
 layer_normalization/gamma (defined at bert_keras_repo/transformer/layers.py:45)	
 layer_normalization/beta (defined at bert_keras_repo/transformer/layers.py:46)	
 layer_normalization/Mean/reduction_indices (defined at bert_keras_repo/transformer/layers.py:50)	
 layer_normalization/add/y (defined at bert_keras_repo/transformer/layers.py:52)	
 PositionEmbedding/embeddings (defined at bert_keras_repo/transformer/embedding.py:67)	
 token_input (defined at bert_keras_repo/transformer/model.py:63)	
 position_input (defined at bert_keras_repo/transformer/model.py:65)	
 segment_input (defined at bert_keras_repo/transformer/model.py:64)	
 TokenEmbedding/embeddings (defined at bert_keras_repo/transformer/embedding.py:68)	
 layer_normalization/Mean_1/reduction_indices (defined at bert_keras_repo/transformer/layers.py:51)	
 keras_learning_phase/input (defined at bert_keras_repo/transformer/embedding.py:73)	
 SegmentEmbedding/embeddings (defined at bert_keras_repo/transformer/embedding.py:66)

how to run code on GPU

i have installed tensorflow-gpu 1.7.0, but when i run this code on GPU, it got error, "ModuleNotFoundError: No module named 'tensorflow.contrib.tpu.python.tpu.keras_support'", how can i solve it?

BERT on TPU or GPU?

I have a fundamental question about BERT, do I need to run that only on TPU or it is doable for running on GPU and also for CPU as well?
Because I am testing the example on my local machine without connecting to my cloud TPU or GPU, and wondering why I face with this error after running tutorial.py:
File "/Users/.../BERT-keras/transformer/init.py", line 16, in tpu_compatible
from tensorflow.contrib.tpu.python.tpu.keras_support import KerasTPUModel

ModuleNotFoundError: No module named 'tensorflow.contrib'
if it will resolve with GPU, I can use eGPU and give it a go.

loss does not decrease during training

Hello, I tried to simplify your code for NER task. I made a model as below

 def load_model(self):
        self.encoder = create_transformer(embedding_layer_norm=True,
                                          neg_inf=-10000.0,
                                          use_attn_mask=self.config.use_attn_mask,
                                          vocab_size=self.bert_config.vocab_size,
                                          accurate_gelu=True,
                                          layer_norm_epsilon=1e-12,
                                          max_len=self.config.max_len,
                                          use_one_embedding_dropout=True,
                                          d_hid=self.bert_config.intermediate_size,
                                          embedding_dim=self.bert_config.hidden_size,
                                          num_layers=self.bert_config.num_hidden_layers,
                                          num_heads=self.bert_config.num_attention_heads,
                                          residual_dropout=self.bert_config.hidden_dropout_prob,
                                          attention_dropout=self.bert_config.attention_probs_dropout_prob)

        self.encoder = load_google_bert(self.encoder, self.bert_config.vocab_size, self.config.bert_dir_path, self.config.max_len, self.config.verbose)
        
        decoder = Dense(units=self.config.num_classes)
        logits = TimeDistributed(decoder)(
            Dropout(self.config.dropout)(self.encoder.outputs[0]))
        task_target = Input(batch_shape=(None, self.config.max_len,), dtype='int32')
        task_mask = Input(batch_shape=(None, self.config.max_len), dtype='int32')
        task_loss = Lambda(lambda x: masked_classification_loss(x[0], x[1], x[2]))([task_target, logits, task_mask])

        # sharing layers between training model and prediction model
        self.train_model = Model(inputs=self.encoder.inputs+[task_target, task_mask], outputs=task_loss)
        self.model = Model(inputs=self.encoder.inputs, outputs=logits)

    def compile(self, *args, **kwargs):
        return self.train_model.compile(*args, loss=pass_through_loss, **kwargs)

Then train the model by

model = XXXX(config)
model.compile(optimizer='adam')
earlystop = EarlyStopping(monitor='val_loss', min_delta=0, patience=10, verbose=1)
    checkpoint = ModelCheckpoint(
        os.path.join(config.dir_output, 'best-weights.h5'),
        monitor='val_loss',
        verbose=1,
        save_best_only=True,
        save_weights_only=True
    )
model.train_model.fit_generator(train_generator, steps_per_epoch=steps_per_epoch, 
    validation_data=dev_generator,
                                    validation_steps=dev_steps, verbose=1, callbacks=[earlystop, checkpoint],
                                    shuffle=False, epochs=100)
```.

In addition, I modified the function load_google_bert, commented the line
 `weights[w_id][vocab_size + TextEncoder.EOS_OFFSET] = saved[3 + TextEncoder.BERT_UNUSED_COUNT]` 
because the variable `TextEncoder.BERT_SPECIAL_COUNT` is 4 instead of 5, 
so the created model does not have so many weigths. 

name 'spm' is not defined

Traceback (most recent call last):
File "tutorial.py", line 6, in
model_name='tutorial', vocab_size=20)
File "/media/bin_lab/C4F6073207B3A949/Linux/Bert/BERT-keras-master/data/vocab.py", line 59, in init
spm.SentencePieceTrainer.Train(
NameError: name 'spm' is not defined

Possible inconsistency in how ids are used for text encoders

The pad_id as defined in data/vocab.py seems inconsistent with BERT's pad id.

Let's say I run:

bert_text_encoder = BERTTextEncoder(vocab_file = './google_bert/model/uncased_L-12_H-768_A-12/vocab.txt')

(In this example, the path for the vocab file is installed in the google_bert submodule under a directory called model.)

I can get pad_id with:

bert_text_encoder.pad_id

which gives me 30522.

BUT if I dig into the BERT tokenizer I get something else:

bert_text_encoder.tokenizer.inv_vocab[0]

which gives me '[PAD]', suggesting that the real id of the pad token is 0.

In short, the id definitions in the TextEncoder base class look like they disagree with the BERT tokenizer's use of ids.

How to set up reading comprehension for BERT Keras

I am looking at section 4.2 of the BERT paper or how to set up BERT for reading comprehension. It looks like a module needs to be added to the end of BERT, S and E are new parameters, and a log softmax loss is calculated for the start and end positions.

This extension is included in the original tensorflow BERT in the 'run_squad.py' script in the repository.

Does does an extension exist for BERT Keras?

Error in tutorial notebook

I was trying to run the tutorial.ipynb where I encountered this error while running the following cell.

Cell

# This is a tutorial on using this library
# first off we need a text_encoder so we would know our vocab_size (and later on use it to encode sentences)
from data.vocab import SentencePieceTextEncoder  # you could also import OpenAITextEncoder

sentence_piece_encoder = SentencePieceTextEncoder(text_corpus_address='openai/model/params_shapes.json',
                                                  model_name='tutorial', vocab_size=20)

Error

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-9-a0a8b2fa2e06> in <module>()
      2 
      3 sentence_piece_encoder = SentencePieceTextEncoder(text_corpus_address='bert_keras_repo/openai/model/params_shapes.json',
----> 4                                                   model_name='tutorial', vocab_size=20)

/content/bert_keras_repo/data/vocab.py in __init__(self, text_corpus_address, model_name, vocab_size, spm_model_type)
     64                 '--training_sentence_size=100000000'.format(
     65                     input=text_corpus_address, model_name=model_name, vocab_size=vocab_size, coverage=1,
---> 66                     model_type=spm_model_type.lower()))
     67         self.sp = spm.SentencePieceProcessor()
     68         self.sp.load('{}.model'.format(model_name))

OSError: Not found: unknown field name "training_sentence_size" in TrainerSpec.

Build classifier on top of BERT

Is there any possible way to train a BERT-based classifier, using [CLS] vector, as described in BERT article?

I was able to load BERT encoder successfully using:

bert_encoder = load_google_bert(base_location='./google_bert/uncased_L-12_H-768_A-12/',
                                use_attn_mask=False, max_len=512, verbose=False)

, but I am not able to find a workaround in order to wrap the encoder in a keras Model (wrapper).

I was looking forward to something like that:

bert_encoder = load_google_bert(base_location='./google_bert/uncased_L-12_H-768_A-12/',
                                use_attn_mask=False, max_len=512, verbose=False)
outputs = Dense(n_classes, activation='softmax')(bert_encoder.outputs)

classifier = Model(inputs=bert_encoder.inputs, outputs=outputs)
classifier.compile()
classifier.fit()
.....

If there is such a solution, it would be the easiest way possible to use BERT!

Error while compiling on TPU

I am using tf 1.13 on Google Colab.
When compiling the model for TPU I got an issue:

training_model_tpu = tf.contrib.tpu.keras_to_tpu_model(
    training_model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(TPU_ADDRESS)))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/utils/generic_utils.py in class_and_config_for_serialized_keras_object(config, module_objects, custom_objects, printable_module_name)
    164     cls = module_objects.get(class_name)
    165     if cls is None:
--> 166       raise ValueError('Unknown ' + printable_module_name + ': ' + class_name)
    167   return (cls, config['config'])
    168 

ValueError: Unknown layer: LayerNormalization

How do I compile a BERT-keras model for TPU?

ValueError: You need tensorflow >= 1.3 for better keras tpu support!

When running the example notebook (on Colab), I receive the following error :

ValueError: You need tensorflow >= 1.3 for better keras tpu support!

This error appear when I execute :

g_bert.compile('adam', 'mse')

Any idea from where it comes from ?


I checked the version of Tensorflow with :

!pip list | grep tensorflow

And it gives me an expected result :

mesh-tensorflow 0.0.5
tensorflow 1.12.0
tensorflow-hub 0.2.0
tensorflow-metadata 0.9.0
tensorflow-probability 0.5.0

@HighCWu

Fine-tuning BERT-keras

I'm trying to fine-tune BERT-keras on the STS-B dataset.

Did someone already use this repo to fine-tune BERT on an end-to-end task ? Is there code example for this ?

I have difficulties to make it work. My runtime die before even training on 1 single batch...


You can take a look at my notebook here : Colab

@HighCWu

How can I apply BERT to a cloze task?

Hi, I have a dataset like :

From Monday to Friday most people are busy working or studying, but in the evenings and weekends they are free and _ themselves.

And there are four candidates for the missing blank area:

["love", "work", "enjoy", "play"], here "enjoy" is the correct answer, it is a cloze-style task, and it looks like the maskLM in the BERT, the difference is that I don't want to search the candidate from all the tokens but the four given candidates, how can I do this? It looks like negtive sampling method. Do you have any idea? Thank you!

What's the meaning of TextEncoder.BERT_SPECIAL_COUNT, TextEncoder.TextEncoder.BERT_UNUSED_COUNT

When I use the BERT-keras, I don't understand this part:
class TextEncoder: PAD_OFFSET = 0 MSK_OFFSET = 1 BOS_OFFSET = 2 DEL_OFFSET = 3 # delimiter EOS_OFFSET = 4 SPECIAL_COUNT = 5 NUM_SEGMENTS = 2 BERT_UNUSED_COUNT = 99 # bert pretrained models BERT_SPECIAL_COUNT = 4 # they don't have DEL
Why would you set it up like this?
and the BERT_UNUSED_COUNT = 99 BERT_SPECIAL_COUNT = 4 are used in load_google_bert.

unknown field name "training_sentence_size" in TrainerSpec.

Hi
Just running the first cell of tutorial.ipynb
from data.vocab import SentencePieceTextEncoder # you could also import OpenAITextEncoder

sentence_piece_encoder = SentencePieceTextEncoder(text_corpus_address='/openai/model/params_shapes.json',model_name='tutorial', vocab_size=20)

Facing with this error from vocab.py file:

File "/Users/shabnamrashtchi/Dropbox/Deep leanring 2019_2020 reserch/Embedinng/BERT-keras-2/data/vocab.py", line 69, in init
model_type=spm_model_type.lower()))

OSError: Not found: unknown field name "training_sentence_size" in TrainerSpec.

Poor performance and poor results

I'm trying to fine tune BERT on STS-B dataset.

I used the following notebook to fine tune it using BERT-keras.
(As described in the paper, I just added a classification layer using the CLS token of the output of BERT).

However, there is great differences in performance and results between this notebook and the script used in the official version for fine tuning :

BERT-keras Official BERT
Pearson 0.0254 0.8956
Spearman 0.0289 0.7942
MSE 2.2691 0.5456
Training time 9h 10min

Note : Pearson / Spearman and correlation metrics used to evaluate the accuracy on the STS-B dataset


Why there is such a difference between the 2 approach ?

error of sparse_categorical_crossentropy when using theano backend

It's totally no problem when using tensorflow backend.
Now I test the theano.
When running train_model of tutorial.ipynb,we get 1d~2d tensor but not Tensortype(float32,3D) error from
T.nnet.softmax() of K.sparse_categorical_crossentropy

<ipython-input-22-27837df85ad1> in classification_loss(y_true, y_pred)
      2 import keras.backend as K
      3 def classification_loss(y_true, y_pred):
----> 4     return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
      5 train.classification_loss = classification_loss

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in sparse_categorical_crossentropy(target, output, from_logits, axis)
   1788     target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])
   1789     target = reshape(target, shape(output))
-> 1790     return categorical_crossentropy(target, output, from_logits, axis=-1)
   1791 
   1792 

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in categorical_crossentropy(target, output, from_logits, axis)
   1762         target = permute_dimensions(target, permutation)
   1763     if from_logits:
-> 1764         output = T.nnet.softmax(output)
   1765     else:
   1766         # scale preds so that the class probas of each sample sum to 1

/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in softmax(c)
    813     if c.broadcastable[-1]:
    814         warnings.warn("The softmax is applied on a dimension of shape 1, which does not have a semantic meaning.")
--> 815     return softmax_op(c)
    816 
    817 

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in __call__(self, *inputs, **kwargs)
    613         """
    614         return_list = kwargs.pop('return_list', False)
--> 615         node = self.make_node(*inputs, **kwargs)
    616 
    617         if config.compute_test_value != 'off':

/usr/local/lib/python3.6/dist-packages/theano/tensor/nnet/nnet.py in make_node(self, x)
    428                 or x.type.dtype not in tensor.float_dtypes:
    429             raise ValueError('x must be 1-d or 2-d tensor of floats. Got %s' %
--> 430                              x.type)
    431         if x.ndim == 1:
    432             warnings.warn("DEPRECATION: If x is a vector, Softmax will not automatically pad x "

ValueError: x must be 1-d or 2-d tensor of floats. Got TensorType(float32, 3D)

Then I use this to avoid it:

    import keras.backend as K
    _softmax = K.T.nnet.softmax
    def softmax(x):
        if x.ndim == 3:
            d1,d2,d3 = x.shape
            return _softmax(x.reshape((d1*d2,d3))).reshape((d1,d2,d3))
        return _softmax(x)
    K.T.nnet.softmax = softmax

but run

m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
                finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
                finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)

again
we get error:

/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_logits and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 6), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_flatten and cannot be automatically inferred with the Theano backend. Defaulting to output shape `(None, 8, 6)` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_gather and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 8, 6), (None, 1)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer odd_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 1), (None, 8, 2)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
/usr/local/lib/python3.6/dist-packages/keras/layers/core.py:665: UserWarning: `output_shape` argument not specified for layer lm_random_loss and cannot be automatically inferred with the Theano backend. Defaulting to output shape `[(None, 1), (None, 8), (None, 8, 25), (None, 8)]` (same as input shape). If the expected output shape is different, specify it via the `output_shape` argument.
  .format(self.name, input_shape))
Epoch 1/100
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:

/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
   2338         if self.set_instead_of_inc:
-> 2339             out[0][inputs[2:]] = inputs[1]
   2340         else:

IndexError: index 8 is out of bounds for axis 1 with size 6

During handling of the above exception, another exception occurred:

IndexError                                Traceback (most recent call last)
<ipython-input-39-7b7276d2ce06> in <module>()
      1 m = train_model(base_model=sequence_encoder, is_causal=False, tasks_meta_data=tasks, pretrain_generator=generator,
      2                 finetune_generator=generator, pretrain_epochs=100, pretrain_steps=number_of_pretrain_steps // 100,
----> 3                 finetune_epochs=100, finetune_steps=number_of_finetune_steps // 100, verbose=2, TPUStrategy=strategy)
      4 # now m is ready to be used!
      5 print(m.inputs)

/content/bert_keras_repo/transformer/train.py in train_model(base_model, is_causal, tasks_meta_data, pretrain_generator, finetune_generator, pretrain_epochs, pretrain_optimizer, pretrain_steps, pretrain_callbacks, finetune_epochs, finetune_optimizer, finetune_steps, finetune_callbacks, verbose, TPUStrategy)
    145 
    146     if pretrain_generator is not None:
--> 147         train_step(True)
    148     if finetune_generator is not None:
    149         train_step(False)

/content/bert_keras_repo/transformer/train.py in train_step(is_pretrain)
    142         _model.fit_generator(_generator, steps_per_epoch=pretrain_steps if is_pretrain else finetune_steps,
    143                              verbose=verbose, callbacks=pretrain_callbacks if is_pretrain else finetune_callbacks,
--> 144                              shuffle=False, epochs=pretrain_epochs if is_pretrain else finetune_epochs)
    145 
    146     if pretrain_generator is not None:

/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    215                 outs = model.train_on_batch(x, y,
    216                                             sample_weight=sample_weight,
--> 217                                             class_weight=class_weight)
    218 
    219                 outs = to_list(outs)

/usr/local/lib/python3.6/dist-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1215             ins = x + y + sample_weights
   1216         self._make_train_function()
-> 1217         outputs = self.train_function(ins)
   1218         return unpack_singleton(outputs)
   1219 

/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py in __call__(self, inputs)
   1386     def __call__(self, inputs):
   1387         assert isinstance(inputs, (list, tuple))
-> 1388         return self.function(*inputs)
   1389 
   1390 

/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    915                     node=self.fn.nodes[self.fn.position_of_error],
    916                     thunk=thunk,
--> 917                     storage_map=getattr(self.fn, 'storage_map', None))
    918             else:
    919                 # old-style linkers raise their own exceptions

/usr/local/lib/python3.6/dist-packages/theano/gof/link.py in raise_with_op(node, thunk, exc_info, storage_map)
    323         # extra long error message in that case.
    324         pass
--> 325     reraise(exc_type, exc_value, exc_trace)
    326 
    327 

/usr/local/lib/python3.6/dist-packages/six.py in reraise(tp, value, tb)
    690                 value = tp()
    691             if value.__traceback__ is not tb:
--> 692                 raise value.with_traceback(tb)
    693             raise value
    694         finally:

/usr/local/lib/python3.6/dist-packages/theano/compile/function_module.py in __call__(self, *args, **kwargs)
    901         try:
    902             outputs =\
--> 903                 self.fn() if output_subset is None else\
    904                 self.fn(output_subset=output_subset)
    905         except Exception:

/usr/local/lib/python3.6/dist-packages/theano/gof/op.py in rval(p, i, o, n)
    890             # default arguments are stored in the closure of `rval`
    891             def rval(p=p, i=node_input_storage, o=node_output_storage, n=node):
--> 892                 r = p(n, [x[0] for x in i], o)
    893                 for o in node.outputs:
    894                     compute_map[o][0] = True

/usr/local/lib/python3.6/dist-packages/theano/tensor/subtensor.py in perform(self, node, inputs, out_)
   2337 
   2338         if self.set_instead_of_inc:
-> 2339             out[0][inputs[2:]] = inputs[1]
   2340         else:
   2341             np.add.at(out[0], tuple(inputs[2:]), inputs[1])

IndexError: index 8 is out of bounds for axis 1 with size 6
Apply node that caused the error: AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}(Alloc.0, TensorConstant{1}, ARange{dtype='int64'}.0, Reshape{1}.0)
Toposort index: 315
Inputs types: [TensorType(float32, matrix), TensorType(int8, scalar), TensorType(int64, vector), TensorType(int32, vector)]
Inputs shapes: [(64, 6), (), (64,), (64,)]
Inputs strides: [(24, 4), (), (8,), (4,)]
Inputs values: ['not shown', array(1, dtype=int8), 'not shown', 'not shown']
Outputs clients: [[Reshape{3}(AdvancedIncSubtensor{inplace=False,  set_instead_of_inc=True}.0, MakeVector{dtype='int64'}.0)]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
  File "bert_keras_repo/transformer/train.py", line 68, in train_model
    [task_loss_weight, task_target, logits, task_mask])
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/base_layer.py", line 457, in __call__
    output = self.call(inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/layers/core.py", line 687, in call
    return self.function(inputs, **arguments)
  File "bert_keras_repo/transformer/train.py", line 67, in <lambda>
    task_loss = Lambda(lambda x: x[0] * masked_classification_loss(x[1], x[2], x[3]), name=task.name + '_loss')(
  File "bert_keras_repo/transformer/train.py", line 20, in masked_classification_loss
    return _mask_loss(y_true, y_pred, y_mask, classification_loss)
  File "bert_keras_repo/transformer/train.py", line 11, in _mask_loss
    l = K.switch(y_mask, element_wise_loss(y_true, y_pred), K.zeros_like(y_mask, dtype=K.floatx()))
  File "<ipython-input-22-27837df85ad1>", line 4, in classification_loss
    return K.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)
  File "/usr/local/lib/python3.6/dist-packages/keras/backend/theano_backend.py", line 1788, in sparse_categorical_crossentropy
    target = T.extra_ops.to_one_hot(target, nb_class=output.shape[-1])

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

It seems it's not my coding bug because I checkout the branch back to that one is before tpu support.

Some issues when using training and lm_gen

Hi Separius,

Thank you and really nice work! I think in lm_dataset, the line 118 should be [0] * (len(sent.tokens) + 2) right? I fix this then the lm can run. Many thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.