Giter Site home page Giter Site logo

kpe / bert-for-tf2 Goto Github PK

View Code? Open in Web Editor NEW
805.0 35.0 193.0 290 KB

A Keras TensorFlow 2.0 implementation of BERT, ALBERT and adapter-BERT.

Home Page: https://github.com/kpe/bert-for-tf2

License: MIT License

Python 99.80% Shell 0.20%
bert keras tensorflow transformer

bert-for-tf2's Introduction

BERT for TensorFlow v2

Build Status Coverage Status Version Status Python Versions Downloads

This repo contains a TensorFlow 2.0 Keras implementation of google-research/bert with support for loading of the original pre-trained weights, and producing activations numerically identical to the one calculated by the original model.

ALBERT and adapter-BERT are also supported by setting the corresponding configuration parameters (shared_layer=True, embedding_size for ALBERT and adapter_size for adapter-BERT). Setting both will result in an adapter-ALBERT by sharing the BERT parameters across all layers while adapting every layer with layer specific adapter.

The implementation is build from scratch using only basic tensorflow operations, following the code in google-research/bert/modeling.py (but skipping dead code and applying some simplifications). It also utilizes kpe/params-flow to reduce common Keras boilerplate code (related to passing model and layer configuration arguments).

bert-for-tf2 should work with both TensorFlow 2.0 and TensorFlow 1.14 or newer.

NEWS

  • 30.Jul.2020 - VERBOSE=0 env variable for suppressing stdout output.
  • 06.Apr.2020 - using latest py-params introducing WithParams base for Layer and Model. See news in kpe/py-params for how to update (_construct() signature has change and requires calling super().__construct()).
  • 06.Jan.2020 - support for loading the tar format weights from google-research/ALBERT.
  • 18.Nov.2019 - ALBERT tokenization added (make sure to import as from bert import albert_tokenization or from bert import bert_tokenization).
  • 08.Nov.2019 - using v2 per default when loading the TFHub/albert weights of google-research/ALBERT.
  • 05.Nov.2019 - minor ALBERT word embeddings refactoring (word_embeddings_2 -> word_embeddings_projector) and related parameter freezing fixes.
  • 04.Nov.2019 - support for extra (task specific) token embeddings using negative token ids.
  • 29.Oct.2019 - support for loading of the pre-trained ALBERT weights released by google-research/ALBERT at TFHub/albert.
  • 11.Oct.2019 - support for loading of the pre-trained ALBERT weights released by brightmart/albert_zh ALBERT for Chinese.
  • 10.Oct.2019 - support for ALBERT through the shared_layer=True and embedding_size=128 params.
  • 03.Sep.2019 - walkthrough on fine tuning with adapter-BERT and storing the fine tuned fraction of the weights in a separate checkpoint (see tests/test_adapter_finetune.py).
  • 02.Sep.2019 - support for extending the token type embeddings of a pre-trained model by returning the mismatched weights in load_stock_weights() (see tests/test_extend_segments.py).
  • 25.Jul.2019 - there are now two colab notebooks under examples/ showing how to fine-tune an IMDB Movie Reviews sentiment classifier from pre-trained BERT weights using an adapter-BERT model architecture on a GPU or TPU in Google Colab.
  • 28.Jun.2019 - v.0.3.0 supports adapter-BERT (google-research/adapter-bert) for "Parameter-Efficient Transfer Learning for NLP", i.e. fine-tuning small overlay adapter layers over BERT's transformer encoders without changing the frozen BERT weights.

LICENSE

MIT. See License File.

Install

bert-for-tf2 is on the Python Package Index (PyPI):

pip install bert-for-tf2

Usage

BERT in bert-for-tf2 is implemented as a Keras layer. You could instantiate it like this:

from bert import BertModelLayer

l_bert = BertModelLayer(**BertModelLayer.Params(
  vocab_size               = 16000,        # embedding params
  use_token_type           = True,
  use_position_embeddings  = True,
  token_type_vocab_size    = 2,

  num_layers               = 12,           # transformer encoder params
  hidden_size              = 768,
  hidden_dropout           = 0.1,
  intermediate_size        = 4*768,
  intermediate_activation  = "gelu",

  adapter_size             = None,         # see arXiv:1902.00751 (adapter-BERT)

  shared_layer             = False,        # True for ALBERT (arXiv:1909.11942)
  embedding_size           = None,         # None for BERT, wordpiece embedding size for ALBERT

  name                     = "bert"        # any other Keras layer params
))

or by using the bert_config.json from a pre-trained google model:

import bert

model_dir = ".models/uncased_L-12_H-768_A-12"

bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")

now you can use the BERT layer in your Keras model like this:

from tensorflow import keras

max_seq_len = 128
l_input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32')
l_token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32')

# using the default token_type/segment id 0
output = l_bert(l_input_ids)                              # output: [batch_size, max_seq_len, hidden_size]
model = keras.Model(inputs=l_input_ids, outputs=output)
model.build(input_shape=(None, max_seq_len))

# provide a custom token_type/segment id as a layer input
output = l_bert([l_input_ids, l_token_type_ids])          # [batch_size, max_seq_len, hidden_size]
model = keras.Model(inputs=[l_input_ids, l_token_type_ids], outputs=output)
model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])

if you choose to use adapter-BERT by setting the adapter_size parameter, you would also like to freeze all the original BERT layers by calling:

l_bert.apply_adapter_freeze()

and once the model has been build or compiled, the original pre-trained weights can be loaded in the BERT layer:

import bert

bert_ckpt_file   = os.path.join(model_dir, "bert_model.ckpt")
bert.load_stock_weights(l_bert, bert_ckpt_file)

N.B. see tests/test_bert_activations.py for a complete example.

FAQ

  1. In all the examlpes bellow, please note the line:
# use in a Keras Model here, and call model.build()

for a quick test, you can replace it with something like:

model = keras.models.Sequential([
  keras.layers.InputLayer(input_shape=(128,)),
  l_bert,
  keras.layers.Lambda(lambda x: x[:, 0, :]),
  keras.layers.Dense(2)
])
model.build(input_shape=(None, 128))
  1. How to use BERT with the google-research/bert pre-trained weights?
model_name = "uncased_L-12_H-768_A-12"
model_dir = bert.fetch_google_bert_model(model_name, ".models")
model_ckpt = os.path.join(model_dir, "bert_model.ckpt")

bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")

# use in a Keras Model here, and call model.build()

bert.load_bert_weights(l_bert, model_ckpt)      # should be called after model.build()
  1. How to use ALBERT with the google-research/ALBERT pre-trained weights (fetching from TFHub)?

see tests/nonci/test_load_pretrained_weights.py:

model_name = "albert_base"
model_dir    = bert.fetch_tfhub_albert_model(model_name, ".models")
model_params = bert.albert_params(model_name)
l_bert = bert.BertModelLayer.from_params(model_params, name="albert")

# use in a Keras Model here, and call model.build()

bert.load_albert_weights(l_bert, albert_dir)      # should be called after model.build()
  1. How to use ALBERT with the google-research/ALBERT pre-trained weights (non TFHub)?

see tests/nonci/test_load_pretrained_weights.py:

model_name = "albert_base_v2"
model_dir    = bert.fetch_google_albert_model(model_name, ".models")
model_ckpt   = os.path.join(albert_dir, "model.ckpt-best")

model_params = bert.albert_params(model_dir)
l_bert = bert.BertModelLayer.from_params(model_params, name="albert")

# use in a Keras Model here, and call model.build()

bert.load_albert_weights(l_bert, model_ckpt)      # should be called after model.build()
  1. How to use ALBERT with the brightmart/albert_zh pre-trained weights?

see tests/nonci/test_albert.py:

model_name = "albert_base"
model_dir = bert.fetch_brightmart_albert_model(model_name, ".models")
model_ckpt = os.path.join(model_dir, "albert_model.ckpt")

bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")

# use in a Keras Model here, and call model.build()

bert.load_albert_weights(l_bert, model_ckpt)      # should be called after model.build()
  1. How to tokenize the input for the google-research/bert models?
do_lower_case = not (model_name.find("cased") == 0 or model_name.find("multi_cased") == 0)
bert.bert_tokenization.validate_case_matches_checkpoint(do_lower_case, model_ckpt)
vocab_file = os.path.join(model_dir, "vocab.txt")
tokenizer = bert.bert_tokenization.FullTokenizer(vocab_file, do_lower_case)
tokens = tokenizer.tokenize("Hello, BERT-World!")
token_ids = tokenizer.convert_tokens_to_ids(tokens)
  1. How to tokenize the input for brightmart/albert_zh?
import params_flow pf

# fetch the vocab file
albert_zh_vocab_url = "https://raw.githubusercontent.com/brightmart/albert_zh/master/albert_config/vocab.txt"
vocab_file = pf.utils.fetch_url(albert_zh_vocab_url, model_dir)

tokenizer = bert.albert_tokenization.FullTokenizer(vocab_file)
tokens = tokenizer.tokenize("你好世界")
token_ids = tokenizer.convert_tokens_to_ids(tokens)
  1. How to tokenize the input for the google-research/ALBERT models?
import sentencepiece as spm

spm_model = os.path.join(model_dir, "assets", "30k-clean.model")
sp = spm.SentencePieceProcessor()
sp.load(spm_model)
do_lower_case = True

processed_text = bert.albert_tokenization.preprocess_text("Hello, World!", lower=do_lower_case)
token_ids = bert.albert_tokenization.encode_ids(sp, processed_text)
  1. How to tokenize the input for the Chinese google-research/ALBERT models?
import bert

vocab_file = os.path.join(model_dir, "vocab.txt")
tokenizer = bert.albert_tokenization.FullTokenizer(vocab_file=vocab_file)
tokens = tokenizer.tokenize(u"你好世界")
token_ids = tokenizer.convert_tokens_to_ids(tokens)

Resources

bert-for-tf2's People

Contributors

birdmw avatar dfarren avatar kpe avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bert-for-tf2's Issues

get error when predict

I load pretrained chinese model and predict semantic similarity. then I get the following error message

File "Predictor.py", line 154, in _classify
outputs = self.bert(inputs, mask, training);
File "/home/xieyi/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 712, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/xieyi/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 753, in call
return self._run_internal_graph(inputs, training=training, mask=mask)
File "/home/xieyi/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/network.py", line 905, in _run_internal_graph
assert str(id(x)) in tensor_dict, 'Could not compute output ' + str(x)
AssertionError: Could not compute output Tensor("bert/Identity:0", shape=(None, 128, 768), dtype=float32)

I am using tensorflow-beta1. I load pretrained bert with the following code.

#!/usr/bin/python3

import os;
import tensorflow as tf;
from bert import BertModelLayer;
from bert.loader import StockBertConfig, load_stock_weights;
from bert.tokenization import FullTokenizer;

def flatten_layers(root_layer):
    if isinstance(root_layer, tf.keras.layers.Layer):
        yield root_layer
    for layer in root_layer._layers:
        for sub_layer in flatten_layers(layer):
            yield sub_layer

def freeze_bert_layers(l_bert):
    """
    Freezes all but LayerNorm and adapter layers - see arXiv:1902.00751.
    """
    for layer in flatten_layers(l_bert):
        if layer.name in ["LayerNorm", "adapter-down", "adapter-up"]:
            layer.trainable = True
        elif len(layer._layers) == 0:
            layer.trainable = False
        l_bert.embeddings_layer.trainable = False

def BERT(max_seq_len = 128, bert_model_dir = 'models/chinese_L-12_H-768_A-12', do_lower_case = False):

    # load bert parameters
    with tf.io.gfile.GFile(os.path.join(bert_model_dir, "bert_config.json"), "r") as reader:
        stock_params = StockBertConfig.from_json_string(reader.read());
        bert_params = stock_params.to_bert_model_layer_params();
    # create bert structure according to the parameters
    bert = BertModelLayer.from_params(bert_params, name = "bert");
    # inputs
    input_token_ids = tf.keras.Input((max_seq_len,), dtype = tf.int32, name = 'input_ids');
    input_segment_ids = tf.keras.Input((max_seq_len,), dtype = tf.int32, name = 'token_type_ids');
    # outputs
    output = bert([input_token_ids, input_segment_ids]);
    # create model containing only bert layer
    model = tf.keras.Model(inputs = [input_token_ids, input_segment_ids], outputs = output);
    model.build(input_shape = [(None, max_seq_len), (None, max_seq_len)]);
    # freeze_bert_layers
    freeze_bert_layers(bert);
    # load bert layer weights
    load_stock_weights(bert, os.path.join(bert_model_dir, "bert_model.ckpt"));
    # create tokenizer, chinese character needs no lower case.
    tokenizer = FullTokenizer(vocab_file = os.path.join(bert_model_dir, "vocab.txt"), do_lower_case = do_lower_case);
    return model, tokenizer;

AttributeError: module 'tensorflow' has no attribute 'logging'

I think tf.logging is gone and we need to remove all tf.logging.

python3.6/site-packages/bert/tokenization/albert_tokenization.py", line 242, in __init__
    tf.logging.info("loading sentence piece model")
AttributeError: module 'tensorflow' has no attribute 'logging'

ImportError: No module named 'params_flow'

looks like in issue with pip install and a missing module?

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-p1fqbaa5/bert-for-tf2/setup.py", line 10, in <module>
        import bert
      File "/tmp/pip-install-p1fqbaa5/bert-for-tf2/bert/__init__.py", line 7, in <module>
        from .layer import Layer
      File "/tmp/pip-install-p1fqbaa5/bert-for-tf2/bert/layer.py", line 10, in <module>
        import params_flow
    ImportError: No module named 'params_flow'
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.```

Finetune Albert on MovieReview dataset

Hi!
I tried finetuning the albert base/large model on the MovieReview dataset that is used in the bert example.
The model is created like this:
def create_model(max_seq_len):

albert_model_name = "albert_base"
albert_dir = bert.fetch_tfhub_albert_model(albert_model_name, ".models")
model_params = bert.albert_params(albert_dir)
l_bert = bert.BertModelLayer.from_params(model_params, name="albert")    
input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="input_ids")
#token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="token_type_ids")
#output         = l_bert([input_ids, token_type_ids])
output         = l_bert(input_ids)

print("bert shape", output.shape)
cls_out = keras.layers.Lambda(lambda seq: seq[:, 0, :])(output)
cls_out = keras.layers.Dropout(0.5)(cls_out)
logits = keras.layers.Dense(units=1024, activation="tanh")(cls_out)
logits = keras.layers.Dropout(0.5)(logits)
logits = keras.layers.Dense(units=2, activation="softmax")(logits)

# model = keras.Model(inputs=[input_ids, token_type_ids], outputs=logits)
# model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])
model = keras.Model(inputs=input_ids, outputs=logits)
model.build(input_shape=(None, max_seq_len))
model.compile(optimizer=keras.optimizers.Adam(),
            loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
            metrics=[keras.metrics.SparseCategoricalAccuracy(name="acc")])


# load the pre-trained model weights
bert.load_albert_weights(l_bert, albert_dir)




model.summary()

return model

I've used from bert.tokenization.albert_tokenization import FullTokenizer for tokenization

Everything else is like in the provided bert example.

When executing the training loop, the accuracy stays at 50% and the loss doesn't really change from .7

Has anybody successfully finetuned any pretrained albert model on the MovieReview dataset? If yes, what am I doing wrong? Thanks in advance!

How to get MLM outputs?

Hello, great library!
How to return vocabulary sized logits used in Masked Language Model instead of embeddings? Any examples?

Problems loading the readme

I am attempting to get a pre-trained BERT layer working in TF 2.0. Said differently, I don't have the computing resources to train BERT myself, and as a result I am looking to use pre-trained weights so that I can do an advanced sentiment analysis in English.

When I run the README python (below), I keep getting list index out of range errors. I am not sure if the ckpt file is correct or if I am not pulling the the weights from the correct location. Thoughts?

wget https://storage.googleapis.com/bert_models/2018_11_03/multilingual_L-12_H-768_A-12.zip
unzip multilingual_L-12_H-768_A-12.zip


ralph_a_brooks@main-keras-p4-instance:~$ pwd
/home/ralph_a_brooks
ralph_a_brooks@main-keras-p4-instance:~$ cd multilingual_L-12_H-768_A-12
ralph_a_brooks@main-keras-p4-instance:~/multilingual_L-12_H-768_A-12$ ls
bert_config.json  bert_model.ckpt.data-00000-of-00001  bert_model.ckpt.index  bert_model.ckpt.meta  vocab.txt
ralph_a_brooks@main-keras-p4-instance:~/multilingual_L-12_H-768_A-12$ python
Python 3.7.0 (default, Oct  9 2018, 10:31:47) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> import tensorflow as tf
>>> from tensorflow.python import keras
>>> from bert import BertModelLayer
>>> from bert.loader import StockBertConfig, load_stock_weights
>>> 
>>> print(os.environ['HOME_PATH'])
/home/ralph_a_brooks
>>> 
>>> model_dir = os.environ['HOME_PATH']+"/multilingual_L-12_H-768_A-12"
>>> print(model_dir)
/home/ralph_a_brooks/multilingual_L-12_H-768_A-12
>>> 
>>> bert_config_file = os.path.join(model_dir, "bert_config.json")
>>> bert_ckpt_file   = os.path.join(model_dir, "bert_model.ckpt")
>>> 
>>> with tf.io.gfile.GFile(bert_config_file, "r") as reader:
...   stock_params = StockBertConfig.from_json_string(reader.read())
...   bert_params  = stock_params.to_bert_model_layer_params()
... 
>>> l_bert = BertModelLayer.from_params(bert_params, name="bert")
>>> load_stock_weights(l_bert, bert_ckpt_file)
WARNING: Logging before flag parsing goes to stderr.
W0605 17:39:47.570813 139787705705856 deprecation.py:323] From /home/ralph_a_brooks/.local/lib/python3.7/site-packages/bert/loader.py:113: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ralph_a_brooks/.local/lib/python3.7/site-packages/bert/loader.py", line 116, in load_stock_weights
    bert_prefix = bert.weights[0].name.split("/")[0]
IndexError: list index out of range

ImportError: cannot import name 'FullTokenizer'

Hi,

when running the jupyter notebook, I get the error:

ImportError: cannot import name 'FullTokenizer'

Tensorflow 2.0

I looked at the python environment:

# grep -i full *
grep: __pycache__: Is a directory
albert_tokenization.py:class FullTokenizer(object):
albert_tokenization.py:        return FullTokenizer(vocab_file, do_lower_case, spm_model_file)
albert_tokenization.py:        return FullTokenizer(
bert_tokenization.py:class FullTokenizer(object):

Also 'from bert.tokenization import *' later on gives the error: FullTokenizer note defined

Any idea / help

Regards Heiko

installation error: Unicode Error

Looking in indexes: https://pypi.org/simple, https://pip:****@pip.ml.moodysanalytics.com/simple Collecting bert-for-tf2 Using cached https://files.pythonhosted.org/packages/93/31/1f9d1d5ccafb5b8bb621b02c4c5bd9e9f6599ec9b305f7307f1b6c5ae0b5/bert-for-tf2-0.12.7.tar.gz ERROR: Command errored out with exit status 1: command: /workspaces/albert/anaconda3/envs/tf_gpu/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-mzla9yhq/bert-for-tf2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-mzla9yhq/bert-for-tf2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-mzla9yhq/bert-for-tf2/pip-egg-info cwd: /tmp/pip-install-mzla9yhq/bert-for-tf2/ Complete output (7 lines): Traceback (most recent call last): File "<string>", line 1, in <module> File "/tmp/pip-install-mzla9yhq/bert-for-tf2/setup.py", line 21, in <module> long_description = fh.read() File "/workspaces/albert/anaconda3/envs/tf_gpu/lib/python3.6/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 8479: ordinal not in range(128) ---------------------------------------- ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

Environment:
Python: 3.6.9
OS: CentOS

Question: Sentence Embedding from BERT layer

Hi, I've followed your guide for implementing BERT model as Keras layer. I have a question about the output of this layer; I've written this model:

`
model_word_embedding = tf.keras.Sequential([
tf.keras.layers.Input(shape=(4,), dtype='int32', name='input_ids'),
bert_layer
])

Then I want to extract the embeddings for a word: 
`
sentences = ["ciao"]
predict = model_word_embedding .predict(sentences)

I receive this

print(predict)
print(len(predict))

...

[[[-0.02768866 -0.7341324   1.9084396  ... -0.65953904  0.26496622
    1.1610721 ]
  [-0.19322394 -1.3134469   0.10383344 ...  1.1250225  -0.2988368
   -0.2323082 ]
  [-1.4576151  -1.4579685   0.78580517 ... -0.8898649  -1.1016986
    0.6008501 ]
  [ 1.41647    -0.92478925 -1.3651332  ... -0.9197768  -1.5469263
    0.03305872]]]
4

My quesiton is: Since I passed only one word with max_seq_lenght equals to 4, I expect for the output one vector instead of 4.
Why 4 vectors?
How can I obtain the embeddings for a sentence?

albert tokenization

using your albert code, it does not generate a vocab.txt file, but under assets there is a *.vocab file similar to those used in spm.

Given that in the network the weight shape for is (30000, ...) and not (30522, ...) im unsure if for proper alignment i should use berts vocab.txt but delete the empty space or use spm. Either way, you should probably update the documentation to incorporate this.

text classification error

image

i followed your codes to write a ten class textclassification and failed (output predictions are the same) , logits seems to be equal correspondingly. i dont konw why ,and ask for helps, thank you

my model is just that simple:
lbert(tokens)
dense(10)

but after i forze bert layers , and add some additional layers like lstm , it worked,well , accuracy is still under my expectation..

SORRY!
its my mistacks!
i wrote a wrong sequnces padder and a wrong tokenizer....
emmmmmmmmmmmmm

load pretrained model weights problems

model_name = "albert_base"
model_dir = bert.fetch_brightmart_albert_model(model_name, ".models")
model_ckpt = os.path.join(model_dir, "albert_model.ckpt")
bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params)
bert.load_bert_weights(l_bert, model_ckpt) # should be called after model.build()

problems:
179 def bert_prefix(bert: BertModelLayer):
180 re_bert = re.compile(r'(.*)/(embeddings|encoder)/(.+):0')
--> 181 match = re_bert.match(bert.weights[0].name)
182 assert match, "Unexpected bert layer: {} weight:{}".format(bert, bert.weights[0].name)
183 prefix = match.group(1)
IndexError: list index out of range

what should i do?

Gradients do not exist for variables

hello everyone, when I start training my custom model, I meet two problem, this, another one is:

W0120 15:20:05.422765 12180 optimizer_v2.py:1029] Gradients do not exist for variables ['bert/embeddings/word_embeddings/embeddings:0', 'bert/embeddings/position_embeddings/embeddings:0', 'bert/embeddings/LayerNorm/gamma:0', 'bert/embeddings/LayerNorm/beta:0', 'bert/encoder/layer_0/attention/self/query/kernel:0', 'bert/encoder/layer_0/attention/self/query/bias:0', 'bert/encoder/layer_0/attention/self/key/kernel:0', 'bert/encoder/layer_0/attention/self/key/bias:0', 'bert/encoder/layer_0/attention/self/value/kernel:0', 'bert/encoder/layer_0/attention/self/value/bias:0', 'bert/encoder/layer_0/attention/output/dense/kernel:0', 'bert/encoder/layer_0/attention/output/dense/bias:0', 'bert/encoder/layer_0/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_0/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_0/intermediate/kernel:0', 'bert/encoder/layer_0/intermediate/bias:0', 'bert/encoder/layer_0/output/dense/kernel:0', 'bert/encoder/layer_0/output/dense/bias:0', 'bert/encoder/layer_0/output/LayerNorm/gamma:0', 'bert/encoder/layer_0/output/LayerNorm/beta:0', 'bert/encoder/layer_1/attention/self/query/kernel:0', 'bert/encoder/layer_1/attention/self/query/bias:0', 'bert/encoder/layer_1/attention/self/key/kernel:0', 'bert/encoder/layer_1/attention/self/key/bias:0', 'bert/encoder/layer_1/attention/self/value/kernel:0', 'bert/encoder/layer_1/attention/self/value/bias:0', 'bert/encoder/layer_1/attention/output/dense/kernel:0', 'bert/encoder/layer_1/attention/output/dense/bias:0', 'bert/encoder/layer_1/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_1/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_1/intermediate/kernel:0', 'bert/encoder/layer_1/intermediate/bias:0', 'bert/encoder/layer_1/output/dense/kernel:0', 'bert/encoder/layer_1/output/dense/bias:0', 'bert/encoder/layer_1/output/LayerNorm/gamma:0', 'bert/encoder/layer_1/output/LayerNorm/beta:0', 'bert/encoder/layer_2/attention/self/query/kernel:0', 'bert/encoder/layer_2/attention/self/query/bias:0', 'bert/encoder/layer_2/attention/self/key/kernel:0', 'bert/encoder/layer_2/attention/self/key/bias:0', 'bert/encoder/layer_2/attention/self/value/kernel:0', 'bert/encoder/layer_2/attention/self/value/bias:0', 'bert/encoder/layer_2/attention/output/dense/kernel:0', 'bert/encoder/layer_2/attention/output/dense/bias:0', 'bert/encoder/layer_2/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_2/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_2/intermediate/kernel:0', 'bert/encoder/layer_2/intermediate/bias:0', 'bert/encoder/layer_2/output/dense/kernel:0', 'bert/encoder/layer_2/output/dense/bias:0', 'bert/encoder/layer_2/output/LayerNorm/gamma:0', 'bert/encoder/layer_2/output/LayerNorm/beta:0', 'bert/encoder/layer_3/attention/self/query/kernel:0', 'bert/encoder/layer_3/attention/self/query/bias:0', 'bert/encoder/layer_3/attention/self/key/kernel:0', 'bert/encoder/layer_3/attention/self/key/bias:0', 'bert/encoder/layer_3/attention/self/value/kernel:0', 'bert/encoder/layer_3/attention/self/value/bias:0', 'bert/encoder/layer_3/attention/output/dense/kernel:0', 'bert/encoder/layer_3/attention/output/dense/bias:0', 'bert/encoder/layer_3/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_3/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_3/intermediate/kernel:0', 'bert/encoder/layer_3/intermediate/bias:0', 'bert/encoder/layer_3/output/dense/kernel:0', 'bert/encoder/layer_3/output/dense/bias:0', 'bert/encoder/layer_3/output/LayerNorm/gamma:0', 'bert/encoder/layer_3/output/LayerNorm/beta:0', 'bert/encoder/layer_4/attention/self/query/kernel:0', 'bert/encoder/layer_4/attention/self/query/bias:0', 'bert/encoder/layer_4/attention/self/key/kernel:0', 'bert/encoder/layer_4/attention/self/key/bias:0', 'bert/encoder/layer_4/attention/self/value/kernel:0', 'bert/encoder/layer_4/attention/self/value/bias:0', 'bert/encoder/layer_4/attention/output/dense/kernel:0', 'bert/encoder/layer_4/attention/output/dense/bias:0', 'bert/encoder/layer_4/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_4/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_4/intermediate/kernel:0', 'bert/encoder/layer_4/intermediate/bias:0', 'bert/encoder/layer_4/output/dense/kernel:0', 'bert/encoder/layer_4/output/dense/bias:0', 'bert/encoder/layer_4/output/LayerNorm/gamma:0', 'bert/encoder/layer_4/output/LayerNorm/beta:0', 'bert/encoder/layer_5/attention/self/query/kernel:0', 'bert/encoder/layer_5/attention/self/query/bias:0', 'bert/encoder/layer_5/attention/self/key/kernel:0', 'bert/encoder/layer_5/attention/self/key/bias:0', 'bert/encoder/layer_5/attention/self/value/kernel:0', 'bert/encoder/layer_5/attention/self/value/bias:0', 'bert/encoder/layer_5/attention/output/dense/kernel:0', 'bert/encoder/layer_5/attention/output/dense/bias:0', 'bert/encoder/layer_5/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_5/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_5/intermediate/kernel:0', 'bert/encoder/layer_5/intermediate/bias:0', 'bert/encoder/layer_5/output/dense/kernel:0', 'bert/encoder/layer_5/output/dense/bias:0', 'bert/encoder/layer_5/output/LayerNorm/gamma:0', 'bert/encoder/layer_5/output/LayerNorm/beta:0', 'bert/encoder/layer_6/attention/self/query/kernel:0', 'bert/encoder/layer_6/attention/self/query/bias:0', 'bert/encoder/layer_6/attention/self/key/kernel:0', 'bert/encoder/layer_6/attention/self/key/bias:0', 'bert/encoder/layer_6/attention/self/value/kernel:0', 'bert/encoder/layer_6/attention/self/value/bias:0', 'bert/encoder/layer_6/attention/output/dense/kernel:0', 'bert/encoder/layer_6/attention/output/dense/bias:0', 'bert/encoder/layer_6/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_6/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_6/intermediate/kernel:0', 'bert/encoder/layer_6/intermediate/bias:0', 'bert/encoder/layer_6/output/dense/kernel:0', 'bert/encoder/layer_6/output/dense/bias:0', 'bert/encoder/layer_6/output/LayerNorm/gamma:0', 'bert/encoder/layer_6/output/LayerNorm/beta:0', 'bert/encoder/layer_7/attention/self/query/kernel:0', 'bert/encoder/layer_7/attention/self/query/bias:0', 'bert/encoder/layer_7/attention/self/key/kernel:0', 'bert/encoder/layer_7/attention/self/key/bias:0', 'bert/encoder/layer_7/attention/self/value/kernel:0', 'bert/encoder/layer_7/attention/self/value/bias:0', 'bert/encoder/layer_7/attention/output/dense/kernel:0', 'bert/encoder/layer_7/attention/output/dense/bias:0', 'bert/encoder/layer_7/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_7/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_7/intermediate/kernel:0', 'bert/encoder/layer_7/intermediate/bias:0', 'bert/encoder/layer_7/output/dense/kernel:0', 'bert/encoder/layer_7/output/dense/bias:0', 'bert/encoder/layer_7/output/LayerNorm/gamma:0', 'bert/encoder/layer_7/output/LayerNorm/beta:0', 'bert/encoder/layer_8/attention/self/query/kernel:0', 'bert/encoder/layer_8/attention/self/query/bias:0', 'bert/encoder/layer_8/attention/self/key/kernel:0', 'bert/encoder/layer_8/attention/self/key/bias:0', 'bert/encoder/layer_8/attention/self/value/kernel:0', 'bert/encoder/layer_8/attention/self/value/bias:0', 'bert/encoder/layer_8/attention/output/dense/kernel:0', 'bert/encoder/layer_8/attention/output/dense/bias:0', 'bert/encoder/layer_8/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_8/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_8/intermediate/kernel:0', 'bert/encoder/layer_8/intermediate/bias:0', 'bert/encoder/layer_8/output/dense/kernel:0', 'bert/encoder/layer_8/output/dense/bias:0', 'bert/encoder/layer_8/output/LayerNorm/gamma:0', 'bert/encoder/layer_8/output/LayerNorm/beta:0', 'bert/encoder/layer_9/attention/self/query/kernel:0', 'bert/encoder/layer_9/attention/self/query/bias:0', 'bert/encoder/layer_9/attention/self/key/kernel:0', 'bert/encoder/layer_9/attention/self/key/bias:0', 'bert/encoder/layer_9/attention/self/value/kernel:0', 'bert/encoder/layer_9/attention/self/value/bias:0', 'bert/encoder/layer_9/attention/output/dense/kernel:0', 'bert/encoder/layer_9/attention/output/dense/bias:0', 'bert/encoder/layer_9/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_9/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_9/intermediate/kernel:0', 'bert/encoder/layer_9/intermediate/bias:0', 'bert/encoder/layer_9/output/dense/kernel:0', 'bert/encoder/layer_9/output/dense/bias:0', 'bert/encoder/layer_9/output/LayerNorm/gamma:0', 'bert/encoder/layer_9/output/LayerNorm/beta:0', 'bert/encoder/layer_10/attention/self/query/kernel:0', 'bert/encoder/layer_10/attention/self/query/bias:0', 'bert/encoder/layer_10/attention/self/key/kernel:0', 'bert/encoder/layer_10/attention/self/key/bias:0', 'bert/encoder/layer_10/attention/self/value/kernel:0', 'bert/encoder/layer_10/attention/self/value/bias:0', 'bert/encoder/layer_10/attention/output/dense/kernel:0', 'bert/encoder/layer_10/attention/output/dense/bias:0', 'bert/encoder/layer_10/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_10/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_10/intermediate/kernel:0', 'bert/encoder/layer_10/intermediate/bias:0', 'bert/encoder/layer_10/output/dense/kernel:0', 'bert/encoder/layer_10/output/dense/bias:0', 'bert/encoder/layer_10/output/LayerNorm/gamma:0', 'bert/encoder/layer_10/output/LayerNorm/beta:0', 'bert/encoder/layer_11/attention/self/query/kernel:0', 'bert/encoder/layer_11/attention/self/query/bias:0', 'bert/encoder/layer_11/attention/self/key/kernel:0', 'bert/encoder/layer_11/attention/self/key/bias:0', 'bert/encoder/layer_11/attention/self/value/kernel:0', 'bert/encoder/layer_11/attention/self/value/bias:0', 'bert/encoder/layer_11/attention/output/dense/kernel:0', 'bert/encoder/layer_11/attention/output/dense/bias:0', 'bert/encoder/layer_11/attention/output/LayerNorm/gamma:0', 'bert/encoder/layer_11/attention/output/LayerNorm/beta:0', 'bert/encoder/layer_11/intermediate/kernel:0', 'bert/encoder/layer_11/intermediate/bias:0', 'bert/encoder/layer_11/output/dense/kernel:0', 'bert/encoder/layer_11/output/dense/bias:0', 'bert/encoder/layer_11/output/LayerNorm/gamma:0', 'bert/encoder/layer_11/output/LayerNorm/beta:0'] when minimizing the loss.

Can anyone help me, THX!

test_albert_chinese_weights FAILED

Hi, it seems that this version fails to load the brightmart weights.

test_load_pretrained_weights.py::TestLoadPreTrainedWeights::test_albert_chinese_weights FAILED [100%]Already  fetched:  albert_base_zh.zip
already unpacked at: .models/albert_base_zh
bert/embeddings/word_embeddings/embeddings:0
bert/embeddings/word_embeddings_projector/projector:0
bert/embeddings/word_embeddings_projector/bias:0
bert/embeddings/token_type_embeddings/embeddings:0
bert/embeddings/position_embeddings/embeddings:0
bert/embeddings/LayerNorm/gamma:0
bert/embeddings/LayerNorm/beta:0
bert/encoder/layer_shared/attention/self/query/kernel:0
bert/encoder/layer_shared/attention/self/query/bias:0
bert/encoder/layer_shared/attention/self/key/kernel:0
bert/encoder/layer_shared/attention/self/key/bias:0
bert/encoder/layer_shared/attention/self/value/kernel:0
bert/encoder/layer_shared/attention/self/value/bias:0
bert/encoder/layer_shared/attention/output/dense/kernel:0
bert/encoder/layer_shared/attention/output/dense/bias:0
bert/encoder/layer_shared/attention/output/LayerNorm/gamma:0
bert/encoder/layer_shared/attention/output/LayerNorm/beta:0
bert/encoder/layer_shared/intermediate/kernel:0
bert/encoder/layer_shared/intermediate/bias:0
bert/encoder/layer_shared/output/dense/kernel:0
bert/encoder/layer_shared/output/dense/bias:0
bert/encoder/layer_shared/output/LayerNorm/gamma:0
bert/encoder/layer_shared/output/LayerNorm/beta:0
Loading brightmart/albert_zh weights...
loader: No value for:[bert/embeddings/word_embeddings_projector/bias:0], i.e.:[bert/embeddings/word_embeddings_2/bias] in:[.models/albert_base_zh/albert_model.ckpt]
loader: Skipping weight:[bert/embeddings/token_type_embeddings/embeddings:0] as the weight shape:[(2, 128)] is not compatible with the checkpoint:[bert/embeddings/token_type_embeddings] shape:(2, 768)
loader: Skipping weight:[bert/embeddings/position_embeddings/embeddings:0] as the weight shape:[(512, 128)] is not compatible with the checkpoint:[bert/embeddings/position_embeddings] shape:(512, 768)
loader: Skipping weight:[bert/embeddings/LayerNorm/gamma:0] as the weight shape:[(128,)] is not compatible with the checkpoint:[bert/embeddings/LayerNorm/gamma] shape:(768,)
loader: Skipping weight:[bert/embeddings/LayerNorm/beta:0] as the weight shape:[(128,)] is not compatible with the checkpoint:[bert/embeddings/LayerNorm/beta] shape:(768,)
Done loading 18 BERT weights from: .models/albert_base_zh/albert_model.ckpt into <bert.model.BertModelLayer object at 0x7fb5fec254a8> (prefix:bert). Count of weights not found in the checkpoint was: [1]. Count of weights with mismatched shape: [4]
Unused weights from checkpoint: 
	bert/embeddings/LayerNorm/beta
	bert/embeddings/LayerNorm/gamma
	bert/embeddings/position_embeddings
	bert/embeddings/token_type_embeddings
	bert/pooler/dense/bias
	bert/pooler/dense/kernel
	cls/predictions/output_bias
	cls/predictions/transform/LayerNorm/beta
	cls/predictions/transform/LayerNorm/gamma
	cls/predictions/transform/dense/bias
	cls/predictions/transform/dense/kernel
	cls/seq_relationship/output_bias
	cls/seq_relationship/output_weights

 
4 != 0

Expected :0
Actual   :4

Exception with TFLiteConverter convert method call

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input
from bert import BertModelLayer, params_from_pretrained_ckpt, load_bert_weights
from tensorflow.compat.v2.lite import TFLiteConverter

bert_params = params_from_pretrained_ckpt(folder)
l_bert = BertModelLayer.from_params(bert_params, name="bert")

max_seq_len = 512
l_input_ids = Input(shape=(max_seq_len,), dtype='int32')
l_token_type_ids = Input(shape=(max_seq_len,), dtype='int32')

output = l_bert([l_input_ids, l_token_type_ids])

model = Model(inputs=[l_input_ids, l_token_type_ids], outputs=output)
model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])

load_bert_weights(l_bert, checkpoint_path)

converter = TFLiteConverter.from_keras_model(model)
tflite_model = converter.convert()
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/importer.py in _import_graph_def_internal(graph_def, input_map, return_elements, validate_colocation_constraints, name, op_dict, producer_op_list)
    500         results = c_api.TF_GraphImportGraphDefWithResults(
--> 501             graph._c_graph, serialized, options)  # pylint: disable=protected-access
    502         results = c_api_util.ScopedTFImportGraphDefResults(results)

InvalidArgumentError: Input 0 of node model_6/bert/embeddings/word_embeddings/embedding_lookup was passed float from model_6/bert/embeddings/word_embeddings/embedding_lookup/Read/ReadVariableOp/resource:0 incompatible with expected resource.

During handling of the above exception, another exception occurred:

ValueError                                Traceback (most recent call last)
 in 
     21 
     22 converter = TFLiteConverter.from_keras_model(model)
---> 23 tflite_model = converter.convert()

/usr/local/lib/python3.6/dist-packages/tensorflow_core/lite/python/lite.py in convert(self)
    403 
    404     frozen_func = _convert_to_constants.convert_variables_to_constants_v2(
--> 405         self._funcs[0], lower_control_flow=False)
    406     input_tensors = [
    407         tensor for tensor in frozen_func.inputs

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/convert_to_constants.py in convert_variables_to_constants_v2(func, lower_control_flow)
    573   output_graph_def.versions.CopyFrom(graph_def.versions)
    574   return _construct_concrete_function(func, output_graph_def,
--> 575                                       converted_input_indices)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/convert_to_constants.py in _construct_concrete_function(func, output_graph_def, converted_input_indices)
    369   new_func = wrap_function.function_from_graph_def(output_graph_def,
    370                                                    new_input_names,
--> 371                                                    new_output_names)
    372 
    373   # Manually propagate shape for input tensors where the shape is not correctly

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/wrap_function.py in function_from_graph_def(graph_def, inputs, outputs)
    618     importer.import_graph_def(graph_def, name="")
    619 
--> 620   wrapped_import = wrap_function(_imports_graph_def, [])
    621   import_graph = wrapped_import.graph
    622   return wrapped_import.prune(

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/wrap_function.py in wrap_function(fn, signature, name)
    596           signature=signature,
    597           add_control_dependencies=False,
--> 598           collections={}),
    599       variable_holder=holder,
    600       signature=signature)

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    913                                           converted_func)
    914 
--> 915       func_outputs = python_func(*func_args, **func_kwargs)
    916 
    917       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/wrap_function.py in __call__(self, *args, **kwargs)
     81 
     82   def __call__(self, *args, **kwargs):
---> 83     return self.call_with_variable_creator_scope(self._fn)(*args, **kwargs)
     84 
     85   def call_with_variable_creator_scope(self, fn):

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/wrap_function.py in wrapped(*args, **kwargs)
     87     def wrapped(*args, **kwargs):
     88       with variable_scope.variable_creator_scope(self.variable_creator_scope):
---> 89         return fn(*args, **kwargs)
     90 
     91     return wrapped

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/wrap_function.py in _imports_graph_def()
    616 
    617   def _imports_graph_def():
--> 618     importer.import_graph_def(graph_def, name="")
    619 
    620   wrapped_import = wrap_function(_imports_graph_def, [])

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/util/deprecation.py in new_func(*args, **kwargs)
    505                 'in a future version' if date is None else ('after %s' % date),
    506                 instructions)
--> 507       return func(*args, **kwargs)
    508 
    509     doc = _add_deprecated_arg_notice_to_docstring(

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/importer.py in import_graph_def(graph_def, input_map, return_elements, name, op_dict, producer_op_list)
    403       name=name,
    404       op_dict=op_dict,
--> 405       producer_op_list=producer_op_list)
    406 
    407 

/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/importer.py in _import_graph_def_internal(graph_def, input_map, return_elements, validate_colocation_constraints, name, op_dict, producer_op_list)
    503       except errors.InvalidArgumentError as e:
    504         # Convert to ValueError for backwards compatibility.
--> 505         raise ValueError(str(e))
    506 
    507     # Create _DefinedFunctions for any imported functions.

ValueError: Input 0 of node model_6/bert/embeddings/word_embeddings/embedding_lookup was passed float from model_6/bert/embeddings/word_embeddings/embedding_lookup/Read/ReadVariableOp/resource:0 incompatible with expected resource.

PyPi package does not work with TF-GPU-2.0.0-alpha0

KPE,

Just saw your post re BERT for TF 2.0. Looks interesting but I am already stuck out of the gate. Thought you might be able to help.

I have the following issue (with error messages):

pip install --user tensorflow-gpu==2.0.0-alpha0
pip install --user bert-for-tf2
Collecting bert-for-tf2
  Downloading https://files.pythonhosted.org/packages/29/71/0ed46e4c3f8791d0b86e7283474bb20549e16d1445d5fba5205c984
145cf/bert-for-tf2-0.1.5.tar.gz
Collecting tensorflow>=1.13.99 (from bert-for-tf2)
  Could not find a version that satisfies the requirement tensorflow>=1.13.99 (from bert-for-tf2) (from versions: 1
.13.0rc1, 1.13.0rc2, 1.13.1, 1.14.0rc0, 2.0.0a0)
No matching distribution found for tensorflow>=1.13.99 (from bert-for-tf2)

In short, I am getting the error even though it looks like TF 2.0 is installed. Proof of installation is as follows:

ralph_a_brooks@main-keras-p4-instance:~$ python
Python 3.7.0 (default, Oct  9 2018, 10:31:47) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.__version__
'2.0.0-alpha0'

Any thoughts as to what is going wrong?

Best,

Ralph

Word Vectors and Sentence Vectors

Ok. I got it how to use a pre-trained model. But I was looking for how do I get the word vectors and also the doctovec property of Bert ?

adapter-BERT: loader reports missing weights

I am using Tensorflow 2.0 and encountered some problems while loading pre-trained model weights. I tried different models but that did not change anything (like uncased_L-12_H-768_A-12). Calling load_stock_weights(l_bert, bert_ckpt_file) causes this:

loader: No value for:[bert_5/encoder/layer_0/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_0/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_0/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_0/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_0/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_0/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_0/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_0/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_0/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_0/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_0/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_0/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_0/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_0/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_0/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_0/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_1/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_1/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_1/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_1/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_1/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_1/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_1/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_1/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_1/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_1/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_1/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_1/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_1/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_1/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_1/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_1/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_2/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_2/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_2/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_2/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_2/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_2/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_2/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_2/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_2/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_2/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_2/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_2/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_2/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_2/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_2/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_2/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_3/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_3/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_3/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_3/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_3/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_3/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_3/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_3/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_3/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_3/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_3/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_3/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_3/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_3/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_3/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_3/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_4/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_4/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_4/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_4/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_4/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_4/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_4/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_4/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_4/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_4/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_4/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_4/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_4/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_4/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_4/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_4/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_5/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_5/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_5/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_5/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_5/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_5/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_5/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_5/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_5/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_5/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_5/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_5/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_5/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_5/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_5/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_5/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_6/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_6/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_6/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_6/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_6/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_6/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_6/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_6/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_6/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_6/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_6/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_6/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_6/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_6/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_6/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_6/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_7/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_7/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_7/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_7/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_7/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_7/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_7/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_7/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_7/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_7/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_7/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_7/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_7/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_7/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_7/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_7/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_8/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_8/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_8/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_8/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_8/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_8/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_8/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_8/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_8/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_8/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_8/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_8/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_8/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_8/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_8/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_8/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_9/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_9/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_9/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_9/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_9/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_9/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_9/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_9/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_9/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_9/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_9/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_9/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_9/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_9/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_9/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_9/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_10/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_10/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_10/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_10/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_10/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_10/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_10/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_10/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_10/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_10/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_10/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_10/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_10/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_10/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_10/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_10/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_11/attention/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_11/attention/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_11/attention/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_11/attention/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_11/attention/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_11/attention/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_11/attention/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_11/attention/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_11/output/adapter-down/kernel:0], i.e.:[bert/encoder/layer_11/output/adapter-down/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_11/output/adapter-down/bias:0], i.e.:[bert/encoder/layer_11/output/adapter-down/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_11/output/adapter-up/kernel:0], i.e.:[bert/encoder/layer_11/output/adapter-up/kernel] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
loader: No value for:[bert_5/encoder/layer_11/output/adapter-up/bias:0], i.e.:[bert/encoder/layer_11/output/adapter-up/bias] in:[.model/uncased_L-12_H-768_A-12/bert_model.ckpt]
Done loading 196 BERT weights from: .model/uncased_L-12_H-768_A-12/bert_model.ckpt into <bert.model.BertModelLayer object at 0x7fe01aa6c630> (prefix:bert_5). Count of weights not found in the checkpoint was: [96]. Count of weights with mismatched shape: [0]
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_ids (InputLayer)       [(None, 256)]             0         
_________________________________________________________________
bert (BertModelLayer)        (None, 256, 768)          111269376 
=================================================================
Total params: 111,269,376
Trainable params: 0
Non-trainable params: 111,269,376
_________________________________________________________________

I created a simple Google Colab notebook with the code: https://colab.research.google.com/drive/13vHlKukauXaOtRl3sCROYxGLw6YEO-2k

% tensorflow_version 2.x

!pip install tqdm  >> /dev/null
!pip install bert-for-tf2 >> /dev/null

import os
import math
import datetime
from tqdm import tqdm
import pandas as pd
import numpy as np
import tensorflow as tf

import bert
from bert import BertModelLayer
from bert.loader import StockBertConfig, map_stock_config_to_params, load_stock_weights
from bert.tokenization import FullTokenizer

# pretrained model

bert_model_dir="2018_10_18"
bert_model_name="uncased_L-12_H-768_A-12"

!mkdir -p .model .model/$bert_model_name

for fname in ["bert_config.json", "vocab.txt", "bert_model.ckpt.meta", "bert_model.ckpt.index", "bert_model.ckpt.data-00000-of-00001"]:
  cmd = f"gsutil cp gs://bert_models/{bert_model_dir}/{bert_model_name}/{fname} .model/{bert_model_name}"
  !$cmd

bert_ckpt_dir    = os.path.join(".model/",bert_model_name)
bert_ckpt_file   = os.path.join(bert_ckpt_dir, "bert_model.ckpt")
bert_config_file = os.path.join(bert_ckpt_dir, "bert_config.json")

# create custom model

bert_params = bert.params_from_pretrained_ckpt(bert_ckpt_dir)
bert_params.adapter_size = 64

l_bert = BertModelLayer.from_params(bert_params, name="bert")
l_bert.apply_adapter_freeze()

max_seq_len = 256

input_ids = tf.keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="input_ids")
output = l_bert(input_ids)          

model = tf.keras.Model(inputs=input_ids, outputs=output)
model.build(input_shape=[(None, max_seq_len)])

load_stock_weights(l_bert, bert_ckpt_file)

model.summary()

Is this an expected behavior, a bug or did I miss something?
Thanks for your help.

Can't freeze pre-trained params

Thanks for releasing this repo, but I met a problem. I can.t freeze pre-trained params, I use the following code to frezze params, but it did't work.
`model_dir = "D:/ProgramData/Pre_Traines_Model_Of_Bert/chinese_L-12_H-768_A-12"

bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")
l_bert.apply_adapter_freeze()
max_seq_len = 128
l_input_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32')
# l_token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32')

# using the default token_type/segment id 0
output = l_bert(l_input_ids)  # output: [batch_size, max_seq_len, hidden_size]
model = keras.Model(inputs=l_input_ids, outputs=output)
model.build(input_shape=(None, max_seq_len))
bert.loader.load_stock_weights(l_bert, os.path.join(model_dir, "bert_model.ckpt"))
model.summary()`

And I get following result:

1576581406(1)
As you see, all params are trainable, Could u help me ASAP, THX! :)

Switched multi-output prediction arrays

Hello @kpe,
first of all, thank you for sharing your very useful repos.

Using bert-for-tf2 for a QA task I created a model with 4 outputs.
For the model creation I specify the outputs as a dictionary with keys in this order:
"id", "start", "end", "type".
When predicting I would expect the predictions output as a list of arrays in the same order:
[id, start, end, type]
but I receive instead a list in the following order:
[type, end, start, id]
(actually, "end" and "start" have the same shape so I am not sure about their relative positions).

Is this "position-switching" of the output arrays a normal behavior or could it be an issue with bert-for-tf2?

Thank you

How to load vocfile for Albert

Hello,

I tried to implement your library in order to fine tune Albert:

def create_model(max_seq_len):
    """Creates a classification model."""
    albert_model_name = "albert_base"
    albert_dir = bert.fetch_tfhub_albert_model(albert_model_name, ".models")

    albert_params = bert.albert_params(albert_model_name)
    l_bert = bert.BertModelLayer.from_params(albert_params, name="albert")
        
    input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="input_ids")
    token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="token_type_ids")
    output         = l_bert([input_ids, token_type_ids])

    print("bert shape", output.shape)
    output = keras.layers.Lambda(lambda x: x[:, 0, :])(output)
    output = keras.layers.Dense(1, activation="sigmoid")(output)

    model = keras.Model(inputs=[input_ids, token_type_ids], outputs=output)
    model.build(input_shape=[(None, max_seq_len)])


    for weight in l_bert.weights:
        print(weight.name)


    model.compile(optimizer=keras.optimizers.Adam(),
        loss="binary_crossentropy",
        metrics=["accuracy", Precision(), Recall()])

    bert.load_albert_weights(l_bert, albert_dir)
    
    model.summary()
        
    return model

When training, I have an index issue regarding the embeddings (my assumption is that the vocab / vocab_size is different between bert and albert). I tried to import it from tf_hub but wasn't able to find it. Am I missing something?

Thank you in advance! You are doing an amazing work!

Inconsistency in number of parameters for the loaded original pre-trained BERT models

Hi, I tried creating the BertModelLayer using the parameters file of original pre-trained BERT models. As per the official repo of BERT, both uncased_L-12_H-768_A-12 and multi_cased_L-12_H-768_A-12 should have 110M parameters. But when I create BertModelLayer using bert-for-tf2 for these models, they have 108M and 177M parameters respectively. Is this the expected behavior? What can be the reasons?

Code:

model_dir = 'uncased_L-12_H-768_A-12'   # 'multi_cased_L-12_H-768_A-12'  

bert_params = params_from_pretrained_ckpt(model_dir)
l_bert = BertModelLayer.from_params(bert_params, name="bert")

max_seq_len = 256
l_input_ids      = keras.layers.Input(shape=(max_seq_len,), dtype='int32')
l_token_type_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32')

output = l_bert([l_input_ids, l_token_type_ids])          
model = keras.Model(inputs=[l_input_ids, l_token_type_ids], outputs=output)
model.build(input_shape=[(None, max_seq_len), (None, max_seq_len)])
model.summary()

Output for uncased_L-12_H-768_A-12

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 256)]        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 256)]        0                                            
__________________________________________________________________________________________________
bert (BertModelLayer)           (None, 256, 768)     108891648   input_1[0][0]                    
                                                                 input_2[0][0]                    
==================================================================================================
Total params: 108,891,648
Trainable params: 108,891,648
Non-trainable params: 0

Output for multi_cased_L-12_H-768_A-12

Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            [(None, 256)]        0                                            
__________________________________________________________________________________________________
input_2 (InputLayer)            [(None, 256)]        0                                            
__________________________________________________________________________________________________
bert (BertModelLayer)           (None, 256, 768)     177262848   input_1[0][0]                    
                                                                 input_2[0][0]                    
==================================================================================================
Total params: 177,262,848
Trainable params: 177,262,848
Non-trainable params: 0

bert-pretrain

Hi,
How can I pre-train bert using bert-for-tf2.
We need,
input_ids, mask_ids and segment_ids.
Once we prepare that and making vocab as well as config, How to proceed with bert-for-tf2.

bert-for-tf2 for Sentiment Analysis

Hi,
Can anyone provide me with a guide on how to use bert-for-tf2 for a custom task like sentiment analysis.
I have been trying but to no avail. My model is training but its not giving me any accuracy.

Below is the method I am using

model_name = "multi_cased_L-12_H-768_A-12"
model_dir = "models/multi_cased_L-12_H-768_A-12"
model_ckpt = os.path.join(model_dir, "bert_model.ckpt")
bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert", trainable=True)

l_input_ids      = tf.keras.layers.Input(shape=(max_seq_len,), dtype='int32')
l_token_type_ids = tf.keras.layers.Input(shape=(max_seq_len,), dtype='int32')

l_bert_o = l_bert(l_input_ids)
conv_blocks = []
for k_size in FILTER_SIZES:
    conv = tf.keras.layers.Conv1D(filters=NUM_FILTERS,
                         kernel_size=k_size,
                         padding="valid",
                         activation="relu",
                         strides=1)(l_bert_o)
    conv = tf.keras.layers.MaxPooling1D(pool_size=max_seq_len - k_size + 1)(conv)
    conv = tf.keras.layers.Flatten()(conv)
    conv_blocks.append(conv)
concat = tf.keras.layers.Concatenate()(conv_blocks) if len(conv_blocks) > 1 else conv_blocks[0]
concat = tf.keras.layers.Dropout(0.2)(concat)
x = tf.keras.layers.Dense(256, activation="relu")(concat)
output = tf.keras.layers.Dense(3, activation="softmax")(x)
model = tf.keras.models.Model(inputs=l_input_ids, outputs=output)

model.compile(loss="categorical_crossentropy", optimizer="rmsprop",
              metrics=["accuracy"])

bert.load_bert_weights(l_bert, model_ckpt)
model.summary()

bert_custom model

Hi there,
Any idea how to use a Convolution Network or LSTM for sentiment Analysis using bert-for-tf2 embeddings. Will I have to replace Lambda layer or any other method

What would be a good way to pad input texts?

Currently, I am just adding 0s to token_ids to match max_seq_len.
`def tokenize_text(texts):
model_name = "albert_base"
max_seq_len = 64
model_dir = bert.fetch_tfhub_albert_model(model_name, ".models")
spm_model = os.path.join(model_dir, "assets", "30k-clean.model")
sp = spm.SentencePieceProcessor()
sp.load(spm_model)
do_lower_case = True

tokenized = []
for text in texts:
    processed_text = bert.albert_tokenization.preprocess_text(text, lower=do_lower_case)
    token_ids = bert.albert_tokenization.encode_ids(sp, processed_text)
    token_ids = np.append(token_ids, np.zeros(max_seq_len-len(token_ids)))
    tokenized.append(token_ids)
return np.array(tokenized)`

However I found out even the zero tokens were embedded to non-zero vectors. Is this something I have to worry about? If it is, what is the proper way of padding input texts?

Load Google's official Chinese ALBERT model

Hi, I'm trying to load the newly released official Chinese models from Google. These models aren't in tfhub format so I have to load them through load_stock_weights. However, load_stock_weights seems to be wrote for brightmart's version, of which map_to_stock_variable_name returns wrong names. Interestingly, map_to_tfhub_albert_variable_name makes correct guess.

Unused weights

When I load a pre-trained bert model(multilingual) it seems that some weights from the checkpoint are unused.

I load the model in this way.

self.model_dir ="PretrainedModels/multilingual_L-12_H-768_A-12" 
self.max_seq_len = 64
bert_params = bert.params_from_pretrained_ckpt(self.model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")
l_input_ids = keras.layers.Input(shape=(self.max_seq_len,), dtype='int32')
model = keras.Sequential()
model.add(l_input_ids)
model.add(l_bert)
model.add(keras.layers.GlobalAveragePooling1D())
output = model(l_input_ids)
self.model = keras.Model(inputs=l_input_ids, outputs=output)
self.model.build(input_shape=(None, self.max_seq_len))
bert_ckpt_file   = os.path.join(self.model_dir, "bert_model.ckpt")
bert.load_stock_weights(l_bert, bert_ckpt_file)

The output from the console seems to show these weights:

Done loading 196 BERT weights from: PretrainedModels/multilingual_L-12_H-768_A-12\bert_model.ckpt into <bert.model.BertModelLayer object at 0x00000209EA5B3188> (prefix:bert). Count of weights not found in the checkpoint was: [0]. Count of weights with mismatched shape: [0]
unused weights from checkpoint:

bert/embeddings/token_type_embeddings
bert/pooler/dense/bias
bert/pooler/dense/kernel
cls/predictions/output_bias
cls/predictions/transform/LayerNorm/beta
cls/predictions/transform/LayerNorm/gamma
cls/predictions/transform/dense/bias
cls/predictions/transform/dense/kernel
cls/seq_relationship/output_bias
cls/seq_relationship/output_weights

Constructor with Params not working

Hi!

The BertModelLayer constructor expects 0 params (only self):
TypeError: init() takes 1 positional argument but 2 were given

The example at the Readme uses BertModelLayer(BertModelLayer.Params(...)) for original bert params, but it's not working.

Fine-tuning bert-for-tf2 on a Q&A task

Hi and thanks for this great repo,
I was trying to adapt bert for tf2.0 but I'm too a novice for this.
Now my question is: how do I fine-tune this for a personal dataset?
My goal is to make a Q&A system using bert ,
Thank you very much
Vincenzo

load bert chinese pretrained model problems

model_name = "chinese_L-12_H-768_A-12.zip"
model_dir = bert.fetch_google_bert_model(model_name,'models')
model_ckpt = os.path.join(model_dir, "bert_model.ckpt")

bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")

use in Keras Model here, and call model.build()

bert.load_bert_weights(l_bert, model_ckpt) # should be called after model.build()

problems:

NotFoundError: models/chinese_L-12_H-768_A-12/chinese_L-12_H-768_A-12.zip; No such file or directory

what should i do

Can't Load Google research model

Hi and thanks for this great repo.
I was trying to fine tunning google research bert model.
But I can't do.

import bert
import os

model_name = "uncased_L-12_H-768_A-12"
model_dir = bert.fetch_google_bert_model(model_name, ".models")
model_ckpt = os.path.join(model_dir, "bert_model.ckpt")

bert_params = bert.params_from_pretrained_ckpt(model_dir)
l_bert = bert.BertModelLayer.from_params(bert_params, name="bert")

# use in Keras Model here, and call model.build()

bert.load_bert_weights(l_bert, model_ckpt)      # should be called after model.build()

then,

Already  fetched:  uncased_L-12_H-768_A-12.zip
already unpacked at: .models/uncased_L-12_H-768_A-12
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-5b1d4714b5bb> in <module>
     10 # use in Keras Model here, and call model.build()
     11 
---> 12 bert.load_bert_weights(l_bert, model_ckpt)      # should be called after model.build()

~/anaconda3/envs/tensorflow2.0/lib/python3.6/site-packages/bert/loader.py in load_stock_weights(bert, ckpt_path)
    200     stock_weights = set(ckpt_reader.get_variable_to_dtype_map().keys())
    201 
--> 202     prefix = bert_prefix(bert)
    203 
    204     loaded_weights = set()

~/anaconda3/envs/tensorflow2.0/lib/python3.6/site-packages/bert/loader.py in bert_prefix(bert)
    179 def bert_prefix(bert: BertModelLayer):
    180     re_bert = re.compile(r'(.*)/(embeddings|encoder)/(.+):0')
--> 181     match = re_bert.match(bert.weights[0].name)
    182     assert match, "Unexpected bert layer: {} weight:{}".format(bert, bert.weights[0].name)
    183     prefix = match.group(1)

IndexError: list index out of range

Installation issues for tensorflow-gpu

I was trying to set up a new conda environment to use this package. I want to use tensorflow with gpu support so I first run pip install tensorflow-gpu==1.14. Running tensorflow.test.is_gpu_available() returns True and all works.

I then run pip install bert-for-tf2 and in the process of installing the requirements, params-flow has a requirement of tensorflow, so tensorflow==1.14 gets installed at this time. Now when running 'tensorflow.test.is_gpu_available()` False is returned. Both tensorflow and tensorflow-gpu are in the conda environment.

I'm not sure what is the proper way to deal with this inconsistency, but the way I got around it was to first install tensorflow-gpu. Then clone the repo for params-flow, remove tensorflow from it's requirements and then install with python setup.py install from within the repo. Finally bert-for-tf2 can be installed with pip and all seems to work.

EDIT:

Just read one of the closed issues, and things seem to work if bert-for-tf2 is installed first and then a gpu version of tensorflow is installed afterwards.

Serving through API

Hello @kpe

I have a repo showcasing how to serve vanilla BERT multiclass predictions through a dockerized flask API. The repo contains a complete example all the way from training, to serving through the API. I recently bumped the project to Tensorflow 2.0 using kpe/bert-for-tf2. I thought it might be a useful ressource. Feel free to check it out: sarnikowski/bert_in_a_flask. Thanks.

tf.logging does not exists in TF 2.0

tf.logging was removed from TF 2.0 but albert_tokenization.py has few instances of tf.logging, I suggest changing it to tf.compat.v1.logging or replacing it with absl-py as per docs.

Fine Tune Layer Possible Error in Examples

When creating the fine-tuning layers, I notice we only take the first element of the second dimension in the bert layer. The bert layer's dimensions are [batch_size, token_id, bert_embedding_dims], meaning the Lambda operation below (which is shown in both the GPU and TPU examples), is just taking the first token to do any learning under a classification task. Is this an error? Or perhaps I am misinterpreting the layer architecture here.

  input_ids = keras.layers.Input(shape=(max_seq_len,), dtype='int32', name="input_ids")
  output = bert(input_ids)
  cls_out = keras.layers.Lambda(lambda seq: seq[:, 0, :])(output)

the num_labels in bert_config dont work?

the num_labels in bert_config dont work?i want to make a 27 classes classifier,but i add
"num_labels": 28 to bert_config , it wont work,still came out with error
Received a label value of 26 which is outside the valid range of [0, 2). Label val....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.