dbmdz / berts Goto Github PK
View Code? Open in Web Editor NEWDBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models
License: MIT License
DBMDZ BERT, DistilBERT, ELECTRA, GPT-2 and ConvBERT models
License: MIT License
Hi guys, tremendous work, I love using your german pretrained models. I was wondering if there is any plan to release a pretrained version of ELECTRA. Its seems like a really good approach.
Thanks!
Luca :)
Can you please clarify the license for these models?
Hello, I'd need the Tensorflow checkpoints for the german BERT model. I saw an older issue with the links but they don't work anymore.
Thanks!
Hey,
is this here
The source data for the model consists of a recent Wikipedia dump, EU Bookshop corpus, Open Subtitles, CommonCrawl, ParaCrawl and News Crawl. This results in a dataset with a size of 16GB and 2,350,234,427 tokens.
For sentence splitting, we use spacy. Our preprocessing steps (sentence piece model for vocab generation) follow those used for training SciBERT. The model is trained with an initial sequence length of 512 subwords and was performed for 1.5M steps.
the only infos about the german bert model? Or is there any paper about it?
I am interested in the News Crawl. Do you know what "news" are within (I think you used a language classificator to get german news only?)? Is there really a backup of online news sources like spiegel.de/WELT/n-tv...? That is currently also my objective how I can get such german news articles text?
Thank you!
PS:
What are the difference between these models?
dbmdz/bert-base-german-uncased and
bert-base-german-dbmdz-uncased
Hi, could you please specify in what percentage Wikipedia, the OPUS and the OSCAR corpus were used for training ita-bert-xxl? Thanks
Hi,
do you sampled each dataset (Wikipedia, Common Crawl, Subtitles etc.) equally during German-BERT Training?
OpenAI uses a unequal sampling, which may lead to a better result, as stated in the GPT-3 Paper:
Note that during training, datasetsare not sampled in proportion to their size, but rather datasets we view as higher-quality are
sampled more frequently,such that CommonCrawl and Books2 datasets are sampled less than once during training, but the other datasets aresampled 2-3 times. This essentially accepts a small amount of overfitting in exchange for higher quality training data
If yes, which paremeters do you used?
Greetings dear dbmdz team
One question that I would like to ask: as I see from HuggingFace page of the model, "The source data for the Italian BERT model consists of a recent Wikipedia dump and various texts from the OPUS corpora collection... For the XXL Italian models, we use the same training data from OPUS and extend it with data from the Italian part of the OSCAR corpus". Were these dataset originally written in Italian, or were they english text translated?
I'm asking this because my medical corpus was originally written in English, and I used the Google translated API to translate it. So I would like to estimate the bias introduced by this operation.
Many thanks,
Cheers
The vocab size declared in the file config.json
and the related dimensions of the model params (32102) are greater than the actual number of rows of the file vocab.txt
(31102).
Is there a reason for that or the vocabulary file is not the correct one?
How to user bert turkish sentiment cased model for calculating sentiment scores of sentences with more than 512 sequence length?
Hello all Dears,
How can we access free tpu via TFRC for pre-training BERT language Model on a specific language?
the link below explains that we have to sign up here: https://services.google.com/fb/forms/tpusignup/ , but it seems that it's a dead URL.
https://ai.googleblog.com/2017/05/introducing-tensorflow-research-cloud.html
Nothing happened By Apply now of : https://www.tensorflow.org/tfrc
Hi, I don't quite get how large is the training set of bert-ita-xxl-cased.
In the Hugging-face page is reported as size of the "training corpus" 13B tokens; is that the size of the set or of the entire dataset used?
Thanks
Hi,
I need to TensorFlow checkpoints for bert-base-turkish-cased model. Could you release the checkpoints?
Awesome work in creating another German BERT model trained on rather scientific texts, dbmdz team!
I would like to use your model with Bert-as-Service and would need the TF checkpoints for that. Do you have them somewhere laying around by any chance?
Hi, thanks again for these models! I was trying to use the bert-base-italian-xxl models, but I noticed that there is a discrepancy between the vocabulary size in the config.json file (32102) and the actual size of the vocabulary (31102). Is it possible that the wrong vocabulary is uploaded?
Hi there, thank you for all of the helpful advice on training Transformer models!
In your recent paper German’s Next Language Model, you compare ELECTRA and BERT at various checkpoints. I have tried to do the same thing on my own data. For ELECTRA it is working (saving every 50k steps to GCS bucket) but for BERT the checkpoints are being overwritten. I have tried setting a value for save_checkpoint_steps
but this still seems to just keep the 5 most-recent checkpoints. May I ask how you were able to keep the older checkpoints from being overwritten?
I am using the official BERT repository from Google: https://github.com/google-research/bert
Thanks!
Thanks for sharing.
I want to train a different language model (Hindi).
How did you train your bert-base-italian-* models? Are those steps covered anywhere?
Hi I'm trying to use dbmdz/bert-base-italian-xxl-cased
for creating a keras model for a classification task.
I've followed the documentation but I continue to receive the following error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[5,0] = 102 is not in [0, 2)
[[node functional_1/bert/embeddings/token_type_embeddings/embedding_lookup (defined at /anaconda3/envs/profanity-detector/lib/python3.7/site-packages/transformers/modeling_tf_bert.py:186) ]] [Op:__inference_train_function_29179]
This is the model:
from transformers import TFBertModel, BertTokenizer
bert_model = TFBertModel.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
tokenizer = BertTokenizer.from_pretrained("dbmdz/bert-base-italian-xxl-cased")
input_ids = tf.keras.layers.Input(shape=(constants.MAX_SEQ_LENGTH,), dtype=tf.int32)
token_type_ids = tf.keras.layers.Input(shape=(constants.MAX_SEQ_LENGTH,), dtype=tf.int32)
attention_mask = tf.keras.layers.Input(shape=(constants.MAX_SEQ_LENGTH,), dtype=tf.int32)
seq_output, _ = bert_model({
"input_ids": input_ids,
"token_type_ids": token_type_ids,
"attention_mask": attention_mask
})
pooling = tf.keras.layers.GlobalAveragePooling1D()(seq_output)
dropout = tf.keras.layers.Dropout(0.2)(pooling)
output = tf.keras.layers.Dense(constants.CLASSES, activation="softmax")(dropout)
model = tf.keras.Model(
inputs=[input_ids, token_type_ids, attention_mask],
outputs=[output]
)
model.compile(optimizer=tf.optimizers.Adam(lr=0.00001), loss='sparse_categorical_crossentropy', metrics=['accuracy'])
My dataset is tokenized by this method:
def map_to_dict(self, input_ids, attention_masks, token_type_ids, labels):
return {
"input_ids": input_ids,
"token_type_ids": token_type_ids,
"attention_mask": attention_masks,
}, labels
def tokenize_sequences(self, tokenizer, max_sequence_length, data, labels):
try:
token_ids = []
token_type_ids = []
attention_mask = []
for sentence in data:
bert_input = tokenizer.encode_plus(
sentence,
add_special_tokens=True, # add [CLS], [SEP]
max_length=max_sequence_length, # max length of the text that can go to BERT
truncation=True,
pad_to_max_length=True, # add [PAD] tokens
return_attention_mask=True # add attention mask to not focus on pad tokens
)
token_ids.append(bert_input["input_ids"])
token_type_ids.append(bert_input["token_type_ids"])
attention_mask.append(bert_input["attention_mask"])
return tf.data.Dataset.from_tensor_slices((token_ids, token_type_ids, attention_mask, labels)).map(self.map_to_dict)
except Exception as e:
stacktrace = traceback.format_exc()
logger.error("{}".format(stacktrace))
raise e
ds_train_encoded = tokenize_sequences(tokenizer, 512, X_train, y_train).shuffle(10000).batch(6)
X_train examples:
["Questo video è davvero bellissimo", "La qualità del video non è proprio il massimo"......]
y_train examples:
[[1], [0]...]
I continue to receive the error described before.
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[5,0] = 102 is not in [0, 2)
[[node functional_1/bert/embeddings/token_type_embeddings/embedding_lookup (defined at /anaconda3/envs/profanity-detector/lib/python3.7/site-packages/transformers/modeling_tf_bert.py:186) ]] [Op:__inference_train_function_29179]
If I try to use TFBertForSequenceClassification everything works fine (for this reason I'm excluding tokenization problems).
Can you please provide a solution or a well documented guide for using TFBertModel class with Keras model (I cannot find it)?
Thank you
Hi,
thanks for the bert models. Is there any chance to see a uncased version of convbert-base-turkish anytime?
BR.
Hello there,
First of all, thank you very much for your awesome work! Such repositories are of great help.
I was wondering if it is possible to have access to tensorflow checkpoints for the italian pre-trained version of BERT (or if there's already some available). I'm currently looking for a good pre-trained italian model for research.
Thanks in advance!
Best,
Federico
Hi there,
First, thanks for making and sharing these German Bert models.
I am wondering if possible to fine-tune the models on question answering tasks? If so, what is the procedure to follow?
Thanks in advance.
Dear dbmdz team
I would like to use one of the italian BERT models that you ore-trained to create a model trained on a specific topic (medical languages).
I would like to ask a few things that are not entirely clear to me:
Sorry if these may be trivial questions. Thanks a lot
Hello,
I have 2 short questions:
is it correct that the model 'distilbert-base-german-cased' (https://huggingface.co/distilbert-base-german-cased) was distilled from the model 'dbmdz/bert-base-german-cased' (https://huggingface.co/dbmdz/bert-base-german-cased)?
is there a paper on the 'dbmdz/bert-base-german-cased' and / or the 'distilbert-base-german-cased' (which can also be used for citation purposes)?
Thanks in advance!
Hello, I'm raising this issue in order to ask if it is possible to get the weights of bert-base-italian-cased for Tensorflow.
TypeError Traceback (most recent call last)
Cell In[57], line 1
----> 1 TFBertEmbeddings = bert(input_ids,attention_mask = attention_mask)[1]
File ~/work/myenv/lib/python3.11/site-packages/tf_keras/src/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
File ~/work/myenv/lib/python3.11/site-packages/transformers/modeling_tf_utils.py:428, in unpack_inputs..run_call_with_unpacked_inputs(self, *args, **kwargs)
425 config = self.config
427 unpacked_inputs = input_processing(func, config, **fn_args_and_kwargs)
--> 428 return func(self, **unpacked_inputs)
File ~/work/myenv/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py:1234, in TFBertModel.call(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict, training)
1190 @unpack_inputs
1191 @add_start_docstrings_to_model_forward(BERT_INPUTS_DOCSTRING.format("batch_size, sequence_length"))
1192 @add_code_sample_docstrings(
(...)
1212 training: Optional[bool] = False,
1213 ) -> Union[TFBaseModelOutputWithPoolingAndCrossAttentions, Tuple[tf.Tensor]]:
1214 r"""
1215 encoder_hidden_states (tf.Tensor
of shape (batch_size, sequence_length, hidden_size)
, optional):
1216 Sequence of hidden-states at the output of the last layer of the encoder. Used in the cross-attention if
(...)
1232 past_key_values
). Set to False
during training, True
during generation
1233 """
-> 1234 outputs = self.bert(
1235 input_ids=input_ids,
1236 attention_mask=attention_mask,
1237 token_type_ids=token_type_ids,
1238 position_ids=position_ids,
1239 head_mask=head_mask,
1240 inputs_embeds=inputs_embeds,
1241 encoder_hidden_states=encoder_hidden_states,
1242 encoder_attention_mask=encoder_attention_mask,
1243 past_key_values=past_key_values,
1244 use_cache=use_cache,
1245 output_attentions=output_attentions,
1246 output_hidden_states=output_hidden_states,
1247 return_dict=return_dict,
1248 training=training,
1249 )
1250 return outputs
File ~/work/myenv/lib/python3.11/site-packages/transformers/modeling_tf_utils.py:428, in unpack_inputs..run_call_with_unpacked_inputs(self, *args, **kwargs)
425 config = self.config
427 unpacked_inputs = input_processing(func, config, **fn_args_and_kwargs)
--> 428 return func(self, **unpacked_inputs)
File ~/work/myenv/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py:912, in TFBertMainLayer.call(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict, training)
909 if token_type_ids is None:
910 token_type_ids = tf.fill(dims=input_shape, value=0)
--> 912 embedding_output = self.embeddings(
913 input_ids=input_ids,
914 position_ids=position_ids,
915 token_type_ids=token_type_ids,
916 inputs_embeds=inputs_embeds,
917 past_key_values_length=past_key_values_length,
918 training=training,
919 )
921 # We create a 3D attention mask from a 2D tensor mask.
922 # Sizes are [batch_size, 1, 1, to_seq_length]
923 # So we can broadcast to [batch_size, num_heads, from_seq_length, to_seq_length]
924 # this attention mask is more simple than the triangular masking of causal attention
925 # used in OpenAI GPT, we just need to prepare the broadcast dimension here.
926 attention_mask_shape = shape_list(attention_mask)
File ~/work/myenv/lib/python3.11/site-packages/transformers/models/bert/modeling_tf_bert.py:206, in TFBertEmbeddings.call(self, input_ids, position_ids, token_type_ids, inputs_embeds, past_key_values_length, training)
203 raise ValueError("Need to provide either input_ids
or input_embeds
.")
205 if input_ids is not None:
--> 206 check_embeddings_within_bounds(input_ids, self.config.vocab_size)
207 inputs_embeds = tf.gather(params=self.weight, indices=input_ids)
209 input_shape = shape_list(inputs_embeds)[:-1]
File ~/work/myenv/lib/python3.11/site-packages/transformers/tf_utils.py:163, in check_embeddings_within_bounds(tensor, embed_dim, tensor_name)
153 def check_embeddings_within_bounds(tensor: tf.Tensor, embed_dim: int, tensor_name: str = "input_ids") -> None:
154 """
155 tf.gather
, on which TF embedding layers are based, won't check positive out of bound indices on GPU, returning
156 zeros instead. This function adds a check against that dangerous silent behavior.
(...)
161 tensor_name (str
, optional): The name of the tensor to use in the error message.
162 """
--> 163 tf.debugging.assert_less(
164 tensor,
165 tf.cast(embed_dim, dtype=tensor.dtype),
166 message=(
167 f"The maximum value of {tensor_name} ({tf.math.reduce_max(tensor)}) must be smaller than the embedding "
168 f"layer's input dimension ({embed_dim}). The likely cause is some problem at tokenization time."
169 ),
170 )
File ~/work/myenv/lib/python3.11/site-packages/keras/src/layers/core/tf_op_layer.py:119, in KerasOpDispatcher.handle(self, op, args, kwargs)
114 """Handle the specified operation with the specified arguments."""
115 if any(
116 isinstance(x, keras_tensor.KerasTensor)
117 for x in tf.nest.flatten([args, kwargs])
118 ):
--> 119 return TFOpLambda(op)(*args, **kwargs)
120 else:
121 return self.NOT_SUPPORTED
File ~/work/myenv/lib/python3.11/site-packages/keras/src/utils/traceback_utils.py:70, in filter_traceback..error_handler(*args, **kwargs)
67 filtered_tb = _process_traceback_frames(e.traceback)
68 # To get the full stack trace, call:
69 # tf.debugging.disable_traceback_filtering()
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
TypeError: Exception encountered when calling layer 'embeddings' (type TFBertEmbeddings).
Could not build a TypeSpec for name: "tf.debugging.assert_less_2/assert_less/Assert/Assert"
op: "Assert"
input: "tf.debugging.assert_less_2/assert_less/All"
input: "tf.debugging.assert_less_2/assert_less/Assert/Assert/data_0"
input: "tf.debugging.assert_less_2/assert_less/Assert/Assert/data_1"
input: "tf.debugging.assert_less_2/assert_less/Assert/Assert/data_2"
input: "Placeholder"
input: "tf.debugging.assert_less_2/assert_less/Assert/Assert/data_4"
input: "tf.debugging.assert_less_2/assert_less/y"
attr {
key: "summarize"
value {
i: 3
}
}
attr {
key: "T"
value {
list {
type: DT_STRING
type: DT_STRING
type: DT_STRING
type: DT_INT32
type: DT_STRING
type: DT_INT32
}
}
}
of unsupported type <class 'tensorflow.python.framework.ops.Operation'>.
Call arguments received by layer 'embeddings' (type TFBertEmbeddings):
• input_ids=<KerasTensor: shape=(None, 400) dtype=int32 (created by layer 'input_ids')>
• position_ids=None
• token_type_ids=<KerasTensor: shape=(None, 400) dtype=int32 (created by layer 'tf.fill_3')>
• inputs_embeds=None
• past_key_values_length=0
• training=False
I tried everything but not solved the issue
Hi, thanks a lot for releasing Turkish BERT model --your work is just amazing. I'm raising this issue per your statement here. This is especially needed in TF-only environments: Rasa is one such example. in HFTransformersNLP
component added in v1.8, we can now use Rasa with HuggingFace Transformers model. As I read from Rasa source code here, they load weights specifically with TFBertModel
class, which is expectedly unable to load a missing tf_model.h5
in dbmdz/bert-base-turkish-cased
. I tried to update Rasa source code to use AutoTokenizer
and AutoModel
; however, it seems that it might require a breaking change in several core features in Rasa. I think that a TF model might also be useful in other TF-only environments. So, is it possible that you release it? I can willingly and happily collaborate on this if help is needed.
Hello,
Do you have any plan for the Tensorflow version of the bert-base-turkish-uncased?
Hi, I am wondering how the tokenizer or the german model will treat input words with special characters like "ß", "ö", "ä", "ü".
I have some input sentences in Latin-1 where the special characters are normalized like "ß" -> "ss" or "ö" -> "oe". Will training with this data be effective or do I have to convert the special characters back to being "ß", "ö", "ä", "ü" again?
Thanks
Hi DBMDZ team, this is not really an issue but I don't know how else I can get in touch with you.
My team and I are using dbmdz/bert-base-italian-xxl-cased LLM on HuggingFace for a research project and we are now in the phase of writing a paper about our initial results. We would be really glad to cite your work, on which our is based, but I cannot find any reference paper. Can you point it out for me?
By the way, thanks very much for your amazing work.
M.
Hi,
I need access to TensorFlow checkpoints since I want to use the weights into my additional layer in my proposed architecture.
Sincerely yours
There is a question mark character in one of the Universal Dependencies datasets which gets wiped out by the tokenizer for the Italian bert & electra models:
https://github.com/UniversalDependencies/UD_Italian-PoSTWITA
warning: big file
https://raw.githubusercontent.com/UniversalDependencies/UD_Italian-PoSTWITA/master/it_postwita-ud-train.conllu
search for "ewww" in the training file
It looks like this if I copy and paste it:
ewww — in viaggio Roma
according to emacs describe-char, it is character 0xFE4FA
Anyway, hopefully that's enough background to figure out which character is causing the problem. If I run the following sentences through the tokenizer with tokenizer.tokenize(sentence)
I get the following:
ewww 🐈 — in viaggio Roma # another random character
ewww — in viaggio Roma # to test, maybe need to check that this is the weird character, not just a box
ewww — in viaggio Roma
# i printed the word pieces & their IDs
(['e', '##www', '[UNK]', '—', 'in', 'viaggio', 'Roma'], [126, 18224, 101, 986, 139, 2395, 2097])
(['e', '##www', '—', 'in', 'viaggio', 'Roma'], [126, 18224, 986, 139, 2395, 2097])
(['e', '##www', '—', 'in', 'viaggio', 'Roma'], [126, 18224, 986, 139, 2395, 2097])
The missing word causes confusion for me when trying to correlate the Bert embeddings with the words they represent. Can the tokenizer be fixed to treat that character (or any other strange character) as [UNK]
as well?
Hi,
I did read about your german BERT model at hugging faces. I would like to train an RoBERTa model.
Since I also want to give the work back as open source to the community and could reference you:
Is it possible to use your german text corpus? You write:
recent Wikipedia dump, EU Bookshop corpus, Open Subtitles, CommonCrawl, ParaCrawl and News Crawl. This results in a dataset with a size of 16GB and 2,350,234,427 tokens.
Is it possible to write down a bibtex entry to cite your Model in different papers/thesis ?
Even if its not an official paper, there are a few Models on huggingface with a bibtex entry to at least acknowledge their work.
It's okay to cite the the repo url but at least some more information like authors etc. would be nice :)
Hello,
for starters thanks for creating the Italian models.
I'm pretty new to the whole transformers / bert architecture so forgive me if this question is dumb. Anyway, I have a dataset upon which I'm trying to do a classification based on NLP (multi-class). Each row is a sentence of variable length + the target class in a separated attribute.
I started by using:
TFBertModel.from_pretrained('bert-base-multilingual-cased')
Since I wasn't entirely happy, I tried changing the argument from multilingual to the dbmdz italian one, but received this error:
AssertionError: Error retrieving file /root/.cache/torch/transformers/076c0d3e6f9148ae7c8bc48e2818d4ff03ec5bc68115b361cfa8b1795b4c9683.h5
I had also tried using AutoModel
, but that doesn't seem to fit my case as it's not headless.
Any suggestion is gonna be much appreciated.
Hi, we're using the deepset BERT model together with AllenNLP 0.9.0 (i.e. the older Pytorch-transformers library). Is there an easy way to make the dbmdz models (cased & uncased) usable in that environment? A non-easy way would also do. Thanks a lot!
Hi dbmdz team,
it's me again^^
I just saw that there is a Pytorch model for distilbert-base-german-cased in huggingface's repo. After my last test with the bigger model, we, the IKON team at the FU Berlin's HCC lab, would be super excited to use these models in our application. Did you also run this distillation experiment by any chance and have the TF checkpoints laying around?
size mismatch for classifier.weight: copying a param with shape torch.Size([8, 1024]) from checkpoint, the shape in current model is torch.Size([9, 1024]).
Hello,I want to know why your file,"config.json"only has 8 labels for conll2003 datasets,I think it should have 9 labels.
We are using both Italian and French DBMDZ BERT in a multilingual research project, and I would like to acknowledge its authorship. Is there any publication to be cited? Website project? Or at least the authors' names? Many thanks.
Merhaba bitirme projem için bir çalışma yapıyorum. Verilen texti en iyi temsil eden kelimeleri bulmaya çalışıyorum örneğin en iyi 3 kelime.. Bunla ilgili sizin modelinizi kullanmak istiyorum, nasıl ilerleyebilirim yardımcı olur musunuz ? Teşekkürler..
Is the generator counterpart to this model: dbmdz/electra-small-turkish-cased-discriminator available? Thanks!
Hi all, thanks for sharing your models!
I noticed that in the xxl cases, the two Italian models report to have a "vocab_size": 32102 (info taken from the config.json), but the size of the vocab.txt is 31102.
Hi.
How can I train xxl italian model for downstream NER task?
Thank you very much for generating these great Bert models in Italian! I've read that you plan to release the TF checkpoints, which I would greatly appreciate. May I ask when are you planning to release them? Thanks again
Hello,
where can I find the tensorflow checkpoints for the German BERT model?
Thanks in advance!
Hello!
First of all, thanks for this wonderful collection of pretrained models. I wonder what is the domain of the corpora used for pretraining BERTurk, DistilBERTurk, ConvBERTurk, and ElecTRa. I would like to cite these models in a scientific publication and give an idea about the domain knowledge made available to the model during pretraining.
Hi!
First off, thanks for making and sharing these models.
Second, is there a license that can be applied to the pre-trained models themselves?
Thanks,
Josh
Hi,
I have One Question.
For how many epochs was the BERTurk uncased model trained?
Thanks for answer nowly
Hey guys,
great to see another public BERT model for german 👍
Could you say sth. about how you managed to handle the 16GB text data in SentencePiece for vocab creation? I get "bad_alloc" all the time when I'm trying to process all of my data. I am aware of VOC_SIZE, INPUT_SENTENCE_SIZE, SHUFFLE_INPUT_SENTENCE,
etc. but still ... I don't want to use a subset of my data. What's your approach?
Hi.
Could you please release checkpoints for 'bert-base-turkish-128k-uncased' ?
Hi guys.
Could you guys please release TF checkpoints for DistilBERTurk?
I have my own Turkish Albert, though with less than desired performance due to me using only the Wiki dump and some pdfs as original training dataset.
I'd like to use your DistilBERT in my intent&slot prediction TF pipeline to compare the accuracies of these models.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.