Comments (18)
But I print the model.embeddings.token_type_embeddings it was Embedding(16,768) .
from transformers.
which model are you loading?
from transformers.
which model are you loading?
the pre-trained model chinese_L-12_H-768_A-12
from transformers.
mycode:
bert_config = BertConfig.from_json_file('bert_config.json')
model=BertModel(bert_config)
model.load_state_dict(torch.load('pytorch_model.bin'))
The error:
RuntimeError: Error(s) in loading state_dict for BertModel:
size mismatch for embeddings.token_type_embeddings.weight: copying a param of torch.Size([16, 768]) from checkpoint, where the shape is torch.Size([2, 768]) in current model.
from transformers.
I'm testing the chinese model.
Do you use the config.json
of the chinese_L-12_H-768_A-12 ?
Can you send the content of your config_json
?
from transformers.
I'm testing the chinese model.
Do you use theconfig.json
of the chinese_L-12_H-768_A-12 ?
Can you send the content of yourconfig_json
?
In the 'config.json' of the chinese_L-12_H-768_A-12 ,the type_vocab_size=2.But I change the config.type_vocab_size=16, it still error.
from transformers.
I'm testing the chinese model.
Do you use theconfig.json
of the chinese_L-12_H-768_A-12 ?
Can you send the content of yourconfig_json
?
{
"attention_probs_dropout_prob": 0.1,
"directionality": "bidi",
"hidden_act": "gelu",
"hidden_dropout_prob": 0.1,
"hidden_size": 768,
"initializer_range": 0.02,
"intermediate_size": 3072,
"max_position_embeddings": 512,
"num_attention_heads": 12,
"num_hidden_layers": 12,
"pooler_fc_size": 768,
"pooler_num_attention_heads": 12,
"pooler_num_fc_layers": 3,
"pooler_size_per_head": 128,
"pooler_type": "first_token_transform",
"type_vocab_size": 2,
"vocab_size": 21128
}
I change my code:
bert_config = BertConfig.from_json_file('bert_config.json')
bert_config.type_vocab_size=16
model=BertModel(bert_config)
model.load_state_dict(torch.load('pytorch_model.bin'))
it still error.
from transformers.
I see you have
"type_vocab_size": 2
in your config file, how is that?
Yes,but I change it in my code.
from transformers.
is your
pytorch_model.bin
the good converted model of the chinese one (and not of an English one)?
I think it's good.
from transformers.
Ok, I have the models. I think type_vocab_size
should be 2 also for chinese. I am wondering why it is 16 in your pytorch_model.bin
from transformers.
I have no idea.Did my model make the wrong convert?
from transformers.
I am testing that right now. I haven't played with the multi-lingual models yet.
from transformers.
I am testing that right now. I haven't played with the multi-lingual models yet.
I also use it for the first time.I am looking forward to your test results.
from transformers.
I am testing that right now. I haven't played with the multi-lingual models yet.
When I was converting the model .
Traceback (most recent call last):
File "convert_tf_checkpoint_to_pytorch.py", line 95, in
convert()
File "convert_tf_checkpoint_to_pytorch.py", line 85, in convert
assert pointer.shape == array.shape
AssertionError: (torch.Size([16, 768]), (2, 768))
from transformers.
are you supplying a config file with "type_vocab_size": 2
to the conversion script?
from transformers.
are you supplying a config file with
"type_vocab_size": 2
to the conversion script?
I used the 'bert_config.json' of the chinese_L-12_H-768_A-12 when I was converting .
from transformers.
Ok, I think I found the issue, your BertConfig is not build from the configuration file for some reason and thus use the default value of type_vocab_size
in BertConfig which is 16.
This error happen on my system when I use config = BertConfig('bert_config.json')
instead of config = BertConfig.from_json_file('bert_config.json')
.
I will make sure these two ways of initializing the configuration file (from parameters or from json file) cannot be messed up.
from transformers.
运行时错误:加载 BertModel state_dict时出错:embeddings.token_type_embeddings 的大小不匹配.weight:
复制火炬参数。大小([16, 768]) 从检查点开始,其中形状为火炬。当前模型中的大小([2, 768]
i have the same problem as you. did you solve the problem?
from transformers.
Related Issues (20)
- AttributeError: 'BertTokenizer' object has no attribute 'split_special_tokens' HOT 2
- LLM does not underestand the semantic ordering! HOT 5
- Problems with saving standalone gemma-2b-it after fine-tuning with LoRA on TPU v3-8 HOT 16
- Obtaining logits during model.generate() HOT 4
- GenerationConfig.from_pretrained raise ValueError after training, maybe raise it earlier? HOT 1
- We don't need attention_mask in sdpa implementation? HOT 2
- Input_embeddings grad is None HOT 4
- Type annotation for train_dataset and eval_dataset params of Trainer incompatible with IterableDataset HOT 3
- Don't use a subset in test fetcher if on main branch HOT 1
- Support mixed-language batches in `WhisperGenerationMixin` HOT 1
- CLIP config inconsistency issue HOT 1
- Need to reset the signal handler for ALARM after we call `resolve_trust_remote_code` HOT 3
- load_in_8bit doesn't work when set device_map HOT 2
- ValueError: 'cohere' is already used by a Transformers config, pick another name. HOT 2
- Integration of InstructBLIP with Pipeline in Transformers Library HOT 7
- run_mlm example is missing `block_size` and `torch_dtype` args (present in run_clm) HOT 2
- mamba generation throughput lower than original due to DecodingCGCache HOT 7
- Generating <extra_id_0> Tokens Instead of Expected Text Responses After Fine-tuning mT5 Model HOT 2
- Using 4bit or 8bit gives wrong result for COLA /Text Classification HOT 2
- Grok-1 MoE support HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.