Comments (5)
I can open a PR :)
But not sure about setting padding_idx
to None, it should be set by config.pad_token_id
as done in e.g. LLaMA
(Obviously you decide for BC but why maintaining bugs?)
from transformers.
Nice catch @PaulLerner ! Sounds indeed like a bug, would you like to open a PR for the fix ? I think we should add a warning if padding_idx
is not None (and init it to None by default on the config) to ensure BC and educate users about the consequences - what do you think?
from transformers.
Good point yes ! Sounds good to me ! Let me know when you open the PR 🙏
from transformers.
Found the same problem in
and more GPT-based models (it's OK for position embeddings, just showing the output of grep):
src/transformers/models/blip/modeling_blip.py: self.token_embedding = nn.Embedding(config.vocab_size, embed_dim)
src/transformers/models/clip/modeling_clip.py: self.token_embedding = nn.Embedding(config.vocab_size, embed_dim)
src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py: self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim)
src/transformers/models/gptj/modeling_gptj.py: self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_neo/modeling_gpt_neo.py: self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_neo/modeling_gpt_neo.py: self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim)
src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py: self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
src/transformers/models/gpt_neox/modeling_gpt_neox.py: self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py: self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.d_model)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py: self.embed_tokens = nn.Embedding(config.vocab_size, config.d_model)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py: self.extra_position_embeddings = nn.Embedding(config.max_position_embeddings, config.d_model)
Do you want me to correct them as well? (I only looked at the models I know that matched the following regex there may be more)
grep "\.Embedding(" src/transformers/models/*/*py
from transformers.
Hey, so, after some thought, I'm not sure it makes sense to correct this.
It seems like BLOOM was not trained by specifying padding_idx
:
In [10]: model.transformer.word_embeddings.weight[model.config.pad_token_id]
# would be only 0 if padding_idx had been specified, see https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
Out[10]:
tensor([ 0.0028, -0.0032, 0.0008, ..., -0.0020, -0.0012, -0.0015],
grad_fn=<SelectBackward0>)
Si fixing it would mean that, when instantiating a new BLOOM model, the padding embedding will be correctly initialized but then it will be overwritten by pre-trained BLOOM anyway.
Also, note that this will actually not affect the loss if the padding tokens are properly masked (btw, the BloomForCausalLM doc stating that it's enough to pass labels = input_ids
is a bit underspecified as it actually expects pad labels to be -100
)
Your choice :)
from transformers.
Related Issues (20)
- [Phi-3-mini-128k-instruct] Difference in slow and fast tokenization after adding new tokens HOT 2
- Cannot copy out of meta tensor; no data! for SwinV2ForImageClassification HOT 3
- Mismatched tensor size error when generating text with beam_search on mps
- Question about quantized model with zero3 HOT 1
- [i18n-<languageCode>] Translating docs to <languageName> HOT 1
- default max value of max_new_token HOT 5
- LLM inference with static kv-cache example gives different generations depending on the batch examples HOT 7
- Add Wav2Vec2BertProcessorWithLM HOT 1
- Issue related to dtype with F.conv1d in Whisper evaluation HOT 2
- Bug with train class method for MobileViTForSemanticSegmentation HOT 2
- Starcoder2 has +10% inference latency when flash attention 2 is enabled HOT 2
- CLIP Training Example Bug - Overfitting HOT 1
- `PreTrainedTokenizerFast._batch_encode_plus()` got an unexpected keyword argument `'split_special_tokens'` HOT 3
- Cannot save HQQ quantized model. HOT 6
- LLama-3 8B - can't match MMLU performance HOT 2
- Pure Python `PreTrainedTokenizer` is Broken HOT 1
- Construct a Marian tokenizer. Based on huggingface tokenizers HOT 2
- DDP error with load_best_model_at_end enabled
- Error while moving model to GPU `NotImplementedError: Cannot copy out of meta tensor; no data!` HOT 4
- KV cache with CPU offloading HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.