System Info transformers ve

Nice catch <a class="user-mention notranslate" data-hovercard-type="user" data-hoverca

Found the same problem in <div class="

BLOOM embeddings should specify padding_idx about transformers HOT 5 OPEN

PaulLerner commented on May 22, 2024

BLOOM embeddings should specify padding_idx

from transformers.

Comments (5)

PaulLerner commented on May 22, 2024 1

I can open a PR :)
But not sure about setting padding_idx to None, it should be set by config.pad_token_id as done in e.g. LLaMA

transformers/src/transformers/models/llama/modeling_llama.py

Line 920 in 09f9f56

self.padding_idx = config.pad_token_id

(Obviously you decide for BC but why maintaining bugs?)

from transformers.

younesbelkada commented on May 22, 2024

Nice catch @PaulLerner ! Sounds indeed like a bug, would you like to open a PR for the fix ? I think we should add a warning if padding_idx is not None (and init it to None by default on the config) to ensure BC and educate users about the consequences - what do you think?

from transformers.

younesbelkada commented on May 22, 2024

Good point yes ! Sounds good to me ! Let me know when you open the PR 🙏

from transformers.

PaulLerner commented on May 22, 2024

Found the same problem in

transformers/src/transformers/models/openai/modeling_openai.py

Line 406 in 09f9f56

self.tokens_embed = nn.Embedding(config.vocab_size, config.n_embd)

transformers/src/transformers/models/gpt2/modeling_gpt2.py

Line 671 in 09f9f56

self.wte = nn.Embedding(config.vocab_size, self.embed_dim)

and more GPT-based models (it's OK for position embeddings, just showing the output of grep):

src/transformers/models/blip/modeling_blip.py:        self.token_embedding = nn.Embedding(config.vocab_size, embed_dim)
src/transformers/models/clip/modeling_clip.py:        self.token_embedding = nn.Embedding(config.vocab_size, embed_dim)
src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:        self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:        self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim)
src/transformers/models/gptj/modeling_gptj.py:        self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_neo/modeling_gpt_neo.py:        self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_neo/modeling_gpt_neo.py:        self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim)
src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py:        self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
src/transformers/models/gpt_neox/modeling_gpt_neox.py:        self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py:        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.d_model)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py:        self.embed_tokens = nn.Embedding(config.vocab_size, config.d_model)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py:            self.extra_position_embeddings = nn.Embedding(config.max_position_embeddings, config.d_model)

Do you want me to correct them as well? (I only looked at the models I know that matched the following regex there may be more)

grep "\.Embedding(" src/transformers/models/*/*py

from transformers.

PaulLerner commented on May 22, 2024

Hey, so, after some thought, I'm not sure it makes sense to correct this.
It seems like BLOOM was not trained by specifying padding_idx:

In [10]: model.transformer.word_embeddings.weight[model.config.pad_token_id]
# would be only 0 if padding_idx had been specified, see https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
Out[10]: 
tensor([ 0.0028, -0.0032,  0.0008,  ..., -0.0020, -0.0012, -0.0015],
       grad_fn=<SelectBackward0>)

Si fixing it would mean that, when instantiating a new BLOOM model, the padding embedding will be correctly initialized but then it will be overwritten by pre-trained BLOOM anyway.

Also, note that this will actually not affect the loss if the padding tokens are properly masked (btw, the ‎BloomForCausalLM doc stating that it's enough to pass labels = input_ids is a bit underspecified as it actually expects pad labels to be -100)

transformers/src/transformers/models/bloom/modeling_bloom.py

Line 843 in 09f9f56

    
                       `labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

Your choice :)

from transformers.

BLOOM embeddings should specify padding_idx about transformers HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent