Giter Site home page Giter Site logo

Comments (5)

PaulLerner avatar PaulLerner commented on May 22, 2024 1

I can open a PR :)
But not sure about setting padding_idx to None, it should be set by config.pad_token_id as done in e.g. LLaMA

self.padding_idx = config.pad_token_id

(Obviously you decide for BC but why maintaining bugs?)

from transformers.

younesbelkada avatar younesbelkada commented on May 22, 2024

Nice catch @PaulLerner ! Sounds indeed like a bug, would you like to open a PR for the fix ? I think we should add a warning if padding_idx is not None (and init it to None by default on the config) to ensure BC and educate users about the consequences - what do you think?

from transformers.

younesbelkada avatar younesbelkada commented on May 22, 2024

Good point yes ! Sounds good to me ! Let me know when you open the PR 🙏

from transformers.

PaulLerner avatar PaulLerner commented on May 22, 2024

Found the same problem in

self.tokens_embed = nn.Embedding(config.vocab_size, config.n_embd)

self.wte = nn.Embedding(config.vocab_size, self.embed_dim)

and more GPT-based models (it's OK for position embeddings, just showing the output of grep):

src/transformers/models/blip/modeling_blip.py:        self.token_embedding = nn.Embedding(config.vocab_size, embed_dim)
src/transformers/models/clip/modeling_clip.py:        self.token_embedding = nn.Embedding(config.vocab_size, embed_dim)
src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:        self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_bigcode/modeling_gpt_bigcode.py:        self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim)
src/transformers/models/gptj/modeling_gptj.py:        self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_neo/modeling_gpt_neo.py:        self.wte = nn.Embedding(config.vocab_size, self.embed_dim)
src/transformers/models/gpt_neo/modeling_gpt_neo.py:        self.wpe = nn.Embedding(config.max_position_embeddings, self.embed_dim)
src/transformers/models/gpt_neox_japanese/modeling_gpt_neox_japanese.py:        self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
src/transformers/models/gpt_neox/modeling_gpt_neox.py:        self.embed_in = nn.Embedding(config.vocab_size, config.hidden_size)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py:        self.position_embeddings = nn.Embedding(config.max_position_embeddings, config.d_model)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py:        self.embed_tokens = nn.Embedding(config.vocab_size, config.d_model)
src/transformers/models/gptsan_japanese/modeling_gptsan_japanese.py:            self.extra_position_embeddings = nn.Embedding(config.max_position_embeddings, config.d_model)

Do you want me to correct them as well? (I only looked at the models I know that matched the following regex there may be more)

grep "\.Embedding(" src/transformers/models/*/*py

from transformers.

PaulLerner avatar PaulLerner commented on May 22, 2024

Hey, so, after some thought, I'm not sure it makes sense to correct this.
It seems like BLOOM was not trained by specifying padding_idx:

In [10]: model.transformer.word_embeddings.weight[model.config.pad_token_id]
# would be only 0 if padding_idx had been specified, see https://pytorch.org/docs/stable/generated/torch.nn.Embedding.html
Out[10]: 
tensor([ 0.0028, -0.0032,  0.0008,  ..., -0.0020, -0.0012, -0.0015],
       grad_fn=<SelectBackward0>)

Si fixing it would mean that, when instantiating a new BLOOM model, the padding embedding will be correctly initialized but then it will be overwritten by pre-trained BLOOM anyway.

Also, note that this will actually not affect the loss if the padding tokens are properly masked (btw, the ‎BloomForCausalLM doc stating that it's enough to pass labels = input_ids is a bit underspecified as it actually expects pad labels to be -100)

`labels = input_ids` Indices are selected in `[-100, 0, ..., config.vocab_size]` All labels set to `-100`

Your choice :)

from transformers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.