Comments (19)
Guys, are [CLS] and [SEP] tokens mandatory for this example?
from transformers.
I noticed that the probability for longer sentences, regardless of how much they are related to the same subject, is higher than the shorter ones. For example, I added some random sentences to the end of the first or second part and observed significant increase in the first logit value. Is it a way to regularize the model for the next sentence prediction?
from transformers.
@parth126 have you seen #1788 and is it related to your issue?
Yes it was the same issue. And the solution worked like a charm.
Many thanks @LysandreJik
from transformers.
I think it should work. You should get a [1, 2] tensor of logits where predictions[0, 0]
is the score of Next sentence being True
and predictions[0, 1]
is the score of Next sentence being False
. So just take the max of the two (or use a SoftMax
to get probabilities).
Did you try it?
The model behaves better on longer sentences of course (it's mainly trained on 512 tokens inputs).
from transformers.
This is not super clear, even wrong in the examples, but there is this note in the docstring for BertModel
:
`pooled_output`: a torch.FloatTensor of size [batch_size, hidden_size] which is the output of a
classifier pretrained on top of the hidden state associated to the first character of the
input (`CLF`) to train on the Next-Sentence task (see BERT's paper).
That seems to suggest pretty strongly that you have to put in the CLF
token.
from transformers.
@pbabvey I am observing the same thing.
are the probabilities length normalized?
from transformers.
Those are the logits, because you did not pass the
next_sentence_label
.
My understanding is that you could apply a softmax and get the probability for the sequence to be a possible sequence.
Sentence 1: How old are you?
Sentence 2: The Eiffel Tower is in Paris
tensor([[-2.3808, 5.4018]], grad_fn=<AddmmBackward>)
Sentence 1: How old are you?
Sentence 2: I am 193 years old
tensor([[ 6.0164, -5.7138]], grad_fn=<AddmmBackward>)
For the first example the probability that the second sentence is a probable continuation is very low.
For the second example the probability is very high (I am looking at the first logit)im getting different scores for the sentences that you have tried . please advise why i'm getting it below is my code .
import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM,BertForNextSentencePrediction
tokenizer=BertTokenizer.from_pretrained('bert-base-uncased')
BertNSP=BertForNextSentencePrediction.from_pretrained('bert-base-uncased')text1 = "How old are you?"
text2 = "The Eiffel Tower is in Paris"text1_toks = ["[CLS]"] + tokenizer.tokenize(text1) + ["[SEP]"]
text2_toks = tokenizer.tokenize(text2) + ["[SEP]"]
text=text1_toks+text2_toks
print(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(text1_toks + text2_toks)
segments_ids = [0]*len(text1_toks) + [1]*len(text2_toks)tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print(indexed_tokens)
print(segments_ids)
BertNSP.eval()
prediction = BertNSP(tokens_tensor, segments_tensors)
prediction=prediction[0] # tuple to tensor
print(predictions)
softmax = torch.nn.Softmax(dim=1)
prediction_sm = softmax(prediction)
print (prediction_sm)o/p of predictions
tensor([[ 2.1772, -0.8097]], grad_fn=)o/p of prediction_sm
tensor([[0.9923, 0.0077]], grad_fn=)why is the score still high 0.9923 even after apply softmax ?
I am facing the same issue. No matter what sentences I use, I always get very high probability of the second sentence being related to the first.
from transformers.
Closing that for now, feel free to reopen if there is another issue.
from transformers.
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM,BertForNextSentencePrediction
# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
# Tokenized input
text = "[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]"
tokenized_text = tokenizer.tokenize(text)
# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
# Define sentence A and B indices associated to 1st and 2nd sentences (see paper)
segments_ids = [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1]
# Convert inputs to PyTorch tensors
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
# Load pre-trained model (weights)
model = BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
model.eval()
# Predict is Next Sentence ?
predictions = model(tokens_tensor, segments_tensors )
print(predictions)
tensor([[ 6.3714, -6.3910]], grad_fn=<AddmmBackward>)
How do i infer this as true or false
from transformers.
Those are the logits, because you did not pass the next_sentence_label
.
My understanding is that you could apply a softmax and get the probability for the sequence to be a possible sequence.
Sentence 1: How old are you?
Sentence 2: The Eiffel Tower is in Paris
tensor([[-2.3808, 5.4018]], grad_fn=<AddmmBackward>)
Sentence 1: How old are you?
Sentence 2: I am 193 years old
tensor([[ 6.0164, -5.7138]], grad_fn=<AddmmBackward>)
For the first example the probability that the second sentence is a probable continuation is very low.
For the second example the probability is very high (I am looking at the first logit)
from transformers.
predictions = model(tokens_tensor, segments_tensors )
I try the code more than once,why I have the different result?
sometime predictions[0, 0] is higher ,however, the same sentence pair,predictions[0, 0] is lower.
from transformers.
Maybe your model is not in evaluation mode (model.eval()
)?
You need to do this to desactivate the dropout modules.
from transformers.
It is OK.THANKS A LOT.
from transformers.
error: --> 197 embeddings = words_embeddings + position_embeddings + token_type_embeddings 198 embeddings = self.LayerNorm(embeddings) 199 embeddings = self.dropout(embeddings) The size of tensor a (21) must match the size of tensor b (14) at non-singleton dimension 1
The above issues get resolved, when I added few extra 1's and 0's to make the shape similar tokens_tensor and segments_tensors. Just wondering am I using in a right way.
My predictions output is a tensor array of size 21 X 30522 .
And what I believe the example is to predict the word which is [MASK] . Can you also please guide how to predict the next sentence?
from transformers.
Maybe your model is not in evaluation mode (
model.eval()
)?
You need to do this to desactivate the dropout modules.
@thomwolf Actually even when I used model.eval() I still got different results. I observed this when I use every model of the package (BertModel, BertForNextSentencePrediction etc). Only when I fixed the length of the input (e.g. to 128), I can get the same results. In this way I have to pad 0 to indexed_tokens so it has a fixed length.
Could you explain why is like this, or did I make any mistake?
Thank you so much!
from transformers.
Maybe your model is not in evaluation mode (
model.eval()
)?
You need to do this to desactivate the dropout modules.@thomwolf Actually even when I used model.eval() I still got different results. I observed this when I use every model of the package (BertModel, BertForNextSentencePrediction etc). Only when I fixed the length of the input (e.g. to 128), I can get the same results. In this way I have to pad 0 to indexed_tokens so it has a fixed length.
Could you explain why is like this, or did I make any mistake?
Thank you so much!
Make sure
- input_ids, input_mask, segment_ids have same length
- vocabulary file for tokenizer is from the same config dir as your bert_config.json
I had symilar symptoms when vocab and config was from diferent berts
from transformers.
Those are the logits, because you did not pass the
next_sentence_label
.My understanding is that you could apply a softmax and get the probability for the sequence to be a possible sequence.
Sentence 1: How old are you?
Sentence 2: The Eiffel Tower is in Paris
tensor([[-2.3808, 5.4018]], grad_fn=<AddmmBackward>)
Sentence 1: How old are you?
Sentence 2: I am 193 years old
tensor([[ 6.0164, -5.7138]], grad_fn=<AddmmBackward>)
For the first example the probability that the second sentence is a probable continuation is very low.
For the second example the probability is very high (I am looking at the first logit)
im getting different scores for the sentences that you have tried . please advise why i'm getting it below is my code .
import torch
from transformers import BertTokenizer, BertModel, BertForMaskedLM,BertForNextSentencePrediction
tokenizer=BertTokenizer.from_pretrained('bert-base-uncased')
BertNSP=BertForNextSentencePrediction.from_pretrained('bert-base-uncased')
text1 = "How old are you?"
text2 = "The Eiffel Tower is in Paris"
text1_toks = ["[CLS]"] + tokenizer.tokenize(text1) + ["[SEP]"]
text2_toks = tokenizer.tokenize(text2) + ["[SEP]"]
text=text1_toks+text2_toks
print(text)
indexed_tokens = tokenizer.convert_tokens_to_ids(text1_toks + text2_toks)
segments_ids = [0]*len(text1_toks) + [1]*len(text2_toks)
tokens_tensor = torch.tensor([indexed_tokens])
segments_tensors = torch.tensor([segments_ids])
print(indexed_tokens)
print(segments_ids)
BertNSP.eval()
prediction = BertNSP(tokens_tensor, segments_tensors)
prediction=prediction[0] # tuple to tensor
print(predictions)
softmax = torch.nn.Softmax(dim=1)
prediction_sm = softmax(prediction)
print (prediction_sm)
o/p of predictions
tensor([[ 2.1772, -0.8097]], grad_fn=)
o/p of prediction_sm
tensor([[0.9923, 0.0077]], grad_fn=)
why is the score still high 0.9923 even after apply softmax ?
from transformers.
@parth126 have you seen #1788 and is it related to your issue?
from transformers.
@LysandreJik thanks for the information
from transformers.
Related Issues (20)
- What should I do if I want to get a past dev version like v4.9.0.dev0 HOT 4
- Why is DeformableDetrForObjectDetection slower with bfloat16 than float32? HOT 6
- RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn HOT 5
- listing train_dataloader sampler throws out of memory error HOT 1
- Observing weird downwards jump in loss after checkpoint reloading in a DDP setting
- GPTNeoX with use_cache=False uses significantly more memory than use_cache=True
- Removing model layers throws an index error. HOT 1
- cannot load model back due to [does not appear to have a file named config.json] HOT 5
- Inference bug of the MoE GPTQ models HOT 1
- Crash when trying to import pipeline when using TPUv2 in GoogleColab HOT 2
- Question about iterable inputs for Pipeline HOT 1
- [Question]: Can I obtain the probability from generated text? HOT 2
- [BUG] DataCollatorForSeq2Seq with PaddingStrategy.MAX_LENGTH may not pad labels HOT 3
- KeyError: 'shortest_edge' when loading Kosmos-2 model from local files HOT 1
- Llama Attention Call should not pass **kwargs HOT 1
- CLIPTokenizer not in ClipModel HOT 2
- Support align_corners=True in image_transforms module
- loss.backward() producing nan values with 8-bit Llama-3-70B-Instruct HOT 2
- Multiple validation datasets unsupported with `dataloader_persistent_workers=True` HOT 1
- Use models as Seq2Seq model
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transformers.