p-lambda / swords Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 6.0 132.38 MB

The Stanford Word Substitution (Swords) Benchmark

Dockerfile 1.07% Jupyter Notebook 13.28% Shell 0.57% Python 85.07%

benchmark lexical-substitution nlp swords writing-assistant

swords's People

Contributors

Stargazers

Watchers

Forkers

caterinalacerra allan-pm smousav9 ahhy3on billyzhang24kobe cynic01

swords's Issues

Bug found in the code for BERT-K and BERT-LS

Hi, I've found a bug in the following method that is used for BERT-K and BERT-LS encoding sentences.

**** In "methods/bert.py/_logits_for_dropout_target" ****
with torch.no_grad():
embeddings = self.bert.embeddings(self.list_to_tensor(context_target_enc))

(Omitting Dropout procedure)

logits = self.bert_mlm(inputs_embeds=embeddings)[0][0, target_tok_off]

Here, you use "self.bert.embeddings" to generate input embeddings (i.e. "embeddings"), but this class method returns "token embeddings + token_type_embeddings + positional embedings". However, what self.bert_mlm takes as "inputs_embeds" is only the token embeddings. So I think your code adds "token_type_embeddings" and "positional embeddings" twice to the token embeddings. To fix this, I think the first line needs to be changed to "self.bert.embeddings.word_embeddings(self.list_to_tensor(context_target_enc))".

Best,

Takashi

Is it possible to evaluate a method without Docker?

legacy metric 'Best' are inaccurate

Thanks for your code and contribution. I've conducted some testing on the legacy metrics, and it appears there might be an issue with the legacy metric-'best' score.
When testing with the 'LS07' test data utilizing bert-ls model with eval process, I obtained a 'best' score of 1.12, whereas with the 'ls14' test data, the 'best' score was 1.18. This is significantly different from the previous results. I believe there might be an issue with the metric calculations.

p-lambda / swords Goto Github PK

swords's People

Contributors

Stargazers

Watchers

Forkers

swords's Issues

Bug found in the code for BERT-K and BERT-LS

Is it possible to evaluate a method without Docker?

legacy metric 'Best' are inaccurate

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent