Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Text NER baseline issues,about asappresearch/slue-toolkit

ankitapasad commented on July 18, 2024

Hi Sid,

That's my bad, I broke it in one of the previous PRs. I have created a new PR #17 and have tested that the text NER training and evaluation works fine. Kindly refer to the PR comments for more details.

We appreciate the updates! Thank you for your patience.

-Ankita

from slue-toolkit.

siddalmia commented on July 18, 2024

Hi @ankitapasad ,

Thanks I can verify now that the code runs without any issues. However there is something strange happening, which I am not sure of -

I start with training the model and I get the final value evaluation for the model to be -

bash baselines/ner/nlp_scripts/ft-deberta.sh deberta-base combined

  eval_overall_accuracy   =      0.987
  eval_overall_f1         =     0.8642
  eval_overall_precision  =     0.8379
  eval_overall_recall     =     0.8922
  eval_runtime            = 0:00:05.90
  eval_samples            =       1753
  eval_samples_per_second =    296.799
  eval_steps_per_second   =      4.741

However, when I run the evaluation script I seem to be getting a really low F1 score.

bash baselines/ner/nlp_scripts/eval-deberta.sh deberta-base dev combined
[micro-averaged F1] Precision: 0.00, recall: 0.01, fscore = 0.00
[micro-averaged F1] Precision: 0.02, recall: 0.49, fscore = 0.04

Is it possible that it's not loading the correct model or something?

Thanks
Sid

from slue-toolkit.

sshon-asapp commented on July 18, 2024

I think this might be label conversion problem and this line might cause this issue.
train_label="combined" in your case.
@ankitapasad can you check this?

slue-toolkit/slue_toolkit/text_ner/ner_deberta.py

Line 30 in f53bf87

train_label="raw",

from slue-toolkit.

ankitapasad commented on July 18, 2024

Hi @siddalmia @sshon-asapp

Yes, that's right Suwon, the value of train_label must be changed to the appropriate option. Additionally, the evaluation code wasn't handling this combination correctly (training and evaluating on combined label tag):

slue-toolkit/slue_toolkit/text_ner/ner_deberta_modules.py

Line 451 in f53bf87

if "combined" in self.eval_label:

I added PR #18 to resolve this.

-Ankita

from slue-toolkit.

siddalmia commented on July 18, 2024

Thanks @ankitapasad ! It's working now.

 [0] → bash baselines/ner/nlp_scripts/eval-deberta.sh deberta-base dev combined combined
[micro-averaged F1] Precision: 0.85, recall: 0.89, fscore = 0.87
[micro-averaged F1] Precision: 0.89, recall: 0.92, fscore = 0.90

Just a minor suggestion. Maybe you can add a label standard or label next to the two scores?

Thanks
Sid

from slue-toolkit.

sshon-asapp commented on July 18, 2024

@siddalmia Thanks for suggestion. Updated!

from slue-toolkit.

Text NER baseline issues about slue-toolkit HOT 6 CLOSED

Comments (6)

Related Issues (19)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent