Giter Site home page Giter Site logo

level2_klue-nlp-07's Introduction

KLUE - RE(Relation Extraction)

๐Ÿ‘‹ ํŒ€์› ์†Œ๊ฐœ


๊ถŒ์ง€์€

๊น€์žฌ์—ฐ

๋ฐ•์˜์ค€

์ •๋‹คํ˜œ

์ตœ์œค์ง„
:octocat: GitHub :octocat: GitHub :octocat: GitHub :octocat: GitHub :octocat: GitHub

๐Ÿ“Œ ๋‹ด๋‹น ์—ญํ• 

  • ๊ถŒ์ง€์€ - ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ, ๋ชจ๋ธ ์„ ์ •์„ ์œ„ํ•œ ๋น„๊ต ์‹คํ—˜, TAPT, studio-ousia/mluke-large ๋ชจ๋ธ ํŠœ๋‹
  • ๊น€์žฌ์—ฐ - ๋ฒ ์ด์Šค๋ผ์ธ ๋ฆฌํŒฉํ† ๋ง, entity embedding layer ๋ฐ LSTM classifier๋ฅผ ํ™œ์šฉํ•œ Custom ๋ชจ๋ธ ์ œ์ž‘, CoRE ๋…ผ๋ฌธ ๊ตฌํ˜„
  • ๋ฐ•์˜์ค€ - ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ, entity marker ์ถ”๊ฐ€, ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•, ๋ชจ๋ธํ•™์Šต
  • ์ •๋‹คํ˜œ - EDA, ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ, Subject entity type ๋ฐ˜์˜ ์‹คํ—˜, Ensemble ์‹คํ—˜ ๋ฐ ๊ฒฐ๊ณผ ๋ถ„์„
  • ์ตœ์œค์ง„ - ๋ฐ์ดํ„ฐ ๋ถ„์„, ๋…ผ๋ฌธ ์กฐ์‚ฌ, ๋ชจ๋ธ ํ•™์Šต, ๋ฒ ์ด์Šค๋ผ์ธ ๊ฐœ์„ , ํ˜‘์—… ํ™˜๊ฒฝ ์„ธํŒ…

๐Ÿ“ƒ Task ๊ฐœ์š”

๋ฌธ์žฅ ์†์—์„œ ๋‹จ์–ด ๊ฐ„์— ๊ด€๊ณ„์„ฑ์„ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์€ ์˜๋ฏธ๋‚˜ ์˜๋„๋ฅผ ํ•ด์„ํ•จ์— ์žˆ์–ด์„œ ๋งŽ์€ ๋„์›€์„ ์ค€๋‹ค.

Untitled

๊ทธ๋ฆผ์˜ ์˜ˆ์‹œ์™€ ๊ฐ™์ด ์š”์•ฝ๋œ ์ •๋ณด๋ฅผ ์‚ฌ์šฉํ•ด QA ์‹œ์Šคํ…œ ๊ตฌ์ถ•๊ณผ ํ™œ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์ด์™ธ์—๋„ ์š”์•ฝ๋œ ์–ธ์–ด ์ •๋ณด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํšจ์œจ์ ์ธ ์‹œ์Šคํ…œ ๋ฐ ์„œ๋น„์Šค ๊ตฌ์„ฑ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

๊ด€๊ณ„ ์ถ”์ถœ(Relation Extraction) ์€ ๋ฌธ์žฅ์˜ ๋‹จ์–ด(Entity)์— ๋Œ€ํ•œ ์†์„ฑ๊ณผ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฌธ์ œ๋‹ค. ๊ด€๊ณ„ ์ถ”์ถœ์€ ์ง€์‹ ๊ทธ๋ž˜ํ”„ ๊ตฌ์ถ•์„ ์œ„ํ•œ ํ•ต์‹ฌ ๊ตฌ์„ฑ ์š”์†Œ๋กœ, ๊ตฌ์กฐํ™”๋œ ๊ฒ€์ƒ‰, ๊ฐ์ • ๋ถ„์„, ์งˆ๋ฌธ ๋‹ต๋ณ€ํ•˜๊ธฐ, ์š”์•ฝ๊ณผ ๊ฐ™์€ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ์‘์šฉ ํ”„๋กœ๊ทธ๋žจ์—์„œ ์ค‘์š”ํ•˜๋‹ค. ๋น„๊ตฌ์กฐ์ ์ธ ์ž์—ฐ์–ด ๋ฌธ์žฅ์—์„œ ๊ตฌ์กฐ์ ์ธ triple์„ ์ถ”์ถœํ•ด ์ •๋ณด๋ฅผ ์š”์•ฝํ•˜๊ณ , ์ค‘์š”ํ•œ ์„ฑ๋ถ„์„ ํ•ต์‹ฌ์ ์œผ๋กœ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋‹ค.

๋ฌธ์žฅ, ๋‹จ์–ด์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ํ†ตํ•ด, ๋ฌธ์žฅ ์†์—์„œ ๋‹จ์–ด ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๋ชจ๋ธ์„ ํ•™์Šตํ•ด์•ผ ํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ธ๊ณต์ง€๋Šฅ ๋ชจ๋ธ์ด ๋‹จ์–ด๋“ค์˜ ์†์„ฑ๊ณผ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๋ฉฐ ๊ฐœ๋…์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋‹ค.


๐Ÿ“Š EDA

  • Labels 1

  • ๋ ˆ์ด๋ธ” ๋ณ„ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜ ํžˆ์Šคํ† ๊ทธ๋žจ์„ ๊ทธ๋ ค๋ณด์•˜์„ ๋•Œ no relation ๋ ˆ์ด๋ธ”์ด ๊ฐ€์žฅ ๋งŽ๊ณ , ๋ ˆ์ด๋ธ”๋ณ„ ๋ถˆ๊ท ํ˜•์ด ์‹ฌํ•˜๋‹ค๋Š” ์ ์„ ํ™•์ธํ–ˆ๋‹ค. ๋”ฐ๋ผ์„œ validation ๊ณผ k-fold ๋ฅผ ๊ตฌํ˜„ํ•  ๋•Œ train ๊ณผ validation ์˜ ํ•™์Šต ๋ฐ์ดํ„ฐ ์…‹์˜ ๋ผ๋ฒจ ๋ถ„ํฌ๊ฐ€ ๊ฐ™๋„๋ก Stratified ๋ฐฉ๋ฒ•์„ ์ด์šฉํ–ˆ๋‹ค. Untitled

  • Subject entity type์€ ๋น„์Šทํ•œ ๋น„์œจ์„ ์ฐจ์ง€ํ•˜๋Š” ๋ฐ˜๋ฉด, Object entity type์—์„œ ๋ถˆ๊ท ํ˜•์„ ํ™•์ธํ–ˆ๋‹ค.

image
  • ํ† ํฐํ™” ํ›„ ๋ฌธ์žฅ ๊ธธ์ด๊ฐ€ 30~50์— ๋ชฐ๋ ค์žˆ๋‹ค๋Š” ๊ฑธ ํ™•์ธํ–ˆ๋‹ค.
  • test, train ๋ฌธ์žฅ ๊ธธ์ด ๋น„์œจ์ด ๋น„์Šทํ•จ์„ ํ™•์ธํ–ˆ๋‹ค.

๐Ÿ“š Preprocess

ใ€ˆใ€‰, ใ€Šใ€‹๋ฅผ '๋กœ ํ†ต์ผ

seed ๊ธฐ์กด f1 ํ†ต์ผ f1
5 84.401 84.634
11 84.543 84.523
42 84.567 84.305
ํ‰๊ท  84.504 84.487

์ด์ƒ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ

  • EDA๊ณผ์ •์—์„œ ์ด์ƒ ๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค.
  • ์ด์ƒ ๋ฐ์ดํ„ฐ์˜ ๊ธฐ์ค€์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
    • Subject entity์˜ type๊ณผ label์˜ ์ฒซ๋ฒˆ์งธ๊ฐ€ ๋‹ค๋ฅธ ๊ฒฝ์šฐ
    • Subject entity์˜ type์— ์กฐ์ง(ORG)์ด ์‚ฌ๋žŒ(PER)์œผ๋กœ ํ…Œ๊น…๋œ ๊ฒฝ์šฐ์™€ ๊ทธ ๋ฐ˜๋Œ€ ๊ฒฝ์šฐ

Entity marker ์ถ”๊ฐ€

image

์ถœ์ฒ˜:An Improved Baseline for Sentence-level Relation Extraction

์œ„ ๋…ผ๋ฌธ์— ์†Œ๊ฐœ๋œ ๊ธฐ๋ฒ•์„ ์ฐจ์šฉํ•˜์—ฌ ๋ชจ๋ธ์—๊ฒŒ ์—”ํ‹ฐํ‹ฐ์˜ ์œ„์น˜๋ฅผ ์•Œ๋ ค์ค„ ์ˆ˜ ์žˆ๋Š” ๋ฐฉ์•ˆ์„ ๋ชจ์ƒ‰ํ•˜์˜€๋‹ค.

  • Entity marker ์˜ˆ์‹œ
    • ๊ธฐ์กด : ๋น„ํ‹€์ฆˆ [SEP] ์กฐ์ง€ ํ•ด๋ฆฌ์Šจ [SEP] ใ€ˆSomethingใ€‰๋Š” ์กฐ์ง€ ํ•ด๋ฆฌ์Šจ์ด ์“ฐ๊ณ  ๋น„ํ‹€์ฆˆ๊ฐ€ 1969๋…„ ์•จ๋ฒ” ใ€ŠAbbey Roadใ€‹์— ๋‹ด์€ ๋…ธ๋ž˜๋‹ค.
    • Method1: ใ€ˆSomethingใ€‰๋Š” #^PER^์กฐ์ง€ ํ•ด๋ฆฌ์Šจ#์ด ์“ฐ๊ณ  @ORG๋น„ํ‹€์ฆˆ@๊ฐ€ 1969๋…„ ์•จ๋ฒ” ใ€ŠAbbey Roadใ€‹์— ๋‹ด์€ ๋…ธ๋ž˜๋‹ค.
    • Method2: ใ€ˆSomethingใ€‰๋Š” #^person^์กฐ์ง€ ํ•ด๋ฆฌ์Šจ#์ด ์“ฐ๊ณ  @organization๋น„ํ‹€์ฆˆ@๊ฐ€ 1969๋…„ ์•จ๋ฒ” ใ€ŠAbbey Roadใ€‹์— ๋‹ด์€ ๋…ธ๋ž˜๋‹ค.
Method micro_f1 auprc
๊ธฐ์กด 66.2409 68.2921
Method 1 68.2769 69.7847
Method 2 68.6772 72.1613

๋ฐ์ดํ„ฐ ์ฆ๊ฐ•

EDA๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ์˜ ๋ถˆ๊ท ํ˜•์ด ์‹ฌํ•˜๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ๋‹ค. ๋˜ํ•œ ์•„๋ž˜ ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด validation dataset์˜ ๊ฒ€์ฆ ๊ฒฐ๊ณผ๋ฅผ ๋ดค์„ ๋•Œ ๊ฐ€์žฅ ๋งŽ์•˜๋˜ no relation์€ prob๊ฐ’์— ๋งŽ์ด ๋“ฑ์žฅํ•œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์—ˆ๊ณ  ๋ฐ˜๋Œ€๋กœ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฑฐ์˜ ์—†์—ˆ๋˜ per: siblings์€ prob๊ฐ’์— ํ•ญ์ƒ 0์— ๊ฐ€๊น๊ฒŒ ๋‚˜ํƒ€๋‚œ ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ถˆ๊ท ํ˜•๊ณผ ํŽธํ–ฅ์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์ง„ํ–‰ํ–ˆ๋‹ค. image

ํ•œ๊ตญ์–ด ์ƒํ˜ธ์ฐธ์กฐํ•ด๊ฒฐ์„ ์œ„ํ•œ BERT ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๊ธฐ๋ฒ•์„ ์ฐธ๊ณ ํ•˜์—ฌ MLM์„ ์ด์šฉํ•ด ๋ฐ์ดํ„ฐ์˜ ๋ฌธ๋งฅ์— ์˜ํ–ฅ์„ ๋ผ์น˜์ง€ ์•Š๋Š” ๋‹จ์–ด๋กœ ์น˜ํ™˜ํ•˜์—ฌ ์ฆ๊ฐ•ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ๋ ˆ๋ฒจ ์˜ˆ์ธก์„ ์ž˜ ๋ชปํ•˜๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ฆ๊ฐ•์‹œํ‚ค๋ฉด ๋ถˆ๊ท ํ˜•๊ณผ ํŽธํ–ฅ์ด ํ•ด๊ฒฐ๋  ๊ฒƒ์ด๋‹ค. ๊ธฐ์กด ๊ฒฐ๊ณผ์˜ accuracy๊ฐ€ 0.8 ์ดํ•˜์ธ ๋ผ๋ฒจ๋“ค์— ๋Œ€ํ•ด์„œ ์ฆ๊ฐ•์„ ์ง„ํ–‰ํ•˜์˜€๊ณ , ์—”ํ‹ฐํ‹ฐ๋ฅผ ์ œ์™ธํ•œ ๋‚˜๋จธ์ง€ ํ† ํฐ๋“ค์„ 10ํผ์„ผํŠธ ํ™•๋ฅ ๋กœ maskํ† ํฐ์œผ๋กœ ๋ฐ”๊พผ ํ›„ MLM์„ ์ด์šฉํ•˜์—ฌ ์ฑ„์›Œ ๋„ฃ๋Š” ๋ฐฉ์‹์œผ๋กœ ์ฆ๊ฐ• ํ›„ ์„ฑ๋Šฅ์„ ํ™•์ธํ•˜์˜€๋‹ค.

  • ์‹คํ—˜ ๊ฒฐ๊ณผ
Method micro_f1 auprc
๊ธฐ์กด 72.1022 76.6324
์ฆ๊ฐ• ํ›„ 71.8970 76.5956

๐Ÿ—„๏ธ Model

  • Model test 1: ํ•™์Šต๋ฅ  3e-5๋กœ 7 ์—ํญ์”ฉ ํŒŒ์ธํŠœ๋‹ํ•œ ํ›„ micro f1 ์ ์ˆ˜์ด๋‹ค. mixed precision์„ ์‚ฌ์šฉํ–ˆ๋‹ค.
๋ชจ๋ธ ๋ฐฐ์น˜ ํฌ๊ธฐ 16 ๋ฐฐ์น˜ ํฌ๊ธฐ 32
studio-ousia/mluke-large 84.629 83.563
xlm-roberta-large 84.008 84.167
kykim/bert-kor-base 83.205 83.405
klue/bert-base 83.119 83.059
sentence-transformers/xlm-r-large-en-ko-nli-ststb 82.814 82.874
kykim/funnel-kor-base 82.556 82.781
snunlp/KR-ELECTRA-discriminator 82.483 81.832
bert-base-multilingual-uncased 81.167 81.319
beomi/KcELECTRA-base-v2022 81.145 79.932
skt/kogpt2-base-v2 78.979 79.637
  • Model test 2: ํ•™์Šต๋ฅ  5e-5๋กœ 7 ์—ํญ์”ฉ ํŒŒ์ธํŠœ๋‹ํ•œ ํ›„ micro f1 ์ ์ˆ˜์ด๋‹ค.
๋ชจ๋ธ ๋ฐฐ์น˜ ํฌ๊ธฐ 16 ๋ฐฐ์น˜ ํฌ๊ธฐ 32
klue/roberta-large 84.956 84.925
monologg/koelectra-base-v3-discriminator 82.953 81.650
monologg/kobert 57.763 57.249

Entity Embedding Layer ์ถ”๊ฐ€

  1. entity_loc_ids๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ์—”ํ‹ฐํ‹ฐ๋“ค์˜ ์œ„์น˜ ์ถ”๊ฐ€

    ํ† ํฐํ™” ๋‹จ๊ณ„์—์„œ ๋ฌธ์žฅ์„ ํ† ํฐํ™”ํ•œ ๋‹ค์Œ์—, ์—”ํ‹ฐํ‹ฐ์˜ ํ† ํฐ๋“ค์˜ ์œ„์น˜๋ฅผ ์ฐพ์•„ subject ํ† ํฐ๋“ค์€ 1๋กœ, object ํ† ํฐ๋“ค์€ 2๋กœ ํ‘œ์‹œํ–ˆ๋‹ค.

    ์˜ˆ์‹œ)

    {
        'input_ids': tensor([[    0, 24380, 12242, 12951,  2386,  2189,     2, 11214,     2, 24380,
             12242, 12951,  2386,  2189,  2259, 11214,  5993,  1761,  2194,  4443,
              2079, 19230,  2628, 27135,  4713,  2138,  3670,  2205,  2507,  2062,
                 2]]), 
        'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
             0, 0, 0, 0, 0, 0, 0]]), 
        'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
             1, 1, 1, 1, 1, 1, 1]]),
        'entity_loc_ids': tensor([[0, 1, 1, 1, 1, 1, 0, 2, 0, 1, 1, 1, 1, 1, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])
    }
    
  2. Custom embedding layer ์ถ”๊ฐ€๋ฅผ ํ™œ์šฉํ•œ Custom model ์ œ์ž‘

    RobertaForSequenceClassification ๋ชจ๋ธ์„ Roberta model๊ณผ Classifier๋กœ ๋ถ„๋ฆฌํ•œ ํ›„,

    Roberta model ๋‚ด๋ถ€์— ์ปค์Šคํ…€ ์ž„๋ฒ ๋”ฉ์„ ์ถ”๊ฐ€ํ•˜์˜€๊ณ ,

    ์ปค์Šคํ…€ ์ž„๋ฒ ๋”ฉ ๋‚ด๋ถ€์— entity location embedding์„ ์ถ”๊ฐ€ํ–ˆ๋‹ค.

  • ์‹คํ—˜ ๊ฒฐ๊ณผ
entity embedding micro f1 score auprc
X 69.8393 72.7379
O 71.0866 76.2210

Focal Loss

๋ผ๋ฒจ์˜ ๋ถˆ๊ท ํ˜•์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋„์ž…ํ•˜์˜€๋‹ค.

  • ์‹คํ—˜ ๊ฒฐ๊ณผ Untitled (2)

Learning Rate Scheduler

3k step ๊ฐ€๋Ÿ‰์—์„œ ๋ณดํ†ต ์ตœ๊ณ  ์„ฑ๋Šฅ์ด ๋‚˜์˜ค๊ณ  ๊ทธ ๋’ค๋กœ๋Š” ํ•™์Šต์ด ์•ˆ์ •์ ์œผ๋กœ ์ง„ํ–‰์ด ๋˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ๋„์ž…ํ•˜์˜€๋‹ค.

  • ์‹คํ—˜ ๊ฒฐ๊ณผ
image
scheduler micro f1 score auprc
linear 71.0407 76.9576
exponential 71.3977 76.4372

Label Constraints

๋ชจ๋ธ์ด ํ…์ŠคํŠธ ๋ฌธ๋งฅ์ด ์•„๋‹Œ, ์—”ํ‹ฐํ‹ฐ ๊ฐ„์˜ attention์œผ๋กœ ๊ด€๊ณ„๋ฅผ ์œ ์ถ”ํ•œ๋‹ค. โ†’ ์—”ํ‹ฐํ‹ฐ ํŽธํ–ฅ

์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด CoRE ๋…ผ๋ฌธ์— ์‚ฌ์šฉ๋œ ๊ธฐ๋ฒ•๋“ค์„ ์ ์šฉํ•˜์˜€๋‹ค.

CoRE ๋…ผ๋ฌธ์—์„œ ํ™œ์šฉํ•œ ์ฃผ์š”ํ•œ ํŽธํ–ฅ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์€ ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

new_preds = (prob + lamb_1 * prob_mask_1 + lamb_2 * prob_mask_2 + label_constraint).argmax(1)
Untitled (3)

์—ฌ๊ธฐ์„œ์˜ metric function์€ micro f1 score, [a,b]๋Š” [-2, 2]๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.

prob : ๋ชจ๋ธ์—์„œ ๋‚˜์˜จ ํ™•๋ฅ 
prob_mask_1 : ์—”ํ‹ฐํ‹ฐ๋กœ๋งŒ ์ถ”์ธกํ•œ ํ™•๋ฅ 
prob_mask_2 : no_relation์˜ ํ™•๋ฅ ๋งŒ 1๋กœ ๋‘๊ณ  ๋‚˜๋จธ์ง„ 0์œผ๋กœ ํŒจ๋”ฉํ•œ ํ™•๋ฅ 
label_constraint : ์—”ํ‹ฐํ‹ฐ type ๋ณ„๋กœ ๊ฐ€์งˆ ์ˆ˜ ์žˆ๋Š” ๊ด€๊ณ„๋ฅผ ์ œํ•œํ•ด๋‘” ๋ฆฌ์ŠคํŠธ

Label constraint micro f1 score auprc
X 73.9659 77.8363
O 74.2457 78.0171

Task Adaptive Pretraining(TAPT)

Donโ€™t Stop Pretraining: Adapt Language Models to Domains and Tasks ๋…ผ๋ฌธ์„ ์ฐธ๊ณ ํ•˜์—ฌ ๋Œ€ํšŒ ๋ฐ์ดํ„ฐ์…‹์— ์‚ฌ์ „ ํ•™์Šต ๋ชจ๋ธ์„ ์ ์‘์‹œํ‚ค๋ฉด ํŒŒ์ธํŠœ๋‹ ์‹œ ์˜ˆ์ธก์„ ๋” ์ž˜ํ•  ๊ฒƒ์ด๋ผ๋Š” ์˜ˆ์ธก ํ•˜์— ๋„์ž…ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค.

๋ชจ๋ธ perplexity RE f1 RE auprc
klue/roberta-large 331.737 71.1024 74.2235
klue/roberta-large + TAPT 4.169 72.6778 72.7815
studio-ousia/mluke-large 4.089 85.447 -
studio-ousia/mluke-large + TAPT 3.588 85.709 -

LSTM ๋ ˆ์ด์–ด๋ฅผ ๋ถ„๋ฅ˜๊ธฐ๋กœ ํ™œ์šฉ

Dense layer์™€ tanh ๋ ˆ์ด์–ด๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ์กด์˜ classificationhead์— ๋น„ํ•ด LSTM์˜ ์žฅ๊ธฐ ์˜์กด์„ฑ, ๋‹จ๊ธฐ ๋ฉ”๋ชจ๋ฆฌ๋Š” ์‹œํ€€์Šค ๋ถ„๋ฅ˜ ์„ฑ๋Šฅ์„ ๋†’์—ฌ์ค„ ๊ฒƒ์ด๋ผ๋Š” ๊ฐ€์ • ํ•˜์— ๋„์ž…ํ•œ ๊ธฐ๋ฒ•์ด๋‹ค.

  • ๊ตฌํ˜„ ๋ฐฉ๋ฒ•
    1. RobertaForSequenceClassification ๋ชจ๋ธ์„ RobertaModel๊ณผ Classifier๋กœ ๋ถ„๋ฆฌ
    2. Classifier๋ฅผ LSTM layer๋ฅผ ์ถ”๊ฐ€ํ•œ ๋ถ„๋ฅ˜๊ธฐ๋กœ ๋Œ€์ฒด โ†’ CustomLSTMClassificationHead
    3. LSTM์„ ์–‘๋ฐฉํ–ฅ context๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๋Š” bi-LSTM์œผ๋กœ ๋ฐ”๊พธ๋ฉฐ ๋ถ„๋ฅ˜ ์‹คํ—˜ ์ง„ํ–‰

LSTM

  • ์‹คํ—˜ ๊ฒฐ๊ณผ
Label constraint micro f1 score auprc
๊ธฐ์กด 71.3977 76.4372
LSTM 71.8922 75.4229
bi-LSTM 71.7605 76.4502

Subject Entity Type ๋ฐ˜์˜

EDA๊ณผ์ •์—์„œ 30๊ฐœ์˜ label์ด Subject entity type(ORG,PER)์— ๋”ฐ๋ผ ๊ฒฐ์ •๋จ์„ ํ™•์ธํ–ˆ๋‹ค. ์—ฌ๊ธฐ์„œ ๋ชจ๋ธ์ด Subject entity type์— ๋งž์ง€ ์•Š๋Š” label๋กœ ์˜ˆ์ธกํ•จ์„ ๋ฐœ๊ฒฌํ•˜์—ฌ, ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋„์ž…ํ•˜์˜€๋‹ค.

๊ฒฐ๊ณผ Prob๊ฐ’์—์„œ Subject entity์˜ type์ด ์•„๋‹Œ label๋กœ ์˜ˆ์ธกํ•œ Prob๋“ค์„ ๋ชจ๋‘ 0์œผ๋กœ ๋ฐ”๊พผ ๋’ค ๋‹ค์‹œ ์ด ํ•ฉ์ด 1์ด ๋˜๋„๋ก ๋ณด์ •ํ•˜์˜€๋‹ค.

  • ์‹คํ—˜ ๊ฒฐ๊ณผ
Subject entity type ๋ฐ˜์˜ micro f1 score auprc
X 73.9659 77.8363
O 74.2457 78.0171

๐Ÿ“Œ Final Model

  • ๋‹จ์ผ ๋ชจ๋ธ๋กœ ์ œ์ผ ์„ฑ๋Šฅ์ด ๋†’๊ฒŒ ๋‚˜์˜จ ๊ฒƒ์€ ์•„๋ž˜ ์˜ต์…˜์„ ์ ์šฉํ•œ klue/roberta-large ์˜€๋‹ค.
    • ์†์‹คํ•จ์ˆ˜: focal loss
    • lr scheduler : exponential
    • LSTM ๋ ˆ์ด์–ด ์ถ”๊ฐ€
    • ํ•™์Šต๋ฅ  : 1e-5
    • ๋ฐฐ์น˜ ์‚ฌ์ด์ฆˆ: 32
    • Entity Embeding layer ์ถ”๊ฐ€
    • TAPT
    • label constraint
  • ํŽธํ–ฅ์„ ์ตœ์†Œํ™”ํ•ด์ฃผ๊ธฐ ์œ„ํ•˜์—ฌ ๋‹ค์Œ 8๊ฐ€์ง€ ๊ฒฐ๊ด๊ฐ’์— ๋ ˆ์ด๋ธ”๊ฐ’ ๊ฒฐ์ • ํ™•๋ฅ ์„ ๋ชจ๋‘ ๋”ํ•˜๊ณ  ์ด๋ฅผ ํ‰๊ท ํ•ด์„œ ์ด๋“ค ์ค‘ ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€ ๋ ˆ์ด๋ธ” ๊ฐ’์„ ์ตœ์ข… ๋ณดํŒ… ๊ฒฐ๊ด๊ฐ’์œผ๋กœ ์„ ์ •ํ•˜๋Š” ์†Œํ”„ํŠธ ๋ณดํŒ… ์•™์ƒ๋ธ”์„ ์ ์šฉํ•˜์—ฌ ์ตœ์ข… ์ œ์ถœํ–ˆ๋‹ค.
    • klue/roberta-large + focal loss + exponential lr scheduler + LSTM ๋ ˆ์ด์–ด + Entity Embeding layer + label constraint
    • klue/roberta-large + focal loss + exponential lr scheduler + Bi-LSTM ๋ ˆ์ด์–ด + Entity Embeding layer
    • klue/roberta-large + focal loss + exponential lr scheduler + LSTM ๋ ˆ์ด์–ด + Entity Embeding layer
    • klue/roberta-large + TAPT + focal loss + exponential lr scheduler + embedding layer
    • klue/roberta-large + TAPT + cross entropy
    • klue/roberta-large + focal loss + exponential lr scheduler + Entity Embeding layer + ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•
    • xlm-roberta-large + focal loss + exponential lr scheduler + LSTM ๋ ˆ์ด์–ด + Entity Embeding layer
    • studio-ousia/mluke-large
public private
micro-f1 75.2196 74.7872
auprc 80.8363 82.8973
์ˆœ์œ„ 8 4

Untitled (4)

level2_klue-nlp-07's People

Contributors

yunjinchoidev avatar jlake310 avatar lectura7942 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.