Giter Site home page Giter Site logo

ganjinzero / triaffine-nested-ner Goto Github PK

View Code? Open in Web Editor NEW
38.0 2.0 7.0 83 KB

Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition [ACL 2022 Findings]

Home Page: https://arxiv.org/abs/2110.07480

Python 100.00%
ner nlp

triaffine-nested-ner's Introduction

Triaffine-nested-ner

Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition [ACL2022 Findings] Paper link

PWC PWC PWC PWC

Environment

All codes are tested under Python 3.7, PyTorch 1.7.0 and Transformers 4.6.1. Need to install opt_einsum for einsum calculations. At least 16GB GPU are needed for training.

Dataset

We only put several samples for each dataset. You can download full GENIA dataset here. There are no direct download links for other datasets. One need to obtain licences from LDC for getting ACE2004, ACE2005 and KBP2017. You can email me with LDC licences to obtain processed datasets ([email protected]). Please put datas under data/dataset_name, you can also refer word_embed.generate_vocab_embed for data paths.

Extract word embedding

Please download cc.en.300.bin and BioWordVec_PubMed_MIMICIII_d200.bin and run python word_embed.py to generate required json files. You need to change the path of word embedding.

Checkpoint

Genia

Reproduce

To notice, you may need to change the path of bert_name_or_path by yourself.

ace04 + bert

CUDA_VISIBLE_DEVICES=0 python main.py --version ace04 --model SpanAttModelV3 --bert_name_or_path ../pretraining-models/bert-large-cased/ --learning_rate 1e-5 --batch_size 1 --gradient_accumulation_steps 8 --train_epoch 50 --score tri_affine --truncate_length 192 --word --word_dp 0.2 --word_embed cc --char --pos --use_context --warmup_ratio 0.0 --att_dim 256  --bert_before_lstm --lstm_dim 1024 --lstm_layer 2 --encoder_learning_rate 1e-4 --max_span_count 30 --share_parser --subword_aggr max --init_std 1e-2 --dp 0.1

ace05 + bert

CUDA_VISIBLE_DEVICES=0 python main.py --version ace05 --model SpanAttModelV3 --bert_name_or_path ../pretraining-models/bert-large-cased/ --learning_rate 3e-5 --batch_size 1 --gradient_accumulation_steps 72 --train_epoch 50 --score tri_affine --truncate_length 192 --word --word_dp 0.2 --word_embed cc --char --pos --use_context --warmup_ratio 0.0 --att_dim 256  --bert_before_lstm --lstm_dim 1024 --lstm_layer 2 --encoder_learning_rate 5e-4 --max_span_count 30 --share_parser --subword_aggr max --init_std 1e-2 --dp 0.1

genia + biobert

CUDA_VISIBLE_DEVICES=0 python main.py --version genia91 --model SpanAttModelV3 --bert_name_or_path ../pretraining-models/biobert_v1.1/ --learning_rate 3e-5 --batch_size 1 --gradient_accumulation_steps 48 --train_epoch 15 --score tri_affine --truncate_length 192 --word --word_dp 0.2 --char --pos --use_context --warmup_ratio 0.0  --att_dim 320 --bert_before_lstm --lstm_dim 1024 --lstm_layer 2 --encoder_learning_rate 5e-4 --max_span_count 30 --share_parser --subword_aggr max --init_std 1e-2 --dp 0.2

kbp + bert

CUDA_VISIBLE_DEVICES=0 python main.py --version kbp --model SpanAttModelV3 --bert_name_or_path ../pretraining-models/bert-large-cased/ --learning_rate 1e-5 --batch_size 1 --gradient_accumulation_steps 8 --train_epoch 50 --score tri_affine --truncate_length 192 --word --word_dp 0.2 --word_embed cc --char --pos --use_context --warmup_ratio 0.0 --att_dim 256  --bert_before_lstm --lstm_dim 1024 --lstm_layer 2 --encoder_learning_rate 1e-4 --max_span_count 30 --share_parser --subword_aggr max --init_std 1e-2 --dp 0.1

ace04 + albert

CUDA_VISIBLE_DEVICES=0 python main.py --version ace04 --model SpanAttModelV3 --bert_name_or_path ../pretraining-models/albert-xxlarge-v2/ --learning_rate 1e-5 --batch_size 1 --gradient_accumulation_steps 8 --train_epoch 10 --score tri_affine --truncate_length 192 --word --word_dp 0.2 --word_embed cc --char --pos --use_context --warmup_ratio 0.0 --att_dim 256  --bert_before_lstm --lstm_dim 1024 --lstm_layer 2 --encoder_learning_rate 1e-4 --max_span_count 30 --share_parser --subword_aggr max --init_std 1e-2 --dp 0.1

ace05 + albert

CUDA_VISIBLE_DEVICES=0 python main.py --version ace05 --model SpanAttModelV3 --bert_name_or_path ../pretraining-models/albert-xxlarge-v2/ --learning_rate 1e-5 --batch_size 1 --gradient_accumulation_steps 8 --train_epoch 10 --score tri_affine --truncate_length 192 --word --word_dp 0.2 --word_embed cc --char --pos --use_context --warmup_ratio 0.0 --att_dim 256  --bert_before_lstm --lstm_dim 1024 --lstm_layer 2 --encoder_learning_rate 1e-4 --max_span_count 30 --share_parser --subword_aggr max --init_std 1e-2 --dp 0.1

kbp + albert

CUDA_VISIBLE_DEVICES=0 python main.py --version kbp --model SpanAttModelV3 --bert_name_or_path ../pretraining-models/albert-xxlarge-v2/ --learning_rate 3e-5 --batch_size 1 --gradient_accumulation_steps 72 --train_epoch 10 --score tri_affine --truncate_length 192 --word --word_dp 0.2 --word_embed cc --char --pos --use_context --warmup_ratio 0.0 --att_dim 256  --bert_before_lstm --lstm_dim 1024 --lstm_layer 2 --encoder_learning_rate 5e-4 --max_span_count 30 --share_parser --subword_aggr max --init_std 1e-2  --dp 0.2

Citations

@inproceedings{yuan-etal-2022-fusing,
    title = "Fusing Heterogeneous Factors with Triaffine Mechanism for Nested Named Entity Recognition",
    author = "Yuan, Zheng  and
      Tan, Chuanqi  and
      Huang, Songfang  and
      Huang, Fei",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2022",
    month = may,
    year = "2022",
    address = "Dublin, Ireland",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2022.findings-acl.250",
    pages = "3174--3186",
    abstract = "Nested entities are observed in many domains due to their compositionality, which cannot be easily recognized by the widely-used sequence labeling framework.A natural solution is to treat the task as a span classification problem.To learn better span representation and increase classification performance, it is crucial to effectively integrate heterogeneous factors including inside tokens, boundaries, labels, and related spans which could be contributing to nested entities recognition.To fuse these heterogeneous factors, we propose a novel triaffine mechanism including triaffine attention and scoring.Triaffine attention uses boundaries and labels as queries and uses inside tokens and related spans as keys and values for span representations.Triaffine scoring interacts with boundaries and span representations for classification.Experiments show that our proposed method outperforms previous span-based methods, achieves the state-of-the-art $F_1$ scores on nested NER datasets GENIA and KBP2017, and shows comparable results on ACE2004 and ACE2005.",
}

triaffine-nested-ner's People

Contributors

ganjinzero avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

triaffine-nested-ner's Issues

Cuda out of memory error for another dataset for a 45 Gb GPU

Can you tell anything that i can do to run it successfully ?

File "model5/Triaffine-nested-ner/model/span_att_v2.py", line 405, in get_class_position
score = contract('bxi,bzrk,ikjr,bxj->bzxr', x3, h_span3, self.parser_list[0].parser1.weight, y3)
File "anaconda3/envs/model5/lib/python3.7/site-packages/opt_einsum/contract.py", line 507, in contract
return _core_contract(operands, contraction_list, backend=backend, **einsum_kwargs)
File "anaconda3/envs/model5/lib/python3.7/site-packages/opt_einsum/contract.py", line 573, in _core_contract
new_view = _tensordot(*tmp_operands, axes=(tuple(left_pos), tuple(right_pos)), backend=backend)
File "anaconda3/envs/model5/lib/python3.7/site-packages/opt_einsum/sharing.py", line 131, in cached_tensordot
return tensordot(x, y, axes, backend=backend)
File "anaconda3/envs/model5/lib/python3.7/site-packages/opt_einsum/contract.py", line 374, in _tensordot
return fn(x, y, axes=axes)
File "anaconda3/envs/model5/lib/python3.7/site-packages/opt_einsum/backends/torch.py", line 54, in tensordot
return torch.tensordot(x, y, dims=axes)
File "anaconda3/envs/model5/lib/python3.7/site-packages/torch/functional.py", line 936, in tensordot
return _VF.tensordot(a, b, dims_a, dims_b) # type: ignore

RuntimeError: CUDA out of memory. Tried to allocate 10.69 GiB (GPU 0; 44.40 GiB total capacity; 34.47 GiB already allocated; 8.00 GiB free; 35.18 GiB reserved in total by PyTorch)

Question about data format

I am trying to apply the model to other data corpus. I was curious that what is your input data format. How could I preprocess the data and then pass to the model? Thank you.

I'm getting an f1 score of zero.

Hello,
I want firstly to thank you for making your code publicly available.
Actually, I was trying to reproduce your code using ACE04 dataset (I'm using only the samples provided here, since I don't have access to the dataset). But I'm getting an f1 score of 0.
Dev_Epoch24 {'p': 0, 'r': 0, 'f1': 0}

Do you have any insights on what might be causing the problem?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.