中文命名实体识别
- 命名实体识别方法研究综述
2022年
http://fcst.ceaj.org/CN/10.3778/j.issn.1673-9418.2112109 - 中文命名实体识别综述
2021年
http://fcst.ceaj.org/CN/abstract/abstract2902.shtml
-
Boundary Smoothing for Named Entity Recognition
ACL 2022
https://arxiv.org/pdf/2204.12031v1.pdf
https://github.com/syuoni/eznlp -
NFLAT: Non-Flat-Lattice Transformer for Chinese Named Entity Recognition
2022
https://arxiv.org/pdf/2205.05832.pdf -
Unified Structure Generation for Universal Information Extraction
(一统实体识别、关系抽取、事件抽取、情感分析)
ACL 2022
https://arxiv.org/pdf/2203.12277.pdf
https://github.com/PaddlePaddle/PaddleNLP/tree/develop/model_zoo/uie
https://github.com/universal-ie/UIE
以下这篇也是通用的,只是英文方面的,没有中文数据上的实验:- DEEPSTRUCT: Pretraining of Language Models for Structure Prediction
2022
https://arxiv.org/pdf/2205.10475v1.pdf
https://github.com/cgraywang/deepstruct
- DEEPSTRUCT: Pretraining of Language Models for Structure Prediction
-
Parallel Instance Query Network for Named Entity Recognition
2022
https://arxiv.org/pdf/2203.10545v1.pdf -
Delving Deep into Regularity: A Simple but Effective Method for Chinese Named Entity Recognition
NAACL 2022
https://arxiv.org/pdf/2204.05544.pdf -
TURNER: The Uncertainty-based Retrieval Framework for Chinese NER
2022
https://arxiv.org/pdf/2202.09022 -
NN-NER: Named Entity Recognition with Nearest Neighbor Search
2022
https://arxiv.org/pdf/2203.17103
https://github.com/ShannonAI/KNN-NER -
Unified Named Entity Recognition as Word-Word Relation Classification
AAAI 2022
https://arxiv.org/abs/2112.10070
https://github.com/ljynlp/W2NER.git -
MarkBERT: Marking Word Boundaries Improves Chinese BERT
2022
https://arxiv.org/pdf/2203.06378 -
MFE-NER: Multi-feature Fusion Embedding for Chinese Named Entity Recognition
2021
https://arxiv.org/pdf/2109.07877 -
AdaK-NER: An Adaptive Top-K Approach for Named Entity Recognition with Incomplete Annotations
2021
https://arxiv.org/pdf/2109.05233 -
ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information
ACL 2021
https://arxiv.org/pdf/2106.16038
https://github.com/ShannonAI/ChineseBert -
Enhanced Language Representation with Label Knowledge for Span Extraction
EMNLP 2021
https://aclanthology.org/2021.emnlp-main.379.pdf
https://github.com/Akeepers/LEAR -
Lex-BERT: Enhancing BERT based NER with lexicons
ICLR 2021
https://arxiv.org/pdf/2101.00396v1.pdf -
Lexicon Enhanced Chinese Sequence Labeling Using BERT Adapter
ACL 2021
https://arxiv.org/pdf/2105.07148.pdf
https://github.com/liuwei1206/LEBERT -
MECT: Multi-Metadata Embedding based Cross-Transformer for Chinese Named Entity Recognition
ACL 2021
https://arxiv.org/pdf/2107.05418v1.pdf
https://github.com/CoderMusou/MECT4CNER -
Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition
ACL 2021
https://arxiv.org/pdf/2105.06804v2.pdf
https://github.com/tricktreat/locate-and-label -
Dynamic Modeling Cross- and Self-Lattice Attention Network for Chinese NER
AAAI 2021
https://ojs.aaai.org/index.php/AAAI/article/view/17706/17513
https://github.com/zs50910/DCSAN-for-Chinese-NER -
Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
EMNLP-2020
https://arxiv.org/pdf/2010.15466
https://github.com/cuhksz-nlp/AESINER -
ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations
ACL 2020
https://arxiv.org/pdf/1911.00720v1.pdf
https://github.com/sinovation/ZEN -
A Unified MRC Framework for Named Entity Recognition
ACL 2020
https://arxiv.org/pdf/1910.11476v6.pdf
https://github.com/ShannonAI/mrc-for-flat-nested-ner -
Simplify the Usage of Lexicon in Chinese NER
ACL 2020
https://arxiv.org/pdf/1908.05969.pdf
https://github.com/v-mipeng/LexiconAugmentedNER -
Dice Loss for Data-imbalanced NLP Tasks
ACL 2020
https://arxiv.org/pdf/1911.02855v3.pdf
https://github.com/ShannonAI/dice_loss_for_NLP -
Porous Lattice Transformer Encoder for Chinese NER
COLING 2020
https://aclanthology.org/2020.coling-main.340.pdf -
FLAT: Chinese NER Using Flat-Lattice Transformer
ACL 2020
https://arxiv.org/pdf/2004.11795v2.pdf
https://github.com/LeeSureman/Flat-Lattice-Transformer -
FGN: Fusion Glyph Network for Chinese Named Entity Recognition
2020
https://arxiv.org/pdf/2001.05272v6.pdf
https://github.com/AidenHuen/FGN-NER -
SLK-NER: Exploiting Second-order Lexicon Knowledge for Chinese NER
2020
https://arxiv.org/pdf/2007.08416v1.pdf
https://github.com/zerohd4869/SLK-NER -
Entity Enhanced BERT Pre-training for Chinese NER
EMNLP 2020
https://aclanthology.org/2020.emnlp-main.518.pdf
https://github.com/jiachenwestlake/Entity_BERT -
Improving Named Entity Recognition with Attentive Ensemble of Syntactic Information
ACL2020
https://arxiv.org/pdf/2010.15466v1.pdf
https://github.com/cuhksz-nlp/AESINER -
Named Entity Recognition for Social Media Texts with Semantic Augmentation
EMNLP 2020
https://arxiv.org/pdf/2010.15458v1.pdf
https://github.com/cuhksz-nlp/SANER -
CLUENER2020: Fine-grained Named Entity Recognition Dataset and Benchmark for Chinese
2020
https://arxiv.org/pdf/2001.04351v4.pdf
https://github.com/CLUEbenchmark/CLUENER2020 -
ERNIE: Enhanced Representation through Knowledge Integration
2019
https://arxiv.org/pdf/1904.09223v1.pdf
https://github.com/PaddlePaddle/ERNIE -
TENER: Adapting Transformer Encoder for Named Entity Recognition
2019
https://arxiv.org/pdf/1911.04474v3.pdf
https://github.com/fastnlp/TENER -
Chinese NER Using Lattice LSTM
ACL 2018
https://arxiv.org/pdf/1805.02023v4.pdf
https://github.com/jiesutd/LatticeLSTM -
ERNIE 2.0: A Continual Pre-training Framework for Language Understanding
2019
https://arxiv.org/pdf/1907.12412v2.pdf
https://github.com/PaddlePaddle/ERNIE -
Glyce: Glyph-vectors for Chinese Character Representations
NeurIPS 2019
https://arxiv.org/pdf/1901.10125v5.pdf
https://github.com/ShannonAI/glyce -
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
NAACL 2019
https://arxiv.org/pdf/1904.02141v3.pdf
https://github.com/microsoft/vert-papers/tree/master/papers/CAN-NER -
Neural Chinese Named Entity Recognition via CNN-LSTM-CRF and Joint Training with Word Segmentation
2019
https://arxiv.org/pdf/1905.01964v1.pdf
https://github.com/rxy007/cnn-lstm-crf -
Chinese Named Entity Recognition Augmented with Lexicon Memory
2019
https://arxiv.org/pdf/1912.08282v2.pdf
https://github.com/dugu9sword/LEMON -
Exploiting Multiple Embeddings for Chinese Named Entity Recognition
2019
https://arxiv.org/pdf/1908.10657v1.pdf
https://github.com/WHUIR/ME-CNER -
Dependency-Guided LSTM-CRF for Named Entity Recognition
IJCNLP 2019
https://arxiv.org/pdf/1909.10148v1.pdf
https://github.com/allanj/ner_with_dependency -
CAN-NER: Convolutional Attention Network for Chinese Named Entity Recognition
NAACL-HLT (1) 2019
https://aclanthology.org/N19-1342/ -
CNN-Based Chinese NER with Lexicon Rethinking
IJCAI 2019
https://www.ijcai.org/proceedings/2019/0692.pdf
https://aclanthology.org/N19-1342.pdf -
Leverage Lexical Knowledge for Chinese Named Entity Recognition via Collaborative Graph Network
IJCNLP 2019
https://aclanthology.org/D19-1396.pdf
https://github.com/DianboWork/Graph4CNER -
Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning
COLING 2018
https://aclanthology.org/C18-1183.pdf
https://github.com/rainarch/DSNER -
Adversarial Transfer Learning for Chinese Named Entity Recognition with Self-Attention Mechanism
EMNLP 2018
https://aclanthology.org/D18-1017.pdf
https://github.com/CPF-NLPR/AT4ChineseNER
没有针对于中文的实验,但是**可以借鉴的:
- A Unified Generative Framework for Various NER Subtasks
(使用BART生成模型进行命名实体识别)
ACL-ICJNLP 2021
https://arxiv.org/pdf/2106.01223.pdf
https://github.com/yhcc/BARTNER
(以下四篇是基于prompt的命名实体识别) - Template-Based Named Entity Recognition Using BART
https://arxiv.org/abs/2106.01760
https://github.com/Nealcly/templateNER - Good Examples Make A Faster Learner: Simple Demonstration-based Learning for Low-resource NER
https://arxiv.org/abs/2110.08454
https://github.com/INK-USC/fewNER - LightNER: A Lightweight Generative Framework with Prompt-guided Attention for Low-resource NER
https://arxiv.org/abs/2109.00720
https://github.com/zjunlp/DeepKE/blob/main/example/ner/few-shot/README_CN.md - Template-free Prompt Tuning for Few-shot NER
https://arxiv.org/abs/2109.13532
https://github.com/rtmaww/EntLM/
- MSRA
- resume
- onenotes4
- onenotes5
- 一家公司提供的数据集,包含人名、地名、机构名、专有名词。
- 人民网(04年)
- 影视-音乐-书籍实体标注数据
- 中文医学文本命名实体识别 2020CCKS
- 医渡云实体识别数据集
- CLUENER2020
- ChineseBert
- MacBert
- SpanBert
- XLNet
- Roberta
- Bert
- StructBert
- WoBert
- ELECTRA
- Ernie1.0
- Ernie2.0
- Ernie3.0
- NeZha
- MengZi
- Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words
- PERT: Pre-Training BERT with Permuted Language Model
- Stanza
- LAC
- Ltp
- Hanlp
- foolnltk
- NLTK
- BosonNLP
- FudanNlp
- Jionlp
- HarvestText
- CCKS2017开放的中文的电子病例测评相关的数据。
评测任务一:https://biendata.com/competition/CCKS2017_1/
评测任务二:https://biendata.com/competition/CCKS2017_2/ - CCKS2018开放的音乐领域的实体识别任务。
评测任务:https://biendata.com/competition/CCKS2018_2/ - (CoNLL 2002)Annotated Corpus for Named Entity Recognition。
地址:https://www.kaggle.com/abhinavwalia95/entity-annotated-corpus - NLPCC2018开放的任务型对话系统中的口语理解评测。
地址:http://tcci.ccf.org.cn/conference/2018/taskdata.php